filtering and measure theory

Upload: hamood-riasat-khan

Post on 07-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Filtering and Measure Theory

    1/270

    http://www.cambridge.org/9780521838030
  • 8/6/2019 Filtering and Measure Theory

    2/270

    This page intentionally left blank

  • 8/6/2019 Filtering and Measure Theory

    3/270

    Measure Theory and Filtering

    Introduction and Applications

    The estimation of noisily observed states from a sequence of data has traditionally incor-porated ideas from Hilbert spaces and calculus-based probability theory. As conditional

    expectation is the key concept, the correct setting for filtering theory is that of a probabil-

    ity space. Graduate engineers, mathematicians, and those working in quantitative finance

    wishing to use filtering techniques will find in the first half of this book an accessible

    introduction to measure theory, stochastic calculus, and stochastic processes, with particular

    emphasis on martingales and Brownian motion. Exercises are included, solutions to which

    are available from www.cambridge.org. The book then provides an excellent users guide

    to filtering: basic theory is followed by a thorough treatment of Kalman filtering, includingrecent results that exend the Kalman filter to provide parameter estimates. These ideas are

    then applied to problems arising in finance, genetics, and population modelling in three sep-

    arate chapters, making this a comprehensive resource for both practitioners and researchers.

    Lakhdar Aggoun is Associate Professor in the Department of Mathematics and Statistics

    at Sultan Qabos University, Oman.

    Robert Elliott is RBC Financial Group Professor of Finance at the University of Calgary,

    Canada.

  • 8/6/2019 Filtering and Measure Theory

    4/270

    CAMBRIDGE SERIES IN STATISTICAL AND

    PROBABILISTIC MATHEMATICS

    Editorial Board

    R. Gill (Department of Mathematics, Utrecht University)

    B. D. Ripley (Department of Statistics, University of Oxford)

    S. Ross (Department of Industrial Engineering, University of California, Berkeley)

    M. Stein (Department of Statistics, University of Chicago)

    B. Silverman (St. Peters College, University of Oxford)

    This series of high-quality upper-division textbooks and expository monographs covers

    all aspects of stochastic applicable mathematics. The topics range from pure and appliedstatistics to probability theory, operations research, optimization, and mathematical pro-

    gramming. The books contain clear presentations of new developments in the field and

    also of the state of the art in classical methods. While emphasizing rigorous treatment of

    theoretical methods, the books also contain applications and discussions of new techniques

    made possible by advances in computational practice.

    Already published

    1. Bootstrap Methods and Their Application, by A. C. Davison and D. V. Hinkley

    2. Markov Chains, by J. Norris3. Asymptotic Statistics, by A. W. van der Vaart

    4. Wavelet Methods for Time Series Analysis, by Donald B. Percival and Andrew T. Walden

    5. Bayesian Methods, by Thomas Leonard and John S. J. Hsu

    6. Empirical Processes in M-Estimation, by Sara van de Geer

    7. Numerical Methods of Statistics, by John F. Monahan

    8. A Users Guide to Measure Theoretic Probability, by David Pollard

    9. The Estimation and Tracking of Frequency, by B. G. Quinn and E. J. Hannan

    10. Data Analysis and Graphics using R, by John Maindonald and John Braun

    11. Statistical Models, by A. C. Davison

    12. Semiparametric Regression, by D. Ruppert, M. P. Wand, R. J. Carroll

    13. Exercises in Probability, by Loic Chaumont and Marc Yor

  • 8/6/2019 Filtering and Measure Theory

    5/270

  • 8/6/2019 Filtering and Measure Theory

    6/270

    Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, So Paulo

    Cambridge University PressThe Edinburgh Building, Cambridge , UK

    First published in print format

    - ----

    - ----

    Cambridge University Press 2004

    2004

    Information on this title: www.cambridge.org/9780521838030

    This publication is in copyright. Subject to statutory exception and to the provision ofrelevant collective licensing agreements, no reproduction of any part may take placewithout the written permission of Cambridge University Press.

    - ---

    - ---

    Cambridge University Press has no responsibility for the persistence or accuracy ofsfor external or third-party internet websites referred to in this publication, and does not

    guarantee that any content on such websites is, or will remain, accurate or appropriate.

    Published in the United States of America by Cambridge University Press, New York

    www.cambridge.org

    hardback

    eBook (EBL)

    eBook (EBL)

    hardback

    http://www.cambridge.org/9780521838030http://www.cambridge.org/http://www.cambridge.org/9780521838030http://www.cambridge.org/
  • 8/6/2019 Filtering and Measure Theory

    7/270

    Contents

    Preface page ix

    Part I Theory 1

    1 Basic probability concepts 3

    1.1 Random experiments and probabilities 3

    1.2 Conditional probabilities and independence 9

    1.3 Random variables 14

    1.4 Conditional expectations 28

    1.5 Problems 34

    2 Stochastic processes 38

    2.1 Definitions and general results 38

    2.2 Stopping times 46

    2.3 Discrete time martingales 50

    2.4 Doob decomposition 56

    2.5 Continuous time martingales 59

    2.6 DoobMeyer decomposition 62

    2.7 Brownian motion 702.8 Brownian motion process with drift 72

    2.9 Brownian paths 72

    2.10 Poisson process 75

    2.11 Problems 75

    3 Stochastic calculus 79

    3.1 Introduction 79

    3.2 Quadratic variations 80

    3.3 Simple examples of stochastic integrals 87

    3.4 Stochastic integration with respect to a Brownian motion 90

    3.5 Stochastic integration with respect to general martingales 94

    3.6 The Ito formula for semimartingales 97

    3.7 The Ito formula for Brownian motion 108

    3.8 Representation results 115

    3.9 Random measures 123

    3.10 Problems 127

  • 8/6/2019 Filtering and Measure Theory

    8/270

    vi Contents

    4 Change of measures 131

    4.1 Introduction 131

    4.2 Measure change for discrete time processes 134

    4.3 Girsanovs theorem 145

    4.4 The single jump process 1504.5 Change of parameter in poisson processes 157

    4.6 Poisson process with drift 161

    4.7 Continuous-time Markov chains 163

    4.8 Problems 165

    Part II Applications 167

    5 Kalman filtering 1695.1 Introduction 169

    5.2 Discrete-time scalar dynamics 169

    5.3 Recursive estimation 169

    5.4 Vector dynamics 175

    5.5 The EM algorithm 177

    5.6 Discrete-time model parameter estimation 178

    5.7 Finite-dimensional filters 180

    5.8 Continuous-time vector dynamics 1905.9 Continuous-time model parameters estimation 196

    5.10 Direct parameter estimation 206

    5.11 Continuous-time nonlinear filtering 211

    5.12 Problems 215

    6 Financial applications 217

    6.1 Volatility estimation 217

    6.2 Parameter estimation 221

    6.3 Filtering a price process 222

    6.4 Parameter estimation for a modified Kalman filter 223

    6.5 Estimating the implicit interest rate of a risky asset 229

    7 A genetics model 235

    7.1 Introduction 235

    7.2 Recursive estimates 235

    7.3 Approximate formulae 239

    8 Hidden populations 242

    8.1 Introduction 242

    8.2 Distribution estimation 243

    8.3 Parameter estimation 246

    8.4 Pathwise estimation 247

    8.5 A Markov chain model 248

  • 8/6/2019 Filtering and Measure Theory

    9/270

    Contents vii

    8.6 Recursive parameter estimation 250

    8.7 A tags loss model 250

    8.8 Gaussian noise approximation 253

    References 255Index 257

  • 8/6/2019 Filtering and Measure Theory

    10/270

  • 8/6/2019 Filtering and Measure Theory

    11/270

    Preface

    Traditional courses for engineers in filtering and signal processing have been based on

    elementary linear algebra, Hilbert space theory and calculus. However, the key objectiveunderlying such procedures is the (recursive) estimation of indirectly observed states given

    observed data. This means that one is discussing conditional expected values, given the

    observations. The correct setting for conditional expected value is in the context of mea-

    surable spaces equipped with a probability measure, and the initial object of this book is

    to provide an overview of required measure theory. Secondly, conditional expectation, as

    an inverse operation, is best formulated as a form of Bayes Theorem. A mathematically

    pleasing presentation of Bayes theorem is to consider processes as being initially defined

    under a reference probability. This is an idealized probability under which all the observa-tions are independent and identically distributed. The reference probability is a much nicer

    measure under which to work. A suitably defined change of measure then transforms the

    distribution of the observations to their real world form. This setting for the derivation of the

    estimation and filtering results enables more general results to be obtained in a transparent

    way.

    The book commences with a leisurely and intuitive introduction to -fields and the results

    in measure theory that will be required.

    The first chapter also discusses random variables, integration and conditional expectation.

    Chapter 2 introduces stochastic processes, with particular emphasis on martingales and

    Brownian motion.

    Stochastic calculus is developed in Chapter 3 and techniques related to changing proba-

    bility measures are described in Chapter 4.

    The change of measure method is the basic technique used in this book.

    The second part of the book commences with a treatment of Kalman filtering in

    Chapter 5. Recent results, which extend the Kalman filter and enable parameter estimates

    to be obtained, are included. These results are applied to financial models in Chapter 6. The

    final two chapters give some filtering applications to genetics and population models.

    The authors would like to express their gratitude to Professor Nadjib Bouzar of the

    Department of Mathematics and Computer Science, University of Indianapolis, for the

    incredible amount of time he spent reading through the whole manuscript and making

    many useful suggestions.

    Robert Elliott would like to acknowledge the support of NSERC and the hospitality of

    the Department of Applied Mathematics at the University of Adelaide, South Australia.

  • 8/6/2019 Filtering and Measure Theory

    12/270

    x Preface

    Lakhdar Aggoun would like to acknowledge the support of the Department of Mathemat-

    ics and Statistics, Sultan Qaboos University, Al-Khoud, Sultanate of Oman; the hospitality

    of the Department of Mathematical Sciences at the University of Alberta, Canada; and the

    Haskayne School of Business, University of Calgary, Calgary, Canada.

  • 8/6/2019 Filtering and Measure Theory

    13/270

    Part I

    Theory

  • 8/6/2019 Filtering and Measure Theory

    14/270

  • 8/6/2019 Filtering and Measure Theory

    15/270

    1

    Basic probability concepts

    1.1 Random experiments and probabilities

    An experiment is random if its outcome cannot be predicted with certainty. A simple exampleis the throwing of a die. This experiment can result in any of six unpredictable outcomes 1,

    2, 3, 4, 5, 6 which we list in what is usually called a sample space = {1, 2, 3, 4, 5, 6} ={1, 2, 3, 4, 5, 6}. Another example is the amount of yearly rainfall in each of thenext 10 years in Auckland. Each outcome here is an ordered set containing ten nonnegative

    real numbers (a vector in IR10+ ); however, one has to wait 10 years before observing theoutcome .

    Another example is the following.

    Let Xt be the water level of a dam at time t. If we are interested in the behavior of Xt

    during an interval of time [t0, t1] say, then it is necessary to consider simultaneously an

    uncountable family of Xts, that is,

    = {0 Xt < , t0 t t1}.

    The smallest observable outcome of an experiment is called simple.

    The set{

    1}

    containing 1 resulting from a throw of a die is simple. The outcome odd

    number is not simple and it occurs if and only if the throw results in any of the three simple

    outcomes 1, 3, 5. If the throw results in a 5, say, then the same throw results also in a

    number larger than 3 or odd number. Sets containing outcomes are called events. The

    events odd number and a number larger than 3 are not mutually exclusive, that is, both

    can happen simultaneously, so that we can define the event odd number and a number

    larger than 3.

    The event odd number and even number is clearly impossible or empty. It is called

    the impossible eventand is denoted, in analogy with the empty set in set theory, by

    . The

    event odd number oreven number occurs no matter what is the event . It is itself and

    is called the certain event.

    In fact possible events of the experiment can be combined naturally using the set opera-

    tions union, intersection, and complementation. This leads to the concept of field or algebra

    (-field (sigma-field) or -algebra, respectively) which is of fundamental importance in the

    theory of probability.

  • 8/6/2019 Filtering and Measure Theory

    16/270

    4 Basic probability concepts

    A nonempty class F of subsets of a nonempty set is called a fieldor algebra if

    1. F,2. F is closed under finite unions (or finite intersections),

    3. F is closed under complementation.

    It is a -fieldor (-algebra) if the stronger condition

    2. F is closed under countable unions (or countable intersections)

    holds.

    If{F} is a -field the pair (,F) is called a measurable space. The sets B Fare calledevents and are said to be measurable sets.

    For instance, the collection of finite unions of the half open intervals (a, b], ( 0 Ft+ . We may also say that a filtration {Ft, t 0} is right-continuous if newinformation at time t arrives precisely at time t and not an instant after t.

    It is left-continuous if{Ft} contains events strictly priorto t, that is Ft =

    s

  • 8/6/2019 Filtering and Measure Theory

    19/270

    1.1 Random experiments and probabilities 7

    It is easily seen that if is finite we need only specify P on atoms ofF.

    The triple (, F, P) is called a probability space.

    Nonempty events which are unlikely to occur and to which a zero probability is assigned

    are called negligible events or null events.

    A -field F is P-complete if all subsets of null events are also events. Of course, theirprobability is zero.

    A filtration is complete ifF0 is complete, i.e. all the null events are known at the initial

    time.

    The mathematical object (,F,Ft, P), where the filtration {Ft, t 0} is right-continuous and complete, is sometimes called a stochastic basis or a filtered probability

    space .

    The filtration {Ft, t 0} is said to satisfy the usual conditions if it is right-continuous

    and complete.For monotonic sequences of events we have the following result on continuity of proba-

    bility measures.

    Theorem 1.1.3 Let(,F, P) be a probability space. If{An} is an increasing sequence ofevents with limit A, then

    P(An) P(A),

    and if{B

    n}is a decreasing sequence of events with limit B, then

    P(Bn) P(B).

    Proof To prove the first statement, visualize the sequence {An} as a sequence of increasingconcentric disks and then define the sequence of disjoint rings {Rn} (except for R1 whichis the disk A1):

    R1 = A1, R2 = A2 A1, . . . , Rn = An An1.

    Note that

    Ak = kn=1Rn , A = n=1An = n=1Rn ,

    so that by -additivity

    P(A) = n=1 P(Rn ) = limkkn=1 P(Rn) = limk P(kn=1Rn ) = limk P(Ak).The proof of the second statement follows by considering the sequence of complementary

    events { Bn} which is increasing with limit B, so that

    1 P(An) 1 P(A) = P(An) P(A).

    Example 1.1.4 Consider the experiment of tossing a fair coin infinitely many times and

    observing the outcomes ofall tosses. Here each = (H, T) is a countably infinitesequence of Heads and Tails. If we denote Heads and Tails by 0 and 1, each is a

    sequence of 0s and 1s and it can be shown that there are as many s as there are points in

    the interval [0, 1)!

  • 8/6/2019 Filtering and Measure Theory

    20/270

    8 Basic probability concepts

    Suppose we wish to estimate the probability of the event consisting of those s for which

    the proportion of heads converges to 1/2. The so-called Strong Law of Large Numbers

    says that this probability is equal to one, i.e. the s for which the convergence to 1/2

    does not hold form a negligible set. However, this negligible set is rather huge, as can be

    imagined!

    Example 1.1.5 In Example 1.1.4 let Fn,S be the collection of infinite sequences of Hs and

    Ts with some restriction S put on the first n tosses. For instance, ifn = 3,

    S = { H H T . . . , H T H . . . , T H H . . . } (H, T)3,

    F3,S is the collection of infinite sequences of Hs and Ts for which the first three entries

    contain exactly two Hs.It is left as an exercise to show that the class

    F= {Fn,S, S (H, T)n , n IN} is a field.

    We now quote without proof from [4] the following result on extending a function P defined

    on sets in a field.

    Theorem 1.1.6 ([4]) If P is a probability measure on a fieldA, then it can be extended

    uniquely to the -field F= {A} generated by A, i.e. the restriction of the extensionmeasure to the fieldA is P itself and by tradition they are both denoted by P.

    Let us return to the coin-tossing situation of Example 1.1.5.

    Using the extension theorem (Theorem 1.1.6) one can construct a (unique) probability

    measure P called product probability measure on the space ((H, T),F), starting from aninitial probability (p(H), p(T)) = (1/2, 1/2) by setting

    P(Fn,S ) =S

    12

    n = (number of infinite sequences in S) 12

    n.

    It is left as an exercise to show that P does not depend on the representations of sets in F

    and that it is countably additive. (See [4]).

    An immediate generalization of the coin tossing experiment in Example 1.1.5 is to con-

    sider an infinite sequence of independent experiments, to which corresponds an infinite

    sequence of probability spaces (1,F1, P1), (2,F2, P2), . . . . We are interested in the

    space (

    )

    =

    1

    2 . . . of all infinite sequences

    =(

    1,

    2, . . . ). Events of inter-

    est are again cylinder sets, i.e. infinite sequences with restrictions put on the first n outcomes.

    The collection of all these cylinders form a field which generates a -field F, often denoted

    F1 F2 . . . . A probability measure P can be defined on cylinder sets then extendeduniquely to F using the Extension Theorem 1.1.6.

    In the coin-tossing experiment, an example of an event which is in F is the event F that

    a Head will occur. Clearly, F = k=1 Fk, where Fk is the event that a Head occurs onthe k-th trial and not before. Since each Fk is a cylinder set, P(Fk) is well defined for each

  • 8/6/2019 Filtering and Measure Theory

    21/270

    1.2 Conditional probabilities and independence 9

    k 1. Moreover the Fks are pairwise disjoint, hence

    P(F) =

    k=1P(Fk) =

    k=1

    1

    2k= 1.

    Note that this probability is still 1 regardless of the size of the probability of occurrence ofa Head, (as long as it is not 0).

    Modeling with infinite sample spaces is not a mathematical fantasy. In many very simple

    minded problems infinite sequences of outcomes cannot be avoided. For example, the first

    time a Head occurs event cannot be described in a finite sample space model because the

    number of trials before it occurs cannot be bounded in advance.

    In general, it is impossible to define a probability measure on all the subsets of an infinite

    sample space; that is, one cannot say any subset is an event. However, consider the following

    case.

    Example 1.1.7 Suppose that is countable and let F be the -field 2. Then it is not

    difficult to define a probability measure on F. Choose P such that

    0 P({}) 1 and P({}) =

    P() = 1,

    and for any F F, define P(F) =F P().Let {Fn}nIN be a sequence of disjoint sets in F and let n, denote the simple events in

    Fn. Since we have an infinite series of nonnegative numbers,

    P(

    n

    Fn) =n,m

    P(n,m ) =

    n

    m

    P(n,m ) =

    n

    P(Fn ).

    1.2 Conditional probabilities and independence

    Given a probability space (,F, P) and some event B with P(B)

    =0, we define a new

    posteriorprobability measure as follows. If A is any event we define the probability of A

    given B as

    P(A | B) = P(A and B)P(B)

    = P(A B)P(B)

    ,

    provided P(B) > 0. Otherwise P(A | B) is left undefined.What we mean by given event B is that we know that event B has occurred, that is we

    know that B, so that we no longer assign the same probabilities given by P to eventsbut assign new, or updated, probabilities given by the probability measure P(. | B). Anyevent which is mutually exclusive with B has probability zero under P(. | B) and the newprobability space is now (B,F B, P(. | B)).

    If our observation is limited to knowing whether event B has occurred or not we may as

    well define P(. | B), where B is the complement of B within . Prior to knowing wherethe outcome is we define the, now random, quantity:

    P(. | B or B)() = P(. | {B})() = P(. | B)IB () + P(. | B)IB ().

  • 8/6/2019 Filtering and Measure Theory

    22/270

    10 Basic probability concepts

    This definition extends in an obvious way to a -field G generated by a finite or countable

    partition {B1, B2, . . . } of and the random variable P(. | G)() is called the conditionalprobability given G. The random function P(. | G)() whose values on the atoms Bi areordinary conditional probabilities P(. | Bi ) =

    P(. Bi )P(B

    I)

    is not defined if P(Bi ) = 0. Inthis case we have a family of functions P(. | G)(), one for each possible arbitrary valueassigned to the undefined P(. | Bi ). Usually, one version is chosen and different versionsdiffer only on a set of probability 0.

    Example 1.2.1 Phone calls arrive at a switchboard between 8:00 a.m. and 12:00 p.m.

    according to the following probability distribution:

    1. P(k calls within an interval of length l) = el lk

    k!;

    2. If I1 and I2 are disjoint intervals,

    P((k1 calls within I1) (k2 calls within I2))= P(k1 calls within I1) P(k2 calls within I2),

    that is, events occurring within disjoint time intervals are independent.

    Suppose that the operator wants to know the probability that 0 calls arrive between 8:00

    and 9:00 given that the total number of calls from 8:00 a.m. to 12:00 p.m., N812, is known.

    From past experience, the operator assumes that this number is near 30 calls, say. Hence

    P(0 calls within [8, 9) | 30 calls within [8, 12])

    = P((0 calls within [8, 9))

    (30 calls within [9, 12]))

    P(30 calls within [8, 12])

    = P(0 calls within [8, 9))P(30 calls within [9, 12])P(30 calls within [8, 12])

    =

    3

    4

    30,

    which can be written as

    P(0 calls within [8, 9) | N812 = N) =

    3

    4

    N. (1.2.1)

    Remarks 1.2.2 Consider again Example 1.2.1.

    1. The events Fi = { : N812() = i}, i = 0, 1, . . . form a partition of and are atomsof the -field generated by observing only N812, so we may write:

    P(0 calls within [8, 9) | Fi , i IN)()= P(0 calls within [8, 9) | {Fi , i IN})()

    =i

    3

    4

    iIFi ().

    2. Observe that since each event F {Fi , i IN} is a union of some Fi1 , Fi2 , . . . , andsince we know, at the end of the experiment, which Fj contains , then we know

  • 8/6/2019 Filtering and Measure Theory

    23/270

    1.2 Conditional probabilities and independence 11

    whether or not lies in F, that is whether F or the complement of F has occurred. In

    this sense, {Fi , i IN} is indeed all we can answer about the experiment from what weknow.

    The likelihood of occurrence of any event A could be affected by the realization of B.Roughly speaking if the proportion of A within B is the same as the proportion of A

    within then it is intuitively clear that P(A | B) = P(A | ) = P(A). Knowing that Bhas occurred does not change the prior probability P(A). In that case we say that events

    A and B are independent. Therefore two events A and B are independent if and only if

    P(A B) = P(A)P(B).Two -fields F1 and F2 are independent if and only if P(A1 A2) = P(A1)P(A2) for

    all A1 F1, A2 F2.

    If events A and B are independent so are {A} and {B} because the impossible event is independent of everything else including itself, and so is . Also A and Bc, Ac andB, Ac and Bc are independent. We can say a bit more, if P(E) = 0 or P(E) = 1 then theevent E is independent of any other event including E itself, which seems intuitively clear.

    Mutually exclusive events with positive probabilities provide a good example of depen-

    dent events.

    Example 1.2.3 In the die throwing experiment the -fields

    F1 =

    {{

    1, 2}

    ,{

    3, 4, 5, 6}}

    ,

    and

    F2 = {{1, 2}, {3, 4}, {5, 6}},are not independent since if we know, for instance, that has landed in {5, 6} (or equivalently{5, 6} has occurred) in F2 then we also know that the event {3, 4, 5, 6} in F1 has occurred.This fact can be checked by direct calculation using the definition. However, the -fields

    F3

    =

    {{1, 2, 3

    },

    {4, 5, 6

    }},

    and

    F4 = {{1, 4}, {2, 5}, {3, 6}},are independent. The occurrence of any event in any ofF3 or F4 does not provide any

    nontrivial information about the occurrence of any (nontrivial) event in the other field.

    Another fundamental concept of probability theory is conditional independence. Events A

    and C are said to be conditionally independent given event B if P(A

    C|

    B)=

    P(A|B)P(C | B), P(B) > 0.

    The following example shows that it is not always easy to decide, under a probability

    measure, if conditional independence holds or not between events.

    Example 1.2.4 Consider the following two events:

    A1=person 1 is going to watch a football game next weekend,A2=person 2, with no relation at all with person 1, is going to watch a football game next

    weekend.

  • 8/6/2019 Filtering and Measure Theory

    24/270

    12 Basic probability concepts

    There is no reason to doubt the independence of A1 and A2 in our model. However consider

    now the event B = next weekend weather is good. Suppose thatP(A1 | B) = .90, P(A2 | B) = .95, P(A1 | B) = .40,

    P(A2|

    B)=

    .30, P(B)=

    .75 and P(B)=

    .25.

    Using this information it can be checked that P(A1 A2) = P(A1) P(A2). The reason isthat event B has linked events A1 and A2 in the sense that if we knew that A1 has occurred

    the probability of B should be high, resulting in the probability of A2 increasing.

    The independence concept extends to arbitrary families of events. A family of events

    {A, I} is said to be a family of independent events if and only if any finite subfamilyis independent, i.e., for any finite subset of indices

    {i1, i2, . . . , ik

    } I,

    P(Ai1 Ai2 Aik) = P(Ai1 )P(Ai2 ) . . . P(Aik).A family of-fields {F , I} is said to be a family of independent -fields if and only ifany finite subfamily {Fi1 ,Fi2 , . . . ,Fik} is independent; that is, if and only if any collectionof events of the form {Ai1 Fi1 , Ai2 Fi2 , . . . , Aik Fik} is independent.

    An extremely powerful and standard tool in proving properties which are true

    with probability one is the BorelCantelli Lemma. This lemma concerns sequences of

    events.

    Let {An} be a monotone decreasing sequence of events, i.e.A1 A2 An An+1 . . . ,

    then by definition

    limn

    An =

    n=1An.

    Let

    {Bn

    }be a monotone increasing sequence of events, i.e.

    B1 B2 Bn Bn+1 . . . ,then by definition

    limn

    Bn =

    n=1Bn .

    Let {Cn} be an arbitrary sequence of events. Define

    An = supkn

    Ck= k=n

    Ck,

    and

    Bn = infkn

    Ck=

    k=n

    Ck.

    Event An occurs if and only if at least one of the events Cn, Cn+1, . . . occurs and event Bnoccurs if and only if all the Cn occur simultaneously except for a finite number.

  • 8/6/2019 Filtering and Measure Theory

    25/270

    1.2 Conditional probabilities and independence 13

    By construction, An and Bn are monotone. An is decreasing and Bn is increasing so that:

    A = limn

    An =

    n=1An =

    n=1

    k=n

    Ck,

    and

    B = limn

    Bn =

    n=1Bn =

    n=1

    k=n

    Ck.

    Event A = n=1k=n Ck = lim sup Cn occurs if and only if infinitely many Cn occur, orCn occurs infinitely often (Cn i.o.). To see this suppose that belongs to an infinite number

    of Cn s; then for every n,

    k=n Ck. Therefore,

    n=1

    k=n Ck. Conversely, if belongs to only a finite number of Cn s, then there is some n0 such that

    k=n0 Ck.

    Sincen=1k=n Ck k=n0 Ck, this shows that n=1k=n Ck if belongs to only

    a finite number ofCn s.

    Event B = n=1k=n Ck = lim infCn occurs if and only if all but a finite number ofCn occur.

    Clearly lim infCn lim sup Cn .Consider the following simple example of sequences of intervals in IR.

    Example 1.2.5 Let A and B be any subsets of and define the sequences C2n = A andC2n

    +1

    =B. Then:

    lim sup Cn = A B, lim infCn = A B.

    Example 1.2.6 Let

    Ck = {(x , y) IR2 : 0 x < k, 0 y 0,

    ({ : |f()| }) 1p |f()|

    pd().

    Proof Let F = { : |f()| }. Then

    |f()|pd()

    F

    |f()|pd() p

    F

    d = p(F ).

    In addition to almost sure convergence, which was defined in Example 1.3.11, we have

    the following types of convergence.

  • 8/6/2019 Filtering and Measure Theory

    39/270

    1.3 Random variables 27

    First recall that Lp(, F, P), p 1, is the space of random variables with finite absolutep-th moments, that is, E[|X|p] < .

    {Xk} converges to X in Lp (Xk Lp

    X), (0 < p < ), ifE[

    |Xk

    |p] ] 0 (k ).

    Let Fn (x ) = P[Xn x ], F(x) = P[X x ]. Xn converges in distribution to X (Xn DX) if

    IR

    g(x)dFn(x) IR

    g(x)dF(x),

    for every real valued, continuous bounded function g defined on IR. A necessary and suffi-

    cient condition for that is:

    Fn(x) F(x ),at every continuity point x of F [7].

    These convergence concepts are in the following relationship to each other.

    (Xk a.s. X) (Xk P X) (Xn D X).A useful concept is the uniform integrability of a family of random variables which

    permits the interchange of limits and expectations.

    Definition 1.3.34 A sequence {Xn} of random variables is said to be uniformly integrableif

    supn

    E[|Xn|I{|Xn |>A}] 0, (A ). (1.3.4)

    A family {Xt}, t 0 of random variables is said to be uniformly integrable ifsup

    t

    E[|Xt|I{|Xt|>A}] 0, (A ). (1.3.5)

    Example 1.3.35 IfL is bounded in Lp(,F, P) for some p > 1, then L is uniformly

    integrable.

    Proof Choose A so large that E[|X|p] < A for all X L. For fixed X L, let Y =|X|I{|X|>K}. Then Y() K I{|X|>K} > 0 for all . Since p > 1,

    Yp1

    Kp1 I{|X|>K},

    and

    K1pYp = Yp1

    Kp1Y Y I{|X|>K} = Y.

    Thus

    E[Y] K1pE[Yp] K1pE[|X|p] K1pA,which goes to 0 when K , from which the result follows.

  • 8/6/2019 Filtering and Measure Theory

    40/270

    28 Basic probability concepts

    The following result is a somewhat stronger version of Fatous Lemma 1.3.16.

    Theorem 1.3.36 Let{Xn} be a uniformly integrable family of random variables. Then

    E[lim infXn ] lim infE[Xn ].

    Proof The proof is left as an exercise

    Corollary 1.3.37 Let{Xn} be a uniformly integrable family of random variables such thatXn X (a.s.), then

    E|Xn| < , E(Xn ) E(X), and E|Xn X| 0.

    The following deep result (Shiryayev [36]) gives a necessary and sufficient condition for

    taking limits under the expectation sign.

    Theorem 1.3.38 Let0 Xn X and E(Xn ) < . Then

    E(Xn ) E(X) the family {Xn} is uniformly integrable.

    Proof The sufficiency part follows from Theorem 1.3.36. To prove the necessity, note

    that if x is not a point of positive probability for the distribution of the random variable X

    then XnI

    {Xn

  • 8/6/2019 Filtering and Measure Theory

    41/270

    1.4 Conditional expectations 29

    Let X = i xiIAi be a simple random variable on a probability space (,F, P). Whatis the expected value of X given some event B having positive probability P(B)? Under

    the posterior probability measure P(. | B) this is

    E[X

    |B]

    = xi P(X = xi | B)= 1

    P(B)

    xi P({X = xi } B) =

    1

    P(B)E[X IB ].

    E[X IB ] is the probability weighted sum of the values taken on by X in the event B. We

    divide the weighted sum by P(B) to obtain the weighted average.

    We could write as a definition:

    E[X | B] = E[X IB ]E[IB ]

    = E[X IB ]P(B)

    .

    Let X = IC and Y = IB . The -field (Y) is generated by the atoms B and B. To see this,consider any Borel set B:

    Y1{B} =

    if{0, 1} / B,B if 0 B, 1 / B,B if 1 B, 0 / B, if{0, 1} B.

    Hence (Y) = {, B, B, }.Define

    E[X | Y] = E[X | (Y)] = E[X | atoms of(Y)] = E[X | B, B].

    Or,

    E[IC | B, B]() = P(C | B, B)() = P(C | B)IB () + P(C | B)IB ().

    Hence E[X | Y] is a function constant on the atoms of(Y). That isE[X | Y] is (Y)-measurable.

    Since E[X | Y] is a random variable its mean is:

    E[E[X | Y]] = E[P(C | B)IB () + P(C | B)IB ()]= P(C B) + P(C B) = P(C) = E[X].

    If X is an integrable random variable and Y = i yiIBi is a simple random variable, wewrite

    E[X | Y] = E[X | (Y)] = E[X IBi ]

    P(Bi )IBi ().

    Hence E[X | Y] is (Y)-measurable and

    E[E[X | Y]] =

    E[X IBi ] = E[X].

    The expected value of E[X|

    Y] is the same as the expected value of X.

  • 8/6/2019 Filtering and Measure Theory

    42/270

    30 Basic probability concepts

    Let X L1 (E|X| < ) be a (nonnegative for simplicity) random variable on a prob-ability space (,F, P) and G be a sub--field ofF. The probability space (,G, P) is

    a coarsening of the original one and X is, in general, not measurable with respect to G.

    We seek now a G-measurable random variable, which we denote temporarily by XG , that

    assumes, on average, the same values as X. That is, we seek an integrable random variableXG such that XG is G-measurable and

    A

    XGdP =

    A

    XdP, for all A G.

    Now the set function Q(A) = A

    Xd P is a measure absolutely continuous with respect to

    P, so that the RadonNikodym Theorem 1.3.25 guarantees the existence of a G-measurable

    random variable suggestively denoted by E(X | G), which is uniquely determined excepton an event of probability zero, such that

    A

    XdP =

    A

    E[X | G]dP,

    for all A G. We say that XG is a version of E(X | G). For a general integrable randomvariable X we define E[X | G] as E[X+ | G] E[X | G].

    Remark 1.4.1 Let(,F, P) be given, and suppose Xisan L2 random variable (measurable

    with respect to F). Let G be a sub--algebra ofF, that is, G is less informative than F. A

    natural question is: by observing only G how much can we learn about X? Or, among all

    random variables which are G-measurable which one gives us the best information (in the

    mean square sense) about the random variable X? It turned out that E[X | G] is the closest(G-measurable) random variable to X. This is seen by considering, for any G-measurable

    random variable,

    Z = X E[X | G].

    Then:

    E[(Z Y)2] = E[(X E[X | G])2 + Y2 + 2Y(X E[X | G])]= E[E[(X E[X | G])2 | G]] + E[Y2].

    This is minimized when Y = 0 a.s.

    Example 1.4.2 Let = (0, 1], X() = , P be Lebesgue measure and consider the -field

    G = {(0,1

    4 ], (1

    4 ,

    1

    2 ], (1

    2 ,

    3

    4 ], (3

    4 , 1]} = {A1, A2, A3, A4}.E[X | G] must be constant on the atoms of G so that

    E[X | G]() =

    xiIAi ().

    where xi =E[X IAi ]

    P(Ai ).

    Clearly P(Ai )

    =

    1

    4and E[X IAi ]

    = Aixdx .

  • 8/6/2019 Filtering and Measure Theory

    43/270

    1.4 Conditional expectations 31

    Hence

    E[X | G]() = 18

    IA1 () +2

    8IA2 () +

    5

    8IA3 () +

    7

    8IA4 (),

    which is a G-measurable random variable.

    Example 1.4.3 Let X1, X2 and X3 be three independent, identically distributed (i.i.d.)

    random variables such that

    P(Xi = 1) = p = 1 P(Xi = 0) = 1 q.

    Let S = X1 + X2 + X3. Suppose that we observe X1 and X2 and we wish to find the(conditional) probability that S = 2 given X1 and X2. The -field generated by the (vector)

    random variable (X1, X2) is generated by the atoms {Ai j }, i, j = 0, 1, where Ai j = [ :X1() = i, X2() = j ].

    P(S = 2 | X1, X2)() = P(S = 2 | {X1, X2})()=

    i,j=0,1P(S = 2 | Ai j )IAi j ()

    =

    i,j=0,1

    P(S = 2 Ai j )P(Ai j )

    IAi j ()

    = i,j=0,1

    P(i + j + X3 = 2) P(Ai j )P(Ai j )

    IAi j ()

    =

    i,j=0,1P(X3 = 2 i j )IAi j ()

    = P(X3 = 0)IA11 + P(X3 = 1)I{A10A01}= q IA11 () + p I{A10A01}().

    The expected value of the (

    {X1, X2

    }-measurable) random variable

    P(S = 2 | X1, X2) isE[q IA11 () + p I{A10A01}()] = q P(A11) + p[ P(A01) + P(A01)] = P(S = 2).

    Example 1.4.4 Let f L1[0, 1], i.e. the Lebesgue integral [0,1)

    |f(x)|dx exists and isfinite. Let Fn = {[

    j

    2n,

    j + 12n

    ), j = 0, . . . , 2n 1}. Then

    E[ f | Fn]() =2n1j=0

    (j+1)2nj 2n f(x)dx

    2nI[j 2n ,(j+1)2n )().

    Theorem 1.4.5 If X is real F-measurable random variable and if

    A

    XdP = 0 for allA

    F, then X

    =0 a.s.

  • 8/6/2019 Filtering and Measure Theory

    44/270

    32 Basic probability concepts

    Proof Suppose X 0 and

    A

    Xd P = 0 for all A F. Write An = { : X() 1n }.

    AnXdP 1

    nP(An) 0.

    But

    An

    XdP = 0 so P(An ) = 0 for all n. Therefore,

    P({X > 0} = P(

    An )

    P(An ) = 0.For a general random variable X, recall that X = X+ X, where both X+ and X arenonnegative.

    The following is a list of classical results on conditional expectation:

    1. E(X | A) is unique (a.s.)Proof Let X1 = E(X | A) and X2 be an A-measurable random variable such that

    A

    X2d P =

    A

    Xd P,

    for all A A and let 0 = { : X1 > X2} A. Hence

    0

    X1

    dP= 0 E(X | A) = 0 XdP,

    and 0

    X2d P =

    0

    XdP,

    so that 0

    X1dP =

    0

    X2d P,

    or 0

    (X1 X2)dP = 0.

    Using Theorem 1.4.5 X1 = X2 a.s.

    2. IfA1 and A2 are two sub--fields ofF such that A1 A2, thenE(E(X

    |A1)

    |A2)

    =E(E(X

    |A2)

    |A1)

    =E(X

    |A1). (1.4.1)

    Proof Clearly E(E(X | A1) | A2) = E(E(X | A2) | A1). Now E(E(X | A2) | A1) isA1-measurable and for A A1,

    A

    E(E(X | A2) | A1))dP =

    A

    E(X | A2)d P

    =

    A

    XdP =

    A

    E(X | A1)dP.

    Hence E(E(X|A2)

    |A1)

    =E(X

    |A1) a.s.

  • 8/6/2019 Filtering and Measure Theory

    45/270

    1.4 Conditional expectations 33

    3. If X, Y, X Y L1 and Y is A-measurable then

    E[X Y | A] = Y E[X | A]. (1.4.2)

    Proof It is sufficient to prove the result when X and Y are positive. IfY

    =IA, A

    A,

    then for every B AB

    X YdP =

    ABXdP =

    AB

    E[X | A]dP

    =

    B

    IAE[X | A]dP =

    B

    Y E[X | A]dP.

    That is, E[X Y | A] = Y E[X | A], ifY is an indicator function. It follows that the resultis true for simple functions of sets in A and therefore for a limit of bounded increasing

    sequence of such functions converging to Y.

    4. If X is independent of the -field A, then

    E(X | A) = E(X). (1.4.3)

    Proof First note that E(X) is A-measurable. Now, for A A we have to show that

    A E(X |

    A)dP = A E(X)dP.

    However, the left hand side is equal to E[IAX] and the right hand side is equal to

    E[IA]E[X], and their equality follows from the definition of independence of random

    variables.

    5. Conditional expectation is a projection operation, and so

    E[E[X|A]

    |A]

    =E[X

    |A]. (1.4.4)

    Example 1.4.6 Consider the joint distribution function F(x1,x2) of two real valued random

    variables X1, X2 and the probability measure P on the two-dimensional Borel sets generated

    by the distribution function F(x1,x2). Suppose that P is absolutely continuous with respect

    to two-dimensional Lebesgue measure. Then, by the RadonNikodym theorem, there exists

    a nonnegative density function f(x1,x2) such that for any Borel set B:

    P(B)

    = IB (x1,x2) f(x1,x2)dx1dx2.

    If f(x1,x2) > 0 everywhere,

    P(B | X2 = x2) ={x1:(x1,x2)B} f(x1,x2)dx1+

    f(x1,x2)dx1,

    from which we can deduce thatf(x1,x2)

    + f(x1,x2)dx1

    is the density function of the conditional

    probability measure P(.|

    X2=

    x2).

  • 8/6/2019 Filtering and Measure Theory

    46/270

    34 Basic probability concepts

    Example 1.4.7 Let X1 and X2 be two random variables with a normal joint distribution.

    Then their probability density function has the form

    (x1,x2) =1

    2 121 2exp

    1

    2(1

    2) x

    21 2 x1 x2 + x 22 ,

    where 0 < 1 and xi =xi i

    i, i = 1, 2. The conditional density of X1 given X2 =

    x2 is a normal density with mean 1 + 1

    2(x2 2) and variance Var(X1 | X2 = x2) =

    (1 2)21 < 21 = Var(X1). To see this, recall that, by definition, the conditional densityof X1 given X2 is given by

    (x1 | x2) =(x1,x2)

    IR

    (x1,x2)dx1

    =

    1

    2 12

    1 2exp

    1

    2(1 2)

    x 21 2 x1 x2 + x 22

    1

    2 2exp

    1

    2x 22

    = 12 11

    2exp

    1

    2(1 2) x 21 2 x1 x2 + 2 x 22

    = 12 1

    1 2

    exp

    1

    2(1 2) [ x1 x2]2

    = 12 1

    1 2

    exp

    1221 (1 2)

    x1 (1 +

    1

    2(x2 2))

    2,

    and the result follows.

    Thus by conditioning on X2 we have gained some statistical information about X1 which

    resulted in a reduction in the variability of X1.

    1.5 Problems

    1. Let {Fi }iI be a family of-fields on . Prove that

    iIFi is a -field.2. Let A and B be two events. Express by means of the indicator functions of A and B

    IAB , IAB , IAB , IBA, I(AB)(BA),

    where A B = A B.3. Let = IR and define the sequences C2n = [1, 2 +

    1

    2n) and C2n+1 = [2

    1

    2n + 1 , 1). Show that

    lim sup Cn=

    [

    2,

    2], lim infCn

    =[

    1, 1].

  • 8/6/2019 Filtering and Measure Theory

    47/270

    1.5 Problems 35

    4. Let = (1, 2, 3, 4) and P(1) =1

    12, P(2) =

    1

    6, P(3) =

    1

    3, and P(4) =

    5

    12.

    Let

    An

    = {1, 3} ifn is odd,

    {2, 4} ifn is even.Find P(lim sup An), P(lim infAn ), lim sup P(An), and lim infP(An) and compare.

    5. Give a proof to Theorem 1.3.36.

    6. Show that a -field is either finite or uncountably infinite.

    7. Show that if X is a random variable, then {|X|} {X}.8. Show that the set B0 of countable unions of open intervals in IR is not closed under

    complementation and hence is not a -field. (Hint: enumerate the rational numbers

    and choose, for each one of them, an open interval containing it. Now show that the

    complement of the union of all these open intervals is not in B0.)

    9. Show that the class of finite unions of intervals of the form (, a], (b, c], and (d, )is a field but not a -field.

    10. Show that a sequence of random variables {Xn} converges (a.s.) to X if and only if > 0 limm P[|Xn X| n m] = 1.

    11. Show that if{Xk} converges (a.s.) to X then {Xk} converges to X in probability but theconverse is false.

    12. Consider the probability space (IN,F, P), where IN is the set of natural numbers, F

    is the collection of all the subsets of IN and P({k}) = 12k

    . Let Xk() = I[=k]. Discussthe convergence (a.s.) and in probability of Xk and show that on this particular space

    they are equivalent.

    13. Let {Xn} be a sequence of random variables with

    P[Xn = 2n] = P[Xn = 2n ] =1

    2n,

    P[Xn=

    0]

    =1

    1

    2n1.

    Show that {Xn} converges (a.s.) to 0 but E|Xn|p does not converge to 0.14. Let {Xn} be a sequence of random variables with

    P[Xn = n1/2p ] =1

    n,

    P[Xn = 0] = 1 1

    n.

    Show that

    {Xn

    }does not converge (a.s.) to 0 but E

    |Xn

    |p converges to 0.

    15. Suppose Q is another probability measure on (,F) such that P(A) = 0 impliesQ(A) = 0 (Q P). Show that P-a.s. convergence implies Q-a.s. convergence.

    16. Prove that ifF1 and F2 are independent sub--fields and F3 is coarser than F1, then

    F3 and F2 are independent.

    17. Let = (1, 2, 3, 4, 5, 6), P(i ) = pi =1

    6and the sub--fields

    F1 = {{1, 2}, {3, 4, 5, 6}},F2

    =

    {{1, 2

    },{

    3, 4}

    ,{

    5, 6}}

    .

  • 8/6/2019 Filtering and Measure Theory

    48/270

    36 Basic probability concepts

    Show that F1 and F2 are not independent. What can be said about the sub--fields

    F3 = {{1, 2}, {3}, {4, 5, 6}},

    and

    F5 = {{1, 4}, {2, 5}, {3, 6}}?

    18. Let = {(i, j ) : i, j = 1, . . . , 6} and P({i, j}) = 1/36. Define the quantity

    X() =

    k=0k I{(i,j ):i+j=k}.

    Is X a random variable? Find PX(x ) = P(X = x), calculate E[X] and describe (X),the -field generated by X.

    19. For the function X defined in the previous exercise, describe the random variable

    P(A | X), where A = {(i, j ) : i odd, j even} and find its expected value E[P(A | X)].20. Let be the unit interval (0, 1] and on it be given the following -fields:

    F1 = {(0, 12 ], ( 12 , 34 ], ( 34 , 1]},F2 = {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]},F3 = {(0, 18 ], ( 18 , 28 ], . . . , ( 78 , 1]}.

    Consider the mapping

    X() = x1I(0,

    14

    ]() + x2I

    (14

    , 12

    ]() + x3I

    (12

    , 34

    ]() + x4I

    (34

    , 1]().

    Find E[X | F1], E[X | F2], and E[X | F3].21. Let be the unit interval and ((0, 1], P) be the Lebesgue-measurable space and consider

    the following sub--fields:

    F1=

    {

    (0, 1

    2

    ], ( 1

    2

    , 3

    4

    ], ( 3

    4

    , 1]}

    ,

    F2 = {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]}.

    Consider the mapping

    X() = .

    Find E[E[X | F1] | F2], E[E[X | F2] | F1] and compare.22. Consider the probability measure P on the real line such that:

    P(0) = p, P((0, 1)) = q, p + q = 1,and the random variables defined on = IR,

    X1(x ) = 1 + x, X2(x ) = 0I{x0} + (1 + x)I{0

  • 8/6/2019 Filtering and Measure Theory

    49/270

    1.5 Problems 37

    23. Let X1, X2 and X3 be three independent, identically distributed (i.i.d.) random vari-

    ables such that P(Xi = 1) = p = 1 P(Xi = 0) = 1 q. Find P(X1 + X2 + X3 =s | X1, X2).

    24. Let X1, X2 and X3 be three random variables with multinomial distribution with pa-

    rameters p1, p2, p3, n, that is

    P(X1 = n1, X2 = n2, X3 = n3) =n!p

    n11 p

    n22 p

    n33

    n1!n2!n3!,

    where n1, n2 and n3 are nonnegative integers such that n1 + n2 + n3 = n. Show that ifn is a random variable with Poisson distribution with parameter then the three random

    X1, X2 X3 become mutually independent with Poisson distributions.

    25. On = [0, 1] and P being Lebesgue measure show that

    X = x1I(0, 12 ] + x2I( 12 ,1] and Y = y1I(0, 14 ]( 34 ,1] + y2I( 14 ], 34 ]are independent.

    26. Show that (see Example 1.4.4)

    E[ f | Fn] =2n1j=0

    (j+1)2nj 2n f(x)dx

    2nI[j 2n ,(j+1)2n )

    converges a.s. and in L1 to f as n .In particular, if f = IE for some Borel set E, then

    2n1j=0

    (E [j 2n, (j + 1)2n))2n

    I[j 2n ,(j+1)2n )(x)a.s. IE(x),

    x [0, 1]. Here (.) is the Lebesgue measure.

  • 8/6/2019 Filtering and Measure Theory

    50/270

    2

    Stochastic processes

    2.1 Definitions and general results

    A stochastic process is a mathematical model for any phenomenon evolving or varying intime (or over some index set), subject to random influences. Examples include the price

    of a commodity observed through time, the fluctuating water level behind a dam or the

    distribution of shades in a noisy image observed over a region of IR2. Suppose (,F)

    is a measurable space. We shall define a stochastic process to be a mapping X(index)()

    from {index space} into a second measurable space (E, E), called the state space,or the range space. Alternatively, we can consider a stochastic process as a family {Xt}t

    {index space

    }of random variables all defined on a measurable space (,F).

    For a fixed simple outcome , X(.)() is a function describing one possible trajectory, orsample path, followed by the process. If the time index is frozen at t, say, then we have a

    random variable Xt(.), i.e. an F-measurable function of.

    When the time index t is continuous, measurability, continuity, etc. in t are considered.

    A continuous-time stochastic process {Xt} is said to have independent increments if forall t0 < t1 < t2 < < tn, the random variables Xt1 Xt0 , Xt2 Xt1 , . . . , Xtn Xtn1 areindependent. If for all s, Xt+s Xt has the same distribution for all t, {Xt} is said to possessstationary increments.

    Sometimes, a stochastic process is interpreted as just a single random variable takingvalues in a space of functions, that is, with each is associated a function. In analogy with

    real random variables, the state space is then endowed with a Borel -field (generated by

    the open sets of an underlying topology).

    Example 2.1.1 Let

    = {1, 2, . . . },

    and let the time index n be finite 0 n N. A stochastic process X in this setting is atwo-dimensional array or matrix such that:

    X =

    X1(1) X1(2) . . .

    X2(1) X2(2) . . .

    . . . . . . . . .

    XN(1) XN(2) . . .

  • 8/6/2019 Filtering and Measure Theory

    51/270

    2.1 Definitions and general results 39

    Each row represents a random variable and each column is a sample path or a realization

    of the stochastic process X. If the time index is unbounded, each sample path is given by

    an infinite sequence.

    Example 2.1.2 Let N = 4 in the previous example and suppose that X is given by thefollowing array.

    2 3 5 7 11 3 2.3 1

    1 1 5.7

    2 3 6 83 19

    11 7 70 3 2 5 2 215 3 2 1 0 1 2 3

    The sample space of{Xn} is IR4 and the stochastic process can be thought of as a mapping(in fact a random variable)

    i X(i ) = (X1(i ), . . . , X4(i )) = (x i1,x i2,x i3,x i4)= x i IR4.

    The random variable X induces a probability measure PX on the Borel -field B(IR4) in the

    usual way, i.e., for any B B(IR4),

    PX(B)= P[ : X() B] = P(X1(B)).

    For instance,

    B1 = {x IR4 : 3 x1 5, 2 x2 7}

    contains a single trajectory (column 6 in the table) so that PX(B1) = P(6).

    B2 = {x IR4

    : max1n4xn 7}contains four trajectories (column 2, column 3, column 4 and column 6 in the table) so that

    PX(B2) = P(2, 3, 4, 6).

    Example 2.1.3 Let = {1, 2, . . . } and P be a probability measure on (,F). Supposethat the time index set is the set of positive integers. A real valued stochastic process X in

    this setting is a two-dimensional infinite array such that:

    X =X1(1) X1(2) . . .

    X2(1) X2(2) . . .

    . . . . . . . . .

    .

    Here the sample space is

    IR

    = {(x1,x2, . . . )

    IR

    IR

    . . .

    }.

  • 8/6/2019 Filtering and Measure Theory

    52/270

    40 Stochastic processes

    Note that the Borel -field B(IR) coincides with the smallest -field containing the open

    sets in IR in the metric (x 1,x 2) =

    k 2k |x 1k x 2k|

    1 + |x 1k x 2k|([36]).

    Now think of the stochastic process X as an IR valued random variable

    i X(i ) = (X1(i ), X2(i ), . . . ) = (x i1,x i2, . . . )= x i IR.

    The random variable X induces a probability measure PX on the -field B(IR). For

    instance, if

    A = {x IR : supxn > a} B(IR),

    then the set A consists of all sequences with some of their entries larger than a and PX(A)

    =P( : X() A).

    Example 2.1.4 (The Single Jump Process) Consider a stochastic process {Xt}, t 0, whichtakes its values in some measurable space {E, E} and which remains at its initial valuez0 Euntil a random time T, when it jumps to a random position Z. A sample path of the process

    is

    Xt() = z0 ift < T(),Z() ift T().The underlying probability space can be taken to be

    = [0, ] E,

    with the -field B E. A probability measure P is given on (, B E) and we suppose

    P([, 0] {z0}) = 0 = P({0} E),so that the probabilities of a zero jump and a jump at time zero are zero.

    Write

    Ft = P[T > t, Z E],c = inf{t : Ft = 0}.

    Ft is right-continuous and monotonic decreasing, so there are only countably many pointsof discontinuity {u} = D where Fu = Fu Fu = 0. At points in D, there are positiveprobabilities that X jumps. Note that the more probability mass there is at a point u, the

    more predictable is the jump at that point.

    Formally define a function by setting:

    d(t) = P(T ]t dt, t], Z E | T > t dt).

    Then is the probability that the jump occurs in the interval ]t

    dt, t], given it has not

  • 8/6/2019 Filtering and Measure Theory

    53/270

    2.1 Definitions and general results 41

    happened at t dt. Roughly speaking we have

    d(t) = P(T ]t dt, t] | T > t dt)

    =

    P(T ]t dt, t])

    Ftdt

    = 1 Ft (1 Ftdt)Ftdt

    = (Ft Ftdt)Ftdt

    = (Ft Ft)Ft

    = dFtFt

    .

    Define

    (t) =

    ]0,t[

    dFs

    Fs. (2.1.1)

    For instance, ifT is exponentially distributed with parameter we have

    (t) = ]0,t[

    d exp(

    s)

    exp( s) = .

    Write

    FAt = P[T > t, Z A],

    then clearly the measure on (IR+,B(IR+)) given by FAt is absolutely continuous with respectto that given by Ft, so that there is a RadonNikodym derivative (A, s) such that

    FAt FA0 =

    ]0,t[

    (A, s)dFs . (2.1.2)

    The pair (, )istheLevy system for the jump process. Roughly, (dx , s) is the conditional

    distribution of the jump position Z given the jump happens at time s.

    Let Xt be a continuous time stochastic process. That is, the time index belongs to some

    interval of the real line, say, t [0, ). If we are interested in the behavior of Xt duringan interval of time [t

    0, t

    1] it is necessary to consider simultaneously an uncountable family

    of Xts {Xt, t0 t t1}. This results in a technical problem because of the uncountabilityof the index parameter t. Recall that -fields are, by definition, closed under countable

    operations only and that statements like {Xt x, t0 t t1} =

    t0tt1{Xt x} are notevents! However, for most practical situations this difficulty is bypassed by replacing un-

    countable index sets by countable dense subsets without losing any significant information.

    In general, these arguments are based on the separability of a continuous time stochastic

    process. This is possible, for example, if the stochastic process Xis almost surely continuous

    (see Definition 2.1.6).

  • 8/6/2019 Filtering and Measure Theory

    54/270

    42 Stochastic processes

    Let X = {Xt : t 0} and Y = {Yt : t 0} be two stochastic processes defined on thesame probability space (,F, P). Because of the presence of, the functions Xt() and

    Yt() can be compared in different ways.

    Definition 2.1.5

    1. X and Y are calledindistinguishable if

    P({ : Xt() = Yt(), t 0}) = 1.2. Y is a modification of X if for every t 0, we have

    P({ : Xt() = Yt()}) = 1.3. X and Y have the same law or probability distribution if and only if all their finite dimen-

    sional probability distributions coincide, that is, if and only if for any sequence of times0 t1 tn the joint probability distributions of (Xt1 , . . . , Xtn ) and (Yt1 , . . . , Ytn )coincide.

    Note that the first property is much stronger than the other two. The null sets in the second

    and third properties may depend on t.

    Recall that there are different definitions of limitfor sequences of random variables. So

    to each definition corresponds a type of continuity of a real valued time index process.

    Definition 2.1.6

    1. {Xt} is continuous in probability if for every t and > 0,limh0

    P[|Xt+h Xt| > ] = 0.

    2. {Xt} is continuous in Lp if for every t ,limh0

    E[|Xt+h Xt|p] = 0.

    3. {Xt} is continuous almost surely (a.s.) if for every t,P[lim

    h0Xt+h = Xt] = 1.

    4. {Xt} is right continuous if for almost every the map t Xt() is right continuous.That is,

    limst

    Xs = Xt a.s.

    If in addition

    limst

    Xs = Xt exists a.s.,

    {Xt} is right continuous with left limits (rcll or corlol or cadlag).However, none of the above notions is strong enough to differentiate, for instance, between a

    process for which almost all sample paths are continuous for every t, and a process for which

    almost all sample paths have a countable number of discontinuities, when the two processes

    have the same finite dimensional distributions. A much stronger criterion for continuity is

    sample paths continuity which requires continuity for all ts simultaneously! In other words,

  • 8/6/2019 Filtering and Measure Theory

    55/270

    2.1 Definitions and general results 43

    for almost all the function X(.)() is continuous in the usual sense. Unfortunately, the

    definition of a stochastic process in terms of its finite dimensional distributions does not

    help here since we are faced with whole intervals containing uncountable numbers of ts.

    Fortunately, for most useful processes in applications, continuous versions (sample path

    continuous), or right-continuous versions, can be constructed.If a stochastic process with index set [0, ) is continuous its sample space can be

    identified with C[0, ), the space of all real valued continuous functions. A metric on thisspace is

    (x, y) =

    k

    2ksup0tk |x(t) y(t)|

    1 + sup0tk |x(t) y(t)|,

    for x , y C[0, ). (See [36].)Let B(C) be the smallest -field containing the open sets of the topology induced by on

    C[0, ), the Borel -field. Then ([36]) the same -field B(C) is generated by the cylindersets ofC[0, ) which have the form

    {x C[0, ) : xt1 I1, xt2 I2, . . . , xtn In},where each Ii is an interval of the form (ai , bi ]. In other words, a cylinder set is a set

    of functions with restrictions put on a finite number of coordinates, or, in the language of

    Shiryayev ([36]), it is the set of functions that, at times t1, . . . , tn, get through the windows

    I1

    , . . . , In

    and at other times have arbitrary values.

    An example of a Borel set from B(C) is

    A = {x : supxt > a, t 0}.Remark 2.1.7 Note that the set given by A depends on the behavior of functions on an

    uncountable set of points and would not be in the -field B(C) ifC[0, ) were replaced bythe much larger space IR[0,) (see Theorem 3, page 146 of [36]). In this latter space everyBorel set is determined by restrictions imposed on the functions x , on an at most countable

    set of points t1, t2, . . . .

    Suppose the index parameter t is either a nonnegative integer or a nonnegative real

    number. The -fields FXt = {Xu , u t} are the smallest ones with respect to which therandom variables Xu , u t, are measurable, and are naturally associated with any stochasticprocess {Xt}. FXt is sometimes called the natural filtration associated with the stochasticprocess {Xt}.

    The -field FXt contains all the events which by time t are known to have occurred or

    not by observing X up to time t.

    Often it is convenient to consider larger -fields than FXt . For instance, {Ft ={Xu , Yu ; u t} where {Yt} is another stochastic process.Definition 2.1.8 The stochastic process X is adapted to the filtration {Ft, t 0} if for eacht 0 Xt is a Ft-measurable random variable.Clearly X is adapted to FXt . A function f is F

    Xt -measurable if the value of f() can be

    decided by observing the history of X up to time t (and nowhere else). This follows from

    the multivariate version of Theorem 1.3.6. For instance, f()=

    Xt2 () is FXt -measurable

    for 0 < t < 1 but it is not FXt -measurable for t 1.

  • 8/6/2019 Filtering and Measure Theory

    56/270

    44 Stochastic processes

    As a function of two variables (t, ), a stochastic process should be measurable with

    respect to both variables to allow a minimum of good behavior.

    Definition 2.1.9 A stochastic process {Xt} with t [0, ) on a probability space

    {,F, P

    }is measurable if, for all Borel sets B in the Borel -fieldB(IRd),

    {(, t) : Xt() B} F B([0, )).If the probability space {,F, P} is equipped with a filtration {Ft} then a much strongerstatement of measurability which relates measurability in t and with the filtration {Ft} isprogressive measurability.

    Definition 2.1.10 A stochastic process {Xt} on a filtered probability space {, F,Ft, P}is progressively measurable if, for any t [0, ) and for any set B in the Borel -fieldB(IRd),

    {(, s) : s t, Xs () B} Ft B([0, t]).Here B([0, t]) is the -field of Borel sets on the interval [0, t].

    A measurable process need not be progressively measurable since (Xt) may contain events

    not in Ft.

    Lemma 2.1.11 If X is a progressively measurable stochastic process, then X is adapted.

    Proof The map (s, ) from [0, t] is Ft-measurable. The map (s, ) Xs () from [0, t] to the state space of X is Ft-measurable. By composition of the twomaps the result follows.

    Theorem 2.1.12 If the stochastic process {Xt : t 0} on the filtered probability space{,F,Ft, P} is measurable and adapted, then it has a progressively measurable modifi-cation.

    Proof See [28] page 68.

    Typically, in a description of a random process, the measure space and the probability

    measure on it are not given. One simply describes the family of joint distribution functions

    of every finite collection of random variables of the process. A basic question is whether

    there is a stochastic process with such a family of joint distribution functions. The following

    theorem ([36] page 244), due to Kolmogorov, guarantees us that this is the case if the joint

    distribution functions satisfy a set of natural consistency conditions.

    Theorem 2.1.13 (Kolmogorov Consistency Theorem) For all t1, . . . , tk, k IN, in the timeindex T , let Pt1,...,tk be probability measures on (IR

    k,B(IRk)) such that

    Pt(1) ,...,t(k) (F1 Fk) = Pt1,...,tk(F1(1) F1(k)).for all permutations on {1, 2, . . . , k} and

    Pt1

    ,...,tk

    (F1

    Fk)=

    Pt1

    ,...,tk

    ,tk+1

    ,...,tk+m

    (F1

    Fk

    IRn

    IRn ),

  • 8/6/2019 Filtering and Measure Theory

    57/270

    2.1 Definitions and general results 45

    for all m IN, and the set on the right hand side has a total of k+ m factors. Then thereis a unique probability measure P on the space (IRT,B(IRT)) such that the restriction of P

    to any cylinder set Bn = {x IRT : xt1 I1, xt2 I2, . . . , xtn In} is Pt1,...,tn, that is

    P(Bn )

    =Pt1 ,...,tn (Bn).

    Proof See [36] page 167.

    Theorem 2.1.14 ( Kolmogorovs Existence Theorem). For all 1, . . . , k, k IN and inthe time index let P1,...,k be probability measures on IR

    nk such that

    P(1),...,(k) (F1 Fk) = P1,...,k(F1(1) F1(k)),

    for all permutations on {1, 2, . . . , k} andP1,...,k(F1 Fk) = P1,...,k,k+1,...,k+m (F1 Fk IRn IRn),

    for all m IN, and the set on the right hand side has a total of k+ m factors. Then thereexist a probability space (,F, P) and a stochastic process {X} on into IRn suchthat

    P1,...,k(F1 Fk) = P[X1 F1, . . . , Xk Fk],

    for all i in the time set, k IN and all Borel sets Fi .Proof The proof follows essentially from Theorems 1.3.9, 1.3.10 and 2.1.13. See [36]

    page 247.

    Definition 2.1.15 Suppose X is a stochastic process whose index set is the positive integers

    Z+. Suppose Fn is a filtration. Then {Xn} is predictable if Xn is Fn1-measurable, that is,Xn() is known from observing events in Fn

    1 at time n

    1.

    In continuous time, without loss of generality, we shall take the time index set to be

    [0, ).In the continuous time case, roughly speaking, a stochastic process {Xt} is predictable if

    knowledge about the behaviorof the process is left-continuous, that is, Xt isFt-measurable.Stated differently, for processes which are continuous on the left one may predict their value

    at each point by their values at preceding points. A Poisson process (see Section 2.10) is not

    predictable (its sample paths are right-continuous) otherwise we would be able to predict a

    jump time immediately before it jumps. More precisely, a stochastic process is predictableif it is measurable with respect to the -field on [0, ) generated by the family of allleft-continuous adapted stochastic processes.

    A stochastic process X with continuous time parameter is optional if it is measurable

    with respect to the -field on [0, ) generated by the family of all right-continuous,adapted stochastic processes which have left limits.

    Definition 2.1.16 A measurable stochastic process {Xt} with values in [0, ), is called anincreasing process if almost every sample path X() is right-continuous and increasing.

  • 8/6/2019 Filtering and Measure Theory

    58/270

    46 Stochastic processes

    Theorem 2.1.17 Suppose {Xt} is an increasing process. Then Xt has a unique decomposi-tion as Xct + Xdt , where {Xct} is an increasing continuous process, and{Xdt } is an increasingpurely discontinuous process, that is, {Xdt } is the sum of the jumps of{Xt} .

    If{Xt} is predictable {Xdt } is predictable. If{Xt} is adapted{Xct} is predictable.Proof See [11] page 69.

    2.2 Stopping times

    One of the most important questions in the study of stochastic processes is the study of

    when a process hits a certain level or enters a certain region in its state space for the first

    time. Since for each possible trajectory, or realization , there is a hitting time (finite or

    infinite), the hitting time is a random variable taking values in the index, or time, space of

    the stochastic process.Let IN = {1, 2, 3, . . . , } and F = {n=1Fn}

    = n=1 Fn .A random variable taking values in IN is a stopping time (or optional or Markov

    time) with respect to a filtration {Fn} if for all n IN we have { : () n} Fn . Anequivalent definition in discrete time is to require { : () = n} Fn .

    The concept of stopping time is directly related to the concept of the flow of information

    through time, that is, the filtration. The event { : () n} is Fn-measurable, that is,measurable with respect to the information available up to time n. This means a stopping

    time is a nonanticipative function, whereas a general random variable may anticipate thefuture.

    Example 2.2.1 Let {Xn,Fn} be an adapted process (i.e. {Fn} is a filtration and Xn is Fn -measurable for all n). Suppose A is a measurable set of the state space of X. Then the

    random time

    = min{k : Xk A}is a stopping time since

    { n} =n

    k=1{Xk A} Fn .

    If is a stopping time with respect to a filtration Fn so is + m, m IN. However, m, m IN is not a stopping time since the event { m = n} = { = n + m} is notin Fn ; it is in Fn

    +m and hence anticipates the future.

    In order to measure the information accumulated up to a stopping time we should define

    the -field F of events prior to a stopping time . Suppose that some event B is part

    of this information. This means that if n we should be able to tell whether or notB has occurred. However, { n} Fn so that we should have B { n} Fn andBc { n} Fn . We, therefore, define:

    F = {A F : A { : () n} Fn n 0}.The next examples should help to clarify this concept.

  • 8/6/2019 Filtering and Measure Theory

    59/270

    2.2 Stopping times 47

    Example 2.2.2 Let = {i ; i = 1, . . . , 8} and the time index T = {1, 2, 3}. Consider thefollowing filtration:

    F1 = {{1, 2, 3, 4, 5, 6}, {7, 8}},F2

    =

    {{1, 2

    },{

    3, 4}

    ,{

    5, 6}

    ,{

    7, 8}}

    ,

    F3 = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}}.Now define the random variable

    (1) = (2) = (5) = (6) = 2,(3) = (4) = (7) = (8) = 3,

    so that

    { = 0} = , { = 1} = ,{ = 2} = {1, 2, 5, 6},{ = 3} = {3, 4, 7, 8},

    and is a stopping time.

    Now F = { all events A F(= F3) such that for some n the event A is a subset ofthe event { : () n} }. In our situation

    F

    =

    {{1, 2

    },

    {5, 6

    },

    {3

    },

    {4

    },

    {7

    },

    {8

    }}.

    Note that the first two simple events ofF, {1, 2}, {5, 6}, are in F2 and the rest are inF3 as they should be. Also, note that F is notthe -field generated by the random variable

    . However, a closer look shows that is F-measurable. If, for instance, the outcome is 1then = 2 and 1(2) = { = 2} = {1, 2, 5, 6} is an atom of the -field generatedby the random variable but not an atom ofF.

    Example 2.2.3 Consider again the experiment of tossing a fair coin infinitely many times.

    Each is an infinite sequence of heads and tails and

    = {H, T}IN.Define the filtration:

    F1 = {{ starting with H}, { starting with T}},F2 = {{ starting with HH}, { starting with HT}, { starting with TH},

    { starting with TT}}, . . . ,Fn

    =

    {{ starting with n fixed letters

    }}Suppose that we win one dollar each time Heads comes up and lose one otherwise. Let

    S0 = 0 and Sn be our fortune after the n-th toss. Define the random variable = inf{n :Sn > 0}, which is the first time our winnings exceed our losses. Clearly, is a stoppingtime with respect to the filtration Fn .

    Here

    F = {{ starting with H}, { starting with THH},

    { starting with THTHH

    },{

    starting with TTHHH}

    , . . .}

    .

  • 8/6/2019 Filtering and Measure Theory

    60/270

    48 Stochastic processes

    and

    ( starting with H) = 1,( starting with THH) = 3,( starting with THTHH)

    =( starting with TTHHH)

    =5.

    If = T H T H H . . . , then the information at time (T H T H H . . . ) = 5 is in F5 and isgiven by the event composed of all the smaller events starting with T H T H H and is an

    atom ofF . However { = 5} = {{T H T H H . . . }, {T T H H H . . . }} which is not an atomofF .

    If are two stopping times then F F , because if A F,

    A{ n} = (A{ n}){ n} Fn (2.2.1)for all n. From this result we see that if {n} is an increasing sequence of stopping times,the sequence {Fn } is a filtration.

    Example 2.2.4 Let = {i , i = 1, . . . , 8} and the time index T = {1, 2, 3, 4}. Considerthe following filtration:

    F1 = {{i , i = 1, . . . , 6}, {7, 8}},F

    2 =

    {{

    1,

    2,

    3},{

    4

    , 5

    , 6}

    ,{

    7

    , 8}}

    ,

    F3 = {{1, 2}, {3}, {4}, {5, 6}, {7, 8}},F4 = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}}.

    Now define the stopping times 1 and 2:

    1(1) = 1(2) = 1(3) = 1(4) = 1(5) = 1(6) = 2,1(7) = 1(8) = 3,

    2(1) = 2(2) = 2(3) = 2, 2(5) = 2(6) = 3,2(4) = 2(7) = 2(8) = 4,

    so that 1 2 and F1 F2 , where

    F1 = {{1, 2, 3}, {4, 5, 6}, {7, 8}},F2 = {{1, 2, 3}, {4}, {5, 6}, {7}, {8}}.

    For any Borel set B,

    { : X()() B} =

    n=0{Xn () B, () = n} F,

    that is, X is a random variable.

    If X has been defined and X F =n

    Fn, then we define X () = X()(), i.e.X

    = nIN XnI{=n} F, that is, X is F -measurable.

  • 8/6/2019 Filtering and Measure Theory

    61/270

    2.2 Stopping times 49

    In the continuous time situation, definitions are more involved and the time parameter

    t plays a much more important role since continuity, limits etc. enter the scene. Let {Ft},t [0, ) be a filtration. A nonnegative random variable is called a stopping time withrespect to the filtration Ft if for all t 0 we have { : () t} Ft.

    A nonnegative random variable is an optional time with respect to the filtration Ft iffor all t 0 we have { : () < t} Ft.

    Every stopping time is optional, and the two concepts coincide if the filtration is

    right-continuous since { : () t} Ft+ for every > 0, and hence { : () t} >0 Ft+ = Ft+ = Ft provided that Ft is right-continuous.

    Example 2.2.5 Suppose {Xt, t 0} is continuous and adapted to the filtration {Ft, t 0}.

    1. Consider () = inf{t, Xt() = b}, the first time the process X hits level b IR (firstpassage time to a level b IR). Then is a stopping time since

    { t} =nIN

    {rQ,rt}

    {|Xr b| 1

    n} Ft.

    2. Consider () = inf{t, |Xt()| 1}, the first time the process X leaves the interval[1, +1]. Then is a stopping time.

    3. Consider () = inf{t, Xt() > 1} which is the first time the jump Xt = Xt Xtexceeds 1. Then is a stopping time.

    Similarly to the discrete time case, the -field of events prior to a stopping time is

    defined by

    F = {A F : A { : () t} Ft t 0}. (2.2.2)

    Any stopping time is F -measurable as, for s

    t,

    { : () s} { : () t} = { : () min(t, s)} Fmin(t,s) Ft. (2.2.3)

    Hence { : () s} F.If 1, 2 are stopping times, then min(1, 2), max(1, 2) and 1 + 2 are stopping

    times as:

    1. {min(1, 2) t} = {1 t} {2 t} F

    t,2. {max(1, 2) t} = {1 t} {2 t} Ft,3. {1 + 2 t} = {1 = 0, 2 = t} {2 = 0, 1 = t}

    p,qQ,p+qt({1 p} {2 q}

    ,

    where Q is the set of rational numbers.

    4. If{n} is a sequence of stopping times then sup n is a stopping time since {sup n t} =

    n{n t} Ft.

    5. If1, 2 are stopping times such that 1

    2 then F1

    F2

    .

  • 8/6/2019 Filtering and Measure Theory

    62/270

    50 Stochastic processes

    Perhaps one of the most important applications of the concept of stopping time is the

    so-called strong Markov property.

    A stochastic process {Xt} is a Markov process if

    E[ f(Xt+

    s )|F

    X

    t

    ]=

    E[ f(Xt+

    s )|

    Xt], ( P-a.s.) (2.2.4)

    where f is any bounded measurable function and FXt = {Xu , u t}. Equation (2.2.4) istermed the Markov property.

    A natural generalization of the Markov property is the strong Markov property, where

    the present time t in (2.2.4) is replaced by a stopping time and the future time t + s isreplaced by another later stopping time. That is, if and are stopping times and ,

    E[X

    |F ]

    =X a.s.

    In other words a stochastic process {Xt} has the strong Markov property if the informationabout the behavior of{Xt} prior to the stopping time is irrelevant in predicting its behaviorafter that time once X is observed.

    2.3 Discrete time martingales

    Martingales are probably the most important type of stochastic processes used for modeling.

    They occur naturally in almost any information processing problem involving sequentialacquisition of data: for example, the sequence of estimates of a random variable based on

    increasing observations, and the sequence of likelihood ratios in a sequential hypothesis

    test are martingales.

    The stochastic process X is a submartingale (supermartingale) with respect to the filtra-

    tion {Fn} if it is

    1. Fn-adapted,

    2. E[|Xn|] < for all n, and3. E[Xn | Fn ] Xn a.s. (E[Xn | Fn ] Xn a.s.) for all n n.

    The stochastic process X is a martingale if it is a submartingale and a supermartingale.

    If we recall the definition of conditional expectation we see that the requirement E[Xn+1 |Fn] = Xn a.s. implies the following:

    FE[Xn+1 | Fn ]dP = F

    Xn+1dP, F Fn ,

    and F

    Xn dP =

    F

    Xn+1d P, F Fn. (2.3.1)

    Since Fn Fn+1 Fn+k, it easily seen that

    FXnd P = F Xn+1dP = FXn+kdP, F Fn. (2.3.2)

  • 8/6/2019 Filtering and Measure Theory

    63/270

    2.3 Discrete time martingales 51

    and hence with probability 1 E[Xn+k | Fn ] = Xn . Setting F = and n = 1, 2, . . . in(2.3.2) gives

    E[X1] = E[X2] = = E[Xn].

    A classical example of a martingale X is a players fortune in successive plays of a fair game.If X0 is the initial fortune, then fair means that, on average, the fortune at some future

    time n, after more plays, should be neither more nor less than X0. If the game is favorable

    to the player, then his fortune should increase on average and Xn is a submartingale. If the

    game is unfavorable to the player, Xn is a supermartingale.

    The following important inequality is used to prove a fundamental result on constructing

    a uniformly integrable family of random variables by conditioning a fixed (integrable)

    random variable on a family of sub--fields.

    Lemma 2.3.1 (Jensens Inequality). Suppose X L 1. If : IR IR is convex and(X) L1, then

    E[(X) | G] (E[X | G]). (2.3.3)

    Proof (see, for example, [11]) Any convex function : IR IR is the supremum of afamily of affine functions, so there exists a sequence (n) of real functions with n(x) =anx + bn for each n, such that = supn n . Therefore (X) anX + bn holds a.s. for each(and hence all) n. So by the positivity ofE[. | G], E[(X) | G] supn (anE[X | G] + bn) =(E[X | G]) a.s.

    Lemma 2.3.2 Let X Lp, p 1. The family

    L = {E[X | G] : G is a sub--field ofF},

    is uniformly integrable.

    Proof Since (x) = |x|p is convex, Jensens Inequality 2.3.1 implies that

    |E[X | G]|p E[|X|p | G].

    Hence

    E[|E[X | G]|p] E[E[|X|p | G]] = E[|X|p],

    that is, E[

    |E[X

    |G]

    |p] b, for1 i k.

    The following theorem is a useful tool in proving convergence results for submartingales.

    Theorem 2.3.6 (Doob). If{Xn,Fn} is a submartingale then for all n 1,

    E[Cn [a, b]] E[Xn a]+

    b a ,

    where and[Xn a]+ = max{(Xn a), 0}.

    Proof See [36] page 474.

    Theorem 2.3.7 If{Xn,Fn} is a nonnegative martingale then Xn X a.s., where X is anintegrable random variable.

    Proof Suppose that the event { : lim infXn () < lim sup Xn()} =

    p 0. (2.3.5)

  • 8/6/2019 Filtering and Measure Theory

    65/270

    2.3 Discrete time martingales 53

    This means that {Xn} oscillates about or up-crosses the interval [a, b] infinitely many times.However, using Theorem 2.3.6 and the fact that sup E[Xn ] = E[X1] < we have:

    limn

    E[Cn[a, b]]

    lim

    n

    E[Xn a]+

    b a

    E[X1] + |a|

    b an(Xn+1 Xn ) | Fn ]

    = I{>n}E[(Xn+1 Xn) | Fn ] = 0,since { > n} Fn .

    We also have that stopping at an optional time preserves the martingale property.

    Theorem 2.3.12 (Doob Optional Sampling Theorem). Suppose {Xn ,Fn} is a martingale.Let (a.s.) be stopping times such that X and X are integrable. Also supposethat

    lim inf

    {n}

    |Xn|dP 0, (2.3.6)

    and

    lim inf

    {n}

    |Xn|d P 0. (2.3.7)

    Then

    E[X | F] = X. (2.3.8)

    In particular E[X ] = E[X ].

    Proof Using the definition of conditional expectation, we have to show that for every

    A F,

    A I{}E[X | F ]dP =

    A I{}X dP =

    A I{}Xd P.

    However, { } =n0{ = n} { n}. Hence it suffices to show that, for all n 0:A

    I{=n}{n}X dP =

    A

    I{=n}{n}XdP

    =

    A

    I{=n}{n}XndP. (2.3.9)

    Now, { : () n} = { : () = n}{ : () n + 1} and in view of (2.3.1), thelast integral in (2.3.9) is equal to

    A{=n}{=n}XndP +

    A{=n}{n+1}

    Xn+1dP

    =

    A{=n}{=n}X d P +

    A{=n}{n+1}

    Xn+1dP. (2.3.10)

  • 8/6/2019 Filtering and Measure Theory

    67/270

    2.3 Discrete time martingales 55

    Also, { : () n} = { : n () n + 1}{ : () n + 2} and using (2.3.1)again, (2.3.10) equals

    A{=n}{nn+1}X d P + A{=n}{n+2}

    Xn+2dP.

    Repeating this step k times,A

    I{=n}{n}Xn dP =

    A{=n}{nn+k}X dP

    +

    A{=n}{n+k+1}Xn+k+1dP,

    that is A{=n}{nn+k}

    X d P =

    A{=n}{n}XndP

    A{=n}{n+k+1}Xn+k+1d P.

    Now,

    Xn+k+1 = X+n+k+1 Xn+k+1= 2X+n+k+1 (X+n+k+1 + Xn+k+1) = 2X+n+k+1 |Xn+k+1|

    so that A{=n}{nn+k}

    X dP =

    A{=n}{n}Xnd P

    2

    A{=n}{n+k+1}X+n+k+1d P

    + A{=n}{n+k+1}

    |Xn+k+1|dP. (2.3.11)

    Taking the limit when k of both sides of (2.3.11) and using (2.3.7), we obtainA{=n}{n}

    X d P =

    A{=n}{n}XndP,

    which establishes (2.3.9) and finishes the proof.

    Definition 2.3.13 The stochastic process {Xn,Fn} is a local martingale if there is a se-quence of stopping times {k} increasing to with probability 1 and such that{Xnk,Fn}is a martingale.

    Remark 2.3.14 The interesting fact about local martingales is that they can be obtained

    rather naturally through a martingale transform (stochastic integral in the continuous time

    case) which is defined as follows. Suppose {Yn ,Fn} is a martingale and {An,Fn} is a

  • 8/6/2019 Filtering and Measure Theory

    68/270

    56 Stochastic processes

    predictable process. Then the sequence

    Xn = A0Y0 +n

    k=1Ak(Yk Yk1)

    is called a martingale transform and is a local martingale.Proof To show that {Xn ,Fn} is a local martingale we have to find a sequence of stop-ping times {k}, k 1, increasing to infinity (P-a.s.) and such that the stopped pro-cess {Xmin(n,k),Fn} is a martingale. Let k = inf{n 0 : |An+1| > k}. Since A is pre-dictable the k are stopping times and clearly k (P-a.s.). Since Y is a martingale and|Amin(n,k)I{k>n}| k then, for all n 1,

    E[|Xmin(n,k)I{k>n}| < .

    Moreover, from Theorem 2.3.11,

    E[(Xmin(n+1,k) Xmin(n,k))I{k>n} | Fn]= I{k>n}Amin(n+1,k)E[Ymin(n+1,k) Ymin(n,k) | Fn] = 0.

    This finishes the proof.

    Example 2.3.15 Suppose that you are playing a game using the following strategy. At

    each time n your stake is An. Write Xn for the state of your total gain through the n-th game

    with X0 = 0 for simplicity.Write Fn = {Xk : 0 k n}. We suppose for each n, An is Fn1 measurable, that

    is A = {An} is predictable with respect to the filtration Fn. This means that An =An(X0, X1, . . . , Xn1) is a function of X0, X1, . . . , Xn1.

    If we assume that you win (or lose) at time n if a Bernouilli random variable bn is equal

    to 1 (or 1), then

    Xn =n

    k=1Akbk =

    nk=1

    AkCk.

    Here Ck = Ck Ck1 and Ck =k

    i=1 bi . IfCis a martingale with respect to the filtrationFn (in this case we say that the game is fair), then the same thing holds for X because

    E[Xn | Fn1] = Xn1 + AnE[Cn Cn1 | Fn1]= Xn1 + An (E[Cn | Fn1] Cn1)

    =Xn1

    +An (Cn1

    Cn1)

    =Xn1.

    2.4 Doob decomposition

    A submartingale is a process which on average is nondecreasing. Unlike a martingale,

    which has a constant mean over time, a submartingale has a trend or an increasing predictable

    part perturbated by a martingale component which is not predictable. This is made more

    precise by the following theorem due to J. L. Doob.

  • 8/6/2019 Filtering and Measure Theory

    69/270

    2.4 Doob decomposition 57

    Theorem 2.4.1 (Doob Decomposition). Any submartingale {Xn} can be written (P-a.s.uniquely) as

    Xn = Yn + Zn, a.s. (2.4.1)

    where {Yn} is a martingale and{Zn} is a predictable, increasing process, i.e. E(Zn ) < ,Z1 = 0 and Zn Zn+1 a.s. n.

    Proof Write n = Xn Xn1, yi = i E[i | Fi1] and zi = E[i | Fi1], z0 = 0.Then:

    Xn = 1 E[1 | F0] + 2 E[2 | F1]

    + +n

    E[n

    |Fn

    1]

    +

    n

    i=1

    E[i

    |Fi

    1]

    =n

    i=1yi +

    ni=1

    zi

    = Yn + Zn,

    To prove uniqueness suppose that there is another decomposition Xn = Yn + Zn =

    ni=1 yi +ni=1 zi . Let yn + zn = xn = yn + zn and take conditional expectation with re-spect to Fn1 to get zn = zn , because yn is a martingale increment and zn is predictable.This implies yn = yn and the uniqueness of the decomposition.

    Remarks 2.4.2

    1. In Theorem 2.4.1 if{Xn} is just an Fn -adapted and integrable process the decompositionremains valid but we lose the increasing property of the process {Zn}.

    2. The process X

    Z is a martingale; as a result Z is called the compensator of the

    submartingale X.3. A processes which is the sum of a predictable process and a martingale is called a

    semimartingale.

    4. Uniqueness of the decomposition is ensured by the predictability of the process {Zn}.

    Definition 2.4.3 A discrete-time stochastic process {Xn} , with finite-state space S ={s1, s2, . . . , sN}, defined on a probability space (,F, P) is a Markov chain if

    P(Xn+1 = sin+1 | X0 = si0 , . . . , Xn = sin ) = P(Xn+1 = sin+1 | Xn = sin ),

    for all n 0 and all states si0 , . . . , sin , sin+1 S. This is termed the Markov property.{Xn} is a homogeneous Markov chain if

    P(Xn+1 = sj | Xn = si ) = j iis independent of n.

  • 8/6/2019 Filtering and Measure Theory

    70/270

    58 Stochastic processes

    The matrix = {j i} is called the probability transition matrix of the homogeneousMarkov chain and it satisfies the property

    Nj=1 j i = 1.

    Note that our transition matrix is the transpose of the traditional transition matrix defined

    elsewhere. The convenience of this choice will be apparent later.

    The following properties of a homogeneous Markov chain are easy to check.

    1. Let 0 = ( 01 , 02 , . . . , 0N) be the distribution of X0. ThenP(X0 = si0 , X1 = si1 , . . . , Xn = sin ) = 0i0 i0i1 . . . in1in .

    2. Let n = ( n1 , n2 , . . . , nN) be the distribution of Xn . Then

    n = n 0 = n1.

    Example 2.4.4 Let {n} be a discrete-time Markov chain as in Definition 2.4.3. Considerthe filtration {Fn} = {0, 1, . . . , n}.

    Write Xn = (I(n=s1), I(n=s2), . . . , I(n=sN)).Then Xn is a discrete-time Markov chain with state space the set of unit vectors e1 =

    (1, 0, . . . , 0), . . . , eN = (0, . . . , 1) of IRN. However, the probability transitions matrix ofX is . We can write:

    E[Xn | Fn1] = E[Xn | Xn1] = Xn1, (2.4.2)

    from which we conclude that Xn1 is the predictable part of Xn , given the history of X

    up to time n 1 and the nonpredictable part of Xn must be Mn = Xn Xn1. In factit can be easily shown that Mn IRN is a