foss lecture1
TRANSCRIPT
-
8/12/2019 Foss Lecture1
1/32
Lectures on Stochastic Stability
Sergey FOSS
Heriot-Watt University
This mini-course presents an overview of stochastic stability methods, mostly moti-vated by (but not limited to) stochastic network applications. We work with stochastic
recursive sequences, and, in particular, Markov chains, in a general Polish state space. We
discuss and compare methods based on (i) Lyapunov functions, and fluid limits, (ii) ex-
plicit coupling (renovating events and Harris chains), (iii) monotonicity, and some others.
We also discuss instability methods and perfect simulation methods.
Lectures are based on handouts of my lecture notes (Colorado State Uni, 1996; Novosi-
birsk State Uni, 19972000; Kazakh National University, 2007), on the joint overview pa-
per with Takis Konstantopoulos (2004), on notes written by us for a Short LMS/EPSRC
Course for PhD students (September 2006), and on some (more-or-less) recent publica-tions.
1
-
8/12/2019 Foss Lecture1
2/32
Table of Topics
1. Introduction.
2. Lyapunov techniques. Criteria for Positive Recurrence and for Instability.3. Fluid Approximation Approach.
4. Coupling and Harris Chains.
5. Monotonicity and Saturation Rule.
6. Renovation Theory, Perfect Simulation.
7. Some intriguing open problems.
2
-
8/12/2019 Foss Lecture1
3/32
1 Lecture 1. Basic Tools.
1.1 Notation, Acronyms, and Basic Concepts
R.v. random variable
i.i.d. independent identically distributed
X , Y , Z , , , , . . . for r.v.s
F, G distribution functions, f density function
P probability and probability measure, E expectation, D variance
=F means P(x) =F(x)for all x
=P means P(B) =P(B), B B.
I(A), or1(A) the indicator function of event A, I(A) = 1 ifA occurs, and .I(A) = 0,
otherwise.
Here are standart families of distributions:
U[a, b] G(p)
E() B(m, p)
N(a, 2) ()
Convergence:
na.s. means P(lim n = ) = 1, or > 0, P(supmn |m |> )0
asn .
np
means > 0, P(|n |> ) 0 as n .
The same for random vectors.
3
-
8/12/2019 Foss Lecture1
4/32
Key Properties of Convergence. Let mean either a.s.or
p.
(1) Ifn andn , then (n, n) (, )
(2) Ifn and ifg is a continuous function, then g(n) ().
(3) More generally, assume that g is not continuous everywhere and denote by Dg a setof its discontinuity points. Ifn and ifP( Dg) = 0, theng(n)().
Weak convergence of distribution functions: Fn F, if, for each x such that F(x)is
continuous in x,
Fn(x) F(x).
Equivalent form: Fn F if, for any bounded and continuous function g , g(x)dFn(x)
g(x)dF(x).
Comment on terminology: Weak convergence is the most common term. Otherterms are convergence of/in distribution(s) and convergence in law.
Weak convergence of random variables: n . It means: n = Fn, = F and
Fn F.
Note that n is just a convenient notation ! There is no any real convergence of
random variables on sample paths.
Relations between convergence types:
na.s. implies n
p and n
p implies n .
Both converse statements are incorrect. Here are two examples:
Example 1. Weak convergence does not imply convergence in probability. LetP(1 =
1) =P(1= 1) = 1/2 and n+1= n, n = 1, 2, . . ..
Example 2. Convergence in probability does not imply a.s. convergence. Let ,F, P
= ((0, 1], B(0,1], )where is the Lebesgue measure. Let0 1. Let, for m= 1, 2, . . .,
for n such that 1 + 2 +. . .+ 2m1 < n 1 + 2 +. . .+ 2m1 + 2m, and for i =
n (1 + 2 + . . . + 2m1),
n() = 1 if ((i 1)/2m, i/2m) and n= 0, otherwise.
4
-
8/12/2019 Foss Lecture1
5/32
Laws of Large Numbers.
If, 1, 2, . . . are i.i.d. random variables with a finite mean, saya = E, then the
Weak Law of Large Numbers (WLLN) says:
Sn/n pa as n
and the Strong Law of Large Numbers (SLLN) says
Sn/na.s.a as n .
Lebesgue and Beppo Levy Theorems.
Theorem (Beppo Levy). If{n} is a.s. non-negative and non-decreasing sequence
of random variables, then
E limn
n = limn
En
where both sides are either finite or infinite simultaneously.
5
-
8/12/2019 Foss Lecture1
6/32
Coupling.
is a copy of they have the same distribution D=. In general, and
may be defined on different probability spaces.
(a) Coupling of distribution functions (d.f.) or of probability measures.
For two d.f.s F1 andF2, theircouplingis a construction of a two-variate distribution
functionF(x1, x2)such that F(x1, ) =F1(x1)and F(, x2) =F2(x2).
Similarly, for two probability measures, P1 andP2 on the real line, their coupling is a
probability measure on the plane P(), such that its projections areP1 andP2.
The same definitions of coupling may be introduced for any number of distributions
(distribution functions, probability measures).
Such a coupling may also be viewed as follows: we define a probability space ,
F,
Pand two random variables 1 and 2 on this space such that 1 = F1 and 2 = F2 (or,
in other notation, 1 = P1 and 2 = P2). Then their joint distribution, say F, has
marginals F1 andF2 (or, equivalently, a probablity measure P(B) =P((1, 2) B)has
martinals P1 andP2).
(b) Coupling of two random variables.
Let1 be defined on 1,F1, P1and 2 be defined on 2,F2, P2.
A coupling of these two r.v.s is defined by, first, an introduction of a new probability
space, say,F, Pand, then, by defining a pair of two r.v.s1,2 on this space suchthat1 D=1,2 D=2.Examples:
(1)F1= U(0, 1),F2 = U(0, 1);
(2)F1= U(0, 1),F2 = E(1);
(3)F1= U(0, 1),F2 = (1);
(4)F1= B(n, p), F2= (np);
(5)F1 has a density 2xI(x (0, 1)and F2 a density 2(1 x)I(x (0, 1).
In each example, there are many couplings !
6
-
8/12/2019 Foss Lecture1
7/32
1.2 Weak and strong convergence
Lemma 0.
IfFn F (allFn andFare d.f.s), then a coupling of{Fn}andF:
n a.s. .
Proof. For a d.f. F, define its inverse F1 by
F1(z) = inf{x :F(x) z}, z (0, 1).
Let =(0,1), F be the -algebra of Borel subsets in (0,1), and P the Lebesgue
measure on (0,1).
Set() =, . Then =U(0, 1).
Letn= F1
n (), = F1()and shown
a.s.. Note thatn=Fn, =F
In order to avoid some technicalities, assume, for simplicity, that all d.f. are continuous.Let
n
= infmn
m, n= supmn
m, Fn= supmn
Fm, Fn= infmn
Fm
Thenn
=Fn,n=Fn.
Indeed,
P(n
x) = P(n
< x) =P(m n : m< x) =
= P(m n: F1m ()< x) =P(m n : < Fm(x)) =
= P( < supmn
Fm(x)) = Fn(x)
Similarly,P(n> x) =. . .= 1 Fn(x).
SinceFn F andFn F(by definition), it is sufficient to show that, for instance,
n
a.s..
But both {Fn} and {n} are monotone as a function ofn!
Then n
a.s. and, therefore, there exists such that n
a.s. Then
a.s.
IfP(=)>0, then there exists x:
P( x)>P( x).
ButP( x) =F(x) = lim Fn(x) P( x)!
Thus, we got a contradiction, and n
a.s. By similar arguments, n a.s.
Therefore, n a.s.
Problem No 1. Prove this lemma without the additional
assumption that all d.f.s are continuous
Exercises: What is F1 for the following distribution functions:
U(0, 1),E(), N(0, 1), B(1, p), B(n, p,()...
7
-
8/12/2019 Foss Lecture1
8/32
1.3 Uniform integrability
Let{n}n1 be a sequence of real-valued r.v.s.
Definition 1.
{n} are uniformly integrable (UI), ifE|n|< nand, moreover,
supn
E{|n| I(|n| x)} h(x) 0 asx .
Comments:
Actually, we can put = instead of in the definition above. But I prefer to
keep since I want the upper bound h(x) to be monotone non-increasing and right-
continuous.
Clearly, if{n}are UI, thensupn E|n| is finite.
Examples:(1)n=E(n),n = 1, 2, . . .are UI if and only ifminn n> 0.
(2)n 2n 2n 0
1
2n
1
2n 1
1
n
|=
E|n|= 2, En= 0; 0,
but{n} are not UI!
Lemma 1.
The following are equivalent:
(i){n} are UI;
(ii) a functiong: [0, )[0, ):
(a)g(0)> 0; g ;limx g(x) =;(b)supn E{|n| g(|n|)}<
Note: g(0)>0 is not essential!
Proof.
(ii)(i). For each n,
E{|n| I(|n| x)} E{|n| g(|n|)
g(|n|) I(|n| x)}
1
g(x) sup
n
E{|n| g(|n|)} 0 as x .
(i) (ii). Assume that h(x)>0 x (otherwise the statement is trivial).
Form Z, let
Am= {x: 1
22(m+1) < h(x)
1
22m}
and, for x Am, let g(x) = 2m. Fromh(0)0.
Note that Am is an interval which is closed from the left and open from the right.
Denote byzm its left boundary point, zm Am. Then
E{|n| g(|n|)}=
m
E{|n| g(|n|) I(|n| Am)}=
8
-
8/12/2019 Foss Lecture1
9/32
=
m
E{|n| 2m I(|n| Am)}
m
2mE{|n| I(|n| zm)}
m 2m h(zm) m 2
m 1
22m
0 is arbitrary, E|n| E||.
(b) Assume now that at least one of distributions of r.v.s has an unbounded support,
that is,P(|n| N)0 such that P(||= x) = 0. Since na.s.
,
n n I(|n|< x)a.s. I(||< x) .
Then
n, P(|n| x) =P(|| x) = 1 |=En E (see (a));
and
|n| |n|a.s. |=E|n| E|n| supn
E|n| K n |=E|| K.
9
-
8/12/2019 Foss Lecture1
10/32
(b2) Show first that E||< . Indeed,
E||= limx
E{|| I(|| N)} K 0, choosex such that P(||= x) = 0,h(x) , andE{|| I(||
x)} .
Let
n= E{|n| I(|n| x)} and = E{|| I(|| x)}.
Then
E|n| = E{|n| I(|n|< x)} + n,
E = E{|| I(||< x)} + .
Sincen nand|| , then
lim sup(E|n| E||) 2 and
liminf(E|n| E||) 2 for any.
Letting to 0, we obtain the first statement of the lemma.
Prove now the second statement. First, from E 0
and then choose x0= x0()such that P(= x0) = 0 and
E{ I( x0)} /2.
Then we may use part (b1) from the proof of (1): for a given x0,
En E |= E{n I(n x0)}= E(n n) =
=En En E E= E{ I( x0)} /2.
Therefore, n()such that
E{n I(n x0)} n > n().
Now, n = 1, 2, . . . , n(),
En< |= xn : E{n I(n xn)} .
Letx = max(x1, . . . , xn(), x0). Then
E{n I(n x)} n.
Thus,
supn
E{n I(n x)} 0 as x .
10
-
8/12/2019 Foss Lecture1
11/32
1.4 Some useful properties of UI
Property 1. [If{n}are UI and if{n}are such that|n| |n|a.s., then{n}are UI.
Indeed, let h(x) be from Definition 1. Then, x >0,
E{|n| I(|n|> x)} E{|n| I(|n|> x)} E{|n| I(|n|> x)} h(x).
Property 2.
If{n} is an i.i.d. sequence with finite mean, E|1| x) P(|n| > x) x).
In particular, if r.v.s{n} admit a stochastic integrable majorant,
|n|st||, n
and ifE||
-
8/12/2019 Foss Lecture1
12/32
(a) The statement and the proof of Lemma 1 stay the same if we replace n= 1, 2, . . .
by t T.
(b) Similarly, the statement and the proof of Lemma 2 stay unchanged if we replacen= 1, 2, . . . by t T = [0, ).
(c) Properties 1 and 3 still hold is we replace n= 1, 2, . . . by t T.
12
-
8/12/2019 Foss Lecture1
13/32
1.5 Coupling inequality. Maximal coupling. Dobrushins
theorem.
In this section, we assume that random variables are not necessarily real-valued and maytake values in a general measurable space (X,BX) which is assumed to be complete
separablemetric space.
The Coupling Inequality
Let1, 2 : ,F, P (X,BX) be two X-valued r.v.s. Let
P1(B) =P(1 B), P2(B) =P(2 B), B BX.
Then, forB BX,
P1(B) P2(B) = P(1 B, 1= 2) + P(1 B, 1=2)
P(2 B, 1= 2) P(2 B, 1 =2) =
= P(1 B, 1=2) P(2 B, 1 =2) P(1=2)
P(1 =2)
Therefore, for any B BX, |P1(B) P2(B)| P(1 =2), that is
()sup
BBX
|P1(B) P2(B)| P(1 =2)
The Maximal Coupling
Now we reformulate the result obtained. Note that the LHS of inequality (*) depends
on marginaldistributionsP1 andP2 only and does not depend on the joint distribution
of1 and2. Therefore, we get the following:
for any coupling of marginal distributions P1and P2, inequality (*) holds. Equivalently,
()supBBX |P1(B) P2(B)| infin all coupling P(1 =2)
The following questions seem to be natural:
(?) Is there equality in ()?
(??) If the answer isyes, then does there exist a coupling such that
supBBX
|P1(B) P2(B)|= P(1 =2)?
13
-
8/12/2019 Foss Lecture1
14/32
The answers to both questions are positive! And this is the content of
Dobrushins theorem.
Theorem 1.
LetP1andP2be two probability measures on a complete separable metric
space(X,BX). There exists a coupling of these probalility measures such
that, fori=Pi,i= 1, 2,
supBBX
|P1(B) P2(B)|= P(1=2).
Proof. (B) =P1(B) P2(B)is a signed measure. Then Banach theorem states that
there exists a subset C X such that
(a) (B) 0 BC;
(b) (B) 0 B X\ CC.
Note:
1) if(C) = 0, then P1= P2 and the coupling is obvious;
2) (C) =(C).
Assume(C)> 0. Introduce 4 distributions (probability measures):
Q1,1 is defined by Q1,1= U(C), ifP1(C) = 0,
Q1,1(B) = P1(C B)P1(C)
, B BX, otherwise.
and
Q2,1 is defined by Q2,1(B) =P2(C B) P1(C B)
(C), B BX.
Similarly,
Q2,2 is defined by
Q2,2= U(C), ifP2(C) = 0,
Q2,2(B) =P2(C B)
P2(C) , B BX, otherwise.
and
Q1,2 is defined by Q1,2(B) =P1(C B) P2(C B)
(C) , B BX.
Then introduce 5 mutually independent r.v.s:
1,1 =Q1,1, 1,2=Q1,2, 2,1=Q2,1, 2,2 =Q2,2,
and 1 2 0
P1(C) P2(C) (C)
14
-
8/12/2019 Foss Lecture1
15/32
Now we can define 1 and2 as follows:
1= 1,1 I(= 1) + 2,2 I(= 2) + 2,1 I(= 0),
2= 1,1 I(= 1) + 2,2 I(= 2) + 1,2 I(= 0).
Simple calculations show that i=Pi, i = 1, 2. This is Problem No 3 for you.
Then,
P(1=2) =P(= 0) =(C) supBBX
|P1(B) P2(B)|.
So,
P(1=2) = supBBX
|P1(B) P2(B)|.
Comment. Banach theorem and Radon-Nykodim theorem are two equivalent state-
ments formulated in slightly different ways.
There is (formally!) another proof (see, e.g. T. Lindvalls book on the coupling method)
based on Radon-Nykodim theorem:
Consider a new probability measure P() = (P1() + P2())/2. Let fi = dPi
dP be corre-
sponding densities. Then
sup
BBX
|P1(B) P2(B)|= 1 min(f1(x), f2(x))P(dx),
and we may repeat the previous construction using densities.
What is the maximal coupling in the following examples:
(1) Two discrete two-point distributions.
(2) Two absolutely continuous distributions on (0, 1)with densitiesf1 andf2.
(3) Bernoulli and Poisson distributions.
(4) Normal and exponential distributions.
(this is another exercise to you)
15
-
8/12/2019 Foss Lecture1
16/32
1.6 Probabilistic Metrics
Dobrushins theorem provides a positive solution to one of important problems in the
theory of Probabilistic Metrics. We will discuss briefly basic concepts of this theory.Again, consider a complete separable metric space (X,BX)and introduce the following
notation:
1)X2 = X X,
2) B2X
= BX BX is a-algebra in X2 generated by all sets B1 B2,B1, B2 BX,
3)diag(X2) ={(x, x), x X}.
Problem No 4. Prove thatdiag(X2) B2X
. (Actually, there is no need to assume
that the state space is complete separable metric, and the minimal requirement for
diag(X2) B2X
to hold is that the sigma-algebra BX is countably generated).
LetPbe any probability distribution on (X2,B2X). Denote byPi,i= 1, 2its marginaldistributions:
P1(B) = P(B X),
P2(B) = P(X B), B BX.
LetPbe a set of all probability distributions (measures) on (X2,B2X
).
Definition 3.
A function d: P [0, ) is called a probabilistic metric if it satisfies
the following conditions:
(1) P(diag(X2
)) = 1 |=d(P) = 0;
(2) d(P) = 0 |=P1= P2;
(3) the triangle inequiality:P(1) has marginalsP1 andP2P(2) has marginalsP1 andP3P(3) has marginalsP3 andP2
|=d(P(1)) d(P(2)) + d(P(3));
Definition 4.
A probabilistic metricd is simple if it depends on marginal distributions
only (i.e. if P(1) and P(2) have the same marginals, then d(P(1)) =
d(P(2))), and complex otherwise.
For a simple metric, it is reasonable to write d(P1, P2) instead ofd(P), sod has the
meaning of a distance between P1 andP2.
For a complex metric, we may also write d(1, 2) instead ofd(P) where 1, 2 is a
coupling of two r.v.s with a joint distribution P,
P(B) =P((1, 2) B), B B2X
.
So,d(1, 2)may be considered as a distance between r.v.s.
16
-
8/12/2019 Foss Lecture1
17/32
We can also write d(1, 2)for simple metrics. In this case,
d(1, 2) =d(F1, F2) =d(P1, P2).
Examples.
Simple Complex
1)supBB |P1(B) P2(B)| 2)P(1=2) P(X2 diag(X2))
(Total variation norm (T.V.N.)) (Indicator metric (I.M.))
For real-valued r.v.s:
3)supx |F1(x) F2(x)| 5)inf{ >0 :P(|1 2|> )< }
(Uniform metric (U.M.)) (Ki Fan metric (K.F.M.))
4)inf{ >0 : F1(x ) F2(x) F1(x + ) + x}
(Levy metric (L.M.))
One of key problems in the theory of probabilistic metrics is to find answers to the
following questions:
Assume a simple metricd(P1, P2) is given. Does there exist a complex metric
d such that
(a) the following coupling inequality holds:
d(1, 2) infall couplings
d(1, 2) ?(compare with())(b) If yes, then is it possible to replace by = in (a) ?
(c) Does there exist a coupling such thatd(1, 2) =d(1, 2)?The following result holds:
Theorem 2.
The answers to the above questions are positive for the metrics:
(1) d= T.V.N.d= I.M.(2) d= L.M.d= K.F.M.Comment. Statement (1) is Dobrushins theorem. Statement (2) is Strassens theo-
rem (its proof is omitted).
17
-
8/12/2019 Foss Lecture1
18/32
1.7 Stopping times
Let ,F, Pbe a probability space and {n}n1 a sequence of r.v.s, n : R.
Denote byFn a -algebra, generated byn:Fn F; Fn = {
1n (B), B B},
whereB is a -algebra of Borel sets in R.
Then, for 1 k n, F[k,n] is a -algebra generated byk, . . . , n; i.e.
F F[k,n] is a minimal -algebra such that
F[k,n] Fl for all l = k, . . . , n.
Another way to describe F[k,n] is:
let k,n := (k, . . . , n)be a random vector; k,n: Rnk+1. Then
F[k,n]= {1k,n(B), B Bnk+1},
whereBnk+1 is a -algebra of Borel sets in Rnk+1.
Finally,F[1,) is a -algebra generated by the whole sequence {n}n1.
Good Property :
A F[1,), a sequence of events{An}n1, An F[1,n] such that
P(A \ An) + P(An \ A) 0 asn .
Let now : {1, 2, . . . , n , . . .} be an integer-valued r.v. (we say it is a counting
r.v.)
Definition 5.
is a stopping time (ST) with respect to{n}, ifn 1,
{= n} F[1,n]
(or, equivalently { n} F[1,n]).
Another variant of a definition of a stopping time is:
Definition 6.
is an ST if a family of functionshn: Rn {0, 1} such that:
n 1, I(= n) =hn(k, . . . , n)a.s.
(or, equivalently I( n) =hn(k, . . . , n)a.s.).Examples of STs:
(1) = min{n 1 :n x};
(2) = min{n 1 :n
1i x};
(3) More examples....
Assume now that {n} is an i.i.d. sequence, is an ST with P(
-
8/12/2019 Foss Lecture1
19/32
Lemma 3.
The following statements hold:
1) {i} is an i.i.d. sequence;
2) i D=1;
3) {i}i1 and a random vector(, 1, . . . , ) are mutually indepen-
dent.
Corollary 1.
{i}i1 andS 1+ . . . + are mutually independent.
Proof of Lemma 3. It is sufficient to show that
k 1, m 1, Borel sets B1, . . . , Bk andC1, . . . , C m,
()P({= k; 1 B1, . . . , kBk} {1 C1, . . . ,m Cm}) ==P(= k;1 B1, . . . , k Bk)P(1 C1, . . . , m Cm).
Indeed, () |=1), 2), and 3).
First, take B1= . . .= Bk =Bk+1= . . .= R. Then, m,
()P(1 C1, . . . ,m Cm)
t.p.f.
=
k=1
P(= k;1 C1, . . . ,m Cm)
()=
k=1
P(= k)m
i=1
P(1 Ci) =m
i=1
P(1 Ci).
In particular, j1 Cj, we can take m j andCi= R for i =j .
Then the LHS of()=P(j Cj),
the RHS of()=P(1 Cj).
|= 2)
Now, take anyC1, . . . , C m and replace in()m
i=1
P(1 Ci) bym
i=1
P(1 Ci).
|= 1)
Finally, take anyB1, . . . , Bk andC1, . . . , C m and replace in()m
i=1P(1 Ci) by
m
i=1P(i Ci).
|= 3)
So, we will prove ()now:
P({= k; 1 B1, . . . , kBk} {1 C1, . . . ,m Cm}) =
P({hk(1, . . . , k) = 1; 1 B1, . . . , kBk} F[1,k]
{k+1 C1, . . . , k+m Cm} F[k+1,k+m]
) =
=P(. . .) P(. . .) =
=P(. . .) m
i=1
P(k+i Ci) =P(. . .) m
i=1
P(1 Ci).
19
-
8/12/2019 Foss Lecture1
20/32
Lemma 4.
(Waldidentity)
Assume thatE|1|< andE
-
8/12/2019 Foss Lecture1
21/32
|= . . . F[1,k] k
|= . . .
F[1,k].
Now let us write (1)i instead of i
(1) instead of
(2)i ...
i(2) ... ... ...
...
Lemma 6. If(i) is a ST w.r. to{(j)i }i1 j= 1, . . . , J and if{(j+1)i }= {(j)i },
then(1) + . . . + (J) is an ST w.r. to{i}i1.
Problem No 5. Prove Lemma 6.
21
-
8/12/2019 Foss Lecture1
22/32
1.8 Two-dimensional stopping times
Let {n,1}n1 and {n,2}n1 be two sequences of r.v.s and F[k1,n1][k2,n2] a -algebra
generated by k1,1, k1+1,1, . . . , n1,1; k2,2, k2+1,2, . . . , n2,2.
Definition 7.
A pair of r.v.s1, 2: {1, 2, . . .} is an ST w.r. to{n,1} and{n,2},
if
n1 1, n2 1 {1= n1, 2= n2} F[1,n1][1,n2].
Lemma 7.
If{n,1}n1 and{n,2}n1 are two mutually independent sequences and
if(1, 2) is an ST, then
1) each of the sequences
{i,1} {1+i,1}and{i,2} {2+i,2}
is i.i.d., and these sequences are mutually independent;
2) i,1D=1,1; i,2
D=1,2;
3) {{i,1}i1; {i,2}i1} and a random vector
(1, 2; 1,1, . . . , 1,1; 1,2, . . . , 2,2)
are mutually independent.
Proof is omitted.
Lemma 8.
In conditions of Lemma 7, assume, in addition, that
1,1D=1,2.
Then a sequence{n}n1,
n=
n,1, if n 1n1+2,2, if n > 1
is i.i.d.; nD=1,1.
Proof. We have to show that n = 1, 2, . . ., B1, . . . , Bl
P(1 B1, . . . , n Bn) =n
i=1
P(1,1 Bi).
1) n, B
P(n B) =P(n,1 B; n 1) + P(n1+2,2 B; n > 1).
22
-
8/12/2019 Foss Lecture1
23/32
P(n,1 B; n 1) = P(1,1 B) P(1,1 B) P(n > 1) =
= P(n,1 B) P(n 1)
P(n1+2,2 B; n > 1) =
n1
l=1
P(2+nl,2 B; 1 = l)
=n1l=1
P(nl,2 B; 1= l)
= . . .= P(1,2 B) P(1 < n)
2) Problem No 6. Prove the statement for joint distributions. Use the induc-
tion arguments.
Here is another variant of a two-dimensional analogue of Lemma 3.
Lemma 9.
Assume that
(i) n= (n,1, n,2)is a sequence (n= 1, 2, . . .) of independent random
vectors;
(ii) each of{n,1}n1 and{n,2}n1 is an i.i.d. sequence;
(iii) 1,1D=1,2;
(iv) (1, 2)is an ST and1 2 = .
Then
n = n,1, if n n,2, if n > is an i.i.d. sequence; n
D=1,1.
Proof is very similar to that of Lemma 8 (omitted).
Finally, here is a further generalization of Lemma 9.
Lemma 10.
In the statement of Lemma 9, replace
( i )by( i ) =
m1 1, m2 1:n = ((n1)m1+1,1, . . . , nm1,1; (n1)m2+1,2, . . . , nm2,2) is
an i.i.d. sequence;
and
(iv)by(iv) =
(1, 2)is an ST,P(1 {m1, 2m1, . . .}) =P(2 {m2, 2m2, . . .}) = 1
and 1m1
2m2
.
Then
n=
n,1, if n 1n1+2,2, if n > 1
is an i.i.d. sequence; nD=1,1.
Problem No 7. Prove Lemma 10.
23
-
8/12/2019 Foss Lecture1
24/32
1.9 Stationary Sequences and Processes
Discrete Time
Definition 8.
(a) Let{n}n0 be a sequence of r.v.s.
It is stationary if l = 1, 2, . . ., 0 i1 < i2 < . . . < il,
B1, . . . , Bl B, m = 1, 2, . . .
P(i1 B1, . . . , il Bl) =P(i1+m B1, . . . , il+m Bl). (1)
(b) Similarly, a double-infinite sequence {n}n= is stationary, if (1)
holds m Z and B1, . . . , Bl B.
Continuous Time
Definition8.
(a) Let{t}t0 be a family of r.v.s.
It is stationary, if l = 1, 2, . . ., 0 t1 < t2 < . . . < tl,
B1, . . . , Bl B, u 0
P(t1 B1, . . . , tl Bl) =P(t1+u B1, . . . , tl+u Bl).
(b) Similarly,{t}t=is stationary, if the above equality holds u R
and B1, . . . , Bl B.
Definition 9. A sequence of events{An}n= is stationary, if a sequence of random
variables{I(An)}n= is stationary.
Assume that {An}n=is a stationary sequence and that P(A0)> 0and P(
n=0An) =
1.
Introduce the following r.v.s:
+ = min{n 1 : I(An) = 1} min{n 1 : An}
= min{n 1 : I(An) = 1}
+ : P( > n) =P(A1 . . . An|A0)
: P( > n) =P(A1 . . . An|A0)
Lemma 11.
(a) D=;
(b) D=;
(c) P(=n) =P(A0) P(n) n = 1, 2, . . .
Remark 4. [ The statement of the lemma is not obvious, in general.
24
-
8/12/2019 Foss Lecture1
25/32
Examples: Let{n}be an i.i.d. sequence,P(n> 0) > 0.
The we can take a) An= {n> 0}; b)An= {n+ n1 > 0}.
Proof of Lemma 11.
(a)
P( > n) = P(A1 . . . An)
m
= P(A1+m . . . An+m)
m=n1
=
= P(An . . . A1) =P( > n).
(b)
P(=n) = P(A0A1 . . . An1An)
P(A0)
=P(AnAn+1 . . . A1A0)
P(A0)
=
= P( =n).
(c)
P(n) = P(A1 . . . An1) =P(A0A1 . . . An1) + P(A0A1 . . . An1)
= P(A0) P(A1 . . . An1|A0) + P(A1 . . . An) =
= P(A0) P(n) + P(n + 1).
|= P(=n) =P(n) P(n + 1) =P(A0) P(n).
Corollary 2.
k >0, Ek
-
8/12/2019 Foss Lecture1
26/32
and, using similar arguments with the lower bound,
Ek P(A0)
k+ 1
Ek+1.
|= Ek andEk+1 are either finite or infinite simultaneously.
26
-
8/12/2019 Foss Lecture1
27/32
1.10 On -algebras generated by a sequence of r.v.s.
(1). Let ,F, P be a probabililty space andn: R, n = 1, 2, . . . a sequence
of r.v.s. Let F[k,n] = (k, . . . , n); F[k,) = (k, k+1 . . .).ForA, B F, introduce a distance
d(A, B) =P(A \ B) + P(B\ A).
(A)Recall basic properties of-algebras.
1) IfF(1),F(2) are -algebras on |= F(1) F(2) is -algebra, too, but
F(1) F(2) may be not, in general.
2) More generally, let T be any parameter set and F(t), t T -algebras on
|= tTF(t)
is -algebra, too.
By definition,F[1,) is a minimal-algebra which contains all -algebras F[1,n] , n=
1, 2, . . . it is an intersection of all -algebras F[1,n]n = 1, 2, . . ..
SinceF F[1,n] n |= F[1,] F.
(B)Now we study properties of the distance d:
(1) Clearly, d(A, B) =d(B, A) 0;
(2) d(A, C) d(A, B) + d(B, C)(the triangle inequality);
Indeed, A \ C= (A \ B) (A (B\ C)) (A \ B) (B\ C)
|= P(A \ C) P(A \ B) + P(B\ C).
Similarly,
P(C\ A) P(B\ A) + P(C\ B).
(3) d(A, B) =d(A, B)(since P(A \ B) =P(B\ A));
(4) |P(A) P(B)| |P(A B) + P(A \ B) P(A B) P(B\ A)| d(A, B);
(5) d(A1 A2, B1 B2) d(A1, B1) + d(A2, B2);
Indeed,(A1A2)\(B1B2) = (A1\(B1B2))(A2\(B1B2)) (A1\B1)(A2\B2)
|= P((A1 A2) \ (B1 B2)) P(A1 \ B1) + P(A2 \ B2).
Lemma 12.
A F[1,), {An}n1, An F[1,n] : d(A, An)0.
27
-
8/12/2019 Foss Lecture1
28/32
Proof. Let U be a set of events A F such that {An}n1, An F[1,n] :
d(A, An)0.
1) One can easily se that U F[1,m] m = 1, 2, . . ..
Indeed, m, A F[1,m], let
An =
, ifn < m;
A, ifn m.
Therefore, A U.
2) Thus, it is sufficient to show that U is-algebra. Then, with necessity,U F[1,),
that completes the proof.
2.1) First we prove that Uis an algebra, i.e.
(i) U;
(ii) A U |= A U;
(iii) k,A(1), . . . , A(k) U |= A(1) . . . A(k) U.
(i) is obvious, (ii) follows from property (3), and (iii) follows from (5):
d(A(1) . . . A(k), A(1)n . . . A(k)n )
k
j=1d(A(j), A(j)n ) 0.
2.2) Now we prove that U is a -algebra:
(iii) A(1), A(2) . . . U |= A j=1A(j) U.
LetB (k) =kj=1A(j) Then B (k) A and P(B(k))P(A).
|= {B(k)n } : B(k)n F[1,n], d(B
(k), B(k)n ) 0 as n .
Choose
n(1) = min{n 1 : d(B(1), B(1)l ) 1/2l n}
and, for k 1,
n(k+ 1) = min{n n(k) : d(B(k), B(k)l ) 1/2
k l n}.
Then let
An=
, ifn < n(1);
B(k)
n(k), ifn(k) n < n(k+ 1).
Clearly, An F[1,n]. Then d(A, An) d(A, B(k)) + 1/2k, for n(k) n < n(k+ 1).
Sincek as n , d(A, An) 0.
28
-
8/12/2019 Foss Lecture1
29/32
Lemma 13.
Let{n}n= be a double-infinite sequence of r.v.s,
F(,) = {. . . , 2, 1, 0, 1, 2, . . .}.
Then A F(,), {An}, An F[n,n] : d(A, An) 0.
Problem No 8. Prove Lemma 13.
(2). Sigma-algebras generated by sequences of independent r.v.s.
Definition 10.
For a sequence{n}n1 of r.v.s, its tail-algebra is
F= k=1F[k,).
Note: Since F[k+1,) F[k,), |= F= k=lF[k,) l.
Definition 11.
For a sequence{n}n=,
F= k=1F[k,)
k=lF[k,), < l
-
8/12/2019 Foss Lecture1
30/32
Lemma 15.
If {n}
n=is a sequence of independent r.v.s, then both FandF
are trivial.
Problem No 9. Prove Lemma 15.
(3). A stationary sequence of r.v.s.
Definition 12.
A sequence{n}n1 (or{n}n=) is stationary, if
l 1, 1 n1 < n2< . . . < nl (or without 1),
k 1 (or < k
-
8/12/2019 Foss Lecture1
31/32
Definition 13.
An F[1,)-measurable (orF(,)-measurable) r.v. is invariant (w.r.to
), if
= a.s. (i.e. P( = ) = 1).
An eventA F[1,) (orA F(,)) is invariant (w.r.to), if
P(A A) =P(A).
Note that = a.s. x,
P({ x} { x}) =P( x).
Comments, examples...
Definition 14.
A stationary sequence{n} is ergodic (w.r.to), if A F[1,) (AF[1,)),
Ais invariant |= P(A) = 01
(or is invariant |= = consta.s. ).
Remark 5.
All invariant events (sets) form a -algebra F(inv) (invariant-algebra).
Lemma 16.
(1) A F[1,) (or A F(,)) a sequence of events{nA, n
0} (or{nA, n }) is stationary;
(2) If {n} is stationary ergodic, then A F[1,) (or A
F(,)),P(A)>0
|= P(n=lnA) = 1 l (andP(n=l
nA) = 1 l).
Proof. (1) follows from definitions.
(2) Let B =n=lnA. Then
B = n=l(
n
A) =n=l+1
n
A
andB B
|= P(B B) =P(B) =P(B) |= B is invariant
|= P(B) = 01.
ButP(B) P(lA) =P(A)>0 |= P(B) = 1.
Lemma 17.
IfA is invariant, then B F such thatd(A, B) = 0.
31
-
8/12/2019 Foss Lecture1
32/32
Proof. There are two cases: (a) F[1,); (b) F(,). Here we give a proof in the
first case.
Problem No 10. Prove the lemma in the case (b).
1) Let B0,m= A A 2A . . . mA,B0=
n=0
nA. Then
A= B0,0 B0,1 . . . B0,m B0,m+1 . . . B0
andP(B0,m)P(B0). But
P(B0,m) =P(A) m |= P(B0) =P(A)and d(B0, A) = 0.
2) For k 1, put Bk =kB0
n=k
nA.
Note that Bk+1 Bk andBk F[k,),
P(Bk) =P(B0) =P(A) and d(Bk, A) = 0.
Let
B = limk
Bk |= P(B) =P(A) and d(B, A) = 0.
SinceB F[k,) k |= B F.
Remark 6.
In the caseF(,), the symmetric statement is true, too: if A is
invariant, then B F such thatd(A, B) = 0.
Corollary 3. Any i.i.d. sequence is stationary ergodic.Indeed,
F is trivial |=ifA is invariant,B F,P(B) = 0 1and d(A, B) = 0
|= P(A) = 0 1.
Remark 7.
There exists a number of weaker conditions that imply the triviality ofthe tail-algebra F and, as a corollary, the ergodicity of a stationary
sequence.
For instance, we can introduce the following mixing coefficients:
dk = supBF[k,),AF(,0]
|P(A B) P(A) P(B)|,
and then show that ifdk0 as k , then F is trivial.
In general, there are examples when F is not trivial, but Finv is (i.e. the sequence
is ergodic).
Example n+1= n n;1 =
1, w.pr. 1/2
1, w.pr. 1/2 Then