mit notes on theory of probability
TRANSCRIPT
-
8/13/2019 mit notes on theory of probability
1/125
MIT OpenCourseWarehttp://ocw.mit.edu
18.175 Theory of Probability
Fall 2008
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
http://ocw.mit.edu/http://ocw.mit.edu/termshttp://ocw.mit.edu/termshttp://ocw.mit.edu/ -
8/13/2019 mit notes on theory of probability
2/125
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
ContentsProbability Spaces, Properties of Probability. 1Random variables and their properties. Expectation. 4Kolmogorovs Theorem about consistent distributions. 10Laws of Large Numbers. 12Bernstein Polynomials. Hausdorff and de Finetti theorems. 160 - 1 Laws. Convergence of random series. 21Stopping times, Walds identity. Another proof of SLLN. 26Convergence of Laws. Selection Theorem. 29Characteristic Functions. Central Limit Theorem on R. 34Multivariate normal distributions and CLT. 38Lindebergs CLT. Levys Equivalence Theorem. Three Series Theorem. 42Levys Continuity Theorem. Poisson Approximation. Conditional Expectation. 46Martingales. Doobs Decomposition. Uniform Integrability. 51Optional stopping. Inequalities for martingales. 55Convergence of martingales. Fundamental Walds identity. 59Convergence on metric spaces. Portmanteau Theorem. Lipschitz Functions. 65Metrics for convergence of laws. Empirical measures. 70Convergence and uniform tightness. 74Strassens Theorem. Relationships between metrics. 76Kantorovich-Rubinstein Theorem. 82Prekopa-Leindler inequality, entropy and concentration. 88
1
-
8/13/2019 mit notes on theory of probability
3/125
22Stochastic Processes. Brownian Motion. 9623Donsker Invariance Principle. 10024Empirical process and Kolmogorovs chaining. 10325Markov property of Brownian motion. Reflection principles. 10926Laws of Brownian motion at stopping times. Skorohods imbedding. 114
2
-
8/13/2019 mit notes on theory of probability
4/125
List of Figures2.1 A random variable defined by quantile transformation. . . . . . . . . . . . . . . . . . . . . . . 52.2 (X)generatedbyX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Pairwise independent but not independent r.v.s. . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.1
Polya urn model.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
7.1 A sequence of stopping times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.1 Approximating indicator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2914.1 Stopping times of level crossings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5725.1 Reflecting the Brownian motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3
-
8/13/2019 mit notes on theory of probability
5/125
List of Tables
4
-
8/13/2019 mit notes on theory of probability
6/125
Section 1Probability Spaces, Properties ofProbability.Apair(,A) isameasurable space ifA isa-algebraofsubsetsof.AcollectionAofsubsetsof isanalgebra(ring) if:
1. A.2. C, BA=CB, CBA.3. BA=\BA. 4. Aisa-algebra, ifinaddition,Ci A,i1 = Ci A.
i1
(,A,P)isaprobabilityspace ifP isaprobabilitymeasureonA,i.e.1. P()=1.2. P(A)0, A A.
3. Piscountablyadditive:Ai A,i1, AiAj = i= j=P Ai = P(Ai).
i=1 i=1AnequivalentformulationofProperty3is:
3.P isafinitelyadditivemeasureandBn
Bn+1, Bn =B=
P(B)=limP(Bn).
nn1
Lemma 1 Properties 3and3 areequivalent.Proof.
1
-
8/13/2019 mit notes on theory of probability
7/125
3 =3 : LetCn =Bn\Bn+1,thenBn =B knCk - alldisjoint.
By3,P(Bn) =P(B) + P(Ck)
P(B)whenn
.kn
3 =3 : Ai =A1A2 An Ai .i1 in
P Ai =P(A1) + +P(An) +P Bn whereBn = Ai. i1 in
SinceBn Bn+1 wehaveP(Bn)P n1Bn =P()=0becauseAisaredisjoint.Given algebra A, letA=(A) be a -algebra generated by A, i.e. intersection of all -algebras that
contain A. It is easy to see that intersection of all such -algebras is itself a -algebra. Indeed, consider asequenceAi fori1suchthateachAi belongstoall-algebrasthatcontainsA.Then Ai belongstoallthese-algebrasandthereforetotheir intersection. i1
Letusrecallanimportantresultfrommeasuretheory.Theorem 1 (Caratheodoryextension)IfAisanalgebraofsetsand:A R isanon-negativecountablyadditivefunction on A, then can be extended to a measure on -algebra(A). If is -finite, then thisextension isunique.(-finitemeans that = Aifor disjointsequenceAi and (Ai)
-
8/13/2019 mit notes on theory of probability
8/125
ConsiderD1, . . . , Dn D.IfasequenceCij Aforj1approximatesDi,P(CijDi)0, j
then by properties 1 - 3, Cn := Cij approximates Dn := Di, which means that Dn D . LetD=i1Di.Then j in in
P(D) =P(Dn) +P(D\Dn)andobviouslyP(D\Dn)0asn .Therefore,D D andD isa-algebra.
3
-
8/13/2019 mit notes on theory of probability
9/125
Section 2Random variables and theirproperties. Expectation.Let (,A,P)be aprobabilityspace and(S,B) be a measurable spacewhereB isa-algebraof subsets ofS. ArandomvariableX : S isameasurablefunction,i.e.
B B=X1(B) A. WhenS =Rwewillusuallyconsidera-algebraB ofBorelmeasurablesetsgeneratedbysets (ai, bi](or,equivalently,generatedbysets(ai, bi)orbyopensets). inLemma 3 X : R is arandom variable ifffor alltR
{Xt}:={ :X()(, t]} A.Proof.Only directionrequiresproof.Wewillprovethat
D={DR:X1(D)A}isa-algebra.Sincesets(, t] D thiswill implythatB D.Theresultfollowssimplybecausetakingpre-imagepreservessetoperations.Forexample, ifweconsiderasequenceDi D fori1then
X1 Di = X1(Di) Ai1 i1
because X1(Di) A and A is a -algebra. Therefore, i1Di D. Other properties can be checkedsimilarly,soD isa-algebra.
LetusdefineameasurePX onB byPX =PX1, i.e.forB B,PX(B) =P(XB) =P(X1(B))=PX1(B).
(S,B,PX)iscalledthesamplespaceofarandomvariableX andPX iscalledthe lawofX.Clearly,onthisspacearandomvariable:S S definedbytheidentity(s) =shasthesamelawasX.
WhenS=R,afunctionF(t) =P(Xt)iscalledthecumulativedistributionfunction(c.d.f.)ofX.Lemma 4 F isac.d.f.of somer.v.X iff
1. 0F(t)1,2. F isnon-decreasing,right-continuous,
4
-
8/13/2019 mit notes on theory of probability
10/125
3. limtF(t) = 0, limt +F(t) = 1.Proof. The fact that any c.d.f. satisfies properties 1 - 3 is obvious. Let us show that F which satisfies
properties 1 - 3 is a c.d.f. of some r.v. X. Consider algebra A consisting of sets (ai, bi] for disjointintervalsandforalln1.LetusdefineafunctionPonAby in P (ai, bi] = F(ai)F(bi) .
in inOne can show that P is countably additive on A. Then, by Caratheodory extension Theorem 1, P extendsuniquely to a measure P on (A) =B - Borel measurable sets. This means that (R,B,P) is a probabilityspaceand,clearly,randomvariable X :R R definedbyX(x) =xhasc.d.f.P(Xt) =F(t). Belowwewillsometimesabusethenotationsand letF denotebothc.d.f.andprobabilitymeasureP.Alternativeproof.Consideraprobabilityspace([0,1],B, ),whereistheLebesguemeasure.Definer.v.X :[0,1] Rbythequantiletransformation
X(t)=inf{
x
R, F(x)
t}
.Thec.d.f.ofX is(t:X(t)a) =F(a)since
X(t)ainf{x:F(x)t} aan a, F(an)tF(a)t.
0 1
Figure2.1:Arandomvariabledefinedbyquantiletransformation.
Definition.Givenaprobabilityspace(,A,P)andar.v.X : S let(X)bea-algebrageneratedbyacollectionofsets{X1(B) :BB}.Clearly,(X) A.Moreover,theabovecollectionofsetsisitselfa-algebra.Indeed,considerasequenceAi =X1(Bi)forsomeBi B.Then
Ai = X1(Bi) =X1 Bi =X1(B)i1 i1 i1
whereB i1Bi B. (X) iscalledthe-algebragenerated byar.v.X.
0
X
1
11/2
Figure2.2:(X)generatedbyX.Example.Considerar.v.definedinfigure2.2.WehaveP(X =0)= 12,P(X=1)= 12 and 1 1
(X) = , 0,2 , 2,1 ,[0,1] .
5
-
8/13/2019 mit notes on theory of probability
11/125
Lemma 5 Consideraprobabilityspace(,A,P),ameasurablespace(S,B)andrandomvariablesX : SandY : R.Then thefollowing areequivalent:
1. Y =g(X)for some(Borel) measurablefunction g:S
R.2. Y : R ismeasurable on (, (X)), i.e.withrespect to the -algebragenerated byX.
Remark.ItshouldbeobviousfromtheproofthatRcanbereplacedbyanyseparablemetricspace.Proof. The fact that 1 implies 2 is obvious since for any Borel set BR the set B :=g1(B) B
and,therefore,{Y =g(X)B}={Xg1(B) =B}=X1(B)(X).
Letusshowthat2implies1.Forall integernandk considersets k k+ 1 k k+ 1An,k = :Y()
2n, 2n =Y1 2n, 2n .By 2, An,k
(X) =
{X1(B) : B
B}
and, therefore, An,k = X1(Bn,k) for some Bn,k B
. Let usconsiderafunction k
gn(X) =2n I(XBn,k).
kZByconstruction, |Y gn(X)| 21n since k k+ 1 k
Y()2n, 2n X()Bn,k gn(X()) = 2n.
It is easy to see that gn(x) gn+1(x) and, therefore, g(x) = limngn(x) is a measurable function on(S,B)and,clearly,Y =g(X).
Discrete random variables.Ar.v.X : S iscalleddiscrete ifPX({Si}i1)=1forsomesequenceSi S.Absolutely continuous random variables.Onameasurespace(S,B),ameasureP iscalledabsolutely continuousw.r.t.ameasure if
B B, (B) = 0 =P(B) = 0.Thefollowingisawellknownresultfrommeasuretheory.Theorem 2 (Radon-Nikodym)IfPandaresigma-finiteandPisabsolutelycontinuousw.r.t.thenthereexistsaRadon-Nikodymderivativef0such thatforallB B
P(B)= f(s)d(s).B
f isuniquelydefined up toa-nullsets.InatypicalsettingofS=Rk,aprobabilitymeasurePandLebesguesmeasure,f iscalledthedensityofthedistributionP.
Independence.Consideraprobabilityspace(,C,P)andtwo-algebrasA,B C.AandB arecalledindependentif
P(AB) =P(A)P(B) forall A A, B B.
6
-
8/13/2019 mit notes on theory of probability
12/125
-algebrasAi C forinareindependent ifP(A1 An) = P(Ai) forall Ai Ai.
i
n-algebrasAi C forinarepairwise independent if
P(AiAj) =P(Ai)P(Aj) forall Ai Ai, Aj Aj, i= j.RandomvariablesXi : S forinare(pairwise) independent if-algebras(Xi), inare(pairwise)independentwhichisjustanotherconvenientwaytostatethefamiliar
P(X1 B1, . . . , X n Bn) =P(X1 B1). . .P(Xn Bn)foranyeventsB1, . . . , Bn B.
Example.Consideraregulartetrahedrondie,Figure2.3,withred,greenandbluesidesandared-greenbluebase.Ifweroll this die then indicatorsofdifferentcolors provide an exampleofpairwise independentr.v.sthatarenotindependentsince
1 1P(r) =P(b) =P(g)= andP(rb) =P(rg) =P(bg) =
2 4but 31 1
P(rbg) =4 = P(r)P(b)P(g) = 2 .
b
br
r
g
g
Figure2.3:Pairwiseindependentbutnotindependentr.v.s.
Independenceof-algebrascanbecheckedongeneratingalgebras:Lemma 6 If algebrasAi, inare independent then-algebras(Ai)are independent.Proof.ObviousbyApproximationLemma2.Lemma 7 Considerr.v.sXi : Rona probabilityspace(,A,P).
1. Xisare independent iffP(X1 t1, . . . , X n tn) =P(X1 t1). . .P(Xn tn). (2.0.1)
2. If the lawsofXis havedensitiesfi(x) thenXis are independent iffajointdensityexistsandf(x1,...,xn) = fi(xi).
7
-
8/13/2019 mit notes on theory of probability
13/125
Proof.1isobviousbyLemma6because(2.0.1)impliesthesameequalityforintervalsP(X1 (a1, b1], . . . , X n (an, bn])=P(X1 (a1, b1]). . .P(Xn (an, bn])
and,therefore,forfiniteunionofdisjointsuch intervals.Tocheckthisfor intervals(forexample,forn=2)wecanwriteP(a1 < X1 b1, a2 < Xn b2)as
P(X1 b1, X2 b2)P(X1 a1, X2 b2)P(X1 b1, X2 a2) +P(X1 a1, X2 a2)= P(X1 b1)P(X2 b2)P(X1 a1)P(X2 b2)P(X1 b1)P(X2 a2) +P(X1 a1)P(X2 a2)= P(X1 b1)P(X1 a1) P(X2 b2)P(X2 a2) =P(a1 < X1 b1)P(a2 < X2 b2).
Toprove2westartwith =.P({Xi Ai}) = P(XA1 An) = fi(xi)dx
A1An
= fi(xi)dxi{
byFubinisTheorem}
= P(Xi
Ai).Ai in
Next,weprove= .Firstofall,by independence,P(Xi Ai)FubiniP(XA1 An) = = fi(xi)dx.
A1An
Therefore, the same equality holds for sets in algebra A that consists of finite unions of disjoint sets A1 An,i.e.
P(XB) = fi(xi)dxforBA.B
BothP(XB), fi(xi)dxarecountablyadditiveonAandfinite,BP(Rn) = fi(xi)dx= 1.Rn
BytheCaratheodoryextensionTheorem1,theyextenduniquelytoallBorelsetsB=(A),soP(B) = fi(xi)dxforB B.
B
Expectation.IfX : R isarandomvariableon(,A,P)thenexpectationofX isdefinedasEX= X()dP().
Inotherwords,expectationisjustanothertermforthe integralwithrespecttoaprobabilitymeasureand,asaresult,expectationhasalltheusualpropertiesoftheintegrals.Letusemphasizesomeofthem.Lemma 8 1. IfF is the c.d.f.ofX thenforany measurablefunctiong:R R,
Eg(x) = g(x)dF(x).R
2. IfX is discrete, i.e.P(X {xi}i1) = 1, thenEX = xiP(X=xi).
i1
8
-
8/13/2019 mit notes on theory of probability
14/125
3. IfX : Rk has adensityf(x)onRk andg:Rk R then Eg(X) = g(x)f(x)dx.
Proof.Allthesepropertiesfollowbymakingachangeofvariablesx=X()or=X1(x), i.e.Eg(X) = g(X())dP() = g(x)dP X1(x) = g(x)dPX(x),
R RwherePX =P X1 isthelawofX.Anotherwaytoseethiswouldbetostartwith indicatorfunctionsofsetsg(x)=I(xB)forwhich
Eg(X) =P(XB) =PX(B) = I(xB)dPX(x)R
and,therefore,thesame istrueforsimplestepfunctionsg(x) = wiI(xBi)
infordisjointBi.Byapproximation, thisistrueforanymeasurablefunctions.
9
-
8/13/2019 mit notes on theory of probability
15/125
Section 3Kolmogorovs Theorem aboutconsistent distributions.The notion of a general probability space (,A,P) and a random variable X : R on this space areratherabstractandoftenoneisreallyinterestedinthelawPX ofXonthesamplespace(R,B,PX).Onecanalwaysdefinearandomvariablewiththis lawbytakingX :R Rtobethe identityX(x) =x.Similarly,onecandefinearandomvectorX= (X1, . . . , X k)onRk bydefiningthedistributionontheBorel-algebraBk first.How canwe define adistribution on an infinitedimensionalspace or, inotherwords, how canwedefineaninfinitefamilyofrandomvariables
(Xt)tT RT = Rt ={f :T R}tT
for some infinite setT? Obviously,there are variouswaystodothat, forexample,wecandefineexplicitlyXt =cos(tU)forsomerandomvariableU.Inthissectionwewillconsideratypicalsituationwhenwestartbydefiningthedistributiononanyfinitesubsetofcoordinates, i.e.foranyfinitesubsetNT the lawPNof(Xt)tN ontheBorel-algebraBN onRN isgiven.Clearly,theselawsmustsatisfyanaturalconsistencyassumption:foranyfinitesubsetsNM andanyBorelsetB BN,
PN(B) =PM(BRMN). (3.0.1)Then the problem is to define asample space simultaneously for the entire family (Xt)tT, i.e. weneed todefine a -algebra A of measurable events in RT and a probability measure P on it that agrees with ourfinite dimensionaldistributionsPN. At thevery least,A should contain events expressed in termsof finitenumberofcoordinates, i.e.thefollowingalgebraofsetsonRT:
A={BRTN :B BN}.(It is easy to check that A is an algebra.) A set BRTN is called a cylinder and B is the base of thecylinder.
The
probability
P
on
such
sets
is
of
course
defined
by
by
P(BRTN) =PN(B).
Noticethat,byconsistencyassumption,Piswelldefined.GiventwofinitesubsetsN1, N2 T andB1 BN1,thesamesetcanberespresentedas
B1RTN1 = B1R(N1N2)\N1 RT\(N1N2).However, by consistency, Pwill not depend onthe representation. LetA=(A) be a-algebra generatedbyalgebraA,i.e.theminimal-algebrathatcontainsallcylinders.
Definition.A iscalledthecylindricalalgebraandA isthecylindrical-algebraonRT.Example.IfNT then{supi1Xi 1}isameasurableevent inA.
10
-
8/13/2019 mit notes on theory of probability
16/125
Theorem 3 (Kolmogorov)Forconsistentfamilyofdistributions(3.0.1),Pcanbeuniquelyextended toA.Proof.TousetheCaratheodoryextensionTheorem1,weneedtoshowthatP iscountablyadditiveonAor,equivalently,that itsatisfiescontinuityofmeasureproperty:givenasequenceBn
A,
Bn Bn+1, Bn ==P(Bn)0.n1
Wewillprovethatifthereexists >0suchthatP(Bn)> forallnthen n1Bn = .WehaveBn =CnRTNn, Nn - finitesubsetofT andCn BNn.
SinceBn Bn+1,wecanassumethatNn Nn+1.Firstofall,byregularityofmeasurePNn thereexistsacompactsetKn Cn suchthat
PNn(Cn\Kn) 2n+1.
Wehave,
CiRTNi \ KiRTNi (Ci\Ki)RTNi
in in inand,therefore,
P CiRTNi \ KiRTNi P (Ci\Ki)RTNiin in in
P(Ci\Ki)RTNi2i
+1 2
.
in inSinceP(Bn) =P inCiRTNi > thisimpliesthat
P KiRTNi 2 >0.
in
Wecan
write
KiRTNi = (KiRNnNi)RTNn =KnRTNnin in
whereKn = in(KiRNnNi) isacompactinRNn,sinceKn isacompactinRNn.WeprovedthatPNn(Kn) =P(KnRTNn) =P KiRTNi >0
inand,therefore,thereexistsapoint
n n nx = (x1, . . . , x , . . .)KnRTNn.NnWealsohavethefollowing inclusionproperty.Form>n,
mx
Km
RTNm
Kn
RTNnmand,therefore,(x1m,...,x )Kn.Anysequenceonacompacthasaconvergingsubsequence.Let{n1k}k1 beNn
n n 1k}k1 such11 k)k 2 (x1, . . . , xN1)K1.Thenwecantakeasubsequence{nk}k1 {n
.Byiteration,wecanfindasubsequence{nk }k1 {nsuchthat(x k, . . . , x1 N1
21k kthat(xn , . . . , xn )(x1, . . . , xN2)K21 N2 m1km }k1,
suchthatm mk k(xn ,...,xn )(x1,...,xNm)Km.1 Nm
Therefore,apoint (x1, x2, . . .) KnRTNn Bn,
n1 n1sothis lastsetisnotempty.
11
-
8/13/2019 mit notes on theory of probability
17/125
Section 4Laws of Large Numbers.Considerar.v.X andsequenceofr.v.s(Xn)n
1 onsomeprobabilityspace.WesaythatXn convergestoX
inprobabilityifforall >0lim P(|XnX| ) = 0.
nWesaythatXn convergestoX almostsurelyorwithprobability1 if
P(: lim Xn() =X())=1.n
Lemma 9 (Chebyshevs inequality)Ifa r.v. X0 thenfor t >0,EX
P(Xt) .t
Proof.EX=EXI(X < t) +EXI(X
t)
EXI(X
t)
tEI(X
t) =tP(X
t).
Theorem 4 (Weak lawof largenumbers)Considerasequenceofr.v.s(Xi)i1 thatarecentered,EXi = 0,havefinite secondmoments,EXi2 K
-
8/13/2019 mit notes on theory of probability
18/125
ThenXk 0inprobability,sincefor0<
-
8/13/2019 mit notes on theory of probability
19/125
Strong law of large numbers.Thefollowingsimpleobservationwillbeuseful.IfarandomvariableX0thenEX= P(Xx)dx.Indeed,0
x EX= xdF(x) = 1dsdF(x) = 1dF(x)ds= P(Xs)ds.0 0 0 0 s 0
ForX0suchthatEX
-
8/13/2019 mit notes on theory of probability
20/125
and ifk0 =min{k:k i}then
1
4 4 K
2k 1n(k)2 = 2k0(1
2) i2.n(k)i ki
Wecancontinue, 1 m+1 1 m+1
() =i2 x2dF(x) = i2 x2dF(x)
i1 mm m 1 m+1 m+1m+ 1 x2dF(x) xdF(x) =EX
-
8/13/2019 mit notes on theory of probability
21/125
Section 5Bernstein Polynomials. Hausdorff andde Finetti theorems.Let us look at some applications related to the law of large numbers. Consider an i.i.d. sequence of realvaluedr.v.(Xi)withdistributionP fromafamilyofdistributionsparametrizedbyRsuchthat
EXi =, 2():=Var(Xi)K 0,
|Eu(Xn)u()| E|u(Xn)u()|
= E
|u(Xn)
u()
| I(
|Xn
| )+I(
|Xn
|> )
max + 2 max u(x) > )|x||u(x)u()| x | |P(|Xn|() + 2u 1E(Xn)2 () +2uK,
2 n2where() isthemodulusofcontinuityofu.Letting=n 0sothatn2 finishestheproof.n
Example.Let(Xi)bei.i.d.withBernoullidistributionB()withprobabilityofsuccess[0,1],i.e.P(Xi =1)=, P(Xi =0)=1,
and letu:[0,1] Rbecontinuous.Then,bytheaboveTheorem,thefollowingBernsteinpolynomialsn k n k
nn Bn():=Eu(Xn) = u P Xi =k = u k(1)nk u()n n k k=0 i=1 k=0
uniformlyon [0,1].Example.Let(Xi)havePoissondistribution()withintensityparameter >0definedby
kP(Xi =k) =
k!e forintegerk0.Thenitiswellknown(andeasytocheck)thatEXi =, 2() =andthesumX1+. . .+Xn hasPoissondistribution(n).Ifuisboundedandcontinuouson [0,+)then
Eu(Xn) =
ukP n Xi =k= uk(n)ken u()n n k!
k=0 i=1 k=0
16
-
8/13/2019 mit notes on theory of probability
22/125
uniformlyoncompactsets.Moment problem.ConsiderarandomvariableX[0,1]and letk =EXk be itsmoments.Given
asequence(c0, c1, c2, . . .) letusdefineasequenceofincrementsbyck =ck+1ck.Thenk =kk+1 =E(XkXk+1) =EXk(1X),
()(k) = (1)22k =EXk(1X)EXk+1(1X) =EXk(1X)2andby induction
(1)rrk =EXk(1X)r.Clearly, (1)rrk 0 since X[0,1].If u isacontinuous functionon [0,1] andBn is its correspondingBernsteinpolynomialthen
n n
k n
k nEBn(X) = u EXk(1X)nk = u (1)nknkk.
n k n kk=0 k=0
SinceBn(X)convergesuniformlytou(X),EBn(X)convergestoEu(X).Letusdefine np
(n)= n (1)nknkk 0, p(n) =1(takeu=1).k kk
k=0Wecanthinkofpk(n) asthedistributionofar.v.X(n) suchthat
P X(n) = k =p(kn). (5.0.1)nWeshowedthat
EBn(X) =Eu X(n) Eu(X)for anycontinuous functionu.We will later see that by definitionthismeansthatX(n) convergesto X indistribution.Giventhemomentsofar.v.X,thisconstructionallowsustoapproximatethedistributionofX andexpectationofu(X).
Next,givenasequence(k),when is itthesequenceofmomentsofsome [0,1]valuedr.v.X?Bytheabove, itisnecessarythat
k 0, 0 = 1and(1)rrk 0forallk,r. (5.0.2)Itturnsoutthatthis isalsosufficient.Theorem 7 (Hausdorff)Thereexistsar.v.X[0,1] such that k =EXk iff (5.0.2) holds.Proof. The idea of the proof is as follows. If k are the moments of the distribution of some r.v. X, thenthediscretedistributionsdefined in(5.0.1)shouldapproximate it.Therefore,ourgoalwillbetoshowthatcondition (5.0.2) ensures that (pk(n)) is indeed a distribution and then show that the moments of (5.0.1)convergetok.Asaresult,any limitofthesedistributionswillbeacandidateforthedistributionofX.
Firstofall,letusexpressk intermsof(pk(n)).Sincek =k+1k wehavethefollowinginversionformula:
k = k+1k = (k+2k+1) + (k+1 + 2k)r
= k+22k+1 + 2k = r (1)rjrjk+j,j
j=0
17
-
8/13/2019 mit notes on theory of probability
23/125
= =
byinduction.Taker=nk.Thennk
nk
nk
nk
j n j (n)k = k+j (1)
n(k+j)n(k+j)k+j = pk+j.n nj=0 k+j j=0 k+jWehave nk k+j
j (nk)! (k+j)!(nkj)! kn nj!(nkj)! n!
k+j ksothat nk k+j n m
k = nk p(kn+)j = nkp(mn).j=0 k m=k k
By(5.0.2),pm(n) 0and mnpm(n) =0 =1sowecanconsiderar.v.X(n) suchthatP X(n) = m =p(n) for0
m
n.m
nWehave
n n n m m(m1) (mk+1) m(m 1) (m k+1)k (n) (n) n n n n n (n)k = npm = n(n1) (nk+1) pm = 1(11) (1k+1) pmm=k k m=k m=k n n
n
n (k) X(n) k.n mk
pm =E k n
m=0Any continuous function u can be approximated by (for example, Bernstein) polynomials so the limitlimnEu X(n) exists. By selection theorem that we will prove later in the course, one can choose asubsequenceX(ni) thatconvergestosomer.v.X indistributionand,asaresult,
kE X(ni) EXk =k,whichmeansthatk arethemomentsofX.
de Finettis theorem. Consider an exchangeable sequence X1, X2, . . . , X n, . . . of Bernoulli randomvariableswhichmeansthatforanyn1theprobability
P(X1 =x1,...,Xn =xn)dependsonlyonx1 +. . .+xn, i.e. itdoesnotdependontheorderof1sor0s.Anotherwaytosaythis isthatforanyn1andanypermutationof1, . . . , nthedistributionof(X(1), . . . , X (n))doesnotdependon.Thenthefollowingholds.Theorem 8 (deFinetti)Thereexists adistributionF on [0,1]such that 1
pk :=P(X1 +. . .+Xn =k) = n xk(1x)nkdF(x).k0
Thismeansthattogeneratesuchexchangeablesequencewecanfirstpickx[0,1]fromdistributionF andthengenerateasequenceofi.i.dBernoullirandomvariableswithprobabilityofsuccessx.Proof.Let0 =1andfork1define
k =P(X1 = 1,...,Xk =1). (5.0.3)
18
-
8/13/2019 mit notes on theory of probability
24/125
WehaveP(X1 = 1,...,Xk = 1, Xk+1 = 0) = P(X1 = 1,...,Xk =1)
P(X
1 = 1,...,X
k = 1, X
k+1 =1)
= kk+1 =k.Next,usingexchangeability
P(X1 = 1,...,Xk = 1, Xk+1 = 0, Xk+2 = 0) = P(X1 = 1,...,Xk = 1, Xk+1 =0) P(X1 = 1,...,Xk = 1, Xk+1 = 0, Xk+2 =1)= k(k+1) = 2k.
Similarly,by induction,P(X1 = 1,...,Xk = 1, Xk+1 = 0,...,Xn =0)=(1)nknkk 0.
BytheHausdorfftheorem,k =EXk forsomer.v.X[0,1]and,therefore,P(X1 = 1,...,Xk = 1, Xk+1 = 0,...,Xn = 0) = (1)nknkk 1
= EXk(1X)nk = xk(1x)nkdF(x).0
Since,byexchangeability,changingtheorderof1sand0sdoesnotaffecttheprobability,weget 1 P(X1 +. . .+Xn =k) = n xk(1x)nkdF(x).
k0
Example. (Polya urn model). Suppose we have b blue and r red balls in the urn. We pick a ball
+ c of the same color
b r
Pick
Figure5.1:Polyaurnmodel.randomlyandreturnitwithcballsofthesamecolor.Considerr.v.s
1 iftheithballpickedisblueXi = 0 otherwise.
Xisare
not
independent
but
exchangeable.
For
example,
b b+c r b r b+r
P(bbr) = , P(brb) =b+rb+r+cb+r+ 2c b+rb+r+cb+r+ 2c
areequal.To identifythedistributionF indeFinettistheorem,letus lookatitsmomentsk in(5.0.3),k =Pb. . .b = b b+c b+ (k1)c . b+rb+r+c b+r+ (k1)c
k timesOnecanrecognizeoreasilycheckthatk arethemomentsofBeta(, )distributionwiththedensity
(+)()()x1(1x)1
19
-
8/13/2019 mit notes on theory of probability
25/125
on [0,1] with parameters =b/c, =r/c. By de Finettis theorem, we can generate Xis by first pickingx fromdistributionBeta b/c, r/c andthengenerating i.i.d.Bernoulli(Xi)swithprobabilityofsuccessx.By strong law of large numbers, the proportion of blue balls in the first n repetitions will convergeto thisprobability
of
success
x,
i.e.
in
the
limit
it
will
be
random
with
Beta
distribution.
This
example
will
come
uponcemorewhenwetalkaboutconvergenceofmartingales.
20
-
8/13/2019 mit notes on theory of probability
26/125
Section 60 - 1 Laws. Convergence of randomseries.Considerasequence(Xi)i1 ofrealvaluedindependentrandomvariablesandlet (Xi)i1 bea-algebraofeventsgeneratedbythissequence,i.e.{(Xi)i1 B}forB inthecylindrical-algebraonRN.
Definition.AneventA (Xi)i1 iscalledataileventifA (Xi)in foralln1.Forexample,ifAi (Xi)then
Ai i.o.= Ain1in
isatailevent.Itturnsoutthatsucheventshaveprobability0or1.Theorem 9 (Kolmogorovs0-1 law) IfA isa tailevent then P(A) = 0 or1.Proof.
For
a
finite
subset
F
={i1, . . . , in}
N,
let
us
denote
by
XF = (Xi1, . . . , X in). A -algebra (Xi)i1 isgeneratedbyalgebra
{XF B:F- finiteN, B B(R|F|)}.By approximation lemma, we can approximate any event A (Xi)i1 by events in this generatingalgebra. Therefore, for any > 0 there exists a set A in this algebra such that P(AA) and bydefinitionA (X1,...,Xn)forlargeenoughn.Thisimplies
|P(A)P(A)| , |P(A)P(AA)| .SinceAisatailevent,A((Xi)in+1)whichmeansthatA, A areindependent,i.e.P(AA) =P(A)P(A).Weget
P(A)
P(AA) =P(A)P(A)
P(A)P(A)and letting 0provesthatP(A) =P(A)2.
Examples.1. i1Xi converges isatailevent,ithasprobability0or1.2.Considerseries i1Xizi onacomplexplane,zC.Itsradiusofconvergence is
1r=liminf Xi i.
i | |
Foranyx0,event{rx}is,obviously,atailevent.Thisimpliesthatr=constwithprobability1.
21
-
8/13/2019 mit notes on theory of probability
27/125
The Savage-Hewitt 0 - 1 law.Next we will prove a stronger result under more restrictive assumption that the r.v.s Xi, i 1 are
not only independent but also identically distributed with the law . Without loss of generality, we canassume that each Xi is given by the identity Xi(x) = x on its sample space (R,B, ). By Kolmogorovsconsistencytheoremtheentiresequence(Xi)i1 canbedefinedonthesamplespace(RN,B,P)whereBisthecylindrical-algebraandPisthemeasureguaranteedbytheCaratheodoryextensiontheorem.InourcaseXisarei.i.d.andP= iscalledtheinfiniteproductmeasure.Itwillbeconvenienttousethenotation((Xi)i1) for the cylindrical -algebra since similar notation can be used for the cylindrical -algebra onanysubsetofcoordinates.
Definition.AneventA((Xi)i1)- iscalledexchangeable/symmetric ifforalln1,(x1, x2, . . . , xn, xn+1, . . .)A=(xn, x2,...,xn1, x1, xn+1,...)A.
Inotherwords,thesetAissymmetricunderpermutationsofafinitenumberofcoordinates.Notethatanytaileventissymmetric.Theorem 10 (Savage-Hewitt0-1 law)IfA is symmetric thenP(A) = 0or1.Proof.Givenasequencex= (x1, x2, . . .)letusdefineanoperator
x= (xn+1, . . . , x2n, x1, . . . , xn, x2n+1, . . .)thatswitchesthefirstncoordinateswiththesecondncoordinates.SinceAissymmetric,
A={x:xA}=A.BytheApproximationLemma2forany >0forlargeenoughn,thereexistsAn (X1,...,Xn)suchthatP(AnA).Clearly,
Bn = An (Xn+1,...,X2n)and by i.i.d.
P(BnA) =P(AnA) = P(AnA),whichimpliesthatP (AnBn)A 2.Therefore,wecanconcludethat
P(A)P(An), P(A)P(AnBn) =P(An)P(Bn) =P(An)2whereweusedthefactthattheeventsAn, Bn aredefinedintermsofdifferentsetsofcoordinatesand,thus,are independent.Letting 0 impliesthatP(A) =P(A)2.
Example.LetSn =X1 +. . .+Xn andletr=limsupSnan.
n bnEvent{rx} issymmetricsincechanging theorderofany finiteset ofcoordinatesdoesnotaffectSn forlargeenoughn.Asaresult,P(rx) = 0or1,which impliesthatr=constwithprobability1.
Random series. We already saw above that, by Kolmogorovs 0-1 law, the series i1Xi for independent(Xi)i1 convergeswithprobability0or1.ThismeansthateitherSn =X1+. . .+Xn convergestoits limitS withprobabilityone,orwithprobabilityone itdoesnotconverge.Twosectionback,beforetheproofofthestronglawoflargenumbers,wesawtheexampleofasequencewhichwithprobabilityonedoesnot converge yet converges to 0 in probability. In case when with probability one Sn does not converge, isitstillpossiblethat itconvergestosomerandomvariable inprobability?Theanswer isnobecausewewillnowprovethatforrandomseriesconvergence inprobabilityimpliesa.s.convergence.
22
-
8/13/2019 mit notes on theory of probability
28/125
| |
Theorem 11 (Kolmogorovs inequality) Suppose that (Xi)i1 are independent andSn =X1 +. . .+Xn. Iffor alljn,
P(|SnSj| a)p a, 1
P max |Sj| x P(|Sn|> xa).1jn 1p
Proof. First of all, let us notice that this inequality is obvious without the maximum because (6.0.1) isequivalentto1pP(|SnSj|< a)andwecanwrite
(1p)P |Sj| x P |SnSj|< a P |Sj| x= P |SnSj| xa).
Theequalityistruebecauseevents{|Sj| x}and{|SnSj|< a}areindependentsincethefirstdependsonlyonX1,...,Xj andthesecondonlyonXj+1,...,Xn.The last inequality istruesimplybytriangle inequality.Todealwiththemaximum,insteadoflookingatanarbitrarypartialsumSj wewilllookatthefirstpartialsum that crosses level x. We define that first time by = min{j n : |Sj| x} and let = n+1 if allSj x
a,
n)
P(
|Sn
|> x
a)
andnoticethatnisequivalenttomaxjn|Sj| x.
Theorem 12 (Kolmogorov)If theseries i1Xi converges inprobability then itconverges almostsurely.Proof. Suppose that partial sums Sn converge to some r.v. S in probability, i.e. for any > 0, for largeenoughnn0()wehaveP(|SnS| ).Ifkjnn0()then
P(|SkSj| 2)P(|SkS| ) +P(|Sj S| )2.Next,weuseKolmogorovsinequalityforx= 4anda= 2(weletpartialsumsstartatn):
1 2P max 123,njk|Sj Sn| 4 12P(|SkSn| 2)forsmall.Theevents{maxnjk|Sj Sn| 4}areincreasingask andbycontinuityofmeasure
P max 3.nj |Sj Sn| 4
Finally,sinceP(SnS)wegetP max 4.
nj |Sj S| 5
23
-
8/13/2019 mit notes on theory of probability
29/125
This kind ofmaximalstatementaboutany sequenceSj is actuallyequivalentto itsa.s.convergence.Toseethistake= m12,taken(m) =n0()andconsideranevent
5= max .Am n(m)j|Sj S| m2
Weprovedthat 4P(Am)
-
8/13/2019 mit notes on theory of probability
30/125
Example.Considerrandomseries i1 i whereP(i =1)= 21.Wehavei
i
2
1 1E
i =
i2
2,
i1 i1sotheseriesconvergesa.s.forsuch.
25
-
8/13/2019 mit notes on theory of probability
31/125
Section 7Stopping times, Walds identity.Another proof of SLLN.Consider a sequence (Xi)i1 of independent r.v.s and an integer valued random variable V {1,2, . . .}.WesaythatV is independent of thefuture if{V n} is independentof((Xi)in+1).WesaythatV isastopping time(Markovtime) if{V n} (X1, . . . , X n)foralln.Clearly,astoppingtime is independentofthefuture.AnexampleofstoppingtimeisV =min{k1, Sk 1}.
SupposethatV is independentofthefuture.WecanwriteESV = ESVI(V =k) = ESkI(V =k)
k1 k1= EXnI(V =k)(=) EXnI(V =k) = EXnI(V n).
k1nk n1kn n1In(*)wecaninterchangetheorderofsummationif,forexample,thedoublesequenceisabsolutelysummable,by Fubini-Tonelli theorem. Since V is independent of the future, the event {V n} = {V n1}c isindependentof(Xn)andweget
ESV = EXnP(V n). (7.0.1)n1
This impliesthefollowing.Theorem 14 (Walds identity.) If (Xi)i1 are i.i.d., E|X1|
-
8/13/2019 mit notes on theory of probability
32/125
where dist meansequality in distribution.=Proof.GivenasubsetNNandsequences(Bi)and(Ci)ofBorelsetsonR,defineevents
A= V N, X1 B1,...,XV BVandforanyk1,
D= XV+1 C1,...,XV+k Ck .Wehave,
P(DA) = P(DA{V =n}) = P(DnA{V =n})n1 n1
whereDn ={Xn+1 C1, . . . , X n+k Ck}.
Theintersectionofevents, nNA{V =n}= {V =n, X1 B1, . . . , X n Bn}, otherwise.
SinceV isastoppingtime,{V =n} (X1, . . . , X n)andA{V =n} (X1, . . . , X n).Ontheotherhand,Dn (Xn+1, . . .)and,asaresult,
P(DA) = P(Dn)P(A{V =n}) = P(D0)P(A{V =n}) =P(D0)P(A),n1 n1
andthisfinishestheproof.Remark.Onecouldbea littlebitmorecarefulwhentalkingabouttheeventsgeneratedbyavector
(V, X1, . . . , X V)thathasrandomlength.Intheproofweimplicitlyassumedthatsucheventsaregeneratedbyevents
A= V N, X1 B1,...,XV BVwhich is a rather intuitive definition. However, one could be more formal and define a -algebra of eventsgeneratedby(V, X1, . . . , X V)aseventsAsuchthatA{V n} (X1, . . . , X n)foranyn1.Thismeansthat when V n the event A is expressed only in terms of X1, . . . , X n. It is easy to check that with thismoreformaldefinitiontheproofremainsexactlythesame.
LetusgiveoneinterestingapplicationofMarkovpropertyandWalds identitythatwillyieldanotherproofofstrong lawof largenumbers.Theorem 16 Suppose that (Xi)i1 are i.i.d. such that EX1 >0. If Z = infn1Sn then P(Z >) = 1.(Partial
sums
can
not
drift
down
to
if
EX1
>
0.
Of
course,
this
is
obvious
by
SLLN.)
Proof.Letusdefine(seefigure7.1),
1 =min{k1, Sk 1}, Z1 =minSk, S(2) =S1+kS1,kk1k(2) 1 Z2 k(2) k(3) (2) 22 =min k1, S , =minS , S =S2+kS(2).k2
Byinduction, =min k1, S(n) , = minS(n), S(n+1) =S(n+) kS(n).n k 1 Zn kn k k n n
Z1,...,Zn arei.i.d.byMarkovproperty.27
-
8/13/2019 mit notes on theory of probability
33/125
0
1
1
!1
!2z2
0z1
Figure7.1:Asequenceofstoppingtimes.Noticethat,byconstruction,S1+ +n1 n1and
Z= inf Sk =inf{Z1, S1 +Z2, S1+2 +Z3,...}.k1
Wehave, {Z N}= {S1+...+k1 +Zk N} {k1 +Zk N}.k1 k1
Therefore,P(Z N) P(k1 +Zk N) = P(Zk Nk+1)
k1 k1= P(Z1 Nk+ 1)= P(Z1 j)NP(Z1 j) | | 0
k1 jN jNifwecanshowthatE|Z1|with probability one. This means that for all n 1, Sn +n M > for some large enough M.Dividingbothsidesbynand lettingn weget
Snliminf
n nwith probability one. We can then let 0 over some sequence. Similarly, we prove that limsupSkk 0withprobabilityone.
28
-
8/13/2019 mit notes on theory of probability
34/125
Section 8Convergence of Laws. SelectionTheorem.Inthissectionwewillbeginthediscussionofweakconvergenceofdistributionsonmetricspaces.Let(S, d)beametricspacewithametricd.Considerameasurablespace(S,B)withBorel-algebraBgeneratedbyopensetsand let(Pn)n1 andPbeprobabilitydistributionsonB.Wedefine
Cb(S) ={f :S R- continuousandbounded}.WesaythatPn Pweakly if
f dPn f dP forall fCb(S).Theorem 18 IfS=R then Pn P iff
Fn(t) =Pn , t F(t) =P , tfor anypointofcontinuity tof F(t).Proof.= Letusapproximatean indicatorfunctionbyacontinuousfunctionsas infigure8.1,i.e.
1(X)I(Xt)2(X), 1, 2 Cb(R).Forconvenienceof notations, instead ofwriting integrals w.r.t. Pn we willwriteexpectations ofa r.v.Xn
! !!!!
""
"#
$!%
Figure8.1:Approximating indicator.withdistributionPn.
P(Xt)E1(X)E1(Xn)Fn(t) =P(Xn t)E2(Xn)E2(X)P(Xt+)asn .Therefore,forany >0,
F(t)liminfFn(t)limsupFn(t)F(t+).Sincet isapointofcontinuityofF, letting 0provestheresult.
29
-
8/13/2019 mit notes on theory of probability
35/125
=LetP C(F)bethesetofpointsofcontinuityofF.SinceF ismonotone,thesetP C(F)isdenseinR.TakeM largeenoughsuchthatbothM,MP C(F)andP([M, M]c).Clearly,forlargeenoughk wehavePk([M, M]c)2.Foranyn >1,takeasequenceofpoints
M =x1n x2n xnn =Mn nsuch that all xi P C(F) and maxi|xi+1 xi | 0 as n . Given a function f Cb(R), consider an
approximatingfunctionfn(x) = f(xi)I(x(xin1, xin]) + 0 I(x /[M, M]).
1
-
8/13/2019 mit notes on theory of probability
36/125
F(x) is a c.d.f. on Rk (exercise). The fact that Pn are uniformly tight ensures that F(x) 0 or 1 if allxi or+.LetxbeapointofcontinuityofF(x)andleta, bAsuchthatai < xi < bi foralli.Wehave,
F(a)
Fn(k)(a)
Fn(k)(x)
Fn(k)(b)
F
(b)
ask .SincexisapointofcontinuityandA isdense,
F(a)a F(b)bx F(x), x F(x),and this proves that Fn(k)(x) F(x) for all such x. Similarly to one-dimensional case one can show thatforanyfCb(Rk),
f dFn(k) fdF.Proof of Theorem 19. If K is a compact then Cb(K) =C(K). Later in these lectures, when we deal inmoredetailwithconvergenceongeneralmetricspaces,wewillprovethefollowingfactwhichiswell-knownandisaconsequenceoftheStone-Weierstrasstheorem.
Fact.C(K)
is
separable
w.r.t.
norm||f|| =supxK|f(x|.
Even though we are proving Selection theorem for a general metric space, right now we are mostlyinterested in thecase S =Rk wherethis fact is asimple consequenceof the Weierstrasstheoremthat anycontinuousfunctioncanbeapproximatedbypolynomials.
SincePn areuniformlytight, foranyr1wecanfindacompactKr suchthatPn(Kr)>1 1.LetrCr C(Kr)beacountableanddensesubsetofC(Kr).ByCantorsdiagonalizationargumentthereexistsasubsequence(n(k)) suchthatPn(k)(f) converges forallf Cr for all r1.Since Cr isdense inC(Kr)thisimpliesthatPn(k)(f)convergesforallfC(Kr)forallr1.Next,foranyfCb(S),
Pn(k)(Kc) ||f||r r .f dPn(k) f dPn(k) Kc |f|dPn(k) ||f||Kr r
This impliesthatthe limitI(f):= lim f dPn(k) (8.0.1)
kexists.ThequestioniswhythislimitisanintegraloversomeprobabilitymeasureP?OneachofthecompactsKr we could use Rieszs representation theorem for continuous functionalson C(Kr) and thenextendthisrepresentationtotheunionofKr.Instead,wewillprovethisasaconsequenceofamoregeneralresult,theStone-Danielltheoremfrommeasuretheory,whichsaysthefollowing.
Afamilyoffunction={f :S R} iscalledavector latticeiff, g=cf+g,cR and fg, fg.
AfunctionalI : R iscalledapre-integralif1. I(cf+g) =cI(f) +I(g),2. f0, I(f)0,3. fn 0,||fn||
-
8/13/2019 mit notes on theory of probability
37/125
OnanycompactKr,fn 0uniformly,i.e.n,r n||fn||,Kr 0.
Since 1fndPn(k) = fndPn(k) n,r +rf1||,KrKrc
weget 1
I(fn)= lim fndPn(k) n,r +r||f1||.k
Lettingn andr wegetthatI(fn)0.BytheStone-DanielltheoremI(f) = f dP
for some measure on (Cb(S)). The choice of f = 1 gives I(f) = 1 = P(S) which means that P is aprobabilitymeasure.Finally,letusshowthat(Cb(S))=B- Borel-algebrageneratedbyopensets.Sinceany f Cb(S) is measurable onB we get (Cb(S)) B. On the other hand, let F S be any closed setandtakeafunctionf(x)=min(1, d(x, F)).Wehave, |f(x)f(y)| d(x, y)sofCb(S)and
f1({0})(Cb(S)).However,sinceF isclosed,f1({0}) ={x:d(x, F) = 0}=F andthisprovesthatB (Cb(S)).
Theorem 21 IfPn convergesweakly toP on Rk then(Pn)n1 isuniformly tight.Proof.Forany >0thereexistslargeenoughM >0,suchthatP(|x|> M)2M (x)dPn (x)dPP |x|> M .Forn largeenough,nn0,wegetPn(|x|>2M)2.Forn < n0 chooseMn sothatPn(|x|> Mn)2.TakeM =max{M1, . . . , M n01,2M}.Asaresult,Pn(|x|> M)2foralln1.
Lemma 13 Iffor any sequence (n(k))k1 there exists a subsequence (n(k(r)))r1 such that Pn(k(r)) Pweakly thenPn Pweakly.Proof. Suppose not. Then for some f Cb(S) and for some >0 there exists a subsequence (n(k)) suchthat f dPn(k) f dP>.ButthiscontradictsthefactthatforsomesubsequencePn(k(r)) Pweakly.Considerr.v.sX andXn onsomeprobabilityspace(,A,P)withvaluesinametricspace(S, d).LetPandPn be their corresponding laws on Borel sets B in S. Convergence of Xn to X in probability and almostsurely isdefinedexactlythesamewayasforS=Rbyreplacing|XnX|withd(Xn, X).
32
-
8/13/2019 mit notes on theory of probability
38/125
Lemma 14 Xn X in probability ifffor any sequence (n(k)) there exists a subsequence (n(k(r))) suchthatXn(k(r)) X
a.s.
Proof.
=.
Suppose
Xn doesnotconvergetoX inprobability,Thenforsmallenough >0thereexistsasubsequence(n(k))suchthat
P d(X, Xn(k)) .ThiscontradictstheexistenceofsubsequenceXn(k(r)) thatconvergestoX a.s.
= .Givenasubsequence(n(k)) letuschoose(k(r))sothat 1 1P d(Xn(k(r)), X)
r r2.ByBorel-Cantelli lemma,theseeventscanoccur i.o.withprobability0,whichmeansthatwithprobabilityoneforlargeenoughr
1d(Xn(k(r)), X) ,r
i.e.Xn(k(r)) X a.s.Lemma 15 Xn X in probability then Xn X weakly. Proof.ByLemma14,foranysubsequence(n(k))thereexistsasubsequence(n(k(r)))suchthatXn(k(r)) X a.s.GivenfCb(R),bydominatedconvergencetheorem,
Ef(Xn(k(r))) Ef(X),i.e.Xn(k(r)) X weakly.ByLemma13,Xn X weakly.
33
-
8/13/2019 mit notes on theory of probability
39/125
Section 9Characteristic Functions. CentralLimit Theorem on R.LetX = (X1, . . . , X k)bearandomvectoronRk withdistributionPand lett= (t1, . . . , tk)Rk.CharacteristicfunctionofX isdefinedby
f(t) =Eei(t,X) = ei(t,x)dP(x).IfX hasstandardnormaldistributionN(0,1)andRthen
2 2 22 2 2 2EeX =1
2exx dx=e 1
2e(x)
2dx=e .
Forcomplex=it,consideranalyticfunction21
(x) =eitx2
ex2 for xC.ByCauchystheorem, integraloveraclosedpath isequalto0.Letustakeaclosedpathx+i0forx fromto+andx+itforxfrom+to.Then
2f(t) = 1 eitxx dx= 1 eit(it+x)21(it+x)2dx
2 2 2 2 2 2 2 2 2
2 2 2 2 2= 1
et +itx+21t itx21x dx=et 1 ex dx=et . (9.0.1)IfY hasnormaldistributionN(m, 2)then
EeitY =Eeit(m+X ) =eitmt222 .rLemma 16 IfX is areal-valuedr.v.such thatE|X|
-
8/13/2019 mit notes on theory of probability
40/125
bydominatedconvergencetheorem.ThismeansthatfC(R).Ifr= 1,E|X|
-
8/13/2019 mit notes on theory of probability
41/125
IfPhasdensitypthenPQ(A) = I(x+yA)p(x)dxdQ(y) = I(zA)p(zy)dzdQ(y)
= p(zy)dzdQ(y) = p(zy)dQ(y) dzA A
whichmeansthatPQhasdensity f(x) = p(xy)dQ(y). (9.0.2)
If,inaddition,Qhasdensityq then f(x) = p(xy)q(y)dy.
DenotebyN(0, 2I)thelawoftherandomvectorX= (X1, . . . , X k)ofi.i.d.N(0, 2)randomvariableswhosedensityonRk is
k 1 2 1 2 1 1 k2e22xi = 2 e22|x| .
i=1ForadistributionPdenoteP =P N(0, 2I).Lemma 18 P =P N(0, 2I) hasdensity 1k 2 2
p(x) = f(t)ei(t,x) 2 |t| dt2
where f(t) = ei(t,x)dP(x).Proof.By(9.0.2),P N(0, 2I)hasdensity 1 k
p(x) = 2 e2
12|xy|2dP(y).
Using(9.0.1),wecanwritee212(xiyi)2 = 1
2ei1(xiyi)zie21zi2dzi
andtakingaproductoverikweget1 2 1 k 1 2
e22|xy| = 2
ei1(xy,z)e2|z| dz.
Thenwe
can
continue
(xy,z)21|z|2p(x) = 1 k ei1 dzdP(y)
2 1 k= ei1(xy,z)21|z|2dP(y)dz
2 1 k zei1(x,z) 21= f 2|z| dz.
2 Letz=t.
36
-
8/13/2019 mit notes on theory of probability
42/125
0
Theorem 23 (Uniqueness) Ifi(t,x)dP(x) = i(t,x)dQ(x)e e
thenP=Q.Proof.BytheaboveLemma,P =Q.IfXPandN(0, I)thenX+X almostsurelyasand,therefore,P Pweakly.Similarly,Q Q.
Weprovedthatthecharacteristic functionofSn/nconvergestothec.f.ofN(0, 2).Also,thesequenceL Sn
n - isuniformlytight,n1sincebyChebyshevsinequality
P Snn
2> M <
M2for largeenoughM.TofinishtheproofoftheCLTonthereal lineweapplythefollowing.Lemma 19 If(Pn) isuniformly tight and
fn(t) = eitxdPn(x) f(t)thenPn P andf(t) = eitxdP(x).Proof. For any sequence (n(k)), by Selection Theorem, there exists a subsequence (n(k(r))) such thatPn(k(r)) convergesweaklytosomedistributionP.Sinceei(t,x) isboundedandcontinuous,
i(t,x)
dPn(k(r)) i(t,x)dP(x)e eas r and, therefore, f is a c.f. of P. By uniqueness theorem, distribution P does not depend on thesequence(n(k)).ByLemma13,Pn Pweakly.
37
-
8/13/2019 mit notes on theory of probability
43/125
Section 10Multivariate normal distributions andCLT.LetPbeaprobabilitydistributiononRk andlet
g(t) = ei(t,x)dP(x).WeprovedthatP =P N(0, 2I)hasdensity
p 12(x)=(2)k g(t)ei(t,x) 2|t|2dt.Lemma 20 (Fourier inversionformula)If |g(t)|dt
-
8/13/2019 mit notes on theory of probability
44/125
It isnowasimpleexercisetoshowthatforanyboundedopensetU,dP(x) = p(x)dx.
U UThismeansthatPrestrictedtoboundedsetshasdensityp(x)and,hence,onentireRk.
ForarandomvectorX= (X1, . . . , X k)Rk wedenoteEX= (EX1, . . . ,EXk).ofi.i.d.randomvectorsonRk suchthatEX1 = 0,EX1 2Theorem 24 Considerasequence(Xi)i1
ThenL Sn converges weakly to distribution Pwhichhas characteristicfunctionn
-
8/13/2019 mit notes on theory of probability
45/125
foranysetRk wecanwriteP(Ag)=P(gA1)=
A
1
1
2k
exp
2
1|x|2dx.
Letusnowmakethechangeofvariablesy=Axorx=A1y.Then 1 k 1 1P(Ag)=
2 exp 2|A1y|2 |det(A)|dy.Butsince
det(C)=det(AAT)=det(A) det(AT)=det(A)2wehave |det(A)|= det(C).Also
|A1y|2 = (A1y)T(A1y) =yT(AT)1A1y=yT(AAT)1y=yTC1y.Therefore,weget k 1 1 1P(Ag)=
2 det(C)exp 2yTC1y dy.ThismeansthatthedistributionN(0, C)hasthedensity 1 k 1 1
2 det(C)exp 2yTC1y .
General case. Letustake, forexample,avectorX =QD1/2g for i.i.d.standardnormalvector g sothatX N(0, C).Ifq1, . . . , q k arethecolumnvectorsofQthen
X=QD1/2g= (11/2g1)q1 +. . .+ (1k/2gn)qk.Therefore,intheorthonormalcoordinatebasisq1, . . . , q k arandomvectorXhascoordinates11/2g1, . . . , k1/2gk.These coordinates are independent with normal distributions with variances 1, . . . , k correspondingly. When det(C) = 0, i.e. C is not invertible, some of its eigenvalues will be zero, say, n+1 = . . . = k = 0.Then the random X vector will be concentrated on the subspace spanned by vectors q1, . . . , q n but it willnothavedensityontheentirespaceRk.Onthesubspacespannedbyvectorsq1, . . . , q n avectorX willhaveadensity
n 1 x2i f(x1, . . . , xn) = 2i exp 2i .i=1
Letuslookatacoupleofpropertiesofnormaldistributions.Lemma 21 IfX N(0, C)onRk andA:Rk Rm is linear thenAX N(0,ACAT)onRm.Proof.Thec.f.ofAX is
Eei(t,AX) =Eei(ATt,X) =e21(CATt,ATt) =e21(ACATt,t).
Lemma 22 X isnormalonRk iff(t, X) isnormalonRforalltRk.
40
-
8/13/2019 mit notes on theory of probability
46/125
Proof.= .Thec.f.ofreal-valuedrandomvariable(t, X)isf() =Eei(t,X) =Eei(t,X) =e21(Ct,t) =e212(Ct,t)
whichmeansthat(t, X) N(0,(Ct,t)). =.If(t, X) isnormalthen
1
Eei(t,X) =e2(Ct,t)becausethevarianceof(t, X) is(Ct,t).
Lemma 23 Let Z = (X, Y) where X = (X1, . . . , X i) and Y = (Y1, . . . , Y j) and suppose that Z is normalonRi+j. ThenX and Y are independent iff Cov(Xm, Yn) = 0forallm,n.
Proof.Onewayisobvious.Theotherwayaround,supposethatD 0
C=Cov(Z) = .0 F
Thenthec.f.ofZ isEei(t,Z) =e21(Ct,t) =e21(Dt1,t1)21(F t2,t2) =Eei(t1,X)Eei(t2,Y),
wheret= (t1, t2).Byuniqueness,X andY are independent.
Lemma 24 (ContinuousMapping.)SupposethatPn PonX andG:X Y isacontinuousmap.ThenG1 P G1 onY. Inotherwords, ifr.v.Zn
Z weakly thenG(Zn
) G(Z)weakly.Pn
Proof.This isobvious,becauseforanyfCb(Y),wehavefGCb(X)andtherefore,Ef(G(Zn)) Ef(G(Z)).
Lemma 25 IfPn PonRk andQn Q onRm thenPnQn PQonRk+m.Proof.ByFubinitheorem,Thec.f.
ei(t,x)dPnQn(x) = ei(t1,x1)dPn ei(t2,x2)dQn ei(t1,x1)dP ei(t2,x2)dQ= ei(t,x)dPQ.ByLemma19itremainstoshowthat(PnQn) isuniformlytight.ByTheorem21,sincePn P, (Pn) isuniformly tight. Therefore, there exists a compact K on Rk such that Pn(K) > 1. Similarly, for somecompactK onRm,Qn(K)>1.Wehave,
PnQn(KK)>12andKK isacompactonRk+m.
Corollary 1 IfPn Pand Qn QbothonRk then PnQn PQ. Proof. Since a function G : Rk+k Rk given by G(x, y) = x+y is continuous, by continuous mappinglemma,
PnQn = (PnQn)G1 (PQ)G1 =PQ.
41
-
8/13/2019 mit notes on theory of probability
47/125
Section 11Lindebergs CLT. Levys EquivalenceTheorem. Three Series Theorem.Insteadofconsideringi.i.d.sequences,foreachn1wewillconsideravector(X1n, . . . , X n)ofindependentnr.v.,notnecessarilyidenticallydistributed.Thissettingiscalledtriangulararraysbecausetheentirevectormaychangewithn.Theorem 25 Consideravector (Xin)1in of independentr.v.ssuch that
EXin = 0, Var(Sn) = E(Xin)2 = 1.
inSuppose that thefollowingLindebergscondition issatisfied:
n
E(Xin)2I(|Xin|> )0 asn for all >0. (11.0.1)i=1
ThenL iinXn N(0,1).Proof.Firstofall, L inXn isuniformlytight,becausebyChebyshevs inequalityi
P Xni 1> M M2 in
22forlargeenoughM.Itremainstoshowthatthecharacteristic functionofSn covergestoe .Forsimplicity
ofnotationsletusomittheupper indexnandwriteXi insteadofXi
n.Since,EeiSn = EeiXi
init isenoughtoshowthat 2iSn iXilogEe = log 1 + Ee 1 .
2 (11.0.2)It isaneasyexercisetoprove,byinductiononm,thatforanyaR,
m+1|a|(m+1)!.
(ia)kia (11.0.3)e k!
km
42
-
8/13/2019 mit notes on theory of probability
48/125
(Justintegratethis inequalitytomaketheinductionstep.)Usingthisform= 1,EeiXi
1 = EeiXi
1
iEXi
2 2 2 1EXi2 2 + EXi2I(
|Xi
|> )
22 (11.0.4)
2
22
2
for largenby(11.0.1)andforsmallenough.Usingtheexpansionof log(1+z) itiseasytocheckthat
12log(1+z)z for |z| |z| 2
and,therefore,wecanwriteEe 4 22EeiXi EeiXi 1 iXi 1 EXi2log 1 + 1 4
in in in4 4
EXi2 EXi2 EXi2 0max = max 4 4in inin
because,asin(11.0.4),EX2 2 +EX2I(|Xi|> )0i i
for largenby(11.0.1)andfor 0.Finally,toshow(11.0.2)itremainstoshowthat2
EeiXi 1 . 2
inUsing(11.0.3)form=1,ontheevent |Xi|> ,
eiXi 1iXi I 2X2iI (|Xi|> Xi > | | 2and,therefore, 2iXi 1iXi + X2i 2X2iII Xi > Xi > | | | |e .2Using(11.0.3)form= 2,ontheevent |Xi| ,
eiXi 1iXi +22 X2i
3 3Xi|3I Xi2.I |Xi| Xi| 6 | | 6
Combiningthe lasttwoequationsandusingthatEXi = 0,Ee 2EX2i2 3iXi EXi2 EXi2.1 + I |Xi|> +2 6
Finally,=2 2EeiXi iXi EXi21 1 +Ee + 2 2
in in in3
+2 EXi2I(Xi > ) 0 | | 6asn (usingLindebergscondition)and0.
Lemma 26 IfP,Q are distributions on Rsuch thatPQ=P then Q({0}) = 1.
43
-
8/13/2019 mit notes on theory of probability
49/125
| | | |
Proof.Letusdefine fP(t) = eitxdP(x), fQ(t) = eitxdQ(x).
Thecondition
P
Q=PimpliesthatfP(t)fQ(t) =fP(t).SincefP(0)=1andfP(t)iscontinuous,forsmallenough t wehave fP(t) >0and,asaresult,fQ(t) = 1.SincefQ(t)= cos(tx)dQ(x) +i sin(tx)dQ(x)
for |t| this impliesthat cos(tx)dQ(x)=1andsincecos(s)1thiscanhappenonly ifQ x:xt= 0mod2 =1forall|t| .
Takes, tsuchthat |s|,|t| ands/t is irrational.Forxtobe inthesupportofQwemusthavexs= 2kandxt= 2mforsome integerk,m.Thiscanhappenonlyifx= 0.
Theorem 26 (Levysequivalence)If(Xi) isasequenceofindependentr.v.then Xi convergesa.s.iffinprobability iff in law. i1Proof.Wealreadyproved(aKolmogorovstheorem)thatconvergenceinprobabilityimpliesa.s.convergence.Itremainstoproveonlythatconvergence inlaw impliesconvergenceinprobability.
Suppose that L(Sn) P. Convergence in law implies that {L(Sn)} is uniformly tight which easilyimpliesthat{L(SnSk)}n,k1 isuniformlytight.Thiswill implythatforany >0
P(|SnSk|> )< (11.0.5)for nkN for large enough N. Suppose not. Then there exists >0 and sequences (n(l)) and (n(l))suchthatn(l)n(l)and
P(|S
n(l)
Sn(l)|
> )
.Let us denote Yl = Sn(l) Sn(l). Since {L(Yl)} is uniformly tight, by selection theorem, there exists asubsequence(l(r))suchthatL(Yl(r))Q.Since
Sn(l(r)) =Sn(l(r)) +Yl(r) and L(Sn(l(r))) =L(Sn(l(r))) L(Yl(r))lettingr wegetthatP=PQ.BytheaboveLemma,Q({0}) = 1,whichimpliesthatP(|Yl(r)|> )forlarger- acontradiction.Once(11.0.5)isproved,byBorel-Cantelli lemmawecanchoosea.s.convergingsubsequenceasinKolmogorovstheorem,andthenby(11.0.5)Sn convergesinprobabilitytothesamelimit.
Theorem 27 (Threeseries theorem)Let(Xi)i1 beasequenceof independentr.v.and letZ
i =X
iI(
|X
i| 1).
Then i1Xi converges iff1. i1P(|Xi|>1)
-
8/13/2019 mit notes on theory of probability
50/125
byBorel-Cantelli lemmaP({Xi =Zi} i.o.)=0whichmeansthat Xi converges iff Zi converges. i1 i1By2,itisenoughtoshowthat i1(ZiEZi)converges,butthisfollowsfromTheorem13by3.
=.If i1Xi convergesa.s.,P({|Xi|>1}i.o)=0
andsince(Xi)areindependent,byBorel-Cantelli,P(|Xi|>1) ) for any > 0 for m, n large enough. Suppose that
| |i
1Var(Zi) =
.Then 2 =Var(Smn)= Var(Zk) mn
mknas n for any fixed m. Intuitively, this should not happen: Smn 0 in probability but their variancegoesto infinity.Inprinciple,onecanconstructsuchsequenceofrandomvariablesbut inourcaseitwillberuledoutbyLindebergsCLT.Becausemn ,Lindebergstheoremwillimplythat,
SmnESmn ZkEZkTmn = =
mn mn N(0,1),mkn
ifm, n and2 .Weonlyneedtocheckthatmn
2
E ZkEZk I ZkEZk> 0mn mn mkn
asm,n,nm .Since|ZkEZk|
-
8/13/2019 mit notes on theory of probability
51/125
Section 12Levys Continuity Theorem. PoissonApproximation. ConditionalExpectation.
Letusstartwiththefollowingbound.Lemma 27 LetX beareal-valuedr.v.with distribution Pand let
f(t) =EeitX = eitxdP(x).Then,
1
7
u
P |X|>u u
0 (1Ref(t))dt.
Proof.Since Ref(t)= costxdP(x)
wehaveu u
1 1(1costx)dP(x)dt = (1costx)dtdP(x)
u u0 R R 0 sinxu
= 1xu dP(x)
R sinxu 1 xu dP(x) siny sin1 |xu|1 1 1
sincey
1 (1sin1) 1dP(x) 7P |X| u .
|xu|1
Theorem 28 (Levy continuity)Let (Xn) bea sequenceofr.v.onRk.Suppose thatfn(t) =Eei(t,Xn) f(t)
46
-
8/13/2019 mit notes on theory of probability
52/125
andf(t) iscontinuousat0alongeachaxis.Then there exists aprobabilitydistributionPsuch thatf(t) = ei(t,x)dP(x)
andL(Xn)P.Proof.ByLemma19weonlyneedtoshowthat{L(Xn)}isuniformlytight.Ifwedenote
Xn = (Xn,1, . . . , X n,k)thenthec.f.salongtheith coordinate:
fi(ti):=fn(0, . . . , ti,0, . . .0)=EeitiXn,i f(0, . . . , ti, . . .0)=:fi(ti).n Sincefn(0)=1and,therefore,f(0)=1,forany >0wecanfind >0suchthatforallik
|fi
(ti)1| if |ti| .
This impliesthatforlargeenoughn|fni(ti)1| 2 if |ti| .
UsingpreviousLemma, 1 7 7
P |Xn,i|> 0 1Refi(ti) dti 0 1fi(ti)dti 72.n n
Theunionboundimpliesthat
k
P |Xn|> 14k
and{L(Xn)}n1 isuniformlytight.CLTdescribeshowsumsofindependentr.v.sareapproximatedbynormaldistribution.Wewillnowgivea
nsimpleexampleofadifferentapproximation.ConsiderindependentBernoullirandomvariablesXin B(pi)n n nforin, i.e.P(Xn =1)=p andP(Xn =0)=1pi.Ifp =p >0thenbyCLTi i i i
Snnpnp(1p) N(0,1).
nHowever, ifp =pi 0 fast enough then, for example, the Lindeberg conditions will be violated. It iswell-knownthat ifpni
=pn andnpn thenSn hasapproximatelyPoissondistribution withp.f.
kf(k) = e
fork= 0,1,2, . . .k!
Here isaversionofthisresult.Theorem 29 Consider independentXi B(pi)forinand let
Sn =X1 +. . .+Xn and=p1 +. . .+pn.Thenforany subsetof integers BZ,
|P(Sn B)(B)| pi2.in
47
-
8/13/2019 mit notes on theory of probability
53/125
Proof. The proof is based on the construction on one probability space. Let us construct Bernoulli r.v.Xi B(pi)andPoissonr.v.Xi pi onthesameprobabilityspaceasfollows.Letusconsideraprobabilityspace([0,1],B, )withLebesquemeasure.Define
0, 0x1pi,Xi =Xi(x) = 1, 1pi < x1.Clearly,Xi B(pi).LetusconstructXi asfollows.Iffork0wedefine
(pi)l
epick =l!
0lkthen
Xi =Xi(x) =
0, 0xc0,1, c0 < xc1,2, c1 < xc2,. . .
Clearly,Xi pi.WhenXi = Xi?Since1pj epj =c0,thiscanonlyhappenfor1pi < xc0 and c1 < x1,
i.e.P(Xj =X) =epj (1pj)+(1epj pjepj) =pj(1epj)p2j j
Weconstructpairs(Xi, Xi)onseparatecoordinatesofaproductspace,thus,makingthemindependentfoIt is well-known thatin. inXi and,finally,weget
P(Sn =Sn) P(Xj =Xj) pj2.jn jn
Conditional expectation.Let(,B,P)beaprobabilityspaceandX : RbearandomvariablesuchthatE|X|
-
8/13/2019 mit notes on theory of probability
54/125
BydefinitionY =E(X|A).2.(Uniqueness)SupposethereexistsY =E(X|A)suchthatP(Y = Y)>0, i.e.
P(Y
> Y)
>
0or
P(Y
< Y
)
>
0.
SincebothY, Y aremeasurableonAthesetA={Y > Y} A.Oneonehand,E(Y Y)IA >0.Ontheotherhand,
E(Y Y)IA =EXIAEXIA = 0- acontradiction.
3.E(cX+Y|A) =cE(X|A) +E(Y|A).4.If-algebrasC A B then
E(E(X|A)|C) =E(X|C).ConsiderasetC C A.Then
EIC(E(E(X|A)|C))=EICE(X|A) =EICX and EIC(E(X|C))=EXIC.Weconcludebyuniqueness.
5.E(X|B) =X,E(X|{,}) =EX,E(X|A) =EX ifX isindependentofA.6.IfXZ thenE(X|A)E(Z|A)a.s.;proof issimilartoproofofuniqueness.7.(Monotoneconvergence)IfE|Xn|
-
8/13/2019 mit notes on theory of probability
55/125
WecanassumethatX, Y 0bydecomposingX =X+X, Y =Y+Y.Considerasequenceofsimplefunctions
Yn = wkICk, Ck AmeasurableonAsuchthat0Yn Y.Bymonotoneconvergencetheorem,it isenoughtoprovethat
E(XICk|A) = ICkE(X|A).TakeB A.SinceBCk A,
EIBICkE(X|A) =EIBCkE(X|A) =EXIBCk =E(XICk)IB.10.(Jensens inequality)Iff :R R isconvexthen
f(E(X|A))E(f(X|A)).Byconvexity,
f(X)
f(E(X|A))
f(E(X|A))(X
E(X|A)).
Takingconditionexpectationofbothsides,
E(f(X)|A)f(E(X|A))f(E(X|A))(E(X|A)E(X|A))=0.
50
-
8/13/2019 mit notes on theory of probability
56/125
Section 13Martingales. Doobs Decomposition.Uniform Integrability.Let(,B,P)beaprobabilityspaceandlet(T,)bealinearlyorderedset.Considerafamilyof-algebrasBt, tT suchthatfortu,Bt Bu B.Definition.Afamily(Xt,Bt)tT iscalledamartingaleif
1. Xt : Rismeasurablew.r.t.Bt;inotherwords,Xt isadaptedtoBt.2. E|Xt|
-
8/13/2019 mit notes on theory of probability
57/125
Thus,(Zn,Bn)n1 isaright-closedmartingale.
Lemma
28
Letf:R R
be
a
convex
function.
Suppose
that
either
one
of
two
conditions
holds:
1. (Xt,Bt) isamartingale,2. (Xt,Bt) isasubmartingaleand f is increasing.
Then (f(Xt),Bt) isasubmartingale.Proof.1.Fortu,byJensensinequality,
f(Xt) =f(E(Xu|Bt))E(f(Xu)|Bt).2.Fortu,sinceXt E(Xu|Bt)andf isincreasing,
f(Xt)f(E(Xu|Bt))E(f(Xu)|Bt),
wherethe laststepisagainJensensinequality.
Theorem 30 (Doobsdecomposition)If(Xn,Bn)n0 isasubmartingalethenitcanbeuniquelydecomposedXn =Zn +Yn,
where (Yn,Bn) isamartingale,Z0 = 0, Zn Zn+1 almost surely andZn isBn1-measurable.Proof.LetDn =XnXn1 and
Gn =E(Dn|B
n
1) =E(Xn
|Bn
1)
Xn
1
0
bythedefinitionofsubmartingale.Let,Hn =DnGn, Yn =H1 +. . .+Hn, Zn =G1 + +Gn.
SinceGn 0a.s.,Zn Zn+1 and,byconstruction,Zn isBn1-measurable.Wehave,E(Hn|Bn1) =E(Dn|Bn1)Gn = 0
and, therefore, E(Yn|Bn1) =Yn1. Uniqueness follows by construction. Suppose that Xn =Zn +Yn withallstatedproperties.First,sinceZ0 = 0, Y0 =X0.Byinduction,givenauniquedecompositionupton1,wecanwrite
Zn =E(Zn|Bn1) =E(XnYn|Bn1) =E(Xn|Bn1)Yn1andYn =XnZn.Definition.Wesaythat(Xn)n1 isuniformly integrableif
supE and sup I( > M) 0 asn |Xn|
-
8/13/2019 mit notes on theory of probability
58/125
Proof.1.IfXn =E(Y|Bn)then|Xn|=|E(Y|Bn)| E(|Y||Bn) and E|Xn| E|Y| M} Bn,XnI(|Xn|> M)=I(|Xn|> M)E(Y|Bn) =E(YI(|Xn|> M)|Bn)
and,therefore,E|Xn|I(|Xn|> M)E|Y|I(|Xn|> M) KP(|Xn|> M) +E|Y|I(|Y|> K)
KE|Xn|+EY I(Y > K)KE|Y|+EY I(Y > K).M | | | | M | | | |
LettingM , K provesthatsup E|Xn|I(|Xn|> M)0asM .n2.Since(Xn,Bn)n isasubmartingale, forY =X wehaveXn E(Y|Bn).Belowwewillusethe
followingobservation.Sinceafunctionmax(a, x) isconvexand increasinginx,byJensensinequalitymax(a, Xn)E(max(a, Y)|Bn). (13.0.1)
Since,max(Xn, a) |a|+XnI Xn >|a|
and{|Xn > a|}Bn wecanwrite| |Emax(Xn, a) +EXnI Xn > a a +EYI +E|Y||a| |a| |a| | | | |
IfwetakeM >|a|thenE|max(Xn, a)|I(|max(Xn, a)|> M) = EXnI(Xn > M)EYI(Xn > M)
KP(Xn > M) +E|Y I(Y > K)Emax(Xn,0)
| | | K M +E|Y|I(|Y|> K)
Emax(Y,0)by(13.0.1) K
M +E|Y|I(|Y|> K).LettingM andK finishestheproof.Uniformintegrabilityplaysanimportantrolewhenstudyingtheconvergenceofmartingales.Thefollowingstrengtheningofthedominatedconvergencetheoremwillbeuseful.Lemma 30 Considerr.v.s(Xn)andX suchthatE|Xn| ) + 2E|Xn|I(|Xn|> K) + 2E|X|I(|X|> K)+ 2KP( > )+2supEXnI( Xn > K) + 2EX|I(X > K). |XnX|
n | | | | | | |Lettingn andthen0, K provestheresult.
1= 2.ByChebyshevsinequality,1
P(|XnX|> )E|XnX| 0
53
-
8/13/2019 mit notes on theory of probability
59/125
as n so Xn X in probability. To prove uniform integrability let us first show that for any > 0thereexists >0suchthat
P(A)< =E|X|IA 0onecanfindasequenceofeventsA(n)suchthat
1P(A(n))
2n and E|X|IA(n) >.Since n1P(A(n))0,take asaboveandtakeM >0 largeenoughsothatforalln1P(Xn > M) E|Xn| M)
E
|Xn
X
|+E
|X
|I(
|Xn
|> M)
E
|Xn
X
|+.
Forlargeenoughnn0,E|XnX| and,therefore,E|Xn|I(|Xn|> M)2.
WecanalsochooseM largeenoughsothatE|Xn|I(|Xn|> M)2fornn0 andthisfinishestheproof.
54
-
8/13/2019 mit notes on theory of probability
60/125
Section 14Optional stopping. Inequalities formartingales.Considerasequenceof-algebras(Bn)n0 suchthatBn Bn+1.Integervaluedr.v. {1,2, . . .} iscalledastopping timeif{n} Bn.LetusdenotebyB a-algebraoftheeventsB suchthat
{n} B Bn, n1.If(Xn)isadaptedto(Bn)thenrandomvariablessuchasX ork=1Xk aremeasurableonB.Forexample,
{X A}= { =n} {Xn A}= {n} \ {n1} {Xn A} B.n1 n1
Theorem 31 (Optionalstopping)Let(Xn,Bn)beamartingaleand1, 2
-
8/13/2019 mit notes on theory of probability
61/125
Thesecondconditionin(14.0.1)isviolatedsinceP(2 =n) = 2n andE|Sn|I(n2) = 2P(2 =n)+(2n+12)P(n+ 12) = 20.
Proof of Theorem 31.ConsiderasetA B1.Wehave,EX2IAI(1 2) = EX2I A {1 =n} I(n2)
n1()= EXnI A {1 =n} I(n2) =EX1IAI(1 2).
n1Toprove(*) itisenoughtoprovethatforAn =A {1 =n} Bn,
EX2IAnI(n2) =EXnIAnI(n2). (14.0.2)Wecanwrite
EXnIAnI(n2) = EXnIAnI(2 =n) +EXnIAnI(n+ 12)= EX2IAnI(2 =n) +EXnIAnI(n+ 12)
since{n+ 12}={2 n}c Bn, by martingaleproperty= EX2IAnI(2 =n) +EXn+1IAnI(n+ 12)
byinduction = EX2IAnI(2 =k) +EXmIAnI(m2)nk
-
8/13/2019 mit notes on theory of probability
62/125
OntheeventA,X1 M and,therefore,EXnIA =EX2IA EX1IA MEIA =MP(A).
Ontheotherhand,EXnIA EX+ andthisfinishestheproof.nAs a corollary we obtain the second Kolmogorovs inequality. If (Xi) are independent and EXi = 0 thenSn = 1inXi isamartingaleandSn2 isasubmartingale.Therefore, 1 1
P max =P max S2 ES2 = Var(Xk).1kn|Sk| M 1kn k M2 M2 n M2
1knExercises. 1.ShowthatforanyrandomvariableY,E|Yp|= ptp1P(|Y| t)dt.0 2.LetX, Y betwonon-negativerandomvariablessuchthatforeveryt >0,P(Y t)t1 XI(Y t)dP.Foranyp >1,fp = ( |f|pdP)1/p and1/p+ 1/q= 1,showthatYp qXp.3.Givenanon-negativesubmartingale(Xn,Bn),letX :=maxjnXj andX :=maxj1Xj.Provethatfornanyp >1and1/p+ 1/q= 1,Xp qsupnXnp.Hint:useexercise2andDoobsmaximalinequality.Doobs upcrossing inequality. Let (Xn,Bn)n1 be a submartingale. Given two real numbers a < b wewilldefineasequenceofstoppingtimes(n)whenXn iscrossingadownwardandbupwardasinfigure14.1.Namely,wedefine
x
a
b
x x x x
xx
Figure14.1:Stoppingtimesof levelcrossings.1 =min{n1, Xn a}, 2 =min{n > 2 :Xn b}
and,byinduction,fork22k1 =min{n > 2k2, Xn a}, 2k =min{n >2k1, Xn b}.
Define(a,b,n)=max{k:2k n}
- thenumberofupwardcrossingsof [a, b]beforetimen.Theorem 33 (Doobsupcrossing inequality)Wehave,
E(a,b,n) E(Xbn
a
a)+. (14.0.4)
Proof.Sincex(xa)+ is increasingconvexfunction,Zn = (Xna)+ isalsoasubmartingale.Clearly,X(a,b,n) =Z(0, ba, n)
whichmeansthatitisenoughtoprove(14.0.4)fornonnegativesubmartingales.Fromnowonwecanassumethat0Xn andwewouldliketoshowthat
E(0, b , n) EXn.b
57
-
8/13/2019 mit notes on theory of probability
63/125
Letusdefineasequenceofr.v.s1, 2k1 < j2k forsomek=j 0, otherwise,
i.e.j istheindicatoroftheeventthatattimej theprocessiscrossing [0, b]upward.DefineX0 = 0.Thenn n
b(0, b , n) j(Xj Xj1) = I(j =1)(Xj Xj1).j=1 j=1
TheeventBj1Bj1 c
{j = 1}= {2k1 < j2k}= 2k1 j1 2k j1 Bj1k k
i.e.thefactthatattimejwearecrossingupwardisdeterminedcompletelybythesequenceuptotimej
1.Then
n nbE(0, b , n) EE I(j =1)(Xj Xj1)Bj1 = EI(j =1)E(Xj Xj1|Bj1)
j=1 j=1n n
= EI(j =1)(E(Xj|Bj1)Xj1) E(Xj Xj1) =EXn,j=1 j=1
where in the last inequality we used that (Xj,Bj) is a submartingale, E(Xj|Bj1) Xj1, which impliesthat
I(j =1)(E(Xj|Bj1)Xj1)E(Xj|Bj1)Xj1.Thisfinishestheproof.
58
-
8/13/2019 mit notes on theory of probability
64/125
Section 15Convergence of martingales.Fundamental Walds identity.Wefinallygettoourmainresultabouttheconvergenceofmartingalesandsubmartingales.Theorem 34 Let (Xn,Bn)
-
8/13/2019 mit notes on theory of probability
65/125
ByDoobs inequality,EY(a,b,n) E(Yna)+ = E(X1a)+
-
8/13/2019 mit notes on theory of probability
66/125
Corollary 2 Martingale(Xn,Bn) isright-closable iff it is uniformly integrable.Toprovethis,applycase3aboveto(Xn)and(Xn)whicharebothsubmartingales.Theorem 35 (Levysconvergence)Let(,B,P)beaprobabilityspaceandX beareal-valuedrandomvariableon it.Givenasequence of-algebras
B1 . . . Bn . . . B+ BwhereB+ = 1n
-
8/13/2019 mit notes on theory of probability
67/125
as M . Therefore, the limit Y = limYn exists and it is an easy exercise to show that Sn/bn 0(Kroneckerslemma).
3.(Polyaurnscheme)LetusrecallthePolyaurnschemefromSection5.Letusconsiderasequence#(blueballsafterniterations)Yn = .
#(totalaftern iterations)Yn isamartingalebecausegiventhatatstepnthenumbersofblueandredballsarebandr,theexpectednumberofballsatstepn+1willbe
b b+c r b bE(Yn+1|Bn) = + = =Yn.
b+r b+r+c b+r b+r+c b+rSince Yn is bounded, by martingale convergence theorem, the limit Y = limnYn exists. What is thedistributionofY?Letusconsiderasequence
1 blueatstepiXi = 0 redatstepi
and letSn = inXi.Clearly,b+Snc Sn
Yn =b+r+nc n
as n and, therefore, Sn/n Y. The sequence (Xn) is exchangeable and by de Finettis theorem inSection5weshowedthat 1
P(Sn =k) = n xk(1x)nkd b ,r (x).k 0 c c
ForanyfunctionuC([0,1]),n
1Eu = u xk(1x)nkd , (x) = Bn(x)d , (x),
Sn
k n
b r
1
b r
n n k 0 c c 0 c c
k=0
whereBn(x) istheBernsteinpolynomialthatapproximatesu(x)uniformlyon [0,1].Therefore, 1 Eu S
nn 0 u(x)d cb
,rc (x)
whichmeansthat ,L S
nn cb r
c =L(Y),i.e.the limitY hasBetadistribution cb,rc .Optional stopping for martingales revisited. Let be a stopping time. We would like to determinewhenEX =EX1.Aswesawaboveinthecaseoftwostoppingtimes,somekindofintegrabilityassumptionsare
necessary.
In
this
simpler
case,
the
necessary
conditions
are
clear
from
the
proof.
Lemma 31 Wehave
EX1 = lim EXI(n) lim EXnI(n) = 0.n n
Proof.Wecanwrite,EXI(n) = EXkI( =k) = EXkI(k)EXkI(k+1)
1kn 1kn since{k+ 1}={k}c Bk = EXkI(k)EXk+1I(k+1)
1kn= EX1EXn+1I(n+1).
62
-
8/13/2019 mit notes on theory of probability
68/125
Example.Given0< p
-
8/13/2019 mit notes on theory of probability
69/125
bysymmetry.Therefore,1 1
Ech() = and Ee =ch(z) ch(ch1(e
)z)
byachangeofvariablese = 1/ch.Formoregeneralstoppingtimesthecondition(15.0.1)mightnotbeeasytocheck.WewillnowshowanotherapproachthatishelpfultoverifyafundamentalWaldsidentity.IfPisthedistributionofXis,letP beadistributionwithRadon-Nikodymderivativew.r.t.Pgivenby
dP ex= .
dP ()This is,indeed,adensitysince
ex ()dP= = 1.
R (x) ()Wewillthinkof(Xn)asdefinedontheproductspace(R,B,P).Forexample,aset
{ =n} (X1, . . . , X n) BnisaBorelsetonRn.Wecanwrite,
S Sn (x1+ +xn) e e e E
()I(
-
8/13/2019 mit notes on theory of probability
70/125
Section 16Convergence on metric spaces.Portmanteau Theorem. LipschitzFunctions.
Let (S, d) be a metric space andB - a Borel -algebra generated by open sets. Let us recall that Pn PweaklyonB if
f dPn f dPforallfCb(S)- real-valuedboundedcontinuousfunctionsonS.
ForasetAS,wedenotebyAtheclosureofA, intA- interiorofAandA=A\intA- boundaryofA.Aiscalledacontinuity setofPifP(A) = 0.Theorem 36 (Portmanteau theorem) Thefollowingareequivalent.
1. Pn Pweakly.2. Forany openset US, lim infnPn(U)P(U).3. Forany closedsetF S, limsup Pn(F)P(F).n4. Forany continuitysetA ofP, limnPn(A) =P(A).
Proof.1= 2.LetU beanopensetandF =Uc.ConsiderasequenceoffunctionsinCb(S)
fm(s)=min(1, md(s, F))suchthatfm(s) IU(s).(This isnotnecessarilytrue ifU isnotopen.)SincePn P,
Pn(U) fmdPn fmdP asn and liminfPn(U) fmdP.n
Lettingm ,bymonotoneconvergencetheorem.liminfPn(U) IUdP=P(U).
n3.Bytakingcomplements.
65
2
-
8/13/2019 mit notes on theory of probability
71/125
2,3=4.SinceintAisopenandAisclosedandintAA,by2and3,P(intA)liminfPn(intA)limsupPn(A)P(A).
n nIfP(A)=0thenP(A)=P(intA) =P(A)and,therefore, limPn(A) =P(A).
4=1. Consider f Cb(S) and let Fy ={s S : f(s) =y} be a level set of f. There exist at mostcountablymanyysuchthatP(Fy)>0.Therefore,forany >0wecanfindasequencea1 . . .aN suchthat
max(ak+1ak), P(Fak)=0 forallkandtherangeoff isinsidetheinterval(a1,aN).Let
Bk ={sS :ak f(s)< ak+1} and f(s) = akI(sBk).Sincef iscontinuous,Bk Fak Fak+1 andP(Bk) = 0.By4,
fdPn = akPn(Bk) akP(Bk) = fdP.k k
Since,byconstruction,|f(s)f(s)| ,letting0provesthat f dPn f dP.Lipschitz functions.Forafunctionf :S R,letusdefineaLipschitzsemi-normby
||f||L =sup|f(x)f(y)|.x=y d(x, y)
Clearly, ||f||L =0ifff isconstantso||f||L isnotanorm.LetusdefineaboundedLipschitznormby||f||BL =||f||L +||f||,
where ||f|| =supsS|f(s)|.LetBL(S, d) = f :SR : ||f||BL
-
8/13/2019 mit notes on theory of probability
72/125
Proof.Proofof 1.It isenoughtoconsiderk= 2.Forspecificity,take=.Givenx, yS,supposethatf1f2(x)f1f2(y) =f1(y).
Thenf1(x)f1(y), iff1(x)f2(x)|f1f2(y)f1f2(x)|=f1f2(x)f1f2(y) f2(x)f2(y), otherwise
||f1||L||f2||Ld(x, y).Thisfinishestheproofof1.
Proofof 2.Firstofall,obviously,max||f1 fk||
1ik||fi||.Therefore,using1,
i i i ||fi||BL.||f1 fk||BL max||fi|| +max||fi||L 2max
Theorem 37 (Extension theorem) Given a set AS and a bounded Lipschitzfunction f BL(A, d) onA, thereexistsanextensionhBL(S, d) such that
f =h on A and ||h||BL =||f||BL.Proof. Let us first find an extension such that ||h||L = ||f||L. We will start by extending f to one pointxS\A.Thevaluey=h(x)mustsatisfy
|yf(s)| fLd(x, s) forall sAor,equivalently,
inf(f(s) +||
f
||Ld(x, s))
y
sup(f(s)
||
f
||Ld(x, s)).
sA sASuchy existsiffforalls1, s2 A,
f(s1) +||f||Ld(x, s1)f(s2)||f||Ld(x, s2).This inequalityissatisfiedbecausebytriangle inequality
f(s2)f(s1)||f||Ld(s1, s2)||f||L(d(s1, x) +d(s2, x)).ItremainstoapplyZornslemmatoshowthatf canbeextendedtotheentireS.Defineorderbyinclusion:
f1 f2 iff1 isdefinedonA1,f2 - onA2, A1 A2, f1 =f2 onA1 andf1L =f2L.
Foranychain{f}, f = f f.ByZornslemmathereexistsamaximalelementh.ItisdefinedontheentireS because,otherwise,wecouldextendtoonemorepoint.ToextendpreservingBLnormtake
h = (h||f||)(||f||).Bypart1ofprevious lemma, itiseasytoseethat ||h||BL =||f||BL.
Stone-Weierstrass Theorem.A set A S is totally bounded if for any > 0 there exists a finite -cover of A, i.e. a set of points
a1, . . . , aN suchthat A B(ai, ),
iNwhereB(a, ) ={yS :d(a, y)} isaballofradiuscenteredata.Letusrecallthefollowingtheoremfromanalysis.
67
-
8/13/2019 mit notes on theory of probability
73/125
Theorem 38 (Arzela-Ascoli) Let (S, d) be a compact metric space and let (C(S), d ) be the space of continuousreal-valuedfunctionsonS withuniform convergencemetric
d (f, g)=sup . xS|
f(x)
g(x)|
AsubsetF C(S) is totallybounded ind metric iffF is equicontinuousand uniformly bounded.Remark. Equicontinuous means that for any > there exists > 0 such that if d(x, y) then for allf F, |f(x)f(y)| .Theorem 39 (Stone-Weierstrass)Let(S, d)beacompactmetricspaceandF C(S) issuch that
1. F isalgebra, i.e.forallf, g F, cR, we havecf+g F, f g F.2. F separatespoints, i.e. if x= yS then thereexistsf F such thatf(x) = f(y).3.
F containsconstants.
ThenF is dense inC(S).Corollary 3 If(S, d) isacompactspace then BL(S, d) isdense inC(S).Proof. ForF =BL(S, d) in the Stone-Weierstrass theorem, 3 is obvious, 1 follows from Lemma 32 and 2followsfromtheextensionTheorem37,sinceafunctiondefinedontwopointsx= y suchthatf(x) = f(y)canbeextendedtotheentireS.Proof of Theorem 39. Consider bounded f F, i.e. |f(x)| M. A function x |x| defined on theinterval [M, M] can be uniformly approximated by polynomials of x by the Weierstrass theorem on thereallineor,forexample,usingBernsteinspolynomials.Therefore,|f(x)|canbeuniformlyapproximatedbypolynomialsoff(x),andbyproperties1and3,byfunctionsin
F.Therefore,if
F istheclosureof
F ind
normthenforanyf F itsabsolutevalue |f| F.Therefore,foranyf, g F wehavemin(f, g)= 1
2(f+g)1
2|fg| F, max(f, g)=1
2(f+g) +1
2|fg| F. (16.0.1)Givenanypointsx= yandc, dRonecanalwaysfindf F suchthatf(x) =candf(y) =d.Indeed,byproperty2wecanfindg F suchthatg(x) = g(y)and,asaresult,asystemofequations
ag(x) +b=c, ag(y) +b=dhasasolutiona,b.Thenthefunctionf =ag+bsatisfiestheaboveandit isinF by1.
TakehC(S)andfixx.Foranyy letfy F besuchthatfy(x) =h(x), fy(y) =h(y).
Bycontinuityoffy,foranyyS thereexistsanopenneighborhoodUy ofy suchthatfy(s)h(s)forsUy.
Since (Uy) is an open cover of the compact S, there exists a finite subcover Uy1, . . . , U yN. Let us define afunction
fx(s)=max(fy1(s), . . . , f yN(s)) F by(16.0.1).Byconstruction, ithasthefollowingproperties:
fx(x) =h(x), fx(s)h(s) forall sS.
68
-
8/13/2019 mit notes on theory of probability
74/125
Again,bycontinuityoffx(s)thereexistsanopenneighborhoodUx ofxsuchthatfx(s)h(s) + for sUx.
TakeafinitesubcoverUx1, . . . , U xM anddefineh(s)=minfx1(s), . . . , f xM(s) F by(16.0.1).
Byconstruction,h(s)h(s) +andh(s)h(s)forallsS whichmeansthatd (h, h).Sinceh F,thisprovesthatF isdense inC(S).
Corollary 4 If(S, d) isacompactspace then C(S) isseparable ind.Remark.Recallthatthis fact wasused in the proof oftheSelectionTheorem, which was proved for
generalmetricspaces.Proof.Bytheabovetheorem,BL(S, d)isdenseinC(S).Foranyintegern
1,theset
{f :
||f||
BL
n}isuniformlyboundedandequicontinuous.BytheArzela-Ascolitheorem,itistotallyboundedand,therefore,
separablewhichcanbeseenbytakingfinite1/m-coversforallm1.Theunion{||f||BL n}=BL(S, d)
isthereforeseparableinC(S)which is,asaresult,alsoseparable.
69
-
8/13/2019 mit notes on theory of probability
75/125
Section 17Metrics for convergence of laws.Empirical measures.Levy-Prohorov metric.Considerametricspace(S, d).ForasetAS letusdenoteby
A ={yS:d(x, y)< forsomexA}its-neighborhood.LetBbeaBorel-algebraonS.
Definition.IfP,QareprobabilitydistributionsonB then(P,Q)=inf{ >0 :P(A)Q(A) +forallAB}
iscalledtheLevy-Prohorov distancebetweenPandQ.Lemma 34 is ametricon theset ofprobability lawson
B.
Proof. 1. First, let us show that (Q,P) = (P,Q). Suppose that (P,Q) > . Then there exists a set AsuchthatP(A)>Q(A) +.Takingcomplementsgives
Q(Ac)>P(Ac) +P(Ac) +,wherethe lastinequalityfollowsfromthefactthatAc Ac :
aAc =d(a, Ac)< = d(a, b)< forsomebAcsinceb /A, d(b, A)
= d(a, A)>0 =a /A=aAc.Therefore, for a set B = Ac, Q(B) > P(B) +. This means that (Q,P) > and, therefore, (Q,P)(P,Q).Bysymmetry,(Q,P)(P,Q)and(Q,P) =(P,Q).
2.Next, letusshowthatif(P,Q)=0thenP=Q.ForanysetF andanyn1,1 1
P(F)Q(Fn) + .n
IfF isclosedthenF1 F asn andbycontinuityofmeasuren1nP(F)Q F =Q(F).
Similarly,P(F)Q(F)and,therefore,P(F) =Q(F).70
-
8/13/2019 mit notes on theory of probability
76/125
3.Finally,letusprovethetriangleinequality(P,R)(P,Q) +(Q,R).
If(P,Q)< xand(Q,R)< y thenforanysetA,P(A)Q(Ax) +xR (Ax)y +y+xR Ax+y +x+y,
whichmeansthat(P,R)x+y.Bounded Lipschitz metric. Given probability distributions P,Q on the metric space (S, d) we define aboundedLipschitzdistancebetweenthemby
(P,Q)=sup f dP f dQ:||f||BL 1 .Lemma 35 isametric on the setofprobability lawsonB.Proof. (P,Q) = (Q,P) and the triangle inequality are obvious. It remains to prove that (P,Q) = 0implies P = Q. Given a closed set F, the sequence of functions fm(x) = md(x, F)1 converges fm IU, whereU =Fc.Obviously, ||fm||BL m+1and,therefore, fmdP= fmdQ.Lettingm provesthatP(U) =Q(U).ThelawPon(S, d) istightifforany >0thereexistsacompactKS suchthatP(S\K).Theorem 40 (Ulam)If(S, d) isseparablethenforany lawPonB thereexistsaclosedtotallyboundedsetKS such that P(S\K). If (S, d) is complete and separable then K is compact and, therefore, everylaw is tight.
1Proof.Consider asequence{x1, x2, . . .}that is dense in S. For any m1, S = B xi, ,where Bi=1 m
denotesaclosedball,andbycontinuityofmeasure,for largeenoughn(m),n(m) 1
P S\ B xi,m 2m.
i=1Ifwetake
n(m) 1K= B xi,
mm1 i=1
then P(S\K)
2m =.m1
K isclosedandtotallyboundedbyconstruction.IfS iscomplete,K iscompact.Theorem 41 Suppose thateither (S, d) isseparableorP is tight.Then thefollowingareequivalent.
1. Pn P.2. ForallfBL(S, d), f dPn f dP.3. (Pn,P) 0.4. (Pn,P) 0.
71
-
8/13/2019 mit notes on theory of probability
77/125
Proof.1= 2.Obvious.3= 4.Infact,wewillprovethat
(Pn
,P)
2 (Pn
,P). (17.0.1)GivenaBorelsetAS,considerafunction 1
f(x) = 0 1
d(x, A) suchthat IA fIA.Obviously, ||f||BL 1 +1 andwecanwrite
Pn(A) f dPn = f dP+ fdPn f dP P(A)+(1+1)sup f dPn f dP:||f||BL 1= P(A)+(1+1)(Pn,P)P(A) +,
where =max(,(1+1)(Pn,P)). This implies that (Pn,P). Since is arbitrary we can minimize=()over.Ifwetake= then=max(, +) =+ and
1 =2 ; 1 =12 .4= 1.Supposethat(Pn,P) 0whichmeansthatthereexistsasequencen 0suchthat
Pn(A)P(An) +n forallmeasurableAS.IfAisclosed,then n1An =Aand,bycontinuityofmeasure,
lim supPn(A)limsup P(An) +n =P(A).n n
Bytheportmanteautheorem,Pn P.2=3. If P is tight, let K be a compact such that P(S\K) . If (S, d) is separable, by Ulams
theorem,letK beaclosedtotallyboundedsetsuchthatP(S\K).Ifweconsiderafunction 1 1f(x) = 0 1 d(x, K) with ||f||BL 1 +
then
Pn(K) f dPn fdPP(K)1,
whichimpliesthatfornlargeenough,Pn(K)12.ThismeansthatallPn areessentiallyconcentratedonK.Let
B= f :||f||BL(S,d) 1 , BK = f :fB C(K),K
wherefK denotestherestrictionofftoK.IfKiscompactthen,bytheArzela-Ascolitheorem,BK istotallyboundedwithrespecttod.IfK istotallyboundedthenwecanisometricallyidentifyfunctionsinBK withtheir uniqueextensions to thecompletionK of K and, by theArzela-Ascolitheorem forthe compactK,BK isagaintotallyboundedwithrespecttod.Inanycase,given >0,wecanfindf1, . . . , f k B suchthatforallfB
sup f(x)fj(x) forsomejk.xK| |
This uniform approximation can also be extended to K. Namely, for any x K take y K such thatd(x, y).Then
|f(x)fj(x)| |f(x)f(y)|+|f(y)fj(y)|+|fj(y)fj(x)| ||f||Ld(x, y) ++||fj||Ld(x, y)3.
72
-
8/13/2019 mit notes on theory of probability
78/125
Therefore,foranyfB,+||f|| Pn(Kc) +P(Kc)f dPn f dP f dPn f dP
K
K
+ 2+f dPn f dPK
K
+ 3+ 3+ 2+fjdPn fjdPK
K
+ 3+ 3+ 3+ 2+fjdPnfjdPn fjdP
fjdP+12.max
1jkFinally,
+12(Pn,P)=sup f dPn f dP fjdPn fjdPmax1jkf
B
and,usingassumption2,limsupn(Pn,P)12.Letting0finishestheproof.Convergence of empirical measures. Let (,P) be a probability space and X1, X2, . . . : S be ani.i.d.sequenceofrandomvariableswithvalues inametricspace(S, d).LetbethelawofXi on
S.Letus
definetherandomempiricalmeasuresn ontheBorel-algebraB onS byn
ni=1
1(A)() = I(Xi()A), A B.n
Bythestrong lawoflargenumbers,foranyfCb(S),n
n i=11
f(Xi) Ef(X1) =f dn = f da.s.
However,thesetofmeasurezerowherethisconvergenceisviolateddependsonf anditisnotobviousthattheconvergenceholdsforallfCb(S)withprobabilityone.Theorem 42 (Varadarajan) Let (S, d) be a separable metric space. Then n converges to weakly almostsurely,
P :n( )() weakly = 1. Proof.Since(S, d)isseparable,byTheorem2.8.2inR.A.P.,thereexistsametriceonS suchthat(S, e)istotallyboundedandeandddefinethesametopology, i.e.e(sn, s) 0 ifandonly ifd(sn, s) 0.This,of course, means that Cb(S, d) =Cb(S, e) and weak convergence of measures does not change. If (T, e) is thecompletionof(S, e)then(T, e)iscompact.BytheArzela-Ascolitheorem,BL(T, e)isseparablewithrespecttothed normand,therefore,BL(S, e)isalsoseparable.Let(fm)beadensesubsetofBL(S, e).Then,bythestrong lawof largenumber,
n
fmdn = fm1 (Xi) Efm(X1) = fmda.s.n
i=1Therefore,onthesetofprobabilityone,thesamesetofprobabilityone, fmf dn
dn fm dforallm1.Since(fm)isdenseinBL(S, e),onf dforallfBL(S, e).Since(S, e)isseparable,theprevious
theorem impliesthatn weakly.
73
-
8/13/2019 mit notes on theory of probability
79/125
Section 18Convergence and uniform tightness.Inthissection,wewillmakeseveralconnectionsbetweenconvergenceofmeasuresanduniformtightnessongeneralmetricspaces,whicharesimilartotheresults intheEuclideansetting.First,wewillshowthat, insomesense,uniformtightnessisnecessaryforconvergenceoflaws.Theorem 43 IfPn P0 onS andeachPn is tightforn0, then(Pn)n0 isuniformly tight.Proof. Since Pn P0 and P0 is tight, by Theorem 41, the Levy-Prohorov metric (Pn,P0) 0. Given >0, letustakeacompactK suchthatP0(K)>1.Bydefinitionof,
n1 0 :Pn(K)>1 0.
By regularity of measure Pn, any measurable set A can be approximated by its closed subset F. Since Pnistight,wecanchooseacompactofmeasureclosetoone,and intersecting itwith theclosedsubsetF,wecanapproximateanysetAbyitscompactsubset.Therefore,thereexistsacompactKn K2a(n) suchthatPn(Kn)>1.Let
L=K (n1Kn).Then Pn(L) Pn(Kn) > 1. It remains to show that L is compact. Consider a sequence (xn) on L.There are two possibilities. First, if there exists an infinite subsequence (xn(k)) that belongs to one of thecompactsKj thenithasaconvergingsubsubsequenceinKj andasaresultinL.Ifnot,thenthereexistsasubsequence(xn(k))suchthatxn(k) Km(k) andm(k) ask .Since
Km(k) K2a(m(k))thereexistsyk K suchthat
d(xn(k), yk)2a(m(k)).Since K iscompact,the sequenceyk K has a convergingsubsequence yk(r) yK which impliesthatd(xn(k(r)), y)0, i.e.xn(k(r)) yL.Therefore,L iscompact.WealreadyknowfromtheSelectionTheoreminSection8thatanyuniformlytightsequenceoflawsonanymetricspacehasaconvergingsubsequence.Underadditionalassumptionson(S, d)wecancomplementtheSelectionTheoremandmakesomeconnectionstothemetricsdefinedintheprevioussection.Theorem 44 Let (S, d) be a complete separable metric space and A be a subset of probability laws on S.Then thefollowingare equivalent.
74
-
8/13/2019 mit notes on theory of probability
80/125
1. A isuniformly tight.2. Forany sequencePn A there exists a converging subsequencePn(k) P whereP isa lawonS.3. Ahasthecompactclosureonthespaceofprobability lawsequippedwiththeLevy-ProhorovorboundedLipschitzmetricsor.4. A is totallyboundedwithrespect to or.
Remark.Implications1= 2= 3= 4holdwithoutcompletenessassumptionandtheonlyimplication wherecompletenesswillbeusedis4= 1.
Proof. 1=2. Any sequence Pn A is uniformly tight and, by selection theorem, there exists aconvergingsubsequence.
2= 3. Since (S, d) is separable, by Theorem 41, Pn P if and only if (Pn,P) or (Pn,P) 0. Everysequence intheclosureAcanbeapproximatedbyasequence inA.Thatsequencehasaconvergingsubsequencethat,obviously,convergestoanelementinAwhichmeansthattheclosureofAiscompact.
3= 4.Compactsetsaretotallyboundedand,therefore,iftheclosureAiscompact,thesetAistotallybounded.
4=1. Since 2, we will only deal with . For any > 0, there exists a finite subset B AsuchthatAB.Since(S, d)iscompleteandseparable,byUlamstheorem,foreachPB thereexistsacompactKP suchthatP(KP)>1.Therefore,
KB = KP isacompactandP(KB)>1forallPB.PB
Forany >0,letF beafinitesetsuchthatKB F (herewewilldenotebyF theclosed-neighborhoodofF).SinceAB,foranyQAthereexistsPB suchthat(Q,P)< and,therefore,
1P(KB)P(F)Q(F2) +.Thus,1
2
Q(F2)forallQ
A.Given >0,takem =/2m+1 andfindFm asabove,i.e.
F/2m .1
2m Q m /2m /2mThen Q m1Fm 1 m1 = 1. Finally, L= m1Fm is compact because it is closed2mandtotallyboundedbyconstruction,andS iscomplete.
Corollary 5 (Prohorov) The set of laws on a complete separable metric space is complete with respect tometricsor.Proof.Ifasequenceof laws isCauchyw.r.t.or then it istotallybounded andbyprevioustheorem ithasaconvergingsubsequence.Obviously,Cauchysequencewillconvergetothesamelimit.Finally,letusstateasaresultthe ideawhichappearedinLemma19 inSection9.Lemma 36 Suppose that (Pn) is uniformly tight on a metric space (S, d).Suppose that all converging subsequences (Pn(k)) converge to the same limit, i.e. if Pn(k) P0 then P0 is independent of (n(k)). ThenPn P0.Proof.Anysubsequence(Pn(k)) is uniformlytightand, by theselectiontheorem, it has a convergingsub-subsequence(Pn(k(r)))whichhastoconvergetoP0.Lemma13 inSection8finishestheproof.This willbeveryuseful whenprovingconvergenceof lawsonmetricspaces,suchasC([0,1]), for example.If we can prove that (Pn) is uniformly tight and, assuming that a subsequence converges, can identify theunique limit,thenthesequencePn mustconvergetothesame limit.
75
-
8/13/2019 mit notes on theory of probability
81/125
Section 19Strassens Theorem. Relationshipsbetween metrics.Metric for convergence in probability.Let(,B,P)beaprobabilityspace,(S, d)- ametricspaceandX, Y : S - randomvariableswithvaluesinS.Thequantity
(X, Y)=inf{0 :P(d(X, Y)> )}is called the Ky Fan metric on the setL0(, S) of classes of equivalences of such random variables, wheretwor.v.sareequivalent iftheyareequala.s.Ifwetakeasequence
k =(X, Y)thenP(d(X, Y)> k)k andsince
I(d(X, Y)> k) I(d(X, Y)> ),
bymonotoneconvergencetheorem,P(d(X, Y)> ).Thus,the infimum inthe definition of(X, Y) isattained.Lemma 37 isametric onL0(, S)whichmetrizesconvergence inprobability.Proof.Firstofall,clearly,(X, Y) = 0 iffX=Y almostsurely.Toprovethetriangleinequality,
P(d(X, Z)> (X, Y) +(Y, Z)) P(d(X, Y)> (X, Y))+P(d(Y, Z)> (Y, Z)) (Y, Z) +(Y, Z)
sothat(X, Z)(X, Y) +(Y, Z).Thisprovesthat isametric.Next, ifn =(Xn, X)0thenforany >0andlargeenoughnsuchthatn )P(d(Xn, X)> n)n 0.