mit notes on theory of probability

8/13/2019 mit notes on theory of probability

1/125

MIT OpenCourseWarehttp://ocw.mit.edu

18.175 Theory of Probability

Fall 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
http://ocw.mit.edu/http://ocw.mit.edu/termshttp://ocw.mit.edu/termshttp://ocw.mit.edu/


2/125

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

ContentsProbability Spaces, Properties of Probability. 1Random variables and their properties. Expectation. 4Kolmogorovs Theorem about consistent distributions. 10Laws of Large Numbers. 12Bernstein Polynomials. Hausdorff and de Finetti theorems. 160 - 1 Laws. Convergence of random series. 21Stopping times, Walds identity. Another proof of SLLN. 26Convergence of Laws. Selection Theorem. 29Characteristic Functions. Central Limit Theorem on R. 34Multivariate normal distributions and CLT. 38Lindebergs CLT. Levys Equivalence Theorem. Three Series Theorem. 42Levys Continuity Theorem. Poisson Approximation. Conditional Expectation. 46Martingales. Doobs Decomposition. Uniform Integrability. 51Optional stopping. Inequalities for martingales. 55Convergence of martingales. Fundamental Walds identity. 59Convergence on metric spaces. Portmanteau Theorem. Lipschitz Functions. 65Metrics for convergence of laws. Empirical measures. 70Convergence and uniform tightness. 74Strassens Theorem. Relationships between metrics. 76Kantorovich-Rubinstein Theorem. 82Prekopa-Leindler inequality, entropy and concentration. 88

1


3/125

22Stochastic Processes. Brownian Motion. 9623Donsker Invariance Principle. 10024Empirical process and Kolmogorovs chaining. 10325Markov property of Brownian motion. Reflection principles. 10926Laws of Brownian motion at stopping times. Skorohods imbedding. 114

2


4/125

List of Figures2.1 A random variable defined by quantile transformation. . . . . . . . . . . . . . . . . . . . . . . 52.2 (X)generatedbyX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Pairwise independent but not independent r.v.s. . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5.1

Polya urn model.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

7.1 A sequence of stopping times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.1 Approximating indicator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2914.1 Stopping times of level crossings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5725.1 Reflecting the Brownian motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

3


5/125

List of Tables

4


6/125

Section 1Probability Spaces, Properties ofProbability.Apair(,A) isameasurable space ifA isa-algebraofsubsetsof.AcollectionAofsubsetsof isanalgebra(ring) if:

1. A.2. C, BA=CB, CBA.3. BA=\BA. 4. Aisa-algebra, ifinaddition,Ci A,i1 = Ci A.

i1

(,A,P)isaprobabilityspace ifP isaprobabilitymeasureonA,i.e.1. P()=1.2. P(A)0, A A.

3. Piscountablyadditive:Ai A,i1, AiAj = i= j=P Ai = P(Ai).

i=1 i=1AnequivalentformulationofProperty3is:

3.P isafinitelyadditivemeasureandBn

Bn+1, Bn =B=

P(B)=limP(Bn).

nn1

Lemma 1 Properties 3and3 areequivalent.Proof.

1


7/125

3 =3 : LetCn =Bn\Bn+1,thenBn =B knCk - alldisjoint.

By3,P(Bn) =P(B) + P(Ck)

P(B)whenn

.kn

3 =3 : Ai =A1A2 An Ai .i1 in

P Ai =P(A1) + +P(An) +P Bn whereBn = Ai. i1 in

SinceBn Bn+1 wehaveP(Bn)P n1Bn =P()=0becauseAisaredisjoint.Given algebra A, letA=(A) be a -algebra generated by A, i.e. intersection of all -algebras that

contain A. It is easy to see that intersection of all such -algebras is itself a -algebra. Indeed, consider asequenceAi fori1suchthateachAi belongstoall-algebrasthatcontainsA.Then Ai belongstoallthese-algebrasandthereforetotheir intersection. i1

Letusrecallanimportantresultfrommeasuretheory.Theorem 1 (Caratheodoryextension)IfAisanalgebraofsetsand:A R isanon-negativecountablyadditivefunction on A, then can be extended to a measure on -algebra(A). If is -finite, then thisextension isunique.(-finitemeans that = Aifor disjointsequenceAi and (Ai)


8/125

ConsiderD1, . . . , Dn D.IfasequenceCij Aforj1approximatesDi,P(CijDi)0, j

then by properties 1 - 3, Cn := Cij approximates Dn := Di, which means that Dn D . LetD=i1Di.Then j in in

P(D) =P(Dn) +P(D\Dn)andobviouslyP(D\Dn)0asn .Therefore,D D andD isa-algebra.

3


9/125

Section 2Random variables and theirproperties. Expectation.Let (,A,P)be aprobabilityspace and(S,B) be a measurable spacewhereB isa-algebraof subsets ofS. ArandomvariableX : S isameasurablefunction,i.e.

B B=X1(B) A. WhenS =Rwewillusuallyconsidera-algebraB ofBorelmeasurablesetsgeneratedbysets (ai, bi](or,equivalently,generatedbysets(ai, bi)orbyopensets). inLemma 3 X : R is arandom variable ifffor alltR

{Xt}:={ :X()(, t]} A.Proof.Only directionrequiresproof.Wewillprovethat

D={DR:X1(D)A}isa-algebra.Sincesets(, t] D thiswill implythatB D.Theresultfollowssimplybecausetakingpre-imagepreservessetoperations.Forexample, ifweconsiderasequenceDi D fori1then

X1 Di = X1(Di) Ai1 i1

because X1(Di) A and A is a -algebra. Therefore, i1Di D. Other properties can be checkedsimilarly,soD isa-algebra.

LetusdefineameasurePX onB byPX =PX1, i.e.forB B,PX(B) =P(XB) =P(X1(B))=PX1(B).

(S,B,PX)iscalledthesamplespaceofarandomvariableX andPX iscalledthe lawofX.Clearly,onthisspacearandomvariable:S S definedbytheidentity(s) =shasthesamelawasX.

WhenS=R,afunctionF(t) =P(Xt)iscalledthecumulativedistributionfunction(c.d.f.)ofX.Lemma 4 F isac.d.f.of somer.v.X iff

1. 0F(t)1,2. F isnon-decreasing,right-continuous,

4


10/125

3. limtF(t) = 0, limt +F(t) = 1.Proof. The fact that any c.d.f. satisfies properties 1 - 3 is obvious. Let us show that F which satisfies

properties 1 - 3 is a c.d.f. of some r.v. X. Consider algebra A consisting of sets (ai, bi] for disjointintervalsandforalln1.LetusdefineafunctionPonAby in P (ai, bi] = F(ai)F(bi) .

in inOne can show that P is countably additive on A. Then, by Caratheodory extension Theorem 1, P extendsuniquely to a measure P on (A) =B - Borel measurable sets. This means that (R,B,P) is a probabilityspaceand,clearly,randomvariable X :R R definedbyX(x) =xhasc.d.f.P(Xt) =F(t). Belowwewillsometimesabusethenotationsand letF denotebothc.d.f.andprobabilitymeasureP.Alternativeproof.Consideraprobabilityspace([0,1],B, ),whereistheLebesguemeasure.Definer.v.X :[0,1] Rbythequantiletransformation

X(t)=inf{

x

R, F(x)

t}

.Thec.d.f.ofX is(t:X(t)a) =F(a)since

X(t)ainf{x:F(x)t} aan a, F(an)tF(a)t.

0 1

Figure2.1:Arandomvariabledefinedbyquantiletransformation.

Definition.Givenaprobabilityspace(,A,P)andar.v.X : S let(X)bea-algebrageneratedbyacollectionofsets{X1(B) :BB}.Clearly,(X) A.Moreover,theabovecollectionofsetsisitselfa-algebra.Indeed,considerasequenceAi =X1(Bi)forsomeBi B.Then

Ai = X1(Bi) =X1 Bi =X1(B)i1 i1 i1

whereB i1Bi B. (X) iscalledthe-algebragenerated byar.v.X.

0

X

1

11/2

Figure2.2:(X)generatedbyX.Example.Considerar.v.definedinfigure2.2.WehaveP(X =0)= 12,P(X=1)= 12 and 1 1

(X) = , 0,2 , 2,1 ,[0,1] .

5


11/125

Lemma 5 Consideraprobabilityspace(,A,P),ameasurablespace(S,B)andrandomvariablesX : SandY : R.Then thefollowing areequivalent:

1. Y =g(X)for some(Borel) measurablefunction g:S

R.2. Y : R ismeasurable on (, (X)), i.e.withrespect to the -algebragenerated byX.

Remark.ItshouldbeobviousfromtheproofthatRcanbereplacedbyanyseparablemetricspace.Proof. The fact that 1 implies 2 is obvious since for any Borel set BR the set B :=g1(B) B

and,therefore,{Y =g(X)B}={Xg1(B) =B}=X1(B)(X).

Letusshowthat2implies1.Forall integernandk considersets k k+ 1 k k+ 1An,k = :Y()

2n, 2n =Y1 2n, 2n .By 2, An,k

(X) =

{X1(B) : B

B}

and, therefore, An,k = X1(Bn,k) for some Bn,k B

. Let usconsiderafunction k

gn(X) =2n I(XBn,k).

kZByconstruction, |Y gn(X)| 21n since k k+ 1 k

Y()2n, 2n X()Bn,k gn(X()) = 2n.

It is easy to see that gn(x) gn+1(x) and, therefore, g(x) = limngn(x) is a measurable function on(S,B)and,clearly,Y =g(X).

Discrete random variables.Ar.v.X : S iscalleddiscrete ifPX({Si}i1)=1forsomesequenceSi S.Absolutely continuous random variables.Onameasurespace(S,B),ameasureP iscalledabsolutely continuousw.r.t.ameasure if

B B, (B) = 0 =P(B) = 0.Thefollowingisawellknownresultfrommeasuretheory.Theorem 2 (Radon-Nikodym)IfPandaresigma-finiteandPisabsolutelycontinuousw.r.t.thenthereexistsaRadon-Nikodymderivativef0such thatforallB B

P(B)= f(s)d(s).B

f isuniquelydefined up toa-nullsets.InatypicalsettingofS=Rk,aprobabilitymeasurePandLebesguesmeasure,f iscalledthedensityofthedistributionP.

Independence.Consideraprobabilityspace(,C,P)andtwo-algebrasA,B C.AandB arecalledindependentif

P(AB) =P(A)P(B) forall A A, B B.

6


12/125

-algebrasAi C forinareindependent ifP(A1 An) = P(Ai) forall Ai Ai.

i

n-algebrasAi C forinarepairwise independent if

P(AiAj) =P(Ai)P(Aj) forall Ai Ai, Aj Aj, i= j.RandomvariablesXi : S forinare(pairwise) independent if-algebras(Xi), inare(pairwise)independentwhichisjustanotherconvenientwaytostatethefamiliar

P(X1 B1, . . . , X n Bn) =P(X1 B1). . .P(Xn Bn)foranyeventsB1, . . . , Bn B.

Example.Consideraregulartetrahedrondie,Figure2.3,withred,greenandbluesidesandared-greenbluebase.Ifweroll this die then indicatorsofdifferentcolors provide an exampleofpairwise independentr.v.sthatarenotindependentsince

1 1P(r) =P(b) =P(g)= andP(rb) =P(rg) =P(bg) =

2 4but 31 1

P(rbg) =4 = P(r)P(b)P(g) = 2 .

b

br

r

g

g

Figure2.3:Pairwiseindependentbutnotindependentr.v.s.

Independenceof-algebrascanbecheckedongeneratingalgebras:Lemma 6 If algebrasAi, inare independent then-algebras(Ai)are independent.Proof.ObviousbyApproximationLemma2.Lemma 7 Considerr.v.sXi : Rona probabilityspace(,A,P).

1. Xisare independent iffP(X1 t1, . . . , X n tn) =P(X1 t1). . .P(Xn tn). (2.0.1)

2. If the lawsofXis havedensitiesfi(x) thenXis are independent iffajointdensityexistsandf(x1,...,xn) = fi(xi).

7


13/125

Proof.1isobviousbyLemma6because(2.0.1)impliesthesameequalityforintervalsP(X1 (a1, b1], . . . , X n (an, bn])=P(X1 (a1, b1]). . .P(Xn (an, bn])

and,therefore,forfiniteunionofdisjointsuch intervals.Tocheckthisfor intervals(forexample,forn=2)wecanwriteP(a1 < X1 b1, a2 < Xn b2)as

P(X1 b1, X2 b2)P(X1 a1, X2 b2)P(X1 b1, X2 a2) +P(X1 a1, X2 a2)= P(X1 b1)P(X2 b2)P(X1 a1)P(X2 b2)P(X1 b1)P(X2 a2) +P(X1 a1)P(X2 a2)= P(X1 b1)P(X1 a1) P(X2 b2)P(X2 a2) =P(a1 < X1 b1)P(a2 < X2 b2).

Toprove2westartwith =.P({Xi Ai}) = P(XA1 An) = fi(xi)dx

A1An

= fi(xi)dxi{

byFubinisTheorem}

= P(Xi

Ai).Ai in

Next,weprove= .Firstofall,by independence,P(Xi Ai)FubiniP(XA1 An) = = fi(xi)dx.

A1An

Therefore, the same equality holds for sets in algebra A that consists of finite unions of disjoint sets A1 An,i.e.

P(XB) = fi(xi)dxforBA.B

BothP(XB), fi(xi)dxarecountablyadditiveonAandfinite,BP(Rn) = fi(xi)dx= 1.Rn

BytheCaratheodoryextensionTheorem1,theyextenduniquelytoallBorelsetsB=(A),soP(B) = fi(xi)dxforB B.

B

Expectation.IfX : R isarandomvariableon(,A,P)thenexpectationofX isdefinedasEX= X()dP().

Inotherwords,expectationisjustanothertermforthe integralwithrespecttoaprobabilitymeasureand,asaresult,expectationhasalltheusualpropertiesoftheintegrals.Letusemphasizesomeofthem.Lemma 8 1. IfF is the c.d.f.ofX thenforany measurablefunctiong:R R,

Eg(x) = g(x)dF(x).R

2. IfX is discrete, i.e.P(X {xi}i1) = 1, thenEX = xiP(X=xi).

i1

8


14/125

3. IfX : Rk has adensityf(x)onRk andg:Rk R then Eg(X) = g(x)f(x)dx.

Proof.Allthesepropertiesfollowbymakingachangeofvariablesx=X()or=X1(x), i.e.Eg(X) = g(X())dP() = g(x)dP X1(x) = g(x)dPX(x),

R RwherePX =P X1 isthelawofX.Anotherwaytoseethiswouldbetostartwith indicatorfunctionsofsetsg(x)=I(xB)forwhich

Eg(X) =P(XB) =PX(B) = I(xB)dPX(x)R

and,therefore,thesame istrueforsimplestepfunctionsg(x) = wiI(xBi)

infordisjointBi.Byapproximation, thisistrueforanymeasurablefunctions.

9


15/125

Section 3Kolmogorovs Theorem aboutconsistent distributions.The notion of a general probability space (,A,P) and a random variable X : R on this space areratherabstractandoftenoneisreallyinterestedinthelawPX ofXonthesamplespace(R,B,PX).Onecanalwaysdefinearandomvariablewiththis lawbytakingX :R Rtobethe identityX(x) =x.Similarly,onecandefinearandomvectorX= (X1, . . . , X k)onRk bydefiningthedistributionontheBorel-algebraBk first.How canwe define adistribution on an infinitedimensionalspace or, inotherwords, how canwedefineaninfinitefamilyofrandomvariables

(Xt)tT RT = Rt ={f :T R}tT

for some infinite setT? Obviously,there are variouswaystodothat, forexample,wecandefineexplicitlyXt =cos(tU)forsomerandomvariableU.Inthissectionwewillconsideratypicalsituationwhenwestartbydefiningthedistributiononanyfinitesubsetofcoordinates, i.e.foranyfinitesubsetNT the lawPNof(Xt)tN ontheBorel-algebraBN onRN isgiven.Clearly,theselawsmustsatisfyanaturalconsistencyassumption:foranyfinitesubsetsNM andanyBorelsetB BN,

PN(B) =PM(BRMN). (3.0.1)Then the problem is to define asample space simultaneously for the entire family (Xt)tT, i.e. weneed todefine a -algebra A of measurable events in RT and a probability measure P on it that agrees with ourfinite dimensionaldistributionsPN. At thevery least,A should contain events expressed in termsof finitenumberofcoordinates, i.e.thefollowingalgebraofsetsonRT:

A={BRTN :B BN}.(It is easy to check that A is an algebra.) A set BRTN is called a cylinder and B is the base of thecylinder.

The

probability

P

on

such

sets

is

of

course

defined

by

by

P(BRTN) =PN(B).

Noticethat,byconsistencyassumption,Piswelldefined.GiventwofinitesubsetsN1, N2 T andB1 BN1,thesamesetcanberespresentedas

B1RTN1 = B1R(N1N2)\N1 RT\(N1N2).However, by consistency, Pwill not depend onthe representation. LetA=(A) be a-algebra generatedbyalgebraA,i.e.theminimal-algebrathatcontainsallcylinders.

Definition.A iscalledthecylindricalalgebraandA isthecylindrical-algebraonRT.Example.IfNT then{supi1Xi 1}isameasurableevent inA.

10


16/125

Theorem 3 (Kolmogorov)Forconsistentfamilyofdistributions(3.0.1),Pcanbeuniquelyextended toA.Proof.TousetheCaratheodoryextensionTheorem1,weneedtoshowthatP iscountablyadditiveonAor,equivalently,that itsatisfiescontinuityofmeasureproperty:givenasequenceBn

A,

Bn Bn+1, Bn ==P(Bn)0.n1

Wewillprovethatifthereexists >0suchthatP(Bn)> forallnthen n1Bn = .WehaveBn =CnRTNn, Nn - finitesubsetofT andCn BNn.

SinceBn Bn+1,wecanassumethatNn Nn+1.Firstofall,byregularityofmeasurePNn thereexistsacompactsetKn Cn suchthat

PNn(Cn\Kn) 2n+1.

Wehave,

CiRTNi \ KiRTNi (Ci\Ki)RTNi

in in inand,therefore,

P CiRTNi \ KiRTNi P (Ci\Ki)RTNiin in in

P(Ci\Ki)RTNi2i

+1 2

.

in inSinceP(Bn) =P inCiRTNi > thisimpliesthat

P KiRTNi 2 >0.

in

Wecan

write

KiRTNi = (KiRNnNi)RTNn =KnRTNnin in

whereKn = in(KiRNnNi) isacompactinRNn,sinceKn isacompactinRNn.WeprovedthatPNn(Kn) =P(KnRTNn) =P KiRTNi >0

inand,therefore,thereexistsapoint

n n nx = (x1, . . . , x , . . .)KnRTNn.NnWealsohavethefollowing inclusionproperty.Form>n,

mx

Km

RTNm

Kn

RTNnmand,therefore,(x1m,...,x )Kn.Anysequenceonacompacthasaconvergingsubsequence.Let{n1k}k1 beNn

n n 1k}k1 such11 k)k 2 (x1, . . . , xN1)K1.Thenwecantakeasubsequence{nk}k1 {n

.Byiteration,wecanfindasubsequence{nk }k1 {nsuchthat(x k, . . . , x1 N1

21k kthat(xn , . . . , xn )(x1, . . . , xN2)K21 N2 m1km }k1,

suchthatm mk k(xn ,...,xn )(x1,...,xNm)Km.1 Nm

Therefore,apoint (x1, x2, . . .) KnRTNn Bn,

n1 n1sothis lastsetisnotempty.

11


17/125

Section 4Laws of Large Numbers.Considerar.v.X andsequenceofr.v.s(Xn)n

1 onsomeprobabilityspace.WesaythatXn convergestoX

inprobabilityifforall >0lim P(|XnX| ) = 0.

nWesaythatXn convergestoX almostsurelyorwithprobability1 if

P(: lim Xn() =X())=1.n

Lemma 9 (Chebyshevs inequality)Ifa r.v. X0 thenfor t >0,EX

P(Xt) .t

Proof.EX=EXI(X < t) +EXI(X

t)

EXI(X

t)

tEI(X

t) =tP(X

t).

Theorem 4 (Weak lawof largenumbers)Considerasequenceofr.v.s(Xi)i1 thatarecentered,EXi = 0,havefinite secondmoments,EXi2 K


18/125

ThenXk 0inprobability,sincefor0<


19/125

Strong law of large numbers.Thefollowingsimpleobservationwillbeuseful.IfarandomvariableX0thenEX= P(Xx)dx.Indeed,0

x EX= xdF(x) = 1dsdF(x) = 1dF(x)ds= P(Xs)ds.0 0 0 0 s 0

ForX0suchthatEX


20/125

and ifk0 =min{k:k i}then

1

4 4 K

2k 1n(k)2 = 2k0(1

2) i2.n(k)i ki

Wecancontinue, 1 m+1 1 m+1

() =i2 x2dF(x) = i2 x2dF(x)

i1 mm m 1 m+1 m+1m+ 1 x2dF(x) xdF(x) =EX


21/125

Section 5Bernstein Polynomials. Hausdorff andde Finetti theorems.Let us look at some applications related to the law of large numbers. Consider an i.i.d. sequence of realvaluedr.v.(Xi)withdistributionP fromafamilyofdistributionsparametrizedbyRsuchthat

EXi =, 2():=Var(Xi)K 0,

|Eu(Xn)u()| E|u(Xn)u()|

= E

|u(Xn)

u()

| I(

|Xn

| )+I(

|Xn

|> )

max + 2 max u(x) > )|x||u(x)u()| x | |P(|Xn|() + 2u 1E(Xn)2 () +2uK,

2 n2where() isthemodulusofcontinuityofu.Letting=n 0sothatn2 finishestheproof.n

Example.Let(Xi)bei.i.d.withBernoullidistributionB()withprobabilityofsuccess[0,1],i.e.P(Xi =1)=, P(Xi =0)=1,

and letu:[0,1] Rbecontinuous.Then,bytheaboveTheorem,thefollowingBernsteinpolynomialsn k n k

nn Bn():=Eu(Xn) = u P Xi =k = u k(1)nk u()n n k k=0 i=1 k=0

uniformlyon [0,1].Example.Let(Xi)havePoissondistribution()withintensityparameter >0definedby

kP(Xi =k) =

k!e forintegerk0.Thenitiswellknown(andeasytocheck)thatEXi =, 2() =andthesumX1+. . .+Xn hasPoissondistribution(n).Ifuisboundedandcontinuouson [0,+)then

Eu(Xn) =

ukP n Xi =k= uk(n)ken u()n n k!

k=0 i=1 k=0

16


22/125

uniformlyoncompactsets.Moment problem.ConsiderarandomvariableX[0,1]and letk =EXk be itsmoments.Given

asequence(c0, c1, c2, . . .) letusdefineasequenceofincrementsbyck =ck+1ck.Thenk =kk+1 =E(XkXk+1) =EXk(1X),

()(k) = (1)22k =EXk(1X)EXk+1(1X) =EXk(1X)2andby induction

(1)rrk =EXk(1X)r.Clearly, (1)rrk 0 since X[0,1].If u isacontinuous functionon [0,1] andBn is its correspondingBernsteinpolynomialthen

n n

k n

k nEBn(X) = u EXk(1X)nk = u (1)nknkk.

n k n kk=0 k=0

SinceBn(X)convergesuniformlytou(X),EBn(X)convergestoEu(X).Letusdefine np

(n)= n (1)nknkk 0, p(n) =1(takeu=1).k kk

k=0Wecanthinkofpk(n) asthedistributionofar.v.X(n) suchthat

P X(n) = k =p(kn). (5.0.1)nWeshowedthat

EBn(X) =Eu X(n) Eu(X)for anycontinuous functionu.We will later see that by definitionthismeansthatX(n) convergesto X indistribution.Giventhemomentsofar.v.X,thisconstructionallowsustoapproximatethedistributionofX andexpectationofu(X).

Next,givenasequence(k),when is itthesequenceofmomentsofsome [0,1]valuedr.v.X?Bytheabove, itisnecessarythat

k 0, 0 = 1and(1)rrk 0forallk,r. (5.0.2)Itturnsoutthatthis isalsosufficient.Theorem 7 (Hausdorff)Thereexistsar.v.X[0,1] such that k =EXk iff (5.0.2) holds.Proof. The idea of the proof is as follows. If k are the moments of the distribution of some r.v. X, thenthediscretedistributionsdefined in(5.0.1)shouldapproximate it.Therefore,ourgoalwillbetoshowthatcondition (5.0.2) ensures that (pk(n)) is indeed a distribution and then show that the moments of (5.0.1)convergetok.Asaresult,any limitofthesedistributionswillbeacandidateforthedistributionofX.

Firstofall,letusexpressk intermsof(pk(n)).Sincek =k+1k wehavethefollowinginversionformula:

k = k+1k = (k+2k+1) + (k+1 + 2k)r

= k+22k+1 + 2k = r (1)rjrjk+j,j

j=0

17


23/125

= =

byinduction.Taker=nk.Thennk

nk

nk

nk

j n j (n)k = k+j (1)

n(k+j)n(k+j)k+j = pk+j.n nj=0 k+j j=0 k+jWehave nk k+j

j (nk)! (k+j)!(nkj)! kn nj!(nkj)! n!

k+j ksothat nk k+j n m

k = nk p(kn+)j = nkp(mn).j=0 k m=k k

By(5.0.2),pm(n) 0and mnpm(n) =0 =1sowecanconsiderar.v.X(n) suchthatP X(n) = m =p(n) for0

m

n.m

nWehave

n n n m m(m1) (mk+1) m(m 1) (m k+1)k (n) (n) n n n n n (n)k = npm = n(n1) (nk+1) pm = 1(11) (1k+1) pmm=k k m=k m=k n n

n

n (k) X(n) k.n mk

pm =E k n

m=0Any continuous function u can be approximated by (for example, Bernstein) polynomials so the limitlimnEu X(n) exists. By selection theorem that we will prove later in the course, one can choose asubsequenceX(ni) thatconvergestosomer.v.X indistributionand,asaresult,

kE X(ni) EXk =k,whichmeansthatk arethemomentsofX.

de Finettis theorem. Consider an exchangeable sequence X1, X2, . . . , X n, . . . of Bernoulli randomvariableswhichmeansthatforanyn1theprobability

P(X1 =x1,...,Xn =xn)dependsonlyonx1 +. . .+xn, i.e. itdoesnotdependontheorderof1sor0s.Anotherwaytosaythis isthatforanyn1andanypermutationof1, . . . , nthedistributionof(X(1), . . . , X (n))doesnotdependon.Thenthefollowingholds.Theorem 8 (deFinetti)Thereexists adistributionF on [0,1]such that 1

pk :=P(X1 +. . .+Xn =k) = n xk(1x)nkdF(x).k0

Thismeansthattogeneratesuchexchangeablesequencewecanfirstpickx[0,1]fromdistributionF andthengenerateasequenceofi.i.dBernoullirandomvariableswithprobabilityofsuccessx.Proof.Let0 =1andfork1define

k =P(X1 = 1,...,Xk =1). (5.0.3)

18


24/125

WehaveP(X1 = 1,...,Xk = 1, Xk+1 = 0) = P(X1 = 1,...,Xk =1)

P(X

1 = 1,...,X

k = 1, X

k+1 =1)

= kk+1 =k.Next,usingexchangeability

P(X1 = 1,...,Xk = 1, Xk+1 = 0, Xk+2 = 0) = P(X1 = 1,...,Xk = 1, Xk+1 =0) P(X1 = 1,...,Xk = 1, Xk+1 = 0, Xk+2 =1)= k(k+1) = 2k.

Similarly,by induction,P(X1 = 1,...,Xk = 1, Xk+1 = 0,...,Xn =0)=(1)nknkk 0.

BytheHausdorfftheorem,k =EXk forsomer.v.X[0,1]and,therefore,P(X1 = 1,...,Xk = 1, Xk+1 = 0,...,Xn = 0) = (1)nknkk 1

= EXk(1X)nk = xk(1x)nkdF(x).0

Since,byexchangeability,changingtheorderof1sand0sdoesnotaffecttheprobability,weget 1 P(X1 +. . .+Xn =k) = n xk(1x)nkdF(x).

k0

Example. (Polya urn model). Suppose we have b blue and r red balls in the urn. We pick a ball

+ c of the same color

b r

Pick

Figure5.1:Polyaurnmodel.randomlyandreturnitwithcballsofthesamecolor.Considerr.v.s

1 iftheithballpickedisblueXi = 0 otherwise.

Xisare

not

independent

but

exchangeable.

For

example,

b b+c r b r b+r

P(bbr) = , P(brb) =b+rb+r+cb+r+ 2c b+rb+r+cb+r+ 2c

areequal.To identifythedistributionF indeFinettistheorem,letus lookatitsmomentsk in(5.0.3),k =Pb. . .b = b b+c b+ (k1)c . b+rb+r+c b+r+ (k1)c

k timesOnecanrecognizeoreasilycheckthatk arethemomentsofBeta(, )distributionwiththedensity

(+)()()x1(1x)1

19


25/125

on [0,1] with parameters =b/c, =r/c. By de Finettis theorem, we can generate Xis by first pickingx fromdistributionBeta b/c, r/c andthengenerating i.i.d.Bernoulli(Xi)swithprobabilityofsuccessx.By strong law of large numbers, the proportion of blue balls in the first n repetitions will convergeto thisprobability

of

success

x,

i.e.

in

the

limit

it

will

be

random

with

Beta

distribution.

This

example

will

come

uponcemorewhenwetalkaboutconvergenceofmartingales.

20


26/125

Section 60 - 1 Laws. Convergence of randomseries.Considerasequence(Xi)i1 ofrealvaluedindependentrandomvariablesandlet (Xi)i1 bea-algebraofeventsgeneratedbythissequence,i.e.{(Xi)i1 B}forB inthecylindrical-algebraonRN.

Definition.AneventA (Xi)i1 iscalledataileventifA (Xi)in foralln1.Forexample,ifAi (Xi)then

Ai i.o.= Ain1in

isatailevent.Itturnsoutthatsucheventshaveprobability0or1.Theorem 9 (Kolmogorovs0-1 law) IfA isa tailevent then P(A) = 0 or1.Proof.

For

a

finite

subset

F

={i1, . . . , in}

N,

let

us

denote

by

XF = (Xi1, . . . , X in). A -algebra (Xi)i1 isgeneratedbyalgebra

{XF B:F- finiteN, B B(R|F|)}.By approximation lemma, we can approximate any event A (Xi)i1 by events in this generatingalgebra. Therefore, for any > 0 there exists a set A in this algebra such that P(AA) and bydefinitionA (X1,...,Xn)forlargeenoughn.Thisimplies

|P(A)P(A)| , |P(A)P(AA)| .SinceAisatailevent,A((Xi)in+1)whichmeansthatA, A areindependent,i.e.P(AA) =P(A)P(A).Weget

P(A)

P(AA) =P(A)P(A)

P(A)P(A)and letting 0provesthatP(A) =P(A)2.

Examples.1. i1Xi converges isatailevent,ithasprobability0or1.2.Considerseries i1Xizi onacomplexplane,zC.Itsradiusofconvergence is

1r=liminf Xi i.

i | |

Foranyx0,event{rx}is,obviously,atailevent.Thisimpliesthatr=constwithprobability1.

21


27/125

The Savage-Hewitt 0 - 1 law.Next we will prove a stronger result under more restrictive assumption that the r.v.s Xi, i 1 are

not only independent but also identically distributed with the law . Without loss of generality, we canassume that each Xi is given by the identity Xi(x) = x on its sample space (R,B, ). By Kolmogorovsconsistencytheoremtheentiresequence(Xi)i1 canbedefinedonthesamplespace(RN,B,P)whereBisthecylindrical-algebraandPisthemeasureguaranteedbytheCaratheodoryextensiontheorem.InourcaseXisarei.i.d.andP= iscalledtheinfiniteproductmeasure.Itwillbeconvenienttousethenotation((Xi)i1) for the cylindrical -algebra since similar notation can be used for the cylindrical -algebra onanysubsetofcoordinates.

Definition.AneventA((Xi)i1)- iscalledexchangeable/symmetric ifforalln1,(x1, x2, . . . , xn, xn+1, . . .)A=(xn, x2,...,xn1, x1, xn+1,...)A.

Inotherwords,thesetAissymmetricunderpermutationsofafinitenumberofcoordinates.Notethatanytaileventissymmetric.Theorem 10 (Savage-Hewitt0-1 law)IfA is symmetric thenP(A) = 0or1.Proof.Givenasequencex= (x1, x2, . . .)letusdefineanoperator

x= (xn+1, . . . , x2n, x1, . . . , xn, x2n+1, . . .)thatswitchesthefirstncoordinateswiththesecondncoordinates.SinceAissymmetric,

A={x:xA}=A.BytheApproximationLemma2forany >0forlargeenoughn,thereexistsAn (X1,...,Xn)suchthatP(AnA).Clearly,

Bn = An (Xn+1,...,X2n)and by i.i.d.

P(BnA) =P(AnA) = P(AnA),whichimpliesthatP (AnBn)A 2.Therefore,wecanconcludethat

P(A)P(An), P(A)P(AnBn) =P(An)P(Bn) =P(An)2whereweusedthefactthattheeventsAn, Bn aredefinedintermsofdifferentsetsofcoordinatesand,thus,are independent.Letting 0 impliesthatP(A) =P(A)2.

Example.LetSn =X1 +. . .+Xn andletr=limsupSnan.

n bnEvent{rx} issymmetricsincechanging theorderofany finiteset ofcoordinatesdoesnotaffectSn forlargeenoughn.Asaresult,P(rx) = 0or1,which impliesthatr=constwithprobability1.

Random series. We already saw above that, by Kolmogorovs 0-1 law, the series i1Xi for independent(Xi)i1 convergeswithprobability0or1.ThismeansthateitherSn =X1+. . .+Xn convergestoits limitS withprobabilityone,orwithprobabilityone itdoesnotconverge.Twosectionback,beforetheproofofthestronglawoflargenumbers,wesawtheexampleofasequencewhichwithprobabilityonedoesnot converge yet converges to 0 in probability. In case when with probability one Sn does not converge, isitstillpossiblethat itconvergestosomerandomvariable inprobability?Theanswer isnobecausewewillnowprovethatforrandomseriesconvergence inprobabilityimpliesa.s.convergence.

22


28/125

| |

Theorem 11 (Kolmogorovs inequality) Suppose that (Xi)i1 are independent andSn =X1 +. . .+Xn. Iffor alljn,

P(|SnSj| a)p a, 1

P max |Sj| x P(|Sn|> xa).1jn 1p

Proof. First of all, let us notice that this inequality is obvious without the maximum because (6.0.1) isequivalentto1pP(|SnSj|< a)andwecanwrite

(1p)P |Sj| x P |SnSj|< a P |Sj| x= P |SnSj| xa).

Theequalityistruebecauseevents{|Sj| x}and{|SnSj|< a}areindependentsincethefirstdependsonlyonX1,...,Xj andthesecondonlyonXj+1,...,Xn.The last inequality istruesimplybytriangle inequality.Todealwiththemaximum,insteadoflookingatanarbitrarypartialsumSj wewilllookatthefirstpartialsum that crosses level x. We define that first time by = min{j n : |Sj| x} and let = n+1 if allSj x

a,

n)

P(

|Sn

|> x

a)

andnoticethatnisequivalenttomaxjn|Sj| x.

Theorem 12 (Kolmogorov)If theseries i1Xi converges inprobability then itconverges almostsurely.Proof. Suppose that partial sums Sn converge to some r.v. S in probability, i.e. for any > 0, for largeenoughnn0()wehaveP(|SnS| ).Ifkjnn0()then

P(|SkSj| 2)P(|SkS| ) +P(|Sj S| )2.Next,weuseKolmogorovsinequalityforx= 4anda= 2(weletpartialsumsstartatn):

1 2P max 123,njk|Sj Sn| 4 12P(|SkSn| 2)forsmall.Theevents{maxnjk|Sj Sn| 4}areincreasingask andbycontinuityofmeasure

P max 3.nj |Sj Sn| 4

Finally,sinceP(SnS)wegetP max 4.

nj |Sj S| 5

23


29/125

This kind ofmaximalstatementaboutany sequenceSj is actuallyequivalentto itsa.s.convergence.Toseethistake= m12,taken(m) =n0()andconsideranevent

5= max .Am n(m)j|Sj S| m2

Weprovedthat 4P(Am)


30/125

Example.Considerrandomseries i1 i whereP(i =1)= 21.Wehavei

i

2

1 1E

i =

i2

2,

i1 i1sotheseriesconvergesa.s.forsuch.

25


31/125

Section 7Stopping times, Walds identity.Another proof of SLLN.Consider a sequence (Xi)i1 of independent r.v.s and an integer valued random variable V {1,2, . . .}.WesaythatV is independent of thefuture if{V n} is independentof((Xi)in+1).WesaythatV isastopping time(Markovtime) if{V n} (X1, . . . , X n)foralln.Clearly,astoppingtime is independentofthefuture.AnexampleofstoppingtimeisV =min{k1, Sk 1}.

SupposethatV is independentofthefuture.WecanwriteESV = ESVI(V =k) = ESkI(V =k)

k1 k1= EXnI(V =k)(=) EXnI(V =k) = EXnI(V n).

k1nk n1kn n1In(*)wecaninterchangetheorderofsummationif,forexample,thedoublesequenceisabsolutelysummable,by Fubini-Tonelli theorem. Since V is independent of the future, the event {V n} = {V n1}c isindependentof(Xn)andweget

ESV = EXnP(V n). (7.0.1)n1

This impliesthefollowing.Theorem 14 (Walds identity.) If (Xi)i1 are i.i.d., E|X1|


32/125

where dist meansequality in distribution.=Proof.GivenasubsetNNandsequences(Bi)and(Ci)ofBorelsetsonR,defineevents

A= V N, X1 B1,...,XV BVandforanyk1,

D= XV+1 C1,...,XV+k Ck .Wehave,

P(DA) = P(DA{V =n}) = P(DnA{V =n})n1 n1

whereDn ={Xn+1 C1, . . . , X n+k Ck}.

Theintersectionofevents, nNA{V =n}= {V =n, X1 B1, . . . , X n Bn}, otherwise.

SinceV isastoppingtime,{V =n} (X1, . . . , X n)andA{V =n} (X1, . . . , X n).Ontheotherhand,Dn (Xn+1, . . .)and,asaresult,

P(DA) = P(Dn)P(A{V =n}) = P(D0)P(A{V =n}) =P(D0)P(A),n1 n1

andthisfinishestheproof.Remark.Onecouldbea littlebitmorecarefulwhentalkingabouttheeventsgeneratedbyavector

(V, X1, . . . , X V)thathasrandomlength.Intheproofweimplicitlyassumedthatsucheventsaregeneratedbyevents

A= V N, X1 B1,...,XV BVwhich is a rather intuitive definition. However, one could be more formal and define a -algebra of eventsgeneratedby(V, X1, . . . , X V)aseventsAsuchthatA{V n} (X1, . . . , X n)foranyn1.Thismeansthat when V n the event A is expressed only in terms of X1, . . . , X n. It is easy to check that with thismoreformaldefinitiontheproofremainsexactlythesame.

LetusgiveoneinterestingapplicationofMarkovpropertyandWalds identitythatwillyieldanotherproofofstrong lawof largenumbers.Theorem 16 Suppose that (Xi)i1 are i.i.d. such that EX1 >0. If Z = infn1Sn then P(Z >) = 1.(Partial

sums

can

not

drift

down

to

if

EX1

>

0.

Of

course,

this

is

obvious

by

SLLN.)

Proof.Letusdefine(seefigure7.1),

1 =min{k1, Sk 1}, Z1 =minSk, S(2) =S1+kS1,kk1k(2) 1 Z2 k(2) k(3) (2) 22 =min k1, S , =minS , S =S2+kS(2).k2

Byinduction, =min k1, S(n) , = minS(n), S(n+1) =S(n+) kS(n).n k 1 Zn kn k k n n

Z1,...,Zn arei.i.d.byMarkovproperty.27


33/125

0

1

1

!1

!2z2

0z1

Figure7.1:Asequenceofstoppingtimes.Noticethat,byconstruction,S1+ +n1 n1and

Z= inf Sk =inf{Z1, S1 +Z2, S1+2 +Z3,...}.k1

Wehave, {Z N}= {S1+...+k1 +Zk N} {k1 +Zk N}.k1 k1

Therefore,P(Z N) P(k1 +Zk N) = P(Zk Nk+1)

k1 k1= P(Z1 Nk+ 1)= P(Z1 j)NP(Z1 j) | | 0

k1 jN jNifwecanshowthatE|Z1|with probability one. This means that for all n 1, Sn +n M > for some large enough M.Dividingbothsidesbynand lettingn weget

Snliminf

n nwith probability one. We can then let 0 over some sequence. Similarly, we prove that limsupSkk 0withprobabilityone.

28


34/125

Section 8Convergence of Laws. SelectionTheorem.Inthissectionwewillbeginthediscussionofweakconvergenceofdistributionsonmetricspaces.Let(S, d)beametricspacewithametricd.Considerameasurablespace(S,B)withBorel-algebraBgeneratedbyopensetsand let(Pn)n1 andPbeprobabilitydistributionsonB.Wedefine

Cb(S) ={f :S R- continuousandbounded}.WesaythatPn Pweakly if

f dPn f dP forall fCb(S).Theorem 18 IfS=R then Pn P iff

Fn(t) =Pn , t F(t) =P , tfor anypointofcontinuity tof F(t).Proof.= Letusapproximatean indicatorfunctionbyacontinuousfunctionsas infigure8.1,i.e.

1(X)I(Xt)2(X), 1, 2 Cb(R).Forconvenienceof notations, instead ofwriting integrals w.r.t. Pn we willwriteexpectations ofa r.v.Xn

! !!!!

""

"#

$!%

Figure8.1:Approximating indicator.withdistributionPn.

P(Xt)E1(X)E1(Xn)Fn(t) =P(Xn t)E2(Xn)E2(X)P(Xt+)asn .Therefore,forany >0,

F(t)liminfFn(t)limsupFn(t)F(t+).Sincet isapointofcontinuityofF, letting 0provestheresult.

29


35/125

=LetP C(F)bethesetofpointsofcontinuityofF.SinceF ismonotone,thesetP C(F)isdenseinR.TakeM largeenoughsuchthatbothM,MP C(F)andP([M, M]c).Clearly,forlargeenoughk wehavePk([M, M]c)2.Foranyn >1,takeasequenceofpoints

M =x1n x2n xnn =Mn nsuch that all xi P C(F) and maxi|xi+1 xi | 0 as n . Given a function f Cb(R), consider an

approximatingfunctionfn(x) = f(xi)I(x(xin1, xin]) + 0 I(x /[M, M]).

1


36/125

F(x) is a c.d.f. on Rk (exercise). The fact that Pn are uniformly tight ensures that F(x) 0 or 1 if allxi or+.LetxbeapointofcontinuityofF(x)andleta, bAsuchthatai < xi < bi foralli.Wehave,

F(a)

Fn(k)(a)

Fn(k)(x)

Fn(k)(b)

F

(b)

ask .SincexisapointofcontinuityandA isdense,

F(a)a F(b)bx F(x), x F(x),and this proves that Fn(k)(x) F(x) for all such x. Similarly to one-dimensional case one can show thatforanyfCb(Rk),

f dFn(k) fdF.Proof of Theorem 19. If K is a compact then Cb(K) =C(K). Later in these lectures, when we deal inmoredetailwithconvergenceongeneralmetricspaces,wewillprovethefollowingfactwhichiswell-knownandisaconsequenceoftheStone-Weierstrasstheorem.

Fact.C(K)

is

separable

w.r.t.

norm||f|| =supxK|f(x|.

Even though we are proving Selection theorem for a general metric space, right now we are mostlyinterested in thecase S =Rk wherethis fact is asimple consequenceof the Weierstrasstheoremthat anycontinuousfunctioncanbeapproximatedbypolynomials.

SincePn areuniformlytight, foranyr1wecanfindacompactKr suchthatPn(Kr)>1 1.LetrCr C(Kr)beacountableanddensesubsetofC(Kr).ByCantorsdiagonalizationargumentthereexistsasubsequence(n(k)) suchthatPn(k)(f) converges forallf Cr for all r1.Since Cr isdense inC(Kr)thisimpliesthatPn(k)(f)convergesforallfC(Kr)forallr1.Next,foranyfCb(S),

Pn(k)(Kc) ||f||r r .f dPn(k) f dPn(k) Kc |f|dPn(k) ||f||Kr r

This impliesthatthe limitI(f):= lim f dPn(k) (8.0.1)

kexists.ThequestioniswhythislimitisanintegraloversomeprobabilitymeasureP?OneachofthecompactsKr we could use Rieszs representation theorem for continuous functionalson C(Kr) and thenextendthisrepresentationtotheunionofKr.Instead,wewillprovethisasaconsequenceofamoregeneralresult,theStone-Danielltheoremfrommeasuretheory,whichsaysthefollowing.

Afamilyoffunction={f :S R} iscalledavector latticeiff, g=cf+g,cR and fg, fg.

AfunctionalI : R iscalledapre-integralif1. I(cf+g) =cI(f) +I(g),2. f0, I(f)0,3. fn 0,||fn||


37/125

OnanycompactKr,fn 0uniformly,i.e.n,r n||fn||,Kr 0.

Since 1fndPn(k) = fndPn(k) n,r +rf1||,KrKrc

weget 1

I(fn)= lim fndPn(k) n,r +r||f1||.k

Lettingn andr wegetthatI(fn)0.BytheStone-DanielltheoremI(f) = f dP

for some measure on (Cb(S)). The choice of f = 1 gives I(f) = 1 = P(S) which means that P is aprobabilitymeasure.Finally,letusshowthat(Cb(S))=B- Borel-algebrageneratedbyopensets.Sinceany f Cb(S) is measurable onB we get (Cb(S)) B. On the other hand, let F S be any closed setandtakeafunctionf(x)=min(1, d(x, F)).Wehave, |f(x)f(y)| d(x, y)sofCb(S)and

f1({0})(Cb(S)).However,sinceF isclosed,f1({0}) ={x:d(x, F) = 0}=F andthisprovesthatB (Cb(S)).

Theorem 21 IfPn convergesweakly toP on Rk then(Pn)n1 isuniformly tight.Proof.Forany >0thereexistslargeenoughM >0,suchthatP(|x|> M)2M (x)dPn (x)dPP |x|> M .Forn largeenough,nn0,wegetPn(|x|>2M)2.Forn < n0 chooseMn sothatPn(|x|> Mn)2.TakeM =max{M1, . . . , M n01,2M}.Asaresult,Pn(|x|> M)2foralln1.

Lemma 13 Iffor any sequence (n(k))k1 there exists a subsequence (n(k(r)))r1 such that Pn(k(r)) Pweakly thenPn Pweakly.Proof. Suppose not. Then for some f Cb(S) and for some >0 there exists a subsequence (n(k)) suchthat f dPn(k) f dP>.ButthiscontradictsthefactthatforsomesubsequencePn(k(r)) Pweakly.Considerr.v.sX andXn onsomeprobabilityspace(,A,P)withvaluesinametricspace(S, d).LetPandPn be their corresponding laws on Borel sets B in S. Convergence of Xn to X in probability and almostsurely isdefinedexactlythesamewayasforS=Rbyreplacing|XnX|withd(Xn, X).

32


38/125

Lemma 14 Xn X in probability ifffor any sequence (n(k)) there exists a subsequence (n(k(r))) suchthatXn(k(r)) X

a.s.

Proof.

=.

Suppose

Xn doesnotconvergetoX inprobability,Thenforsmallenough >0thereexistsasubsequence(n(k))suchthat

P d(X, Xn(k)) .ThiscontradictstheexistenceofsubsequenceXn(k(r)) thatconvergestoX a.s.

= .Givenasubsequence(n(k)) letuschoose(k(r))sothat 1 1P d(Xn(k(r)), X)

r r2.ByBorel-Cantelli lemma,theseeventscanoccur i.o.withprobability0,whichmeansthatwithprobabilityoneforlargeenoughr

1d(Xn(k(r)), X) ,r

i.e.Xn(k(r)) X a.s.Lemma 15 Xn X in probability then Xn X weakly. Proof.ByLemma14,foranysubsequence(n(k))thereexistsasubsequence(n(k(r)))suchthatXn(k(r)) X a.s.GivenfCb(R),bydominatedconvergencetheorem,

Ef(Xn(k(r))) Ef(X),i.e.Xn(k(r)) X weakly.ByLemma13,Xn X weakly.

33


39/125

Section 9Characteristic Functions. CentralLimit Theorem on R.LetX = (X1, . . . , X k)bearandomvectoronRk withdistributionPand lett= (t1, . . . , tk)Rk.CharacteristicfunctionofX isdefinedby

f(t) =Eei(t,X) = ei(t,x)dP(x).IfX hasstandardnormaldistributionN(0,1)andRthen

2 2 22 2 2 2EeX =1

2exx dx=e 1

2e(x)

2dx=e .

Forcomplex=it,consideranalyticfunction21

(x) =eitx2

ex2 for xC.ByCauchystheorem, integraloveraclosedpath isequalto0.Letustakeaclosedpathx+i0forx fromto+andx+itforxfrom+to.Then

2f(t) = 1 eitxx dx= 1 eit(it+x)21(it+x)2dx

2 2 2 2 2 2 2 2 2

2 2 2 2 2= 1

et +itx+21t itx21x dx=et 1 ex dx=et . (9.0.1)IfY hasnormaldistributionN(m, 2)then

EeitY =Eeit(m+X ) =eitmt222 .rLemma 16 IfX is areal-valuedr.v.such thatE|X|


40/125

bydominatedconvergencetheorem.ThismeansthatfC(R).Ifr= 1,E|X|


41/125

IfPhasdensitypthenPQ(A) = I(x+yA)p(x)dxdQ(y) = I(zA)p(zy)dzdQ(y)

= p(zy)dzdQ(y) = p(zy)dQ(y) dzA A

whichmeansthatPQhasdensity f(x) = p(xy)dQ(y). (9.0.2)

If,inaddition,Qhasdensityq then f(x) = p(xy)q(y)dy.

DenotebyN(0, 2I)thelawoftherandomvectorX= (X1, . . . , X k)ofi.i.d.N(0, 2)randomvariableswhosedensityonRk is

k 1 2 1 2 1 1 k2e22xi = 2 e22|x| .

i=1ForadistributionPdenoteP =P N(0, 2I).Lemma 18 P =P N(0, 2I) hasdensity 1k 2 2

p(x) = f(t)ei(t,x) 2 |t| dt2

where f(t) = ei(t,x)dP(x).Proof.By(9.0.2),P N(0, 2I)hasdensity 1 k

p(x) = 2 e2

12|xy|2dP(y).

Using(9.0.1),wecanwritee212(xiyi)2 = 1

2ei1(xiyi)zie21zi2dzi

andtakingaproductoverikweget1 2 1 k 1 2

e22|xy| = 2

ei1(xy,z)e2|z| dz.

Thenwe

can

continue

(xy,z)21|z|2p(x) = 1 k ei1 dzdP(y)

2 1 k= ei1(xy,z)21|z|2dP(y)dz

2 1 k zei1(x,z) 21= f 2|z| dz.

2 Letz=t.

36


42/125

0

Theorem 23 (Uniqueness) Ifi(t,x)dP(x) = i(t,x)dQ(x)e e

thenP=Q.Proof.BytheaboveLemma,P =Q.IfXPandN(0, I)thenX+X almostsurelyasand,therefore,P Pweakly.Similarly,Q Q.

Weprovedthatthecharacteristic functionofSn/nconvergestothec.f.ofN(0, 2).Also,thesequenceL Sn

n - isuniformlytight,n1sincebyChebyshevsinequality

P Snn

2> M <

M2for largeenoughM.TofinishtheproofoftheCLTonthereal lineweapplythefollowing.Lemma 19 If(Pn) isuniformly tight and

fn(t) = eitxdPn(x) f(t)thenPn P andf(t) = eitxdP(x).Proof. For any sequence (n(k)), by Selection Theorem, there exists a subsequence (n(k(r))) such thatPn(k(r)) convergesweaklytosomedistributionP.Sinceei(t,x) isboundedandcontinuous,

i(t,x)

dPn(k(r)) i(t,x)dP(x)e eas r and, therefore, f is a c.f. of P. By uniqueness theorem, distribution P does not depend on thesequence(n(k)).ByLemma13,Pn Pweakly.

37


43/125

Section 10Multivariate normal distributions andCLT.LetPbeaprobabilitydistributiononRk andlet

g(t) = ei(t,x)dP(x).WeprovedthatP =P N(0, 2I)hasdensity

p 12(x)=(2)k g(t)ei(t,x) 2|t|2dt.Lemma 20 (Fourier inversionformula)If |g(t)|dt


44/125

It isnowasimpleexercisetoshowthatforanyboundedopensetU,dP(x) = p(x)dx.

U UThismeansthatPrestrictedtoboundedsetshasdensityp(x)and,hence,onentireRk.

ForarandomvectorX= (X1, . . . , X k)Rk wedenoteEX= (EX1, . . . ,EXk).ofi.i.d.randomvectorsonRk suchthatEX1 = 0,EX1 2Theorem 24 Considerasequence(Xi)i1

ThenL Sn converges weakly to distribution Pwhichhas characteristicfunctionn


45/125

foranysetRk wecanwriteP(Ag)=P(gA1)=

A

1

1

2k

exp

2

1|x|2dx.

Letusnowmakethechangeofvariablesy=Axorx=A1y.Then 1 k 1 1P(Ag)=

2 exp 2|A1y|2 |det(A)|dy.Butsince

det(C)=det(AAT)=det(A) det(AT)=det(A)2wehave |det(A)|= det(C).Also

|A1y|2 = (A1y)T(A1y) =yT(AT)1A1y=yT(AAT)1y=yTC1y.Therefore,weget k 1 1 1P(Ag)=

2 det(C)exp 2yTC1y dy.ThismeansthatthedistributionN(0, C)hasthedensity 1 k 1 1

2 det(C)exp 2yTC1y .

General case. Letustake, forexample,avectorX =QD1/2g for i.i.d.standardnormalvector g sothatX N(0, C).Ifq1, . . . , q k arethecolumnvectorsofQthen

X=QD1/2g= (11/2g1)q1 +. . .+ (1k/2gn)qk.Therefore,intheorthonormalcoordinatebasisq1, . . . , q k arandomvectorXhascoordinates11/2g1, . . . , k1/2gk.These coordinates are independent with normal distributions with variances 1, . . . , k correspondingly. When det(C) = 0, i.e. C is not invertible, some of its eigenvalues will be zero, say, n+1 = . . . = k = 0.Then the random X vector will be concentrated on the subspace spanned by vectors q1, . . . , q n but it willnothavedensityontheentirespaceRk.Onthesubspacespannedbyvectorsq1, . . . , q n avectorX willhaveadensity

n 1 x2i f(x1, . . . , xn) = 2i exp 2i .i=1

Letuslookatacoupleofpropertiesofnormaldistributions.Lemma 21 IfX N(0, C)onRk andA:Rk Rm is linear thenAX N(0,ACAT)onRm.Proof.Thec.f.ofAX is

Eei(t,AX) =Eei(ATt,X) =e21(CATt,ATt) =e21(ACATt,t).

Lemma 22 X isnormalonRk iff(t, X) isnormalonRforalltRk.

40


46/125

Proof.= .Thec.f.ofreal-valuedrandomvariable(t, X)isf() =Eei(t,X) =Eei(t,X) =e21(Ct,t) =e212(Ct,t)

whichmeansthat(t, X) N(0,(Ct,t)). =.If(t, X) isnormalthen

1

Eei(t,X) =e2(Ct,t)becausethevarianceof(t, X) is(Ct,t).

Lemma 23 Let Z = (X, Y) where X = (X1, . . . , X i) and Y = (Y1, . . . , Y j) and suppose that Z is normalonRi+j. ThenX and Y are independent iff Cov(Xm, Yn) = 0forallm,n.

Proof.Onewayisobvious.Theotherwayaround,supposethatD 0

C=Cov(Z) = .0 F

Thenthec.f.ofZ isEei(t,Z) =e21(Ct,t) =e21(Dt1,t1)21(F t2,t2) =Eei(t1,X)Eei(t2,Y),

wheret= (t1, t2).Byuniqueness,X andY are independent.

Lemma 24 (ContinuousMapping.)SupposethatPn PonX andG:X Y isacontinuousmap.ThenG1 P G1 onY. Inotherwords, ifr.v.Zn

Z weakly thenG(Zn

) G(Z)weakly.Pn

Proof.This isobvious,becauseforanyfCb(Y),wehavefGCb(X)andtherefore,Ef(G(Zn)) Ef(G(Z)).

Lemma 25 IfPn PonRk andQn Q onRm thenPnQn PQonRk+m.Proof.ByFubinitheorem,Thec.f.

ei(t,x)dPnQn(x) = ei(t1,x1)dPn ei(t2,x2)dQn ei(t1,x1)dP ei(t2,x2)dQ= ei(t,x)dPQ.ByLemma19itremainstoshowthat(PnQn) isuniformlytight.ByTheorem21,sincePn P, (Pn) isuniformly tight. Therefore, there exists a compact K on Rk such that Pn(K) > 1. Similarly, for somecompactK onRm,Qn(K)>1.Wehave,

PnQn(KK)>12andKK isacompactonRk+m.

Corollary 1 IfPn Pand Qn QbothonRk then PnQn PQ. Proof. Since a function G : Rk+k Rk given by G(x, y) = x+y is continuous, by continuous mappinglemma,

PnQn = (PnQn)G1 (PQ)G1 =PQ.

41


47/125

Section 11Lindebergs CLT. Levys EquivalenceTheorem. Three Series Theorem.Insteadofconsideringi.i.d.sequences,foreachn1wewillconsideravector(X1n, . . . , X n)ofindependentnr.v.,notnecessarilyidenticallydistributed.Thissettingiscalledtriangulararraysbecausetheentirevectormaychangewithn.Theorem 25 Consideravector (Xin)1in of independentr.v.ssuch that

EXin = 0, Var(Sn) = E(Xin)2 = 1.

inSuppose that thefollowingLindebergscondition issatisfied:

n

E(Xin)2I(|Xin|> )0 asn for all >0. (11.0.1)i=1

ThenL iinXn N(0,1).Proof.Firstofall, L inXn isuniformlytight,becausebyChebyshevs inequalityi

P Xni 1> M M2 in

22forlargeenoughM.Itremainstoshowthatthecharacteristic functionofSn covergestoe .Forsimplicity

ofnotationsletusomittheupper indexnandwriteXi insteadofXi

n.Since,EeiSn = EeiXi

init isenoughtoshowthat 2iSn iXilogEe = log 1 + Ee 1 .

2 (11.0.2)It isaneasyexercisetoprove,byinductiononm,thatforanyaR,

m+1|a|(m+1)!.

(ia)kia (11.0.3)e k!

km

42


48/125

(Justintegratethis inequalitytomaketheinductionstep.)Usingthisform= 1,EeiXi

1 = EeiXi

1

iEXi

2 2 2 1EXi2 2 + EXi2I(

|Xi

|> )

22 (11.0.4)

2

22

2

for largenby(11.0.1)andforsmallenough.Usingtheexpansionof log(1+z) itiseasytocheckthat

12log(1+z)z for |z| |z| 2

and,therefore,wecanwriteEe 4 22EeiXi EeiXi 1 iXi 1 EXi2log 1 + 1 4

in in in4 4

EXi2 EXi2 EXi2 0max = max 4 4in inin

because,asin(11.0.4),EX2 2 +EX2I(|Xi|> )0i i

for largenby(11.0.1)andfor 0.Finally,toshow(11.0.2)itremainstoshowthat2

EeiXi 1 . 2

inUsing(11.0.3)form=1,ontheevent |Xi|> ,

eiXi 1iXi I 2X2iI (|Xi|> Xi > | | 2and,therefore, 2iXi 1iXi + X2i 2X2iII Xi > Xi > | | | |e .2Using(11.0.3)form= 2,ontheevent |Xi| ,

eiXi 1iXi +22 X2i

3 3Xi|3I Xi2.I |Xi| Xi| 6 | | 6

Combiningthe lasttwoequationsandusingthatEXi = 0,Ee 2EX2i2 3iXi EXi2 EXi2.1 + I |Xi|> +2 6

Finally,=2 2EeiXi iXi EXi21 1 +Ee + 2 2

in in in3

+2 EXi2I(Xi > ) 0 | | 6asn (usingLindebergscondition)and0.

Lemma 26 IfP,Q are distributions on Rsuch thatPQ=P then Q({0}) = 1.

43


49/125

| | | |

Proof.Letusdefine fP(t) = eitxdP(x), fQ(t) = eitxdQ(x).

Thecondition

P

Q=PimpliesthatfP(t)fQ(t) =fP(t).SincefP(0)=1andfP(t)iscontinuous,forsmallenough t wehave fP(t) >0and,asaresult,fQ(t) = 1.SincefQ(t)= cos(tx)dQ(x) +i sin(tx)dQ(x)

for |t| this impliesthat cos(tx)dQ(x)=1andsincecos(s)1thiscanhappenonly ifQ x:xt= 0mod2 =1forall|t| .

Takes, tsuchthat |s|,|t| ands/t is irrational.Forxtobe inthesupportofQwemusthavexs= 2kandxt= 2mforsome integerk,m.Thiscanhappenonlyifx= 0.

Theorem 26 (Levysequivalence)If(Xi) isasequenceofindependentr.v.then Xi convergesa.s.iffinprobability iff in law. i1Proof.Wealreadyproved(aKolmogorovstheorem)thatconvergenceinprobabilityimpliesa.s.convergence.Itremainstoproveonlythatconvergence inlaw impliesconvergenceinprobability.

Suppose that L(Sn) P. Convergence in law implies that {L(Sn)} is uniformly tight which easilyimpliesthat{L(SnSk)}n,k1 isuniformlytight.Thiswill implythatforany >0

P(|SnSk|> )< (11.0.5)for nkN for large enough N. Suppose not. Then there exists >0 and sequences (n(l)) and (n(l))suchthatn(l)n(l)and

P(|S

n(l)

Sn(l)|

> )

.Let us denote Yl = Sn(l) Sn(l). Since {L(Yl)} is uniformly tight, by selection theorem, there exists asubsequence(l(r))suchthatL(Yl(r))Q.Since

Sn(l(r)) =Sn(l(r)) +Yl(r) and L(Sn(l(r))) =L(Sn(l(r))) L(Yl(r))lettingr wegetthatP=PQ.BytheaboveLemma,Q({0}) = 1,whichimpliesthatP(|Yl(r)|> )forlarger- acontradiction.Once(11.0.5)isproved,byBorel-Cantelli lemmawecanchoosea.s.convergingsubsequenceasinKolmogorovstheorem,andthenby(11.0.5)Sn convergesinprobabilitytothesamelimit.

Theorem 27 (Threeseries theorem)Let(Xi)i1 beasequenceof independentr.v.and letZ

i =X

iI(

|X

i| 1).

Then i1Xi converges iff1. i1P(|Xi|>1)


50/125

byBorel-Cantelli lemmaP({Xi =Zi} i.o.)=0whichmeansthat Xi converges iff Zi converges. i1 i1By2,itisenoughtoshowthat i1(ZiEZi)converges,butthisfollowsfromTheorem13by3.

=.If i1Xi convergesa.s.,P({|Xi|>1}i.o)=0

andsince(Xi)areindependent,byBorel-Cantelli,P(|Xi|>1) ) for any > 0 for m, n large enough. Suppose that

| |i

1Var(Zi) =

.Then 2 =Var(Smn)= Var(Zk) mn

mknas n for any fixed m. Intuitively, this should not happen: Smn 0 in probability but their variancegoesto infinity.Inprinciple,onecanconstructsuchsequenceofrandomvariablesbut inourcaseitwillberuledoutbyLindebergsCLT.Becausemn ,Lindebergstheoremwillimplythat,

SmnESmn ZkEZkTmn = =

mn mn N(0,1),mkn

ifm, n and2 .Weonlyneedtocheckthatmn

2

E ZkEZk I ZkEZk> 0mn mn mkn

asm,n,nm .Since|ZkEZk|


51/125

Section 12Levys Continuity Theorem. PoissonApproximation. ConditionalExpectation.

Letusstartwiththefollowingbound.Lemma 27 LetX beareal-valuedr.v.with distribution Pand let

f(t) =EeitX = eitxdP(x).Then,

1

7

u

P |X|>u u

0 (1Ref(t))dt.

Proof.Since Ref(t)= costxdP(x)

wehaveu u

1 1(1costx)dP(x)dt = (1costx)dtdP(x)

u u0 R R 0 sinxu

= 1xu dP(x)

R sinxu 1 xu dP(x) siny sin1 |xu|1 1 1

sincey

1 (1sin1) 1dP(x) 7P |X| u .

|xu|1

Theorem 28 (Levy continuity)Let (Xn) bea sequenceofr.v.onRk.Suppose thatfn(t) =Eei(t,Xn) f(t)

46


52/125

andf(t) iscontinuousat0alongeachaxis.Then there exists aprobabilitydistributionPsuch thatf(t) = ei(t,x)dP(x)

andL(Xn)P.Proof.ByLemma19weonlyneedtoshowthat{L(Xn)}isuniformlytight.Ifwedenote

Xn = (Xn,1, . . . , X n,k)thenthec.f.salongtheith coordinate:

fi(ti):=fn(0, . . . , ti,0, . . .0)=EeitiXn,i f(0, . . . , ti, . . .0)=:fi(ti).n Sincefn(0)=1and,therefore,f(0)=1,forany >0wecanfind >0suchthatforallik

|fi

(ti)1| if |ti| .

This impliesthatforlargeenoughn|fni(ti)1| 2 if |ti| .

UsingpreviousLemma, 1 7 7

P |Xn,i|> 0 1Refi(ti) dti 0 1fi(ti)dti 72.n n

Theunionboundimpliesthat

k

P |Xn|> 14k

and{L(Xn)}n1 isuniformlytight.CLTdescribeshowsumsofindependentr.v.sareapproximatedbynormaldistribution.Wewillnowgivea

nsimpleexampleofadifferentapproximation.ConsiderindependentBernoullirandomvariablesXin B(pi)n n nforin, i.e.P(Xn =1)=p andP(Xn =0)=1pi.Ifp =p >0thenbyCLTi i i i

Snnpnp(1p) N(0,1).

nHowever, ifp =pi 0 fast enough then, for example, the Lindeberg conditions will be violated. It iswell-knownthat ifpni

=pn andnpn thenSn hasapproximatelyPoissondistribution withp.f.

kf(k) = e

fork= 0,1,2, . . .k!

Here isaversionofthisresult.Theorem 29 Consider independentXi B(pi)forinand let

Sn =X1 +. . .+Xn and=p1 +. . .+pn.Thenforany subsetof integers BZ,

|P(Sn B)(B)| pi2.in

47


53/125

Proof. The proof is based on the construction on one probability space. Let us construct Bernoulli r.v.Xi B(pi)andPoissonr.v.Xi pi onthesameprobabilityspaceasfollows.Letusconsideraprobabilityspace([0,1],B, )withLebesquemeasure.Define

0, 0x1pi,Xi =Xi(x) = 1, 1pi < x1.Clearly,Xi B(pi).LetusconstructXi asfollows.Iffork0wedefine

(pi)l

epick =l!

0lkthen

Xi =Xi(x) =

0, 0xc0,1, c0 < xc1,2, c1 < xc2,. . .

Clearly,Xi pi.WhenXi = Xi?Since1pj epj =c0,thiscanonlyhappenfor1pi < xc0 and c1 < x1,

i.e.P(Xj =X) =epj (1pj)+(1epj pjepj) =pj(1epj)p2j j

Weconstructpairs(Xi, Xi)onseparatecoordinatesofaproductspace,thus,makingthemindependentfoIt is well-known thatin. inXi and,finally,weget

P(Sn =Sn) P(Xj =Xj) pj2.jn jn

Conditional expectation.Let(,B,P)beaprobabilityspaceandX : RbearandomvariablesuchthatE|X|


54/125

BydefinitionY =E(X|A).2.(Uniqueness)SupposethereexistsY =E(X|A)suchthatP(Y = Y)>0, i.e.

P(Y

> Y)

>

0or

P(Y

< Y

)

>

0.

SincebothY, Y aremeasurableonAthesetA={Y > Y} A.Oneonehand,E(Y Y)IA >0.Ontheotherhand,

E(Y Y)IA =EXIAEXIA = 0- acontradiction.

3.E(cX+Y|A) =cE(X|A) +E(Y|A).4.If-algebrasC A B then

E(E(X|A)|C) =E(X|C).ConsiderasetC C A.Then

EIC(E(E(X|A)|C))=EICE(X|A) =EICX and EIC(E(X|C))=EXIC.Weconcludebyuniqueness.

5.E(X|B) =X,E(X|{,}) =EX,E(X|A) =EX ifX isindependentofA.6.IfXZ thenE(X|A)E(Z|A)a.s.;proof issimilartoproofofuniqueness.7.(Monotoneconvergence)IfE|Xn|


56/125

Section 13Martingales. Doobs Decomposition.Uniform Integrability.Let(,B,P)beaprobabilityspaceandlet(T,)bealinearlyorderedset.Considerafamilyof-algebrasBt, tT suchthatfortu,Bt Bu B.Definition.Afamily(Xt,Bt)tT iscalledamartingaleif

1. Xt : Rismeasurablew.r.t.Bt;inotherwords,Xt isadaptedtoBt.2. E|Xt|


57/125

Thus,(Zn,Bn)n1 isaright-closedmartingale.

Lemma

28

Letf:R R

be

a

convex

function.

Suppose

that

either

one

of

two

conditions

holds:

1. (Xt,Bt) isamartingale,2. (Xt,Bt) isasubmartingaleand f is increasing.

Then (f(Xt),Bt) isasubmartingale.Proof.1.Fortu,byJensensinequality,

f(Xt) =f(E(Xu|Bt))E(f(Xu)|Bt).2.Fortu,sinceXt E(Xu|Bt)andf isincreasing,

f(Xt)f(E(Xu|Bt))E(f(Xu)|Bt),

wherethe laststepisagainJensensinequality.

Theorem 30 (Doobsdecomposition)If(Xn,Bn)n0 isasubmartingalethenitcanbeuniquelydecomposedXn =Zn +Yn,

where (Yn,Bn) isamartingale,Z0 = 0, Zn Zn+1 almost surely andZn isBn1-measurable.Proof.LetDn =XnXn1 and

Gn =E(Dn|B

n

1) =E(Xn

|Bn

1)

Xn

1

0

bythedefinitionofsubmartingale.Let,Hn =DnGn, Yn =H1 +. . .+Hn, Zn =G1 + +Gn.

SinceGn 0a.s.,Zn Zn+1 and,byconstruction,Zn isBn1-measurable.Wehave,E(Hn|Bn1) =E(Dn|Bn1)Gn = 0

and, therefore, E(Yn|Bn1) =Yn1. Uniqueness follows by construction. Suppose that Xn =Zn +Yn withallstatedproperties.First,sinceZ0 = 0, Y0 =X0.Byinduction,givenauniquedecompositionupton1,wecanwrite

Zn =E(Zn|Bn1) =E(XnYn|Bn1) =E(Xn|Bn1)Yn1andYn =XnZn.Definition.Wesaythat(Xn)n1 isuniformly integrableif

supE and sup I( > M) 0 asn |Xn|


58/125

Proof.1.IfXn =E(Y|Bn)then|Xn|=|E(Y|Bn)| E(|Y||Bn) and E|Xn| E|Y| M} Bn,XnI(|Xn|> M)=I(|Xn|> M)E(Y|Bn) =E(YI(|Xn|> M)|Bn)

and,therefore,E|Xn|I(|Xn|> M)E|Y|I(|Xn|> M) KP(|Xn|> M) +E|Y|I(|Y|> K)

KE|Xn|+EY I(Y > K)KE|Y|+EY I(Y > K).M | | | | M | | | |

LettingM , K provesthatsup E|Xn|I(|Xn|> M)0asM .n2.Since(Xn,Bn)n isasubmartingale, forY =X wehaveXn E(Y|Bn).Belowwewillusethe

followingobservation.Sinceafunctionmax(a, x) isconvexand increasinginx,byJensensinequalitymax(a, Xn)E(max(a, Y)|Bn). (13.0.1)

Since,max(Xn, a) |a|+XnI Xn >|a|

and{|Xn > a|}Bn wecanwrite| |Emax(Xn, a) +EXnI Xn > a a +EYI +E|Y||a| |a| |a| | | | |

IfwetakeM >|a|thenE|max(Xn, a)|I(|max(Xn, a)|> M) = EXnI(Xn > M)EYI(Xn > M)

KP(Xn > M) +E|Y I(Y > K)Emax(Xn,0)

| | | K M +E|Y|I(|Y|> K)

Emax(Y,0)by(13.0.1) K

M +E|Y|I(|Y|> K).LettingM andK finishestheproof.Uniformintegrabilityplaysanimportantrolewhenstudyingtheconvergenceofmartingales.Thefollowingstrengtheningofthedominatedconvergencetheoremwillbeuseful.Lemma 30 Considerr.v.s(Xn)andX suchthatE|Xn| ) + 2E|Xn|I(|Xn|> K) + 2E|X|I(|X|> K)+ 2KP( > )+2supEXnI( Xn > K) + 2EX|I(X > K). |XnX|

n | | | | | | |Lettingn andthen0, K provestheresult.

1= 2.ByChebyshevsinequality,1

P(|XnX|> )E|XnX| 0

53


59/125

as n so Xn X in probability. To prove uniform integrability let us first show that for any > 0thereexists >0suchthat

P(A)< =E|X|IA 0onecanfindasequenceofeventsA(n)suchthat

1P(A(n))

2n and E|X|IA(n) >.Since n1P(A(n))0,take asaboveandtakeM >0 largeenoughsothatforalln1P(Xn > M) E|Xn| M)

E

|Xn

X

|+E

|X

|I(

|Xn

|> M)

E

|Xn

X

|+.

Forlargeenoughnn0,E|XnX| and,therefore,E|Xn|I(|Xn|> M)2.

WecanalsochooseM largeenoughsothatE|Xn|I(|Xn|> M)2fornn0 andthisfinishestheproof.

54


60/125

Section 14Optional stopping. Inequalities formartingales.Considerasequenceof-algebras(Bn)n0 suchthatBn Bn+1.Integervaluedr.v. {1,2, . . .} iscalledastopping timeif{n} Bn.LetusdenotebyB a-algebraoftheeventsB suchthat

{n} B Bn, n1.If(Xn)isadaptedto(Bn)thenrandomvariablessuchasX ork=1Xk aremeasurableonB.Forexample,

{X A}= { =n} {Xn A}= {n} \ {n1} {Xn A} B.n1 n1

Theorem 31 (Optionalstopping)Let(Xn,Bn)beamartingaleand1, 2


61/125

Thesecondconditionin(14.0.1)isviolatedsinceP(2 =n) = 2n andE|Sn|I(n2) = 2P(2 =n)+(2n+12)P(n+ 12) = 20.

Proof of Theorem 31.ConsiderasetA B1.Wehave,EX2IAI(1 2) = EX2I A {1 =n} I(n2)

n1()= EXnI A {1 =n} I(n2) =EX1IAI(1 2).

n1Toprove(*) itisenoughtoprovethatforAn =A {1 =n} Bn,

EX2IAnI(n2) =EXnIAnI(n2). (14.0.2)Wecanwrite

EXnIAnI(n2) = EXnIAnI(2 =n) +EXnIAnI(n+ 12)= EX2IAnI(2 =n) +EXnIAnI(n+ 12)

since{n+ 12}={2 n}c Bn, by martingaleproperty= EX2IAnI(2 =n) +EXn+1IAnI(n+ 12)

byinduction = EX2IAnI(2 =k) +EXmIAnI(m2)nk


62/125

OntheeventA,X1 M and,therefore,EXnIA =EX2IA EX1IA MEIA =MP(A).

Ontheotherhand,EXnIA EX+ andthisfinishestheproof.nAs a corollary we obtain the second Kolmogorovs inequality. If (Xi) are independent and EXi = 0 thenSn = 1inXi isamartingaleandSn2 isasubmartingale.Therefore, 1 1

P max =P max S2 ES2 = Var(Xk).1kn|Sk| M 1kn k M2 M2 n M2

1knExercises. 1.ShowthatforanyrandomvariableY,E|Yp|= ptp1P(|Y| t)dt.0 2.LetX, Y betwonon-negativerandomvariablessuchthatforeveryt >0,P(Y t)t1 XI(Y t)dP.Foranyp >1,fp = ( |f|pdP)1/p and1/p+ 1/q= 1,showthatYp qXp.3.Givenanon-negativesubmartingale(Xn,Bn),letX :=maxjnXj andX :=maxj1Xj.Provethatfornanyp >1and1/p+ 1/q= 1,Xp qsupnXnp.Hint:useexercise2andDoobsmaximalinequality.Doobs upcrossing inequality. Let (Xn,Bn)n1 be a submartingale. Given two real numbers a < b wewilldefineasequenceofstoppingtimes(n)whenXn iscrossingadownwardandbupwardasinfigure14.1.Namely,wedefine

x

a

b

x x x x

xx

Figure14.1:Stoppingtimesof levelcrossings.1 =min{n1, Xn a}, 2 =min{n > 2 :Xn b}

and,byinduction,fork22k1 =min{n > 2k2, Xn a}, 2k =min{n >2k1, Xn b}.

Define(a,b,n)=max{k:2k n}

- thenumberofupwardcrossingsof [a, b]beforetimen.Theorem 33 (Doobsupcrossing inequality)Wehave,

E(a,b,n) E(Xbn

a

a)+. (14.0.4)

Proof.Sincex(xa)+ is increasingconvexfunction,Zn = (Xna)+ isalsoasubmartingale.Clearly,X(a,b,n) =Z(0, ba, n)

whichmeansthatitisenoughtoprove(14.0.4)fornonnegativesubmartingales.Fromnowonwecanassumethat0Xn andwewouldliketoshowthat

E(0, b , n) EXn.b

57


63/125

Letusdefineasequenceofr.v.s1, 2k1 < j2k forsomek=j 0, otherwise,

i.e.j istheindicatoroftheeventthatattimej theprocessiscrossing [0, b]upward.DefineX0 = 0.Thenn n

b(0, b , n) j(Xj Xj1) = I(j =1)(Xj Xj1).j=1 j=1

TheeventBj1Bj1 c

{j = 1}= {2k1 < j2k}= 2k1 j1 2k j1 Bj1k k

i.e.thefactthatattimejwearecrossingupwardisdeterminedcompletelybythesequenceuptotimej

1.Then

n nbE(0, b , n) EE I(j =1)(Xj Xj1)Bj1 = EI(j =1)E(Xj Xj1|Bj1)

j=1 j=1n n

= EI(j =1)(E(Xj|Bj1)Xj1) E(Xj Xj1) =EXn,j=1 j=1

where in the last inequality we used that (Xj,Bj) is a submartingale, E(Xj|Bj1) Xj1, which impliesthat

I(j =1)(E(Xj|Bj1)Xj1)E(Xj|Bj1)Xj1.Thisfinishestheproof.

58


64/125

Section 15Convergence of martingales.Fundamental Walds identity.Wefinallygettoourmainresultabouttheconvergenceofmartingalesandsubmartingales.Theorem 34 Let (Xn,Bn)


65/125

ByDoobs inequality,EY(a,b,n) E(Yna)+ = E(X1a)+


66/125

Corollary 2 Martingale(Xn,Bn) isright-closable iff it is uniformly integrable.Toprovethis,applycase3aboveto(Xn)and(Xn)whicharebothsubmartingales.Theorem 35 (Levysconvergence)Let(,B,P)beaprobabilityspaceandX beareal-valuedrandomvariableon it.Givenasequence of-algebras

B1 . . . Bn . . . B+ BwhereB+ = 1n


67/125

as M . Therefore, the limit Y = limYn exists and it is an easy exercise to show that Sn/bn 0(Kroneckerslemma).

3.(Polyaurnscheme)LetusrecallthePolyaurnschemefromSection5.Letusconsiderasequence#(blueballsafterniterations)Yn = .

#(totalaftern iterations)Yn isamartingalebecausegiventhatatstepnthenumbersofblueandredballsarebandr,theexpectednumberofballsatstepn+1willbe

b b+c r b bE(Yn+1|Bn) = + = =Yn.

b+r b+r+c b+r b+r+c b+rSince Yn is bounded, by martingale convergence theorem, the limit Y = limnYn exists. What is thedistributionofY?Letusconsiderasequence

1 blueatstepiXi = 0 redatstepi

and letSn = inXi.Clearly,b+Snc Sn

Yn =b+r+nc n

as n and, therefore, Sn/n Y. The sequence (Xn) is exchangeable and by de Finettis theorem inSection5weshowedthat 1

P(Sn =k) = n xk(1x)nkd b ,r (x).k 0 c c

ForanyfunctionuC([0,1]),n

1Eu = u xk(1x)nkd , (x) = Bn(x)d , (x),

Sn

k n

b r

1

b r

n n k 0 c c 0 c c

k=0

whereBn(x) istheBernsteinpolynomialthatapproximatesu(x)uniformlyon [0,1].Therefore, 1 Eu S

nn 0 u(x)d cb

,rc (x)

whichmeansthat ,L S

nn cb r

c =L(Y),i.e.the limitY hasBetadistribution cb,rc .Optional stopping for martingales revisited. Let be a stopping time. We would like to determinewhenEX =EX1.Aswesawaboveinthecaseoftwostoppingtimes,somekindofintegrabilityassumptionsare

necessary.

In

this

simpler

case,

the

necessary

conditions

are

clear

from

the

proof.

Lemma 31 Wehave

EX1 = lim EXI(n) lim EXnI(n) = 0.n n

Proof.Wecanwrite,EXI(n) = EXkI( =k) = EXkI(k)EXkI(k+1)

1kn 1kn since{k+ 1}={k}c Bk = EXkI(k)EXk+1I(k+1)

1kn= EX1EXn+1I(n+1).

62


68/125

Example.Given0< p


69/125

bysymmetry.Therefore,1 1

Ech() = and Ee =ch(z) ch(ch1(e

)z)

byachangeofvariablese = 1/ch.Formoregeneralstoppingtimesthecondition(15.0.1)mightnotbeeasytocheck.WewillnowshowanotherapproachthatishelpfultoverifyafundamentalWaldsidentity.IfPisthedistributionofXis,letP beadistributionwithRadon-Nikodymderivativew.r.t.Pgivenby

dP ex= .

dP ()This is,indeed,adensitysince

ex ()dP= = 1.

R (x) ()Wewillthinkof(Xn)asdefinedontheproductspace(R,B,P).Forexample,aset

{ =n} (X1, . . . , X n) BnisaBorelsetonRn.Wecanwrite,

S Sn (x1+ +xn) e e e E

()I(


70/125

Section 16Convergence on metric spaces.Portmanteau Theorem. LipschitzFunctions.

Let (S, d) be a metric space andB - a Borel -algebra generated by open sets. Let us recall that Pn PweaklyonB if

f dPn f dPforallfCb(S)- real-valuedboundedcontinuousfunctionsonS.

ForasetAS,wedenotebyAtheclosureofA, intA- interiorofAandA=A\intA- boundaryofA.Aiscalledacontinuity setofPifP(A) = 0.Theorem 36 (Portmanteau theorem) Thefollowingareequivalent.

1. Pn Pweakly.2. Forany openset US, lim infnPn(U)P(U).3. Forany closedsetF S, limsup Pn(F)P(F).n4. Forany continuitysetA ofP, limnPn(A) =P(A).

Proof.1= 2.LetU beanopensetandF =Uc.ConsiderasequenceoffunctionsinCb(S)

fm(s)=min(1, md(s, F))suchthatfm(s) IU(s).(This isnotnecessarilytrue ifU isnotopen.)SincePn P,

Pn(U) fmdPn fmdP asn and liminfPn(U) fmdP.n

Lettingm ,bymonotoneconvergencetheorem.liminfPn(U) IUdP=P(U).

n3.Bytakingcomplements.

65

2


71/125

2,3=4.SinceintAisopenandAisclosedandintAA,by2and3,P(intA)liminfPn(intA)limsupPn(A)P(A).

n nIfP(A)=0thenP(A)=P(intA) =P(A)and,therefore, limPn(A) =P(A).

4=1. Consider f Cb(S) and let Fy ={s S : f(s) =y} be a level set of f. There exist at mostcountablymanyysuchthatP(Fy)>0.Therefore,forany >0wecanfindasequencea1 . . .aN suchthat

max(ak+1ak), P(Fak)=0 forallkandtherangeoff isinsidetheinterval(a1,aN).Let

Bk ={sS :ak f(s)< ak+1} and f(s) = akI(sBk).Sincef iscontinuous,Bk Fak Fak+1 andP(Bk) = 0.By4,

fdPn = akPn(Bk) akP(Bk) = fdP.k k

Since,byconstruction,|f(s)f(s)| ,letting0provesthat f dPn f dP.Lipschitz functions.Forafunctionf :S R,letusdefineaLipschitzsemi-normby

||f||L =sup|f(x)f(y)|.x=y d(x, y)

Clearly, ||f||L =0ifff isconstantso||f||L isnotanorm.LetusdefineaboundedLipschitznormby||f||BL =||f||L +||f||,

where ||f|| =supsS|f(s)|.LetBL(S, d) = f :SR : ||f||BL


72/125

Proof.Proofof 1.It isenoughtoconsiderk= 2.Forspecificity,take=.Givenx, yS,supposethatf1f2(x)f1f2(y) =f1(y).

Thenf1(x)f1(y), iff1(x)f2(x)|f1f2(y)f1f2(x)|=f1f2(x)f1f2(y) f2(x)f2(y), otherwise

||f1||L||f2||Ld(x, y).Thisfinishestheproofof1.

Proofof 2.Firstofall,obviously,max||f1 fk||

1ik||fi||.Therefore,using1,

i i i ||fi||BL.||f1 fk||BL max||fi|| +max||fi||L 2max

Theorem 37 (Extension theorem) Given a set AS and a bounded Lipschitzfunction f BL(A, d) onA, thereexistsanextensionhBL(S, d) such that

f =h on A and ||h||BL =||f||BL.Proof. Let us first find an extension such that ||h||L = ||f||L. We will start by extending f to one pointxS\A.Thevaluey=h(x)mustsatisfy

|yf(s)| fLd(x, s) forall sAor,equivalently,

inf(f(s) +||

f

||Ld(x, s))

y

sup(f(s)

||

f

||Ld(x, s)).

sA sASuchy existsiffforalls1, s2 A,

f(s1) +||f||Ld(x, s1)f(s2)||f||Ld(x, s2).This inequalityissatisfiedbecausebytriangle inequality

f(s2)f(s1)||f||Ld(s1, s2)||f||L(d(s1, x) +d(s2, x)).ItremainstoapplyZornslemmatoshowthatf canbeextendedtotheentireS.Defineorderbyinclusion:

f1 f2 iff1 isdefinedonA1,f2 - onA2, A1 A2, f1 =f2 onA1 andf1L =f2L.

Foranychain{f}, f = f f.ByZornslemmathereexistsamaximalelementh.ItisdefinedontheentireS because,otherwise,wecouldextendtoonemorepoint.ToextendpreservingBLnormtake

h = (h||f||)(||f||).Bypart1ofprevious lemma, itiseasytoseethat ||h||BL =||f||BL.

Stone-Weierstrass Theorem.A set A S is totally bounded if for any > 0 there exists a finite -cover of A, i.e. a set of points

a1, . . . , aN suchthat A B(ai, ),

iNwhereB(a, ) ={yS :d(a, y)} isaballofradiuscenteredata.Letusrecallthefollowingtheoremfromanalysis.

67


73/125

Theorem 38 (Arzela-Ascoli) Let (S, d) be a compact metric space and let (C(S), d ) be the space of continuousreal-valuedfunctionsonS withuniform convergencemetric

d (f, g)=sup . xS|

f(x)

g(x)|

AsubsetF C(S) is totallybounded ind metric iffF is equicontinuousand uniformly bounded.Remark. Equicontinuous means that for any > there exists > 0 such that if d(x, y) then for allf F, |f(x)f(y)| .Theorem 39 (Stone-Weierstrass)Let(S, d)beacompactmetricspaceandF C(S) issuch that

1. F isalgebra, i.e.forallf, g F, cR, we havecf+g F, f g F.2. F separatespoints, i.e. if x= yS then thereexistsf F such thatf(x) = f(y).3.

F containsconstants.

ThenF is dense inC(S).Corollary 3 If(S, d) isacompactspace then BL(S, d) isdense inC(S).Proof. ForF =BL(S, d) in the Stone-Weierstrass theorem, 3 is obvious, 1 follows from Lemma 32 and 2followsfromtheextensionTheorem37,sinceafunctiondefinedontwopointsx= y suchthatf(x) = f(y)canbeextendedtotheentireS.Proof of Theorem 39. Consider bounded f F, i.e. |f(x)| M. A function x |x| defined on theinterval [M, M] can be uniformly approximated by polynomials of x by the Weierstrass theorem on thereallineor,forexample,usingBernsteinspolynomials.Therefore,|f(x)|canbeuniformlyapproximatedbypolynomialsoff(x),andbyproperties1and3,byfunctionsin

F.Therefore,if

F istheclosureof

F ind

normthenforanyf F itsabsolutevalue |f| F.Therefore,foranyf, g F wehavemin(f, g)= 1

2(f+g)1

2|fg| F, max(f, g)=1

2(f+g) +1

2|fg| F. (16.0.1)Givenanypointsx= yandc, dRonecanalwaysfindf F suchthatf(x) =candf(y) =d.Indeed,byproperty2wecanfindg F suchthatg(x) = g(y)and,asaresult,asystemofequations

ag(x) +b=c, ag(y) +b=dhasasolutiona,b.Thenthefunctionf =ag+bsatisfiestheaboveandit isinF by1.

TakehC(S)andfixx.Foranyy letfy F besuchthatfy(x) =h(x), fy(y) =h(y).

Bycontinuityoffy,foranyyS thereexistsanopenneighborhoodUy ofy suchthatfy(s)h(s)forsUy.

Since (Uy) is an open cover of the compact S, there exists a finite subcover Uy1, . . . , U yN. Let us define afunction

fx(s)=max(fy1(s), . . . , f yN(s)) F by(16.0.1).Byconstruction, ithasthefollowingproperties:

fx(x) =h(x), fx(s)h(s) forall sS.

68


74/125

Again,bycontinuityoffx(s)thereexistsanopenneighborhoodUx ofxsuchthatfx(s)h(s) + for sUx.

TakeafinitesubcoverUx1, . . . , U xM anddefineh(s)=minfx1(s), . . . , f xM(s) F by(16.0.1).

Byconstruction,h(s)h(s) +andh(s)h(s)forallsS whichmeansthatd (h, h).Sinceh F,thisprovesthatF isdense inC(S).

Corollary 4 If(S, d) isacompactspace then C(S) isseparable ind.Remark.Recallthatthis fact wasused in the proof oftheSelectionTheorem, which was proved for

generalmetricspaces.Proof.Bytheabovetheorem,BL(S, d)isdenseinC(S).Foranyintegern

1,theset

{f :

||f||

BL

n}isuniformlyboundedandequicontinuous.BytheArzela-Ascolitheorem,itistotallyboundedand,therefore,

separablewhichcanbeseenbytakingfinite1/m-coversforallm1.Theunion{||f||BL n}=BL(S, d)

isthereforeseparableinC(S)which is,asaresult,alsoseparable.

69


75/125

Section 17Metrics for convergence of laws.Empirical measures.Levy-Prohorov metric.Considerametricspace(S, d).ForasetAS letusdenoteby

A ={yS:d(x, y)< forsomexA}its-neighborhood.LetBbeaBorel-algebraonS.

Definition.IfP,QareprobabilitydistributionsonB then(P,Q)=inf{ >0 :P(A)Q(A) +forallAB}

iscalledtheLevy-Prohorov distancebetweenPandQ.Lemma 34 is ametricon theset ofprobability lawson

B.

Proof. 1. First, let us show that (Q,P) = (P,Q). Suppose that (P,Q) > . Then there exists a set AsuchthatP(A)>Q(A) +.Takingcomplementsgives

Q(Ac)>P(Ac) +P(Ac) +,wherethe lastinequalityfollowsfromthefactthatAc Ac :

aAc =d(a, Ac)< = d(a, b)< forsomebAcsinceb /A, d(b, A)

= d(a, A)>0 =a /A=aAc.Therefore, for a set B = Ac, Q(B) > P(B) +. This means that (Q,P) > and, therefore, (Q,P)(P,Q).Bysymmetry,(Q,P)(P,Q)and(Q,P) =(P,Q).

2.Next, letusshowthatif(P,Q)=0thenP=Q.ForanysetF andanyn1,1 1

P(F)Q(Fn) + .n

IfF isclosedthenF1 F asn andbycontinuityofmeasuren1nP(F)Q F =Q(F).

Similarly,P(F)Q(F)and,therefore,P(F) =Q(F).70


76/125

3.Finally,letusprovethetriangleinequality(P,R)(P,Q) +(Q,R).

If(P,Q)< xand(Q,R)< y thenforanysetA,P(A)Q(Ax) +xR (Ax)y +y+xR Ax+y +x+y,

whichmeansthat(P,R)x+y.Bounded Lipschitz metric. Given probability distributions P,Q on the metric space (S, d) we define aboundedLipschitzdistancebetweenthemby

(P,Q)=sup f dP f dQ:||f||BL 1 .Lemma 35 isametric on the setofprobability lawsonB.Proof. (P,Q) = (Q,P) and the triangle inequality are obvious. It remains to prove that (P,Q) = 0implies P = Q. Given a closed set F, the sequence of functions fm(x) = md(x, F)1 converges fm IU, whereU =Fc.Obviously, ||fm||BL m+1and,therefore, fmdP= fmdQ.Lettingm provesthatP(U) =Q(U).ThelawPon(S, d) istightifforany >0thereexistsacompactKS suchthatP(S\K).Theorem 40 (Ulam)If(S, d) isseparablethenforany lawPonB thereexistsaclosedtotallyboundedsetKS such that P(S\K). If (S, d) is complete and separable then K is compact and, therefore, everylaw is tight.

1Proof.Consider asequence{x1, x2, . . .}that is dense in S. For any m1, S = B xi, ,where Bi=1 m

denotesaclosedball,andbycontinuityofmeasure,for largeenoughn(m),n(m) 1

P S\ B xi,m 2m.

i=1Ifwetake

n(m) 1K= B xi,

mm1 i=1

then P(S\K)

2m =.m1

K isclosedandtotallyboundedbyconstruction.IfS iscomplete,K iscompact.Theorem 41 Suppose thateither (S, d) isseparableorP is tight.Then thefollowingareequivalent.

1. Pn P.2. ForallfBL(S, d), f dPn f dP.3. (Pn,P) 0.4. (Pn,P) 0.

71


77/125

Proof.1= 2.Obvious.3= 4.Infact,wewillprovethat

(Pn

,P)

2 (Pn

,P). (17.0.1)GivenaBorelsetAS,considerafunction 1

f(x) = 0 1

d(x, A) suchthat IA fIA.Obviously, ||f||BL 1 +1 andwecanwrite

Pn(A) f dPn = f dP+ fdPn f dP P(A)+(1+1)sup f dPn f dP:||f||BL 1= P(A)+(1+1)(Pn,P)P(A) +,

where =max(,(1+1)(Pn,P)). This implies that (Pn,P). Since is arbitrary we can minimize=()over.Ifwetake= then=max(, +) =+ and

1 =2 ; 1 =12 .4= 1.Supposethat(Pn,P) 0whichmeansthatthereexistsasequencen 0suchthat

Pn(A)P(An) +n forallmeasurableAS.IfAisclosed,then n1An =Aand,bycontinuityofmeasure,

lim supPn(A)limsup P(An) +n =P(A).n n

Bytheportmanteautheorem,Pn P.2=3. If P is tight, let K be a compact such that P(S\K) . If (S, d) is separable, by Ulams

theorem,letK beaclosedtotallyboundedsetsuchthatP(S\K).Ifweconsiderafunction 1 1f(x) = 0 1 d(x, K) with ||f||BL 1 +

then

Pn(K) f dPn fdPP(K)1,

whichimpliesthatfornlargeenough,Pn(K)12.ThismeansthatallPn areessentiallyconcentratedonK.Let

B= f :||f||BL(S,d) 1 , BK = f :fB C(K),K

wherefK denotestherestrictionofftoK.IfKiscompactthen,bytheArzela-Ascolitheorem,BK istotallyboundedwithrespecttod.IfK istotallyboundedthenwecanisometricallyidentifyfunctionsinBK withtheir uniqueextensions to thecompletionK of K and, by theArzela-Ascolitheorem forthe compactK,BK isagaintotallyboundedwithrespecttod.Inanycase,given >0,wecanfindf1, . . . , f k B suchthatforallfB

sup f(x)fj(x) forsomejk.xK| |

This uniform approximation can also be extended to K. Namely, for any x K take y K such thatd(x, y).Then

|f(x)fj(x)| |f(x)f(y)|+|f(y)fj(y)|+|fj(y)fj(x)| ||f||Ld(x, y) ++||fj||Ld(x, y)3.

72


78/125

Therefore,foranyfB,+||f|| Pn(Kc) +P(Kc)f dPn f dP f dPn f dP

K

K

+ 2+f dPn f dPK

K

+ 3+ 3+ 2+fjdPn fjdPK

K

+ 3+ 3+ 3+ 2+fjdPnfjdPn fjdP

fjdP+12.max

1jkFinally,

+12(Pn,P)=sup f dPn f dP fjdPn fjdPmax1jkf

B

and,usingassumption2,limsupn(Pn,P)12.Letting0finishestheproof.Convergence of empirical measures. Let (,P) be a probability space and X1, X2, . . . : S be ani.i.d.sequenceofrandomvariableswithvalues inametricspace(S, d).LetbethelawofXi on

S.Letus

definetherandomempiricalmeasuresn ontheBorel-algebraB onS byn

ni=1

1(A)() = I(Xi()A), A B.n

Bythestrong lawoflargenumbers,foranyfCb(S),n

n i=11

f(Xi) Ef(X1) =f dn = f da.s.

However,thesetofmeasurezerowherethisconvergenceisviolateddependsonf anditisnotobviousthattheconvergenceholdsforallfCb(S)withprobabilityone.Theorem 42 (Varadarajan) Let (S, d) be a separable metric space. Then n converges to weakly almostsurely,

P :n( )() weakly = 1. Proof.Since(S, d)isseparable,byTheorem2.8.2inR.A.P.,thereexistsametriceonS suchthat(S, e)istotallyboundedandeandddefinethesametopology, i.e.e(sn, s) 0 ifandonly ifd(sn, s) 0.This,of course, means that Cb(S, d) =Cb(S, e) and weak convergence of measures does not change. If (T, e) is thecompletionof(S, e)then(T, e)iscompact.BytheArzela-Ascolitheorem,BL(T, e)isseparablewithrespecttothed normand,therefore,BL(S, e)isalsoseparable.Let(fm)beadensesubsetofBL(S, e).Then,bythestrong lawof largenumber,

n

fmdn = fm1 (Xi) Efm(X1) = fmda.s.n

i=1Therefore,onthesetofprobabilityone,thesamesetofprobabilityone, fmf dn

dn fm dforallm1.Since(fm)isdenseinBL(S, e),onf dforallfBL(S, e).Since(S, e)isseparable,theprevious

theorem impliesthatn weakly.

73


79/125

Section 18Convergence and uniform tightness.Inthissection,wewillmakeseveralconnectionsbetweenconvergenceofmeasuresanduniformtightnessongeneralmetricspaces,whicharesimilartotheresults intheEuclideansetting.First,wewillshowthat, insomesense,uniformtightnessisnecessaryforconvergenceoflaws.Theorem 43 IfPn P0 onS andeachPn is tightforn0, then(Pn)n0 isuniformly tight.Proof. Since Pn P0 and P0 is tight, by Theorem 41, the Levy-Prohorov metric (Pn,P0) 0. Given >0, letustakeacompactK suchthatP0(K)>1.Bydefinitionof,

n1 0 :Pn(K)>1 0.

By regularity of measure Pn, any measurable set A can be approximated by its closed subset F. Since Pnistight,wecanchooseacompactofmeasureclosetoone,and intersecting itwith theclosedsubsetF,wecanapproximateanysetAbyitscompactsubset.Therefore,thereexistsacompactKn K2a(n) suchthatPn(Kn)>1.Let

L=K (n1Kn).Then Pn(L) Pn(Kn) > 1. It remains to show that L is compact. Consider a sequence (xn) on L.There are two possibilities. First, if there exists an infinite subsequence (xn(k)) that belongs to one of thecompactsKj thenithasaconvergingsubsubsequenceinKj andasaresultinL.Ifnot,thenthereexistsasubsequence(xn(k))suchthatxn(k) Km(k) andm(k) ask .Since

Km(k) K2a(m(k))thereexistsyk K suchthat

d(xn(k), yk)2a(m(k)).Since K iscompact,the sequenceyk K has a convergingsubsequence yk(r) yK which impliesthatd(xn(k(r)), y)0, i.e.xn(k(r)) yL.Therefore,L iscompact.WealreadyknowfromtheSelectionTheoreminSection8thatanyuniformlytightsequenceoflawsonanymetricspacehasaconvergingsubsequence.Underadditionalassumptionson(S, d)wecancomplementtheSelectionTheoremandmakesomeconnectionstothemetricsdefinedintheprevioussection.Theorem 44 Let (S, d) be a complete separable metric space and A be a subset of probability laws on S.Then thefollowingare equivalent.

74


80/125

1. A isuniformly tight.2. Forany sequencePn A there exists a converging subsequencePn(k) P whereP isa lawonS.3. Ahasthecompactclosureonthespaceofprobability lawsequippedwiththeLevy-ProhorovorboundedLipschitzmetricsor.4. A is totallyboundedwithrespect to or.

Remark.Implications1= 2= 3= 4holdwithoutcompletenessassumptionandtheonlyimplication wherecompletenesswillbeusedis4= 1.

Proof. 1=2. Any sequence Pn A is uniformly tight and, by selection theorem, there exists aconvergingsubsequence.

2= 3. Since (S, d) is separable, by Theorem 41, Pn P if and only if (Pn,P) or (Pn,P) 0. Everysequence intheclosureAcanbeapproximatedbyasequence inA.Thatsequencehasaconvergingsubsequencethat,obviously,convergestoanelementinAwhichmeansthattheclosureofAiscompact.

3= 4.Compactsetsaretotallyboundedand,therefore,iftheclosureAiscompact,thesetAistotallybounded.

4=1. Since 2, we will only deal with . For any > 0, there exists a finite subset B AsuchthatAB.Since(S, d)iscompleteandseparable,byUlamstheorem,foreachPB thereexistsacompactKP suchthatP(KP)>1.Therefore,

KB = KP isacompactandP(KB)>1forallPB.PB

Forany >0,letF beafinitesetsuchthatKB F (herewewilldenotebyF theclosed-neighborhoodofF).SinceAB,foranyQAthereexistsPB suchthat(Q,P)< and,therefore,

1P(KB)P(F)Q(F2) +.Thus,1

2

Q(F2)forallQ

A.Given >0,takem =/2m+1 andfindFm asabove,i.e.

F/2m .1

2m Q m /2m /2mThen Q m1Fm 1 m1 = 1. Finally, L= m1Fm is compact because it is closed2mandtotallyboundedbyconstruction,andS iscomplete.

Corollary 5 (Prohorov) The set of laws on a complete separable metric space is complete with respect tometricsor.Proof.Ifasequenceof laws isCauchyw.r.t.or then it istotallybounded andbyprevioustheorem ithasaconvergingsubsequence.Obviously,Cauchysequencewillconvergetothesamelimit.Finally,letusstateasaresultthe ideawhichappearedinLemma19 inSection9.Lemma 36 Suppose that (Pn) is uniformly tight on a metric space (S, d).Suppose that all converging subsequences (Pn(k)) converge to the same limit, i.e. if Pn(k) P0 then P0 is independent of (n(k)). ThenPn P0.Proof.Anysubsequence(Pn(k)) is uniformlytightand, by theselectiontheorem, it has a convergingsub-subsequence(Pn(k(r)))whichhastoconvergetoP0.Lemma13 inSection8finishestheproof.This willbeveryuseful whenprovingconvergenceof lawsonmetricspaces,suchasC([0,1]), for example.If we can prove that (Pn) is uniformly tight and, assuming that a subsequence converges, can identify theunique limit,thenthesequencePn mustconvergetothesame limit.

75


81/125

Section 19Strassens Theorem. Relationshipsbetween metrics.Metric for convergence in probability.Let(,B,P)beaprobabilityspace,(S, d)- ametricspaceandX, Y : S - randomvariableswithvaluesinS.Thequantity

(X, Y)=inf{0 :P(d(X, Y)> )}is called the Ky Fan metric on the setL0(, S) of classes of equivalences of such random variables, wheretwor.v.sareequivalent iftheyareequala.s.Ifwetakeasequence

k =(X, Y)thenP(d(X, Y)> k)k andsince

I(d(X, Y)> k) I(d(X, Y)> ),

bymonotoneconvergencetheorem,P(d(X, Y)> ).Thus,the infimum inthe definition of(X, Y) isattained.Lemma 37 isametric onL0(, S)whichmetrizesconvergence inprobability.Proof.Firstofall,clearly,(X, Y) = 0 iffX=Y almostsurely.Toprovethetriangleinequality,

P(d(X, Z)> (X, Y) +(Y, Z)) P(d(X, Y)> (X, Y))+P(d(Y, Z)> (Y, Z)) (Y, Z) +(Y, Z)

sothat(X, Z)(X, Y) +(Y, Z).Thisprovesthat isametric.Next, ifn =(Xn, X)0thenforany >0andlargeenoughnsuchthatn )P(d(Xn, X)> n)n 0.

mit notes on theory of probability

Documents