limit theorems for general empirical...

Faculty of ScienceDepartment of Mathematics

Limit Theorems forGeneral Empirical Processes

Master thesis submitted in partial fulfillment of therequirements for the degree of Master in Mathematics

Gauthier Dierickx

Promotor: Prof. Dr. Uwe Einmahl

MAY 2012

Faculteit WetenschappenDepartement Wiskunde

Limiet Stellingen voorAlgemene Empirische Processen

Proefschrift ingediend met het oog op het behalenvan de graad van Master in de Wiskunde

Gauthier Dierickx

Promotor: Prof. Dr. Uwe Einmahl

MEI 2012

Thanks.

I would like to thank first of all my parents for their support during my studies,and then my fellow students. I’m also very grateful to all the professors who in-troduced me in the many exciting disciplines of modern mathematics and physicswith a ever lasting enthusiasm.

I’d like to express my deepest gratitude to Prof. Dr. U. Einmahl to accept mefor a masterthesis under his direction, for his unlimited patience and for the stimu-lating talks we had on different subjects. I’m also thankful for his invitation for thebiannual congress Deutsche Stochastiktage 2012 held at Mainz, which motivatedme even more to explore the large field of probability theory and mathematicalstatistics.

i

Abstract.

This master thesis is about uniform limit theorems and its main goal is to presenta uniform strong law and a uniform weak convergence result for the empiricalprocess indexed by general classes of functions.

The topic of uniform versions of the classical limit theorems in probabilitystarted in the 1930’s, when Glivenko and then Cantelli proved that the empiricaldistribution function converges uniformly with probability one to the unknowndistribution function of the underlying random variables. This can be consideredas a uniform version of the strong law of large numbers. First versions of thecentral limit theorem for the empirical process were obtained by Doob, Donsker,Kolmogorov and Skorokhod around 1950. In the 1970’s a general theory wasstarted by Vapnik and Cervonenkis, who introduced a new method suitable to treatempirical processes indexed by general function classes. Dudley also made majorcontributions to this theory. Later Gine and Zinn obtained further refinements.

More recently, a weak convergence theory for nonmeasurable mappings hasbeen developed, from which one can infer the limit results for empirical processes.The purpose of this thesis is to given an account of this approach based on thebooks by Dudley and van der Vaart and Wellner.

This version is not the definitive one, but aslightly adapted version, in which the main ty-pos of the first version are corrected and someshortcuts of proofs are implemented, due to thecomments of the members of the jury duringthe presentation.

iii

Contents

Thanks. i

Abstract. iii

Introduction. vii

1 Empirical measures and processes. 11.1 Definitions and problems. . . . . . . . . . . . . . . . . . . . . . . 11.2 The classical cases. . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 The Glivenko–Cantelli theorem. . . . . . . . . . . . . . . 21.2.2 Donsker’s theorem. . . . . . . . . . . . . . . . . . . . . . 2

1.3 The problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Weak Convergence 52.1 Outer probability and expectation . . . . . . . . . . . . . . . . . 52.2 Perfect functions . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Convergence: almost uniformly, outer probability . . . . . . . . . 112.4 Convergence in law . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 The (extended) Portmanteau Theorem . . . . . . . . . . . . . . . 152.6 Asymptotic tightness and measurability . . . . . . . . . . . . . . 192.7 Spaces of bounded functions . . . . . . . . . . . . . . . . . . . . 23

3 Vapnik–Cervonenkis classes. 333.1 Introduction: definitions, fundamental lemma. . . . . . . . . . . . 333.2 Uniform bounds for packing number of VC class. . . . . . . . . . 39

4 On measurability. 474.1 Admissibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Suslin image admissibility. . . . . . . . . . . . . . . . . . . . . . 58

v

vi CONTENTS

5 Uniform limit theorems. 675.1 Entropy and Covering Numbers. . . . . . . . . . . . . . . . . . . 675.2 A Symmetrization Lemma. . . . . . . . . . . . . . . . . . . . . . 685.3 Martingale property, Glivenko–Cantelli theorem. . . . . . . . . . 765.4 Pollard’s Central Limit Theorm. . . . . . . . . . . . . . . . . . . 86

A Topology and Measure Theory. 105A.1 Metric and topological spaces. . . . . . . . . . . . . . . . . . . . 105

A.1.1 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . 105A.1.2 Some important theorems. . . . . . . . . . . . . . . . . . 115

A.2 Measure Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . 122A.2.1 Rings, algebra’s σ–algebra’s and (outer) measures. . . . . 122A.2.2 (Sub)Martingales and reversed (sub)martingales. . . . . . 138

B Gaussian Processes. 141

C More about Suslin / Analytic Sets. 145C.1 The Borel Isomorphism Theorem. . . . . . . . . . . . . . . . . . 145C.2 Definitions and properties of Analytical Sets. . . . . . . . . . . . 146C.3 Universal measurability of Analytic Sets. . . . . . . . . . . . . . 148C.4 A selection theorem for Analytic Sets. . . . . . . . . . . . . . . . 150

D Entropy and useful inequalities. 153D.1 Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153D.2 Exponential inequalities. . . . . . . . . . . . . . . . . . . . . . . 157

Introduction.

This master thesis deals with uniform limit theorems for empirical measures andprocesses, i.e. limit theorems for normalized finite sums like

n−1

n∑i=1

δ(Xi)(·) := n−1

n∑i=1

I·(Xi) and

n−1/2( n∑i=1

δ(Xi)(·)− P (·))

The name empirical refers to the fact the measure and process are based uponthe data, namely the Xi, who are supposed to be i.i.d. and to come from a givendistribution F .

In the classical case, for a fixed subset A of S (some metric space, e.g. R), bythe law of large numbers the empirical measure converges a.s. to P (A), where Pdenotes the probability measure associated with the distribution function F . TheCLT, for the empirical process, can also be applied and tells us that the empiricalprocess, for fixed A, which is then just a Binomial random variable, converges indistribution to some normal random variable.

In the case S = R, andAt :=]−∞, t], for t ∈ R, a theorem of Glivenko–Cantelli,assures us that we have got a.s. converges of the empirical process over all the setsAt, t ∈ R. For the empirical process some similar theorem proved by Donsker isavailable. It says that we have convergence to the Brownian Bridge Y in the space(D[0, 1], d∞), the space of cadlag functions equipped with the uniform topology;

‖√n(Fn − F )− Yn‖∞ → 0

where Yn are Brownian Bridges and F := U [0, 1] the uniform distribution and

√n(Fn(t)− t) =

1√n

( n∑i=1

I[0,t](Xi)− t).

Or in other words, the empirical process converges uniformly over all closed in-tervals, [0, t], t ∈ [0, 1], in distribution to a Brownian Bridge. The main goal is

vii

viii CONTENTS

to present similar results, but for larger classes of functions, instead of classes ofsets and in general spaces.

In chapter one we will briefly give the general framework in which we will beworking, and go a bit further into details for the classical cases.

In order to achieve our goal, many problems have to be tackled, the class overwhich the supremum is taken could possible be too big to still have measurabilityof the empirical measure and process. So we will need a general theory of con-vergence a.s., in probability and in distribution for non measurable functions. Thedevelopment of this theory will be our main concern in the 2nd chapter. Defini-tions of almost uniform convergence, convergence in outer probability and weakconvergence for non measurable functions will be given. It will be seen that wecan recover pretty much everything of the measurable case for the non measurablefunctions, a general, extended portmanteau theorem will be proven together withconditions characterizing non measurable weak convergence.

The third chapter, a rather short one, will be about special classes of functions.One way of measuring the size of classes of functions (and sets) will be presented,and then Vapnik–Cervonenkis classes will be defined and two important propertiesabout those classes will be proven. Some examples of VC classes are given at theend.

Because in general measurability of the empirical measure and process is notsatisfied, we will dedicated a chapter, the 4th one, to our investigations towards amethod, which will imply measurability of the empirical measure and process.

The main chapter is where everything from the precedent chapters is com-bined to prove uniform limit theorems. The main theorem, a uniform central limittheorem for the empirical process due to David Pollard, and extended by RichardMansfield Dudley is stated and proved. Thereafter two corollaries about weakconvergences for special types of VC classes are shown.

Because empirical process theory makes use of topology, a large appendix ontopology has been written, and some definitions and theorems of measure theoryare recalled and sometimes we have added a proof to the theorems. A third part ofthe appendix is considered with some properties of analytic sets, which play a cru-cial role in the development of conditions implying measurability of the empiricalmeasure and the empirical process.

Chapter 1

Empirical measures and processes.

1.1 Definitions and problems.In the following part, we will mainly be involved with ”empirical measures” and”empirical processes”. In order to define such objects, we need a sequence ofi.i.d. random variables. One way to do so, is to take a countable product of copiesof a probability space (S,B, P ) (called the sample space) (S∞,B∞, P∞) withproduct σ–algebra and product measure, and let the coordinates onto (S,B, P ) bethe random variables. Defining theX1, X2, · · · as such will be called the standardmodel.Now we are able to define empirical measures :

Pn : S∞ × B → [0, 1] : (s∞, A) 7→ n−1

n∑i=1

δXi(s∞)(A)

where δx(A) := IA(x). Defined this way Pn is a probability measure on B, andactually on 2S := P(S). And the empirical process is defined as

νn : S∞ × B → [−√n,√n] : (s∞, A) 7→ n1/2(Pn(s∞, A)− P (A))

If we fix a Borel set B, then νn(B), for each positive integer n, is a randomvariable on (S∞,B∞, P∞). P (B) is fixed, so constant, and Pn(ω,B) is thena normalized Binomial variable, as a sum of i.i.d. Bernoulli( P (A) ) randomvariables, each of which records whether Xi lies in B or not. Therefor letting ntend to infinity and applying the classical one dimensional Central Limit TheoremA.2.21, we know that νn(B) converges in distribution to a normal random variablewith mean zero, variance P (B)(1− P (B)).

For purposes later on, we’ll need a certain stochastic process, commonly calledBrownian Bridge . This will appear as a limit distribution of empirical processes,

1

2 CHAPTER 1. EMPIRICAL MEASURES AND PROCESSES.

and is an example of a Gaussian Process. So we will study general Gaussian Pro-cesses indexed by some classes of functions, and we will see that the BrownianBridge is an example of such class indexed Gaussian Process.

HereGP : (S∞,B∞, P∞)→ `∞(F),

where GP (f), f ∈ F , is a one dimensional Gaussian random variable with meanzero and variance Var(GP ) := Var(f).

The covariance w.r.t. P induces also a pseudometric on L2(P ) as follows:ρP (f, g) := E[(GP (f) − GP (g))2]1/2. On L2

0 := f : f ∈ L2,∫f dP = 0

this pseudometric coincides with the usual one for

ρP (f, g) = E[(GP (f − g))2]1/2 = Var(f − g)1/2 = E[(f − g))2]1/2.

1.2 The classical cases.

1.2.1 The Glivenko–Cantelli theorem.Let At :=]−∞, t], closed half lines.

Theorem 1.2.1 (Glivenko–Cantelli). Let X1, X2, · · · be i.i.d. random variableswith common distribution function F . Then supx |Fn(x) − F (x)| → 0 a.s. asn→∞, where

Fn(x, ω) :=1

n

n∑i=1

I]−∞,x](Xi(ω)).

I.e. ‖Pn − P‖At:t∈R → 0 a.s.

Proof. We refer to [Bill1] chapter 4 theorem 20.6 on page 286 or [Dud1] theorem11.4.2 on page 400 or even [Pol] pages 13–16 for a proof.

1.2.2 Donsker’s theorem.Theorem 1.2.2 (Donsker). Let Xi be i.i.d. random variables, uniformly [0, 1]distributed. Let αn be the nth empirical process, then αn converges weakly inD[0, 1], with the Skorokhod topology, to a certain Gaussian Process, known asthe Brownian Bridge.

1.3. THE PROBLEMS. 3

Recall that by definition, we have weak convergence iff for everyG; real–valued,bounded and continuous, w.r.t. the Skorokhod topology, function on D[0, 1]:∫

G dµn →∫G dW ,

where µn denotes the probability measure associated with the empirical processand W the one of the Brownian Bridge.

Proof. We refer to [Bill2] chapter 3 theorem 14.3 on page 149 for a proof.

In Billingsley’s classical book [Bill2], the Skorokhod topology is defined atthe beginning of chapter 3. This is done, since actually D with the uniform metricis not separable; it actually contains an uncountable discrete set, which causesmeasurability problems. In the original proof of Donsker there was thus a mistake,due to the nonmeasurability: one cannot associate a probability measure with anonmeasurable function. But this problem was taken care of by Skorokhod, whointroduced a new metric on D, which turns D into a Polish space, i.e. a separablecomplete metric space.

As seen in [Bill2] chapter 3 section 15 changing the metric of D accountsin fact to reduce the Borel σ–algebra of D for the uniform topology to the ballσ–algebra of that same uniform topology. The ball σ–algebra is the σ–generatedby the open balls and in non separable spaces both σ–algebra’s could be different.The relation between the uniform topology and the Skorokhod topology in termsof σ–algebra’s is as follows: the ball σ–algebra of the uniform topology equalsthe Borel σ–algebra of the Skorokhod topology.

1.3 The problems.The problem with empirical processes in non separable metric spaces is that theσ–algebra (on the codomain) is often too large to allow the process to be Borelmeasurable and thus not for every real–valued, continuous and bounded functionG, G νn has to be measurable. This is exactly what happens in Donsker’s (clas-sical) theorem. Since D[0, 1] with the uniform metric is non separable. We referto [Bill2] chapter 3 section 15 for a proof of the non separability of (D, ‖ · ‖∞)and for the non measurability of the empirical process as a stochastic process in(D, ‖ · ‖∞).

As mentioned after Donsker’s theorem in the previous section, one can takecare of this non measurability, by introducing a new metric on D[0, 1]. This isthe same as considering the empirical process as a random variable in D w.r.t.the ball σ–algebra for the uniform topology. So one could think of developinga theory of weak convergence for that specific (ball) σ–algebra, restricting the

4 CHAPTER 1. EMPIRICAL MEASURES AND PROCESSES.

real–valued bounded continuous G to be measurable w.r.t. the ball σ–algebra, asdone in [Bill2] chapter 1 section 6.

But it turns out that for general classes of functions the empirical process in-dexed by such classes is not even measurable for the ball σ–algebra for the uni-form topology. So one actually needs a new approach to what weak convergencemeans and how it can be defined (such that it is consistent with the theory of weakconvergence for measurable random variables). One can then address the prob-lem for which classes we still have Donsker type, i.e. central limit theorems forstochastic process in the space of uniformly bounded real–valued functions withuniform norm, results for those general index classes.

In particular we need a new definition for convergence in law; the integral ofa nonmeasurable function is not defined. To achieve this, we will use the conceptof upper integral; that will be one of the subjects of the next chapter.

Chapter 2

Weak convergence fornon-measurable functions

2.1 Outer probability and expectationLet (X ,A, P ) be a probability space and set R := [−∞,∞].

Definition 2.1.1. Let g : X → R be a (not necessarily measurable) function. Theouter expectation (or the upper P–integral) of g is defined as

E∗P [g] :=

∫ ∗g dP := inf

EP [h] : h ≥ g, h : X → R, P–semi–integrable

.

Recall that a function h : X → R is called P–semi–integrable if it isA–measurable and at least one of the integrals

∫h+ dP ,

∫h− dP is finite.

Similarly, we can define an outer measure P ∗ : 2X (:= P(X ))→ [0, 1] by setting

P ∗(B) = infP (A) : A ⊃ B,A ∈ A, for B ⊂ X .

Lemma 2.1.1. Given B ⊂ X , we can find a set A ∈ A, A ⊃ B such thatP ∗(B) = P (A).Moreover, this set is P–almost surely unique, i.e. if A1, A2 ⊃ B are two setswith P ∗(B) = P (A1) = P (A2), we have P (A1∆A2) = 0. We call any set(B ⊂)A ∈ A with P ∗(B) = P (A) a measurable cover of B.

Proof. Choose a sequenceB ⊂ An ∈ A such that P (An)→ P ∗(B). Then clearlyby monotonicity

P ∗(B) ≤ P (∩∞n=1An) ≤ infn≥1

P (An) = P ∗(B),

5

6 CHAPTER 2. WEAK CONVERGENCE

which shows that ∩∞n=1An is a measurable cover.IfA1, A2 are measurable covers ofB we obtain again by monotonicity thatA1∩A2

is a measurable cover of B and hence that P (A1 ∩ A2) = P (A1) = P (A2)which implies that P (A1∆A2) = 0. Indeed A1 is the disjoint union of A1\A2

and A1 ∩ A2. Hence P (A1\A2) = 0, similarly P (A2\A1) = 0 and recall that thesymmetric difference is defined as A1∆A2 := (A1\A2) ∪ (A2\A1).

We next show that there are also measurable cover functions for the outerintegrals. We need some additional notation. Let L0 = L0(X ,A, P,R) be the setof all Borel measurable functions f : X → R.

Definition 2.1.2. If J ⊂ L0(X ,A, P,R), a function f ∈ L0 is called an essentialinfimum of J , if

i) f ≤ j P–a.s. for all j ∈ J ;

ii) f ≥ g P–a.s. for all measurable g satisfying: g ≤ j for all j ∈ J .

From this definition it immediately follows that the essential infimum is unique,P–almost surely, provided that it exists; which will be shown in the following the-orem.

Theorem 2.1.2. Let (X ,A, P ) be a probability space and J ⊂ L0(X ,A, P,R),then an essential infimum of J exists.

Proof. W. l. o. g. we assume that J 6= ∅. Define

J1 = min(f1, · · · , fn) : fi ∈ J , i = 1, · · · , n, n ∈ N

and set c = infE[arctan(j)] : j ∈ J1. Clearly, c ∈ [−π/2, π/2]. Choose then asequence jn so thatE[arctan(jn)]→ c. Set hn = min(j1, · · · , jn). Since hn ∈ J1

we still have E[arctan(hn)] → c. The sequence arctan(hn) is monotone, thus itconverges to a measurable function H = arctan(h). By bounded convergence(theorem A.2.19) we have E[arctan(h)] = c. Take an arbitrary j ∈ J . Thenhn ∧ j ∈ J1 and hn ∧ j → h ∧ j. Using again bounded convergence we see that

c ≤ E[arctan(hn ∧ j)]→ E[arctan(h ∧ j)] ≤ E[arctan(h)] = c.

Thus E[arctan(h ∧ j)] = E[arctan(h)] which implies that h ∧ j = h or h ≤ jP–almost surely.If g is another A–measurable function satisfying g ≤ j, j ∈ J , we also haveg ≤ hn, n ≥ 1 which trivially implies that g ≤ limn→∞ hn = h a.s. Thus, h is anessential infimum for J .

2.1. OUTER PROBABILITY AND EXPECTATION 7

Consider the function class J = j ∈ L0 : j ≥ f everywhere . From theabove proof it is then clear that we can choose a version of the essential infimumwhich we will denote in the sequel by f ∗ so that f ∗ ≥ f everywhere.It is easy to see that if E∗[f ] <∞, f ∗ is P–semi–integrable and E∗[f ] = E[f ∗].

Lemma 2.1.3. We have for all functions f, g : X →]−∞,∞]

i) (f + g)∗ 6 f ∗ + g∗ a.s.,

ii) (f − g)∗ > f ∗ − g∗, whenever both sides are defined a.s.

Proof. i) The RHS is well–defined, because −∞ < f ∗, g∗ 6 ∞ everywhere,and also measurable and > f + g everywhere by definition.

ii) On the set where g∗ = +∞, f ∗ is finite a.s. by assumption. So on that setf ∗ − g∗ = −∞ and thus 6 (f − g)∗. On the set where g∗ is finite, g is too,and then f = (f − g) + g. So by part (i): f ∗ 6 (f − g)∗ + g∗, and thenf ∗ − g∗ 6 (f − g)∗.

Lemma 2.1.4. Let V be a vector space equiped with a seminorm ‖ · ‖. For anypair of functions X, Y : X → S :

i) ‖X + Y ‖∗ 6 (‖X‖+ ‖Y ‖)∗ 6 ‖X‖∗ + ‖Y ‖∗ a.s.;

ii) ‖cX‖∗ = |c|‖X‖∗ a.s; c ∈ R.

Proof. Because of the triangle inequality and the definition of measurable cover,the first inequality in (i) follows, whereas the second inequality in (i) is a conse-quence of the previous lemma (2.1.3).For c = 0 the second assertion is trivial, so consider c ∈ R0. Clearly ‖cX‖∗ 6|c|‖X‖∗ because the RHS is measurable and ‖cX‖ = |c|‖X‖ ≤ |c|‖X‖∗ and bydefinition ‖cX‖∗ 6 h for each h ∈ L0 : h ≥ ‖cX‖. The converse inequalityholds too. For fixed c, for any h ∈ L0 : h > ‖cX‖, h/|c| > ‖X‖, and inparticular ‖cX‖∗ > ‖cX‖, thus ‖cX‖∗/|c| > ‖X‖∗.

In order to distribute the star over a product or a sum, one needs independence.

Lemma 2.1.5. Let (Xj,Aj, Pj), j = 1, · · · , n; n ∈ N be probability spaces andlet fj be functions from Xj into R.

i) If either fj > 0; j = 1, · · · , n or f1 ≡ 1, n = 2.Then on

∏nj=1(Xj,Aj, Pj), x := (x1, · · · , xn), f(x) =

∏nj=1 fj(xj) we

have f ∗(x) =∏n

j=1 f∗j (xj) a.s. (as usual 0 · ∞ := 0).

ii) If fj > −∞ for all xj; j = 1, · · · , n and g(x) :=∑n

j=1 fj(xj), theng∗((x1, · · · , xn)) =

∑nj=1 f

∗j (xj).

Proof. We start with the proof for the sum. By induction it is enough to considerthe case for n = 2. By lemma 2.1.3 g∗ 6 f ∗1 (x1) + f ∗2 (x2) a.s.Now we continue by contradiction, so assume that on a setC ⊂ X1×X2 of strictlypositive probability we have a strict inequality. Then

C = C ∩(∪q∈Q Cq

), Cq := (x, y) : g∗(x, y) < q < f ∗1 (x) + f ∗2 (y)

(note that Cq is indeed a measurable set). And then P (C) 6∑

q∈Q P (Cq ∩ C),so for at least one q we have P (Cq ∩ C) > 0. Denote that q by t. Thus there is arational t such that g∗ < t < f ∗1 (x) + f ∗2 (y) for all (x, y) ∈ C.Now consider h : Q×Q→ Q : (p, q) 7→ p+ q, then define

Qt = (x, y) : h(x, y) ∈]t,∞[ and Cq,p := (x, y) : f ∗1 (x) > q, f ∗2 (y) > p.

As above Ct = ∪(q,p)∈QtCq,p. And because 0 t such that q < f ∗1 (x), r < f ∗2 (y) for all(x, y) ∈ C := Cq,r.Consider the section Cx := y ∈ X2 : (x, y) ∈ C. By (the proof of) theTonelli–Fubini theorem (A.2.17) there is a set D1 ⊂ X1, P1(D1) > 0 such thatP2(Cx) > 0 for all x ∈ D1. If f1|D1(x) 6 q then f ∗1 6 q a.s. on D1. But for anyx ∈ D1 and y ∈ Cx 6= ∅ : f ∗1 > q a contradiction. So f1|D1(x) > q for some x ∈D1, and fix that x. Then for any y ∈ Cx : q + f2(y) < f1(x) + f2(y) 6 g∗(x, y).Thus for almost all y ∈ Cx : f2(y) < g∗(x, y)− q, f ∗2 (y) < g∗(x, y)− q. For anysuch y : q + f ∗2 (y) < q + r and f ∗2 (y) < r a contradiction.

Now we treat the case of the products. It is clear from the definition thatf ∗(x) 6

∏nj=1 f

∗j (xj) a.s. and also 1∗ ≡ 1. As before, it is enough to consider the

case n = 2 for the converse inequality. Suppose that on a set of strictly positiveprobability one has: f ∗((x, y)) < f ∗1 (x)f ∗2 (y) a.s. As before for some rational ron a set of strictly positive probability: f ∗(x, y) < r < f ∗1 (x)f ∗2 (y).If f1 ≡ 1, we have f(x, y) 6 f ∗(x, y) < r < f ∗2 (y) on some set of strictly positiveprobability. Then by Tonelli–Fubini (A.2.17): for some x : f2(y) 6 f ∗(x, y) <r < f ∗2 (y) on a set of y with strictly positive probability, contradicting the choiceof f ∗2 (y).If f1 > 0, f2 > 0, then as before on a set C ⊂ X1 × X2 of strictly positiveprobability and for some rationals a, b : ab > r, f ∗1 (x) > a > 0, f ∗2 (y) > b > 0we have that f ∗(x, y) < r < ab < f ∗1 (x)f ∗2 (y). Repeating the argument above,there is a set D1 ⊂ X1 : P1(D1) > 0 and P2(Cu) > 0 for every u ∈ D1 wheref1(u) > a. Then for any v ∈ Cu : f2(v) 6 f ∗(u, v)/a and so f ∗2 (v) 6 f ∗(u, v)/afor almost all v ∈ Cu. or such a v the following holds: af ∗2 (v) < ab and f ∗2 (v) < b,again a contradiction.

2.1. OUTER PROBABILITY AND EXPECTATION 9

Lemma 2.1.6. Let f : X → R be a function. Then for any real number t:

i) P ∗(f > t) = P (f ∗ > t),

ii) ∀ ε > 0 : P ∗(f > t) 6 P (f ∗ > t) 6 P ∗(f > t− ε)

Proof.

i) Because f 6 f ∗ implies f > t ⊂ f ∗ > t and the latter is measurable,by definition of the outer measure P ∗(f > t) 6 P (f ∗ > t).Now let A be a measurable cover of f > t, since we can always redefineA := A ∩ f ∗ > t, w.l.o.g. one may suppose A ⊂ f ∗ > t.If P (A) t), or in other words if P (f ∗ > t,Ac) > 0 thendefine g := f ∗IA + tIAc . By construction g > f , but on f ∗ > t ∩ Ac(a set of strictly positive probability) f ∗(ω) > tIAc(ω) = g(ω). Becauseg is measurable this is a contradiction with the definition of f ∗. HenceP (A) > P (f ∗ > t).

ii) As in (i) f > t ⊂ f ∗ > t, implying P ∗(f > t) 6 P (f ∗ > t). If0 < δ 6 ε, then using the result from (i), we get

P (f ∗ > t− δ) = P ∗(f > t− δ) 6 P ∗(f > t− ε).

Because of the continuity from above of P ;

P (f ∗ > t) = limδ→0

P (f ∗ > t− δ) 6 P ∗(f > t− ε).

Let (X ,A, P ) be a probability space. Then for any function f on X into R,let ∫

∗f dP := sup

∫h dP : h P -semi-integrable and h 6 f

.

It is easy to see that∫∗ f dP = −

∫ ∗(−f) dP.

We can also define f∗ as the essential supremum of all measurable functionsg 6 f . Then in exactly the same way as is seen for f ∗, f∗ is P–a.s. unique, andwhenever one of both exists

∫∗ f dP =

∫f∗ dP . The relation between essential

supremum and essential infimum is −f∗ = (−f)∗.

2.2 Perfect functionsFor a function g : X1 → X2, define g[A] := g(x) : x ∈ A, for A ⊂ X1. Inthis part we will investigate which conditions are needed to be satisfied by g, ameasurable function, such that for each f real–valued: (f g)∗ = f ∗ g. It willturn out later that such functions g will be useful.

Theorem 2.2.1. Let (X1,A, P ) be a probability space, (X2,B) a measurablespace, and g : X1 → X2 a measurable function. Let Q := P g−1 be theimage measure through g on B. For any real–valued function f on X2, define f ∗

w.r.t. Q. Then the following are equivalent:

a) For any A ∈ A there is a B ∈ B with B ⊂ g[A] and Q(B) > P (A);

b) for any A ∈ A where P (A) > 0, there is a B ∈ B with B ⊂ g[A] andQ(B) > 0;

c) for every real–valued function f on X2, (f g)∗ = f ∗ g Q–a.s.;

d) for any D ⊂ X2, (ID g)∗ = I∗D g.

Proof.

a)⇒ b) is trivial.

b)⇒ c) Note that (f g)∗ 6 f ∗ g always holds. So suppose (f g)∗ < f ∗ g onsome set of strictly positive probability. Note that

f ∗ g > (f g)∗ =⋃q∈Q

(f ∗ g > q ∩ q > (f g)∗

)and thus, as in the proof of lemma 2.1.5, for some rational r;(f g)∗ < r < f ∗ g on a set A ∈ A : P (A) > 0. Using (b) we find aB ∈ B : B ⊂ g[A] and Q(B) > 0. Then f g < r on A implies f < r onB, and so f ∗ 6 r on B a.s. But f ∗ g > r on A, a contradiction.

c)⇒ d) Again this is trivial.

d)⇒ a) Take a set A ∈ A, and define D := X2\g[A]. Then there is some setC ∈ B : I∗D = IC . I∗D > ID, thusD ⊂ C, and I∗D g = (ID g)∗ = 0 a.s. onA (if not then this would contradict our choice of I∗D). LetB := X2\C ∈ B.Then B ⊂ g[A] and

Q(B) = 1−∫IC dQ = 1−

∫Y

IC d(P g−1)

= 1−∫g−1(Ω′)

I∗D g dP = 1−∫

Ω

(ID g)∗ dP

> P (A)

2.3. CONVERGENCE: ALMOST UNIFORMLY, OUTER PROBABILITY 11

Definition 2.2.1. A function g satisfying one, and so by the previous theorem allfour conditions, of the conditions above will be called perfect or P–perfect.

Next, we show that on product spaces the projections are perfect.

Proposition 2.2.2. Suppose X = X1 × X2, P is a product probability ν × µ onA = A1 ⊗A2 and g is the natural projection from X onto X2. Then g is perfect.

Proof. Note that P g−1 = µ.For any B ⊂ X let Bx2 := x1 : (x1, x2) ∈ B, x2 ∈ X2 be a section of B.If B is measurable, define C := x2 : ν(Bx2) > 0. This set is containedin A2 since x2 7→ ν(Bx2) is A2-measurable by Tonelli–Fubini A.2.17. ClearlyC ⊂ g[B] and µ(C) = P (B). So by condition (a) of the previous theorem, i.e.theorem 2.2.1, g is perfect.

This proof is symmetric in the arguments, so the projection onto the first coor-dinate is perfect too. Moreover one can easily consider a countable product spacewithout changing the argument.

2.3 Convergence almost uniformly and in outer prob-ability

Definition 2.3.1. Let (Ω,A, P ) be a probability space, (S, d) a metric space and(fn)n∈N, f0 functions from Ω into S.

(i) fn is said to converge to f0 in outer probability iff d(fn, f0)∗ → 0 inprobability, or equivalently P ∗d(fn, f0) > ε → 0, n → ∞ for everyε > 0.

(ii) fn is said to converge almost uniformly to f0 iff d(fn, f0)∗ → 0 P–almostsurely.

The following result gives a characterization of almost uniform convergence.

Proposition 2.3.1. Let (Ω,A, P ) be a probability space, (S, d) a metric space andf0, fn; n ∈ N functions from Ω into S. Then the following are equivalent:

A) fn → f0 almost uniformly;

B) There exist measurable hn > d(fn, f0) with hn → 0 a.s.;

C) For any ε > 0 : P ∗supn>m d(fn, f0) > ε ↓ 0 as m→∞;

D) For any δ > 0 there is some B ∈ A with P (B) > 1− δ such that fn → f0

uniformly on B.

Proof. Clearly from the definition of almost uniform convergence we have that(A) and (B) are equivalent.

A)⇒ C) From the definition of essential infimum, one immediately gets d(fn, f0) 6d(fn, f0)∗ and thus (supn>m d(fn, f0))∗ 6 supn>m d(fn, f0)∗. Consequently,

P ∗supn>m

d(fn, f0) > ε = P(supn>m

d(fn, f0))∗ > ε

≤ Psupn>m

d(fn, f0)∗ > ε → 0, ε > 0

C)⇒ D) Take for k = 1, · · · the sets

Ck := supn>m(k)

d(fn, f0) > 1/k.

Then inserting from (C) for m(k) large enough P ∗(Ck) < 2−k. If we takemeasurable covers Bk for Ck with P (Bk) < 2−k then on Ar := ∩k>rBc

k wehave that fn → f0 uniformly and P (Ar) > 1− 2−r;

D)⇒ A) Assuming (D) we obtain sets Bk with P (Bk) ↑ 1 and fn → f0 on Bk.Then Ck := ∪kj=1Bj so that C1 ⊂ C2 ⊂ · · · and P (Ck) ↑ 1. Now form(k) large enough d(fn, f0) < 1/k on Ck for all n > m(k). Then alsod(fn, f0)∗ 6 1/k and so d(fn, f0)∗ → 0 a.s.

Part (D) is the same as what is usually called ”Egorov’s theorem” (for almostsurely convergent sequences of measurable functions) in the literature.

Like for measurable functions and convergence in probability, continuous func-tions preserve convergence in outer probability.

Proposition 2.3.2. Let (S, d) and (Y, e) be metric spaces and (Ω,A, P ) a proba-bility space. Let fn be functions from Ω into S for n = 0, 1, · · · such that fn → f0

in outer probability as n→∞. Assume that f0 has separable range and is Borelmeasurable. Let g be continuous from S into Y . Then g(fn) → g(f0) in outerprobability.

Proof. Given ε > 0, k = 1, · · · define

Bk := x ∈ S : d(x, y) < 1/k implies e(g(x), g(y)) ≤ ε, y ∈ S

2.4. CONVERGENCE IN LAW 13

We claim that Bk is closed. To see that we take a sequence xn in Bk converging tosome x. If d(x, y) < 1/k we also have d(xm, y) < 1/k for m ≥ mk (k is fixed!)and by definition ofBk it follows that e(g(xm), g(y)) ≤ ε,m ≥ mk. By continuityof g, g(xm) → g(x) and also for m ≥ mk, ε ≥ e(g(xm), g(y)) → e(g(x), g(y)),whence x ∈ Bk. Secondly note that Bk ↑ S as k → ∞ by the continuity of gwhich implies that f−1

0 (Bk) ↑ Ω.

Pick k large enough such that P (f−10 (Bk)) > 1 − ε. Now it follows from the

definition of Bk that e(g(fn), g(f)) > ε ∩ f−10 (Bk) ⊂ d(fn, f) > 1/k.

P ∗e(g(fn), g(f)) > ε ≤ P ∗(e(g(fn), g(f)) > ε ∩ f−10 (Bk)

+ P ∗(e(g(fn), g(f)) > ε ∩ (f−10 (Bk))

c6 P ∗d(fn, f) > 1/k+ P(f−1

0 (Bk))c

 1/k+ ε < 2ε

Lemma 2.3.3. Let (Ω,A, P ) be a probability space and gn∞n=0 a sequence ofuniformly bounded real–valued functions on Ω such that g0 is measurable. Ifgn → g0 in outer probability, then lim supn→∞

∫ ∗gn dP 6

∫g0 dP .

Proof. Let |gn(x)| 6 M < ∞ for all n ∈ N and x ∈ Ω. By setting gn := gn/Mwe can and do assume that M = 1. Given ε > 0, we have for n large enough,

P ∗(|gn − g0| > ε) = P (An) < ε,

where An = |gn − g0|∗ > ε. Then:

g∗n ≤ |gn − g0|∗ + g0 ≤ |gn − g|∗IAn + |gn − g|∗IAcn + g0 ≤ 2IAn + ε+ g0.

It follows that for any ε > 0,∫ ∗gn dP =

∫g∗n dP ≤ 2P (An) + ε+

∫g0 dP ≤ 3ε+

∫g0 dP,

and the lemma has been proven.

2.4 Convergence in law

Now we are able to give a definition for convergence of laws, where only the limithas to be measurable. In this section we assume that (S, d) is a metric space.


Definition 2.4.1 (J. Hoffmann–Jørgensen). Let (Ωn,An, Pn); n = 0, 1, · · · beprobability spaces. Consider further a sequence Yn : Ωn → S; n > 0. Supposethat the range of Y0 is included in some separable subset of S and that Y0 is Borelmeasurable with respect to the Borel sets on its range. Then for n → ∞, Yn willbe said to converge in law to Y0, noted Yn ⇒ Y0, if for every g, bounded andcontinuous real–valued function on S,∫ ∗

g(Yn) dPn →∫g(Y0) dP0.

Remark Note that an equivalent condition could be given in terms of the integrals∫∗ g(Yn) dPn as we have −

∫ ∗g(Yn) dPn =

∫∗−g(Yn) dPn, n ≥ 1.

Similarly as in the classical case we have that convergence in outer probabilityimplies convergence in law.

Theorem 2.4.1. If Yn : Ω → S be a sequence such that Yn → Y0 in outerprobability and if Y0 is measurable with separable range, then Yn ⇒ Y0.

Proof. Let G be a bounded, continuous function from S into R. Then applyingproposition 2.3.2 to G, we obtain that G(Yn) → G(Y0) in outer probability. Thesame holds for −G. Then lemma 2.3.3 tells us that

lim supn→∞

∫ ∗G(Yn) dP ≤

∫G(Y0) dP and

lim supn→∞

∫ ∗−G(Yn) dP ≤

∫−G(Y0) dP

Rewriting the last term gives us:

lim supn→∞

∫ ∗−G(Yn) dP 6

∫−G(Y0) dP

lim supn→∞

−∫∗G(Yn) dP 6 −

∫G(Y0) dP

− lim infn→∞

∫∗G(Yn) dP 6 −

∫G(Y0) dP

lim infn→∞

∫∗G(Yn) dP >

∫G(Y0) dP.

Noticing that trivially lim infn→∞∫∗G(Yn) dP 6 lim infn→∞

∫ ∗G(Yn) dP , we

obtain convergence in law.

Here is a theorem about convergence where perfect functions are used. Withthe aid of those functions, one is able to prove that convergence in outer probabil-ity implies convergence in distribution. Note that the domain of the functions fn

2.5. THE (EXTENDED) PORTMANTEAU THEOREM 15

is here only a measurable space, no measure is yet defined, this is different fromtheorem 2.4.1.

Theorem 2.4.2. Let for n ≥ 0, (Yn,Bn) be measurable spaces, gn : Ω → Ynbe perfect A,Bn-measurable mappings and fn : Yn → S. Suppose also that therange of f0 is separable and that f0 Borel measurable. Let Qn := P g−1

n on Bn.If fn gn → f0 g0 in outer probability, as n→∞, then fn ⇒ f0 as n→∞ forfn on (Yn,Bn, Qn).

Proof. By Theorem 2.4.1 , fn gn ⇒ f0 g0. Let H be a bounded, continuous,real–valued function on S. Writing it down gives∫ ∗

H(fn(gn)) dP →∫H(f0(g0)) dP =

∫H(f0) dQ0

where the last equation follows by the image measure theorem. The followingequalities will finish the proof.∫ ∗

H(fn(gn)) dP =

∫H(fn(gn))∗ dP

=

∫H(fn)∗ gn dP

=

∫H(fn)∗ dQn

=

∫ ∗H(fn) dQn

The first step holds because of the definition of upper integral and essential infi-mum, the second and third because gn is perfect (and thus also measurable) andfinally the last one is as the first equality. Combining the two facts, we get∫ ∗

H(fn) dQn =

∫ ∗H(fn(gn)) dP →

∫H(f0(g0)) dP =

∫H(f0) dQ0

as claimed.

2.5 The (extended) Portmanteau TheoremIn the previous section we defined the outer measure and expectation of any setor function. We also defined weak convergence for non measurable maps. Thiswas motivated by the definition of weak convergence for the usual case where oneconsiders (sequences of) measurable functions. In the classical case we can give


an intuitive explanation of what weak converges is; by the portmanteau theoremweak convergence of a sequence of probability measures is the same as conver-gence of the measures for some, thus not necessarily all, Borel sets. Like in theclassical case there is also a portmanteau theorem for (a sequence of) non measur-able functions.

Theorem 2.5.1 ((Extended) portmanteau theorem). Let (S, d) be a metric space.For n = 0, 1, 2, · · · let (Xn,An, Qn) be a probability space and fn : Xn → S amapping. Suppose that f0 has separable range S0 and is measurable. Let P0 :=Q0 f−1

0 , then the following are equivalent:

a) fn ⇒ f0;

b) lim supn→∞E∗Qn

[G(fn)] ≤∫G dP0 for each bounded continuous / Lips-

chitz real–valued function G on S;

c) E∗Qn [G(fn)] →∫G dP0 for each bounded and real–valued Lipschitz func-

tion G on S;

d) For any F ⊂ S closed, P0(F ) ≥ lim supn→∞Q∗n(fn ∈ F);

e) For any U ⊂ S open, P0(U) ≤ lim infn→∞(Qn)∗(fn ∈ U);

f) For any continuity set A ⊂ S, i.e. P0(∂A) = 0, of P0, Q∗n(fn ∈ A) →P0(A) and (Qn)∗(fn ∈ A)→ P0(A).

Before starting we recall briefly the definition of weak convergence for thegeneral (i.e. not necessarily measurable) setting, see also definition 2.4.1. Sofn ⇒ f0 iff for each G ∈ Cb(S) :

∫ ∗G(fn) dQn →

∫G(f0) dQ0 or equivalently

if∫∗G(fn) dQn →

∫G(f0) dQ0 for all G ∈ Cb(S).

Proof. (a)⇒ b) trivial

b)⇒ c) Consider −G instead of G, we find that∫−G(f0) dQ0 ≥ lim sup

n→∞

∫ ∗−G(fn) dQn

−∫G(f0) dQ0 ≥ lim sup

n→∞−∫∗G(fn) dQn

−∫G(f0) dQ0 ≥ − lim inf

n→∞

∫∗G(fn) dQn.

Next note that

lim supn→∞

E∗[G(fn)] ≤ E[G(f0)] ≤ lim infn→∞

E∗[G(fn)] ≤ lim infn→∞

E∗[G(fn)].

2.5. THE (EXTENDED) PORTMANTEAU THEOREM 17

Thus (b) implies (c).

c)⇒ d) Let F be any closed set of (S, d). Then let gkk≥1 be a sequence ofLipschitz function, with IF ≤ gk and gk ↓ IF , e.g. gk(x) := max(1−kd(x, F ), 0).For such a function: gk ≥ IF , they are also obviously non increasing. They areLipschitz, lemma A.1.13, and bounded by 1.From (c) ( or also (b) ) for every k:

lim supn→∞

∫ ∗gk fn dQn ≤

∫gk dP0.

As gk ≥ IF for every k we have∫ ∗gk fn dQn ≥

∫ ∗IF fn dQn = Q∗n(fn ∈ F ).

First taking the lim sup on both sides:

lim supn→∞

Q∗n(fn ∈ F ) ≤ lim supn→∞

∫ ∗gk fn dQn ≤

∫gk dP0,

and then letting k →∞, we finally get that

P0(F ) = limk→∞

∫gk dP0 ≥ lim sup

n→∞Q∗n(fn ∈ F ).

d)⇒ e) For U ⊂ S open, let F := S\U , then F is closed, hence by (c)

P (F ) ≥ lim supn→∞

Q∗n(fn ∈ F ).

Rewriting the whole expression in terms of U :

1− P (U) ≥ lim supn→∞

∫ ∗(1− IU) fn dQn

1− P (U) ≥ lim supn→∞

(1−

∫∗IU fn dQn

)1− P (U) ≥ 1− lim inf

n→∞(Qn)∗(fn ∈ U)

lim infn→∞

(Qn)∗(fn ∈ U) ≥ P (U)

In fact it is not hard to see that we have (d) ⇐⇒ (e).

d)+ e)⇒ f) By definition the boundary of a set A ⊂ S, with S a topologicalspace, is ∂A := A\A. For continuity sets A of P0, due to the trivial inclusions

A ⊂ A ⊂ A one has P0(A) = P0(A) = P0(A).To prove Q∗n(fn ∈ A)→ P0(A), start with noting:

lim supn→∞

Q∗n(fn ∈ A) ≤ lim supn→∞

Q∗n(fn ∈ A) ≤ P0(A) = P0(A)

where the second inequality follows from assumption (d) and equality from theremark above. Using assumtption (e) we get similarly

P0(A) ≤ lim infn→∞

(Qn)∗(fn ∈ A) ≤ lim infn→∞

(Qn)∗(fn ∈ A)

≤ lim infn→∞

(Qn)∗(fn ∈ A)

Taking all facts together

lim supn→∞

Q∗n(fn ∈ A) = lim infn→∞

(Qn)∗(fn ∈ A) = P0(A)

The proof of (Qn)∗(fn ∈ A) → P0(A) goes along the same inequalities. Weuse:

lim supn→∞

(Qn)∗(fn ∈ A) ≤ lim supn→∞

Q∗n(fn ∈ A) ≤ P0(A) = P0(A)

andP0(A) ≤ lim inf

n→∞(Qn)∗(fn ∈ A) ≤ lim inf

n→∞(Qn)∗(fn ∈ A)

f)⇒ a) Let G be a bounded real–valued continuous function. Then for someM > 0 : G(S) ⊂ [−M,M ]. For at most countably many t ∈ [−M,M ] : G < tis not a continuity set of P . This is easy to see; ∂G < t = G = t bycontinuity of G. G < t is open, because G is continuous, its closure in S iscertainly contained in G ≤ t, which is a closed set. Hence ∂G < t ⊂ G =t. Let Ft := G = t, then for at most countably many t : P0(Ft) > 0. The set

A := u ∈ [−M,M ] : P0(Fu) > 0 =⋃n≥1

u ∈ [−M,M ] : P0(Fu) >

1

n

The number of t ∈ [−M,M ] : P0(Ft) > 1/n is at most n ( P0G ∈ [−M,M ]) =P0(S) = P (S0) = 1). And a countable union of finite (or at most countable ) setsremains (at most) countable. Let ε > 0, and for k any integer let:

Bk,ε := s ∈ S : kε ≤ G(s) < (k + 1)ε

2.6. ASYMPTOTIC TIGHTNESS AND MEASURABILITY 19

and ∂Bk,ε ⊂ ∂Fkε ∪ ∂F(k+1)ε. By choosing ε appropriately we can take all Bk,ε tobe continuity sets.∫

G dP − ε ≤∑k

kεP (Bk,ε) = limn→∞

∑k

kεQ∗n(Bk,ε)

≤ lim infn→∞

∫ ∗G fn dQn ≤ lim sup

n→∞

∫ ∗G fn dQn

≤∑k

(k + 1)εQ∗n(Bk,ε) ≤∑k

(k + 1)εP (Bk,ε)

≤∫G dP + ε

Because G is bounded those sums over k are finite sums, thus they exist. Solimn→∞

∫ ∗Gfn dQn ∈ B(

∫GdP ), ε), hence limn→∞

∫ ∗Gfn dQn =

∫GdP

and fn ⇒ f0.

2.6 Asymptotic tightness and measurabilityLet (S, d) be a metric space and set for A ⊂ S, and δ > 0:

Aδ := y ∈ S : d(y, A) < δ.

Note that these sets are open in S (d(·, A) is 1–Lipschitz continuous by lemmaA.1.13).

Definition 2.6.1. A p-measure Q defined on the Borel σ–algebra of (S, d) is saidto be tight iff for every ε > 0 there exists a compact set K in S such that Q(K) ≥1− ε.A map f : X ,→ S which is measurable w.r.t. A and the Borel sets of S, is saidto be a tight random variable iff its law L(f) := Q f−1 is a tight measure on(S, d).If (Xn,An, Qn) are p-spaces, a sequence fn : Xn → S, n ≥ 1 is said to beasymptotically tight iff for every ε > 0 there is a compact set K such that

lim infn→∞

(Qn)∗(fn ∈ Kδ) ≥ 1− ε for every δ > 0

Such a sequence is said to be asymptotically measurable iff∫ ∗G fn dQn −

∫∗G fn dQn → 0

for all G ∈ Cb(S).

It is easy to see that if f is tight, it has separable support. From the definitionof weak convergence it also follows that fn has to be asymptotically measurableif fn converges weakly to an f : X → S with separable support. Moreover, fnbeing asymptotically tight and fn ⇒ f implies that f is tight.The next theorem is an extension of Prohorov’s theorem to possibly non measur-able mappings and gives a kind of reverse implication, namely that asymptoticmeasurability and asymptotic tightness imply that there is a weakly convergentsubsequence.

Theorem 2.6.1 (Prohorov). Let (Xn,An, Qn) be a probability and fn : Xn → Sbe mappings for n = 1, 2, · · · . If fn is asymptotically tight and asymptoticallymeasurable, then there exists a subsequence fnj that converges weakly to a tightBorel law.

Proof. We refer to [vdVaartAndWell] theorem 1.3.9 page 21 for a proof.

The following lemma will help to verify whether a given sequence fn is asymp-totically measurable.

Lemma 2.6.2. Let (Xn,An, Qn) be a probability space and fn : Xn → S amapping for n = 1, 2, · · · . If fnn≥1 is asymptotically tight, and∫ ∗

G fn dQn −∫∗G fn dQn → 0 for all G ∈ F , (2.6.1)

whereF is a subalgebra ofCb(S) that separates the points of (S, d), then fnn≥1

is asymptotically measurable.

Proof. Let ε > 0, choose a compact set K ⊂ S such that

lim infn→∞

(Qn)∗(fn ∈ Kδ) ≥ 1− ε for every δ > 0.

Next note that one can clearly add all constant functions to F . F remains then asubalgebra for which the assumption, equation 2.6.1, still holds.

Then by the Stone–Weierstrass theorem (A.1.20 ) the restriction of F to K isuniformly dense in Cb(K). So for G ∈ Cb(S) there exists an F ∈ F :

|G(x)− F (x)| ≤ ε/4 for all x ∈ K.

Using the compactness of K, we can say more. Actually it is even true that

|G(x)− F (x)| < ε/3

for all x ∈ Kδ and for some δ > 0. This follows from lemma A.1.23.

2.6. ASYMPTOTIC TIGHTNESS AND MEASURABILITY 21

For fn ∈ Kδ choose a measurable subset An, e.g. fn ∈ Kδ∗, such that:

Qn(An) = (Qn)∗(fn ∈ Kδ).

Then from the definition of inner and outer cover:

Qn(X n\fn ∈ Kδ∗) = Q∗n(fn /∈ Kδ)

and for n large enough and G ∈ Cb(S):

Qn(|(G fn)∗ − (G fn)∗| > ε)

≤ Qn(|(G fn)∗ − (G fn)∗| > ε ∩ fn ∈ Kδ∗)+ Qn(Xn\fn ∈ Kδ∗)

≤ Qn(|(G fn)∗ − (G fn)∗| > ε ∩ fn ∈ Kδ∗) + ε

≤ Qn(|(G fn)∗ − (F fn)∗|+ |(F fn)∗ − (F fn)∗|+ |(F fn)∗ − (G fn)∗| > ε ∩ An) + ε

≤ Qn(|(F fn)∗ − (F fn)∗| > ε/3) + ε

The second inequality holds by asymptotic tightness, and the fifth by uniformapproximation on Kδ. Note that we used part (ii) of lemma 2.1.3 and also lemma2.1.4 for:

|(G fn)∗ − (F fn)∗| ≤ |(G fn)− (F fn)|∗

and that, since fn ∈ Kδ∗ ⊂ fn ∈ Kδ:

Ifn∈Kδ∗ |[(G fn)− (F fn)]∗| ≤ (εIfn∈Kδ∗)∗ = εIfn∈Kδ∗

Ifn∈Kδ∗∣∣[(G fn)− (F fn)]

∣∣∗ ≤ εIfn∈Kδ∗

Also

|(F fn)∗ − (G fn)∗| = | − (−F fn)∗ −−(−G fn)∗|| − (−F fn)∗ + (−G fn)∗| ≤ |(G fn)− (F fn)|∗

the latter term is less than ε/3 on An = fn ∈ Kδ∗.Since, for F ∈ F , E[(F fn)∗ − (F fn)∗]→ 0, (F fn)∗ − (F fn)∗ → 0

in probability, and so (G fn)∗ − (G fn)∗ → 0 in probability too. This is auniformly bounded sequence and it easily follows that∫ ∗

G fn dQn −∫∗G fn dQn = E[(G fn)∗ − (G fn)∗]→ 0

We still need a unicity condition for tight Borel-measures on the Borel sub-sets of S. Combined with Prohorov’s theorem, which gives weakly convergentsubsequences, one can then prove weak convergence to a unique limit.

Lemma 2.6.3. Let µ, ν be two finite Borel measures on a metric space (S, d).

i) If∫F dµ =

∫F dν for all F ∈ Cb(S), then µ = ν.

ii) If µ, ν are tight Borel probability measures on (S, d). Then, if∫F dµ =∫

F dν for every F in a vector lattice (see A.1.12 for a definition ) F ⊂Cb(S), that contains the constants and separates points of S, µ = ν.

Proof.

i) Let G be an open set of S, then define hk(x) := min(kd(x, S\G), 1. And0 ≤ hk ≤ 1 actually hk(x) ≤ IG, because hk(x) = 0 for x ∈ S\G and isbounded by 1 everywhere. Also hk ≤ hk+1, because when x /∈ G : hk(x) =0 = hk+1(x) and

kd(x, S\G) ≤ (k + 1)d(x, S\G)(< 1) for x ∈ G, (k + 1)d(x, S\G) < 1.

The hk are clearly continuous and bounded (they are actually k–Lipschitz,compare with the proof of (c) ⇒ (d) in the extended portmanteau theorem2.5.1 ) and converge monotonically to IG. By the Monotone Convergencetheorem A.2.18 :

µ(G) = limk→∞

∫hk dµ = lim

k→∞

∫hk dν = ν(G).

The open sets generate the Borel σ–algebra, and form a π–system, i.e. theyare closed under finite intersections. So both measures µ and ν extenduniquely to the whole σ–algebra and thus are equal, theorem A.2.16.

ii) let ε > 0 and choose a compactK ⊂ S for which min(µ(K), ν(K)) ≥ 1−ε.By a version of the Stone–Weierstrass theorem (A.1.21 ) a vector latticeF ⊂ Cb(K), which contains the constants and separates points of S, isuniformly dense in Cb(K). Let G ∈ Cb(S), then G is uniformly bounded,so for someM > 0 : ‖g‖∞ ≤M , so by addingM toG and dividingG+Mby 2M , one has that 0 ≤ (G + M)/2M ≤ 1, and let g := (G + M)/2M .Then since F is uniformly dense in Cb(K), take F ∈ F : |g(x)−F (x)| ≤ εfor all x ∈ K. Because 0 ≤ g ≤ 1 we can also take f := max(min(F, 1), 0)as an ε–approximation of g; first if y ∈ F > 1 ∩K :

F (y)− g(y) = |g(y)− F (y)| ≤ ε and 0 ≤ 1− g(y) ≤ |F (y)− g(y)|.

2.7. SPACES OF BOUNDED FUNCTIONS 23

Thus |min(F, 1)(x) − g(x)| ≤ ε for all x ∈ K. In the second step: forz ∈ min(F, 1) < 0 ∩K :

0 ≤ g(z) ≤ g(z)−min(F, 1)(z) = |g(z)−min(F, 1)(z)| ≤ ε

Now we will bound A := |∫g dµ−

∫g dν|, also since F is a vector lattice:

f := max(min(F, 1), 0) ∈ F so that f is uniformly bounded by 1;

A =∣∣∣ ∫ g dµ−

∫g dν

∣∣∣=

∣∣∣ ∫ (g − f) dµ+

∫f dµ−

∫(g − f) dν −

∫f dν

∣∣∣≤ 2ε+

∣∣∣ ∫S\K

(g − f) dµ∣∣∣+∣∣∣ ∫ f dµ−

∫f dν

∣∣∣+∣∣∣ ∫

S\K(g − f) dν

∣∣∣≤ 2ε+ 2ε+

∣∣∣ ∫ f dµ−∫f dν

∣∣∣+ 2ε = 6ε

Since by assumption∫H dµ =

∫H dν for all H ∈ F . Thus since g :=

(G+M)/2M :∣∣∣ ∫ g dµ−∫g dν

∣∣∣ = 1/2M∣∣∣ ∫ G dµ−

∫G dν

∣∣∣We have got that:

∣∣∣ ∫ G dµ−∫G dν

∣∣∣ ≤ 3ε/M for any ε > 0, then by part(i): µ equals ν on the Borel σ–algebra of (S, d).

2.7 Spaces of bounded functionsLet T be an arbitrary set. We now look at the space

`∞(T ) := F : T → R : F is uniformly bounded .

This is a normed space with the norm ‖z‖T := supt∈T |z(t)| and correspondingmetric d(z1, z2) = ‖z1 − z2‖T , z1, z2 ∈ `∞(T ).Then any stochastic process (f(t, ·)t∈T on a probability space (X ,A, P ) which is(pointwise) bounded gives a mapping f : X → `∞(T ). This mapping possessessome measurability since the functions f(t, ·) : X → R have to beA-measurable.This will allow us to simplify the general criterion for asymptotic measurabilty,see lemma 2.6.2, somewhat.


Lemma 2.7.1. Let (Xn,An, Qn) be a probability space, T a set and fn a func-tion from Xn into `∞(T ) for n = 1, 2, · · · . Suppose that fn is asymptoticallytight. Then fn is asymptotically measurable iff (fn(t1), · · · , fn(tk)) : Xn → Rk isasymptotically measurable for any choice t1, . . . , tk ∈ T and k ≥ 1.

Proof. Assume fn : Xn → `∞(T ) is asymptotically measurable. Then we have toshow that the same is true for (fn(t1), . . . , fn(tk)) : Xn → Rk for t1, . . . , tk ∈ T .Note that (fn(t1), . . . , fn(tk)) = fn g, where g = (πt1 , . . . , πtk) : `∞(T ) → Rk

with πt : `∞(T ) → R : z → z(t) being the projection onto the t–th coordinateof z. πt is trivially continuous since `∞(T ) bears the uniform topology. This ofcourse implies that g : `∞(T )→ Rk is continuous as well.Let H ∈ Cb(Rk). Then we have trivially,

E[(H(fn(t1), . . . , fn(tk))∗ − (H(fn(t1), . . . , fn(tk))∗]

= E[(H g) fn)∗ − (H g) fn)∗].

Since H g ∈ Cb(`∞(T )) the latter term converges to zero as n → ∞. Hence asclaimed (fn(t1), . . . , fn(tk)) is asymptotically measurable.

Conversely, consider the collection of continuous functions

F := h : `∞(T )→ R :z → h(z) := G(z(t1), · · · , z(tk)) :

G ∈ Cb(Rk); ti ∈ T, i = 1, · · · , k; k ∈ N

Such a class of functions has certain properties: it is an algebra and a vectorlattice (see definition A.1.12 for definitions). For the proofs we refer to lemmaA.1.22.Moreover F also separates elements of `∞(T ). Let z1 6= z2, then forsome t ∈ T : z1(t) 6= z2(t). Both functions are bounded by M , then h(z) :=max(min(M, z(t)),−M) is bounded (by M ) and continuous, as composition ofcontinuous functions (projection on t–th coordinate and min and max).By Lemma 2.6.2 we have asymptotic measurability if

E[(F fn)∗ − (F fn)∗]→ 0 as n→∞, ∀f ∈ F .

By definition of the function class F this is the same as asymptotic measurabilityof (fn(t1), . . . , fn(tk)) for all t1, . . . , tk ∈ T and k ≥ 1.

In the sequel we will call the (finite-dimensional) distributions of f , i.e alldistributions (f(t1), . . . , f(tk)) for t1, . . . , tk and k ≥ 1 the marginal distributionsof f or simply the marginals. Using the same argument as in the above proof, wecan infer from Lemma 2.6.3 that


Lemma 2.7.2. Let f, g be two tight Borel measurable maps from a probabilityspace (X ,A, P ) into `∞(T ). Then f and g are equal in Borel law iff all corre-sponding marginals of f and g are equal in law.

Proof. One implication is trivial, if f and g have the same Borel law, then be-cause the projections are measurable, f(t) are measurable, and for ti ∈ T ; i =1, · · · k; (f(t1), · · · , f(tk)) are measurable too. Hence

P ((f(t1), · · · , f(tk)) ∈ B) = P (((πt1 , · · · , πtk) f) ∈ B)

= P (f ∈ A)

= P (g ∈ A)

= P (((πt1 , · · · , πtk) g) ∈ B)

= P ((g(t1), · · · , g(tk)) ∈ B)

where B ∈ Rk and A := F ∈ `∞(T ) : (F (t1), · · · , F (tk)) ∈ B.

Conversely, suppose all marginals are equal in distribution. Consider againthe collection of functions

F := h : `∞(T )→ R :z → f(z) := G(z(t1), · · · , z(tk)) :

G ∈ Cb(Rk); ti ∈ T, i = 1, · · · , k; k ∈ N

Now we note that∫h f dP =

∫h g dP for all h ∈ F . But this is true, since h

depends only on finitely many coordinates and the marginals of f ang g are equal.Then, by lemma 2.6.3, f and g are equal in Borel law.

Combining the two above lemmas with Prohorov’s theorem, we get the fol-lowing.

Theorem 2.7.3. Let (Xn,An, Qn) be a probability space, and let fn be a mappingfrom Xn to `∞(T ) for n = 1, 2, · · · .Then fn converge weakly to a tight limit iff fnn≥1 is asymptotically tight and themarginals (fn(t1), · · · , fn(tk)) converge weakly to a limit for every finite subsett1, · · · , tk ⊂ T ; k = 1, 2, · · · .

Proof. Suppose fn converges to a tight limit, say f : X0 → `∞(T ) which isdefined on a p-space (X0,A0, Q0).We first show that fn is asymptotically tight. So let ε > 0. Since f is tight, take acompactK subset of `∞(T ) withQ0f ∈ K ≥ 1−ε. By part (e) of the extendedportmanteau theorem, 2.5.1, since Kδ is open;

1− ε ≤ Q0(f ∈ Kδ) ≤ lim infn→∞

(Qn)∗(fn ∈ Kδ)

for any δ > 0.That the marginals converge follows from the continuous mapping theorem.Now assume fn is asymptotically tight and the marginals converge for any finitesubset T0 of T . From the definition of weak convergence it follows then that(fn(t)t∈T0 is asymptotically measurable. So, by lemma 2.7.1, fn is asymptoti-cally measurable, and it was given that fn was also asymptotically tight. Thus byProhorov’s theorem, 2.6.1, fn is relatively compact. So one may possible havedifferent weak limits, but since all marginals converge to a certain (unique) limitand, by lemma 2.7.2, convergence of all marginals is enough to characterize alimit, the limit must always be the same. So fn converges weakly to a uniquelimit.

As seen from theorem 2.7.3, two conditions have to be satisfied for weak con-vergence: convergence of marginals is usually the easier part, because a lot oftechniques are already available. The second condition, tightness, is much harder.This will be related to an asymptotic unform equicontinuity condition on the sam-ple paths t→ fn(t).

Definition 2.7.1. Let ρ be a semimetric on T . A sequence fn : Xn → `∞(T )is said to be asymptotically uniformly ρ–equicontinuous in probability iff forevery ε, η > 0 there exists a δ > 0:

lim supn→∞

Q∗n(

sup|fn(s)− fn(t)| : s, t ∈ T, ρ(s, t) < δ > ε)< η (2.7.1)

Condition (2.7.1) will also be called the asymptotic equicontinuity condition.

Theorem 2.7.4. A sequence fn : (Xn,An, Qn) → `∞(T ) is asymptotically tightiff fn(t) is asymptotically tight in R for every t ∈ R, and there exists a semimetricρ on T such that (T, ρ) is totally bounded and fnn≥1 is asymptotically uniformlyρ–equicontinuous in probability.Addendum Moreover, if fn ⇒ f0, then almost all paths t 7→ f0(·)(t) are uni-formly ρ–continuous; and the semimetric ρ can, without loss of generality, betaken equal to any semimetric ρ for which (T, ρ) is totally bounded and the pathst 7→ f0(·)(t) are uniformly ρ–continuous .

Proof.

⇐ Let ζ > 0, and εmm≥1 a sequence:

εm > 0 and εm ↓ 0.

We claim that ‖fn‖T := supt∈T |fn(t)| is a tight sequence in R. Indeed, letε = 1 and since we have asymptotic ρ–equicontinuity of fn in probability,

condition 2.7.1, for that particular choice of ε and ζ =: η implies that thereis a δ > 0 such that

lim infn→∞

(Qn)∗(

sup|fn(s)− fn(t)| : s, t ∈ T, ρ(s, t) < δ ≤ 1)≥ 1− ζ

and since (T, ρ) is totally bounded: T ⊂ ∪rj=1B(T,ρ)(tj, δ) for some tj ∈ T .Hence if xn ∈

sup|fn(s) − fn(t)| : s, t ∈ T, ρ(s, t) < δ ≤ 1

, we

have ‖fn(xn)‖∞ ≤ maxrj=1 |fn(xn)(tj)|+ 1.Moreover, the functions fn(tj) : Xn → R are asymptotically tight. Thisimplies that there exist constants Mj > 0 so that (Qn)∗(|fn(tj)| ≤ Mj) ≥1 − η/r, 1 ≤ j ≤ r. Setting M = max(M1, . . . ,Mr) + 1 we can concludethat

lim infn→∞

(Qn)∗(‖fn‖T ≤M

)≥ 1− 2ζ.

For εm and η := 2−mζ , choose δm > 0 such that, equation 2.7.1 is valid.For each m, finitely many balls of radius δm cover T , they could havenon–empty intersection. To avoid that we make a partition out of them,as usually by excluding the previous, i.e.

B(m)k := B(T,ρ)(t

(m)k , δm)\

k−1⋃j=1

B(T,ρ)(t(m)j , δm)

for all m ≥ 1 and k ∈ 1, · · · , qm. For m fixed, let zj; 1 ≤ j ≤ pm befunctions from T into R constant on each B(m)

k ; 1 ≤ k ≤ qm and taking oneof the values 0,±εm, · · · ,±dM/εmeεm on each B(m)

k . The number of suchpossible functions is bounded by qm(2dM/εme + 1), hence finite. Furtherlet

Km :=

pm⋃j=1

B`∞(T )(zj, εm).

Next for xn in:

‖fn‖T ≤M ∩

max1≤k≤qm

sups,t∈B(m)

k

|fn(s)− fn(t)| ≤ εm

it follows that xn lies in fn ∈ Km. Indeed, for t(m)

k as above one has|fn(xn)(t

(m)k )| ≤M ≤ dM/εmeεm, moreover

fn(xn)(t(m)k ) ∈ [lεm, (l + 1)εm] for some −dM/εme ≤ l ≤ dM/εme − 1.

And for all s ∈ B(m)k :

fn(xn)(s) ∈ B`∞(T )(lεmIB(m)k, εm) ∪B`∞(T )((l + 1)εmIB(m)

k, εm)

Let K := ∩m≥1Km, because all Km are closed, K is closed too. MoreoverK is also totally bounded in `∞(T ), for ξ > 0, choose εm < ξ, which canbe done since εm ↓ 0, then note that

K ⊂ Km =

pm⋃j=1

B`∞(T )(zj, εm) ⊂pm⋃j=1

B`∞(T )(zj, ξ).

Hence K is compact. Moreover, given any δ > 0 there is an m = mδ ≥ 1so that

Kδ ⊃ ∩mi=1Ki.

This will be proved by contradiction. Let δ > 0 arbitrary and suppose thatfor each m: Kδ + ∩mi=1Ki. Then there is a sequenceumm≥1: um ∈ ∩mi=1Ki and um /∈ Kδ.

In particular um ∈ K1 for all m ≥ 1. Since K1 = ∪p1j=1B`∞(T )(zj, ε1), thereexists a subsequence uk1(n)n≥1 and a closed ball B1 of K1 such that

uk1(n) ∈ B1 := B`∞(T )(z1, ε1)

Next, since k1(n) > n: uk1(n) ∈ K2 for all n ≥ 1. So there exists a (closed)ball B2 of K2 and a further subsequence uk2(n)n≥1 of uk1(n)n≥1 suchthat uk2(n) ∈ B2, n ≥ 1. Continuing recursively we obtain at stage j a sub-sequence ukj(n)n≥1 of ukj−1(n)n≥1 such that ukj(n) ∈ Bj , Bj a closedball making up Kj .

Now let ukn(n)n≥1 denote the diagonal subsequence; for each m ≥ 0and n ≥ m : ukn(n) ∈ Bm, a closed ball of Km, which has radius εm.So ukn(n)n≥1 is a Cauchy sequence in `∞(T ) (at every stage m, ele-ments of the queue of the diagonal sequence lie at most 2εm away fromeach other, and εm ↓ 0). It is well known that the space (`∞(T ), ‖ · ‖T )is complete, in other words any Cauchy sequence converges. Let u denotethe limit of ukn(n)n≥1. Since the limit of ukm(m)m≥1 is the same as thatof ukm(m)m≥j+1 for any j ≥ 1, one obtain that u ∈ Bj for any j ≥ 1,because

ukm(m)m≥j+1 ⊂ Bj and Bj = Bj .

So in particular u ∈ K is true. On the other hand, since all um /∈ Kδ:

ukm(m)m≥1 ∈ (Kδ)c = (Kδ)c.

Thus u ∈ (Kδ)c ∩K, this is clearly a contradiction.Finally if fn(xn) /∈ Kδ, then fn(xn) /∈ ∩mi=1Ki for some fixed m and it

follows that:

lim supn→∞

Q∗n

(fn /∈ Kδ

)≤ lim sup

n→∞Q∗n

(fn /∈

m⋂i=1

Ki

)

≤ lim supn→∞

Q∗n

(m⋃i=1

(‖fn‖T > M

∪

max1≤k≤qi

sups,t∈B(i)

k

|fn(s)− fn(t)| > εi

))

≤ lim supn→∞

(Q∗n

(‖fn‖T > M

)+

m∑i=1

Q∗n

(max

1≤k≤qisup

s,t∈B(i)k

|fn(s)− fn(t)| > εi

))

≤ 2ζ +m∑i=1

2−mζ < 3ζ

fn is thus asymptotically tight.

⇒ If fn is asymptotically tight, then fn(t) will be also asymptotically tight forany t ∈ T . Let ε > 0, then there exists a K ⊂ `∞(T ) compact such that

lim infn→∞

(Qn)∗(fn ∈ Kδ) ≥ 1− ε

for all δ > 0. Now since πt is continuous, πt(K) is compact ( A.1.4 ).Moreover πt(Kδ) ⊂ (πt(K))δ, because if h ∈ Kδ then for some k ∈ K :supu∈T |k(u)− h(u)| < δ, in particular |πt(h)− πt(k)| = |h(t)− k(t)| < δand

(Qn)∗(fn ∈ Kδ) ≤ (Qn)∗(fn(t) ∈ (πt(K))δ).

We now take compact sets K1 ⊂ K2 ⊂ · · · such that for any ε > 0:

lim infn→∞

(Qn)∗(fn ∈ Kεm) ≥ 1− 1/m.

And for each m fixed we define the semimetric ρm : T × T → R+ by

ρm(s, t) := supz∈Km

|z(s)− z(t)|, s, t ∈ T.

The triangle inequality follows from the triangle inequality for the absolutevalue.

We now claim that (T, ρm) is a totally bounded space. Let η > 0, sinceKm is compact it is also totally bounded: Km ⊂ ∪ki=1Bd∞(zi, η). Next wepartition Rk in cubes of edge η. Since zi; i = 1, · · · , k is uniformly boundedin R, (z1, z2, · · · , zk) is uniformly bounded in Rk. Hence the set A :=(z1(t), z2(t), · · · , zk(t)) : t ∈ T has non empty intersection with onlyfinitely many cubes, say p such cubes. For each cube we pick one elements from T such that (z1(s), z2(s), · · · , zk(s)) lies in that cube. Thus we havegot only finitely many tj; j = 1, · · · , p such that (z1(tj), z2(tj), · · · , zk(tj))lies in a cube.

Then T ⊂ ∪pi=1Bρm(ti, 3η). First note that

supz∈Km

|z(t)− z(s)| ≤ 2η +k

maxj=1|zj(t)− zj(s)|,

sinceKm ⊂ ∪ki=1Bd∞(zi, η). Secondly, (z1(t), · · · , zk(t)) lies in the cube ofsome (unique) (z1(ti), · · · , zk(ti)) of edge η. So for t ∈ T fixed, choose ti ∈t1, · · · , tp such that (z1(t), · · · , zk(t)) lies in the cube of (z1(ti), · · · , zk(ti)):

ρm(t, ti) = supz∈Km

|z(t)− z(ti)|

≤ 2η +k

maxj=1|zj(t)− zj(ti)|

≤ 2η + η = 3η.

Without loss of generality we can take another metric that is bounded by 1and induces the same topology as ρm, e.g. min(ρm, 1). We define now ametric

ρ(s, t) :=∑m≥1

2−m(min(ρm, 1))

on T . With this new metric T still will be totally bounded. Let η > 0,and take m ∈ N : 2−m < η. (T, ρm) was totally bounded, so there arefinitely many ti; i = 1, · · · p : T ⊂ ∪pi=1Bρm(ti, η). Because Kn ⊂ Km,ρn ≤ ρm, 1 ≤ n ≤ m. For t ∈ T , take ti so that t ∈ Bρm(ti, η). Then

ρ(t, ti) ≤m∑l=1

2−lρl(t, ti) +∑l≥m+1

2−l

≤m∑l=1

2−lη + 2−m < 2η

Now we prove the asymptotic uniform ρ–equicontinuity in probability of fn.Let z ∈ Km, from the definition of ρm it follows |z(t) − z(s)| ≤ ρm(t, s).

Because fn is asymptotically tight, it would suffice to have

Kεm ⊂

z ∈ `∞(T ) : sup

ρ(s,t)<2−mε

|z(s)− z(t)| ≤ 3ε

for some m ∈ N. Indeed if z ∈ Kεm, then for some z ∈ Km:

|z(s)− z(t)| ≤ |z(s)− z(s)|+ |z(s)− z(t)|+ |z(t)− z(t)|≤ ε+ ρm(s, t) + ε ≤ 2ε+ ρ(s, t)2m

since min(ρm, 1) ≤ ρ2m. So if δ < 2−mε:

lim infn→∞

(Qn)∗

(sup

ρ(s,t)<δ

|fn(s)− fn(t)| ≤ 3ε)≥ 1− 1/m.

Since this is true for any m ≥ 1 and any ε > 0, we easily see that condition(2.7.1) is satisfied.

It remains to prove the addendum. So assume that fn ⇒ f0, where f0 is definedon the probability space (X0,A0, Q0). Defining the sets Kn as in the above proofof the implication “⇒”, we have that Q0(f0 ∈ Km) ≥ 1 − 1/m,m ≥ 1 whichtrivially implies that Q0(f0 ∈ ∪∞m=1Km) = 1. By definition of ρm all functionsin Km are uniformly ρm–continuous which also implies that they are uniformlyρ–continuous (since ρm ∧ 1 ≤ 2mρ).

|z(t)− z(s)| ≤ 1. supz∈Km

|z(t)− z(s)| ≤ ρm(s, t);

for z ∈ Km.Thus ∪∞m=1Km is a subset of the uniformly ρ–continuous functions on T .To prove the last part of the addendum, we note first that the set of uniformlycontinuous functions on a totally bounded semi–metric space T is separable andcomplete so that f0 is tight (for a proof see [Bill2] chapter 1 theorem 1.3 on page8). Indeed, let

(UC(T ), ‖ · ‖T ) := g : (T, ρ)→ R : g is uniformly continuous.

Then it is well known (from standard real analysis) that (UC(T ), ‖ · ‖T ) is aclosed subset of the complete space (`∞(T ), ‖ · ‖T ), hence (UC(T ), ‖ · ‖T ) iscomplete too. For the separability, recall that any metric space can be completed,in particular there exists a complete metric space (S, d) and an isometry φ fromT onto S, such that φ(T ) is dense in S (e.g. see [Dud1] theorem 2.5.1 on page58 for a proof). Also φ(T ) is still totally bounded (since an isometry is Lipschitzcontinuous). We claim that φ(T )(= S) is still totally bounded, so that (S, e)

would be compact. Let ε > 0 and x ∈ φ(T ), then since φ(T ) is totally boundedthere exists φ(tj) (an isometry is always injective so φ : T → φ(T ) is a bijection),1 ≤ j ≤ n:

φ(T ) ⊂n⋃j=1

B(φ(tj), ε/2).

Also there exists a sequence yn ⊂ φ(T ): yn → x, hence for k large enough:e(yk, x) < ε/2 and for some φ(tl) :e(φ(tl), yk) > ε/2 so that;

φ(T ) ⊂n⋃j=1

B(tj, ε).

Hence S(e) is a compact metric space and T is isometric to a dense subset of S, sow.l.o.g. one can assume T is a dense subset of a compact space. Now the separa-bility of (UC(T ), ‖ · ‖T ) follows from the separability (theorem A.1.20: considerthe polynomial functions on S) of C(S) together with the string of equalities

UC(T ) = UC(S) = C(S),

where the first equality follows by extending uniform continuous functions on Tby mean of a limit argument to S and the second from the fact that continuousfunctions on a compact space are uniformly continuous (corollary A.1.11). Fur-ther consider:

Fδ,ε = z ∈ `∞(T ) : sup|z(s)− z(t)| : s, t ∈ T, ρ(s, t) < δ ≥ ε.

Note that such sets are closed subsets of `∞(T ). Consequently, we have

lim supn→∞

Q∗n(fn ∈ Fδ,ε) ≤ Q0(f0 ∈ Fδ,ε)

(by theorem 2.5.1 (d)). Since the paths t → f0(t) are Q0–a.s. uniformly ρ-continuous, we have for any ε > 0

Q0(f0 ∈ Fδ,ε)→ 0 as δ → 0.

Combining the two last conditions, we get (2.7.1) for the semi-metric ρ.

Chapter 3

Vapnik–Cervonenkis classes.

3.1 Introduction: definitions and a fundamental lemma.This part is concerned with a special kind of classes of sets. If one desires toobtain results on convergence in law, in (outer) probability or almost uniformlyfor a sequence of probability measures, uniformly over a class of sets, it shouldnot be surprising that such (uniform) convergence will not hold for any such class.One of the paths leading towards a solution uses the concept of a VC class. Thecondition imposed assures one that the class of sets considered is, in some senseto be defined later on, not too big. But unfortunately this isn’t enough, one hasto impose also some (mild) measurability conditions on that class. However thegood news is that the last remark is more of a theoretical nature, the statistician inhis daily life hardly encounters classes that violate the measurability condition.

Definition 3.1.1. Let X be any set and C a class of subsets of X . For A ⊂ X , letCA := C u A := A u C := C ∩ A : C ∈ C. Denote the cardinality of A by |A|and let 2A := B : B ⊂ A. Let ∆C(A) := |CA|.C is said to shatter A iff C u A = 2A. If A is finite, then C shatters A iff

∆C(A) = 2|A|.Let mC(n) := max∆C(F ) : F ⊂ X, |F | = n, for n = 0, 1, 2, · · · . If X is

finite, say |X | < n, then mC(m) 6 2n for all m. Now we define

V (C) :=

infn : mC(n) < 2n, if inf is finite;+∞, if mC(n) = 2n for all n.

S(C) :=

supn : mC(n) = 2n;−1, if C is empty.

We clearly have S(C) = V (C) − 1, so if one is finite, then the other is too. S(C)is, by definition, the largest cardinality of a set shattered by C. V (C) = n is the

33

34 CHAPTER 3. VAPNIK–CERVONENKIS CLASSES.

smallest index n such that no set with cardinality larger than n is shattered by C.C will be called a Vapnik–Cervonenkis or VC class whenever V (C) <∞.

In the case X = R or more generally Rd, one can give classes of sets whichare VC classes. We will only work one example out, and give references forbooks where more examples are given. In the case of R, the class C := ] −∞, t] : t ∈ R is a VC class with S(C) = 1. Indeed any singleton is shattered.Now take any set of two points x1, x2, and w.l.o.g. assume x1 < x2, thenx1, x2 u C = ∅, x1, x1, x2, but x2 cannot be isolated by sets of C, everyset that contains x2 has to contain x1 too. For similar results in higher dimensions,we refer to [vdVaartAndWell] Example 2.6.1 on page 135.

A nice property of VC classes is that the property is preserved by (finitelymany) Boolean operations. This is an easy way to generate VC classes, startingfrom simple VC classes. For more on this subject we refer to [vdVaartAndWell]section 2.6.5 (146–149).

If X is finite, with n elements, then 2X is a VC class, with S(2X ) = n. LetNC6k :=

∑kj=0

(Nj

), where

(Nj

)is 0 for j > N . We have an identity as in Pascal’s

triangle.

Proposition 3.1.1. NC6k = N−1C6k + N−1C6k−1 for k = 1, 2, · · · and N =1, 2, · · · .

Proof. For each j = 1, 2, · · · , N we have(N

j

)=

(N − 1

j

)+

(N − 1

j − 1

)Then summing over all j finishes the proof (by definition

(Nj

)= 0 for j > N ).

Then next fact illustrates why we consider VC classes, and why in some senseVC–classes are small. Clearly for any non VC class mC(n) = 2n meaning thatarbitrary large finite sets are shattered. If however C is a VC class then the nextfacts will show that mC(n) = O(nr) for some r ∈ N. Therefore for a VC classmC(n) only grows as a polynomial in n, which is much slower than the exponen-tial growth for non VC classes.

Theorem 3.1.2 (Sauer’s lemma). If mC(n) > nC6k−1, k > 1, then mC(k) = 2k.So if S(C) <∞, then mC(n) 6 nC6S(C) for all n.

Proof. The proof goes by induction on k and n. For k = 1 : nC60 = 1 < mC(n).Thus mC(n) > 2, and so C must contain at least two elements. So for somesingleton G = x : ∆C(G) = 2. If k > n, then nC6k−1 = 2n > mC(n), so theassumption implies k 6 n.

3.1. INTRODUCTION: DEFINITIONS, FUNDAMENTAL LEMMA. 35

Assume that the statement is true for k 6 K and n > k. Now for fixedk := K + 1. As above, we only need to consider n > k = K + 1. So forn = k = K + 1, the condition

mC(n) > nC6n−1 = 2n − 1,

implies mC(n) = 2n. To continue, suppose now the statement holds for all(k ≤)n ≤ N . We will prove it then for n := N + 1. So one starts with

mC(N + 1) = mC(n) > nC6k−1 = N+1C6K ,

by definition of mC(n) there exists a set

Hn := x1, · · · , xn such that ∆C(Hn) > nC6K .

Let HN := Hn\xn (recall n := N + 1). If ∆C(HN) > NC6K we have by ourinduction hypothesis that mC(k) = 2k. So assume now that

∆C(HN) ≤ NC6K .

Let Cn := Hn u C := Hn ∩ A : A ∈ C. Furthermore we will need followingsets, called full sets. A set E ⊂ HN is said to be full iff E and E ∪ xn belongto Cn, i.e. there exists a C1 ∈ C : E ⊂ C1 and a C2 ∈ C : E ∪ xn ⊂ C2. Denoteby f the number of full sets. The map

Cn → CN : D 7→ D ∩HN

is onto, forB := D∩HN , withD ∈ C, takeD∩Hn ∈ Cn, and (D∩Hn)∩HN = B.We claim ∆C(Hn) = ∆C(HN) + f . Indeed for full sets we have two candidatesnamely E and E ∪ xn and for non full sets only one possibility.

Let F be the collection of full sets. Suppose f = ∆F(HN) > NC6K−1.Then by our induction hypothesis, there is a set G ⊂ HN of cardinality K andmC(G) = 2K , i.e. G is shattered by the collection of full sets. Let J := G∪xn,then card(J) := 2K+1 and since the union is a disjoint one, mC(J) = 2K+1.If however f ≤ NC6K−1, then

∆C(Hn) = ∆C(HN) + f ≤ NC6K + NC6K−1 = N + 1C6K

where the last equation follows from proposition 3.1.1. A contradiction to ourassumption: ∆C(Hn) > nC6K .

For the second part it is enough to note that S(C) < +∞, implies mC(S(C) +j) < 2S(C)+j for all j ≥ 1, so by the lemma just proven: mC(n) ≤ nC6S(C) forall n (since for all n ≥ S(C) + 1 by the lemma and it holds vacuously for alln ≤ S(C)).

Proposition 3.1.3 (Vapnik–Cervonenkis). Let n be any nonnegative integer andk: k + 2 ≤ n, then nC6k ≤ (1, 5)nk/k!.

Proof. Two proofs are provided, one nearly optimal relying on Stirling’s formulaand one other, more elementary and based on purely probabilistic methods wherewe prove nC6k ≤ (e/k)knk for all n ≥ k + 1.

• for k = 0: nC6k = 1 < 1, 5 = (1, 5)nk/k!. So we may assume k ≥ 1.Recall the binomial theorem: (a + b)p =

∑pj=0

(pj

)ap−jbj for p = 1, 2, · · ·

and a, b ∈ R. We deduce easily:

(n+ 1)k =k∑j=0

(k

j

)nk−j1j ≥

(k

0

)nk +

(k

1

)nk−1 = nk−1(k + n).

And further, by diving the above inequality by k!:

(n+ 1)k

k!≥ nk−1

(k − 1)!+nk

k!. (3.1.1)

We will continue by induction on n and k. For k = 1 and n ≥ k + 2(= 3),nC6k = 1 + n < (1/2)n + n = (1, 5)nk/k!. Until now, we proved it fork = 0, 1 and n ≥ k + 2. But, before considering general k and n, we proveit for a specific value of n, namely n := k + 2, and k ≥ 2.For n = k + 2 the desired inequality is

2n − n− 1 ≤ (1, 5)nn−2

(n− 2)!(3.1.2)

=nn−1

n

n− 1

n− 1

1, 5

(n− 2)!

=(1, 5)nn−1(n− 1)

n!.

From the binomial theorem it is deduced that: 2n = (1 + 1)n =

n∑j=0

(n

j

)1n−j1j =

n−2∑j=0

(n

j

)+

(n

n− 1

)+

(n

n

)= nC6n−2 + n+ 1.

Since for k = 0, nothing was to be proved, we verify by hand, that theinequality holds for k = 1, 2, 3, and 4, or similarly for n = 3, 4, 5, and 6.

3.1. INTRODUCTION: DEFINITIONS, FUNDAMENTAL LEMMA. 37

So let n = 3, then

23 − 3− 1 = 4 < 4, 5 =(1, 5)33−1(3− 1)

3!for n = 3, 3.1.2 is true;

24 − 4− 1 = 11 < 12 =(1, 5)44−1(4− 1)

4!for n = 4, 3.1.2 is true;

25 − 5− 1 = 26 < 31, 25 =(1, 5)55−1(5− 1)

5!for n = 5, 3.1.2 is true;

26 − 6− 1 = 57 < 81 =(1, 5)66−1(6− 1)

6!for n = 6, 3.1.2 is true.

For greater n, Stirling’s formula

n! ≤(ne

)n(2πn)1/2 exp

( 1

12n

),

see [Dud2] theorem 1.3.13 on page 17. So for n ≥ 7 (i.e. k ≥ 5 andn = k + 2), it is enough to have

(2n − n− 1)n! ≤ 2nn!

≤ 2n(ne

)n(2πn)1/2 exp

( 1

12n

)This will follow from noticing: (e/2)n ≥ 2n1/2 for n ≥ 7. Indeed, for ifthis would be true:

2n(ne

)n(2πn)1/2 exp

( 1

12n

)≤ 1

2n−1/2nn

√2πn1/2 exp

( 1

12n

)=

√π

2exp

( 1

12n

)nn

< 1, 5nn(

1− 1

n

)since, for n = 7:√

π

2exp

( 1

12 ∗ 7

)< 1, 27 < 1, 28 < 1, 5

(1− 1

7

)where for larger n, the LHS, becomes smaller, and the RHS greater. Con-sider the functions (e/2)x, 2x1/2 on [7,+∞[. In 7:

(e/2)7 > 8 > 6 > 2√

7

Also ((e/2)x)′ = (e/2)x ln(e/2) and (2x1/2)′ = x−1/2 so and

(e/2)x ln(e/2) > 8 ∗ 0, 3 > 1 > x−1/2.

And so from the fundamental theorem of calculus, one deduces that (e/2)x ≥2x1/2, for all x ≥ 7. Thus for all n, equation 3.1.2 is valid.

Now we have come to the last step. Suppose, for k = 1, · · · , K and n ≥k + 2, the result holds (notice that we did prove it for k = 1). Then letk = K+1 and now we have to prove it for n = k+ j = (K+1)+ j, wherej ≥ 2, by induction on j. Let j = 2, nothing has to be proved since it wasdone in the previous paragraph. So we do it for j := j + 1,

nC6k = (K+1)+j+1C6K+1

= (K+1)+jC6K+1 + (K+1)+jC6K

≤ (1, 5)((K + 1) + j)K+1/(K + 1)! + (1, 5)(K + 1 + j)K/K!

≤ (1, 5)((K + 1) + j + 1)K+1/(K + 1)!

= (1, 5)nk/k!

where the first equation is the definition, the second is true by proposition3.1.1, the first inequality by our induction hypothesis, the second inequalityby equation 3.1.1.

• A probabilistic proof of Vapnik–Cervonenkis’s theorem. Let Y be aBinomial(n, 1/2) random variable. Then, for s a positive integer

nC6s :=s∑j=0

(n

j

)= 2n Pr(Y ≤ s) = 2n Pr(s− Y ≥ 0).

Let a > 1 arbitrary for now, then

2n Pr(s− Y ≥ 0) = 2n Pr(as−Y ≥ 1)

and by Markov’s inequality the last term is bounded by 2nE[as−Y ]. So

nC6s ≤ 2nasE[a−Y ] = 2nasE[rY ]

where r = 1/a < 1. Since Y ∼Bin(n, 1/2), we can decompose it as a sumof n independent Bernoulli(1/2), i.e. Binomial(1, 1/2), random variablesYi (Pr(Yi = 0) = 1/2 = Pr(Yi = 1)) and such that

∑i Yi is equal in

distribution to Y . We rewrite the above expectation, using independence ofthe Yi’s as

2nasE[rY ] = 2nasE[r∑i Yi ] = 2nasE[

∏i

rYi ] = 2nas∏i

E[rYi ].

3.2. UNIFORM BOUNDS FOR PACKING NUMBER OF VC CLASS. 39

Recall that the expectation of a discrete random variable Z is given by∑z z Pr(Z = z). Hence

2nas∏i

E[rYi ] = 2nasn∏i=1

(r01/2 + r11/2)

= 2nas(1/2)n(1 + r)n = as(1 + r)n

Thus let C be a VC–class, then by Sauer’s lemma 3.1.2, S(C) < +∞ and somC(n) 6 nC6S(C) for all n. Let s := S(C), then by the previous discussionone has the inequality

mC(n) ≤ nC6S(C) ≤ as(1 + r)n

for each a > 1 and r = 1/a. Let n > s(= S(C)). Then, for r = s/n < 1.

r−s(1 + r)n ≤(ns

)s(1 +

s

n

)n≤(ns

)sexp(s) = ns(e/s)s

and (e/s)s is a constant that depends only on S(C).

Definition 3.1.2. Let (X ,A) be a measurable space and C ⊂ A, we define

dens(C) := infr ∈ R+0 : there is a K < +∞ such that

mC(n) ≤ Knr for all on n ≥ 1.

Corollary 3.1.3. For any set X and C ⊂ 2X , dens(C) ≤ S(C), and conversely, ifdens(C) < +∞, then S(C) < +∞ too.

Proof. We first start with dens(C) ≤ S(C). By Sauer’s lemma ( 3.1.2 ), sinceS(C) < +∞, we have got mC(n) 6 nC6S(C) for all n ≥ 1, and by proposition3.1.3, there is a K such that mC(n) ≤ KnS(C) for all n ≥ S(C) + 2. By taking alarger constant K, the result holds for all n ≥ 1.

In the other direction, we note that if dens(C) < +∞, then there is a constantK and a strictly positive real number r such that mC(n) < Knr, for all n ≥ 1.Also since nr/2n → 0 as n → ∞, for all n from some n0 on: mC(n) < 2n, andS(C) < +∞.

3.2 Uniform bounds for packing number of VC class.Definition 3.2.1. Let (X ,A) be a measurable space and C ⊂ A, we define asemimetric dP on (X ,A) as follows:

dP : A×A → R+ : (A,B) 7→ dP (A,B) := P (A∆B),

where A∆B := (A\B) ∪ (B\A).That it is symmetric is obvious and P (A∆B) = 0 iff A = B, P–a.s. The triangleinequality follows from: A∆B ⊂ (A∆C) ∪ (B∆C) for any A,B,C and thesubadditivity of P .

Proof.

A∆B = (A\B) ∪ (B\A) = (A ∩Bc) ∪ (Ac ∩B)

= [(A ∩Bc ∩ C) ∪ (A ∩Bc ∩ Cc)] ∪ [(Ac ∩B ∩ C) ∪ (Ac ∩B ∩ Cc)]

⊂ [(Bc ∩ C) ∪ (A ∩ Cc)] ∪ [(Ac ∩ C) ∪ (B ∩ Cc)]

= [(A ∩ Cc) ∪ (Ac ∩ C)] ∪ [(B ∩ Cc) ∪ (Bc ∩ C)]

= (A∆C) ∪ (B∆C)

Definition 3.2.2. Let (X ,A) be a measurable space and C ⊂ A, we define

s(C) := infw : there is a K = K(w, C) < +∞ such thatfor every law P on A and 0 < ε ≤ 1

D(ε, C, dP ) ≤ Kε−w.

the index of C.

We state the definition of packing number and of envelope function for a classof real–measurable functions on some probability space.

Definition 3.2.3. Let (S, d) be a metric space, let ε > 0 and A ⊂ S, the packingnumber of the set A of is the quantity defined as D(ε, A, d) :=

supn : there exists a set F ⊂ A, F having n elementssatisfying d(x, y) > ε, for all x 6= y, x, y ∈ F .

Let F ⊂ L2(X ,A, P ), where (X ,A, P ) is some probability space. Let

FF(x) := sup|f(x)| : f ∈ F = ‖δx‖F ,

where δx(g) := g(x). A measurable function F ≥ FF is called an envelopefunction for F . If FF isA–measurable, then it is said to be the envelope function.For any lawQ on (X ,A), F ∗F , the essential infimum as defined in chapter 2 section1 is an envelope function for F depending on Q.

The next theorem will be useful when dens(C) is finite, because then the indexs(C) will be finite too, and we have then got a uniform bound of the packingnumber for any law P .

Theorem 3.2.1. For any measurable space (X ,A) and C ⊂ A, dens(C) ≥ s(C).

Proof. In the first part of the proof, dens(C) ≥ s(C) is showed. Let P be aprobability measure on A and let ε ∈]0, 1]. By definition dens(C) is the infi-mum a all real numbers r such that mC(n) ≤ Mnr for all n, with M some con-stant. It will be enough that for each s > r, for some constant K = K(s):D(ε, C, dP ) ≤ K(s)ε−s.Letm ≤ D(ε, C, dP ), then there are someA1, · · · , Am ∈ C satisfying dP (Ai∆Aj) >ε for all 1 ≤ i < j ≤ m. We show that for n large mC ≥ m.

As usual Xi denote the coordinates on the countable product spaces(X∞,A∞, P∞) each i.i.d. of law P . Let Pn be the empirical measure. Considerfor n = 1, 2, · · · :

Pr for some i 6= j, Xk /∈ Ai∆Aj, for every k ≤ n

= Pr

( ⋃1≤i<j≤m

n⋂k=1

Xk /∈ Ai∆Aj

)

≤∑

1≤i<j≤m

Pr

(n⋂k=1

Xk /∈ Ai∆Aj

)

=∑

1≤i<j≤m

n∏k=1

PrXk /∈ Ai∆Aj =∑

1≤i<j≤m

P (Ai∆Aj)n

≤∑

1≤i<j≤m

(1− ε)n ≤ m(m− 1)

2(1− ε)n.

Now an easy calculation shows that m(m− 1)2−1(1− ε)n is strictly smaller than1 if;

n > − ln(m(m− 1)

2

)/ ln(1− ε).

For such n there is a strictly positive probability that for all i 6= j

Pn(Ai∆Aj) = 1/nn∑k=1

IAi∆Aj(Xk) > 0.

It is easy now to see that mC(n) ≥ m.Let r > dens(C), then for such r there is an M = M(r, C) < +∞ for whichmC(n) ≤ Mnr for all n. Remark that − ln(1− ε) ≥ ε, since by Taylor’s formulafor x around 0: ln(1 + x) ≤ x. Then; m ≤ mC(n) ≤Mnr, for n large enough, inparticular for:

n := 2 ln(m2)ε−1 > − ln(m(m− 1)

2

)/ ln(1− ε)

If m ≥ 2 then m ≤M(2 ln(m2))rε−r, which can be written as

m(ln(m))−r ≤M1ε−r

for M1 := 4rM . Let δ > 0; then ln(m)r ≤ Cmδ, because ln(x), from a certainpoint on, grows slower than any x1/n, by de l’Hopital’s rule. Also

ln(exp(n2)) = n2 < en = (exp(n2))1/n.

So in particular for all x ≥ 0: ln(x) < n2(x)1/n and C can be taken n2. Hence forthe smallest n > r/δ:

ln(m) < n2m1/n

(ln(m))r < (dr/δe)2rmr/n < (dr/δe)2rmδ

and this holds for all m ≥ 1. Now for all m ≥ 0, recall thatlimx→0 x ln(x) = 0;

m1−δ < m(ln(m))−r(dr/δe)2r < (dr/δe)2rM1ε−r

and thusm ≤ ((dr/δe)2r)1/(1−δ)(M1ε

−r)1/(1−δ)

Let M2(r, C, δ) := (dr/δe)2r)1/(1−δ)(M1)1/(1−δ), then M2 < +∞ for all δ > 0.also for δ small 1/(1− δ) > 1. And r/(1− δ) > 1 > r ≤dens(C). Hence lettingr ↓ dens(C) and δ ↓ 0, gives dens(C) ≥ s(C).

Remark. Actually, both, dens(C) and s(C) are equal, but this inequality is suf-ficient for our purposes later on in chapter 5. For a complete proof, we referto [Dud1] theorem 4.6.1 on page 156–157.

Definition 3.2.4. Let X be a set and F a class of real–valued functions on X . Letf ∈ F , the subgraph of f is the subset:

(x, t) : 0 ≤ t ≤ f(x) if f(x) > 0 or, f(x) ≤ t ≤ 0 if f(x) < 0.

of X × R.If D is a class of subset of X × R, and if for all f ∈ F the subgraph of f lies

in D, then F will be called a subgraph class .If D is a VC class, then F is called a VC subgraph class .

As the previous theorem, i.e. theorem 3.2.1 , the next theorem provides auniform bound of the packing number, for any Q probability measure on a finiteset.

Theorem 3.2.2. Let 1 ≤ p < +∞. Let (X ,A, Q) be a probability space andF a VC class subgraph class of measurable real–valued functions on X . LetF ∈ Lp(X ,A, Q) be an envelope function for F , such that

∫F dQ > 0, and

F ≥ 1. Let C be the collection of all the subgraphs of functions in F .Then for any W > S(C), there is an A := A(W,S(C)), A < +∞:

D(p)F (ε,F , Q) ≤ A(2p−1/εp)W for 0 < ε ≤ 1.

Proof. Let ε ∈]0, 1[ and m maximal and f1, · · · , fm:∫|fi − fj|p dQ > εp

∫F p dQ for 1 ≤ i < j ≤ m,

which can be done since D(p)F (ε,F , Q) is finite.

Suppose p = 1, then for any B ∈ A, let

(FQ)(B) :=

∫B

F dQ and QF := (FQ)/Q(F ),

where Q(F ) =∫F dQ, then QF is a probability measure on A. Furthermore let

k = k(m, ε), be the smallest integer such that

exp(kε

2

)>

(m

2

); (3.2.1)

then k ≤ 1 + (4 ln(m))/ε. Let X1, · · · , Xk be i.i.d. with law QF . Given Xi,define Yi as random variables uniformly distributed on [−F (Xi, F (Xi], such thatthe vectors (Xi, Yi) are independent 1 ≤ i ≤ k. Denote the subgraph of fj by Cj ,1 ≤ j ≤ m. For all i and j 6= s:

Pr((Xi, Yi) ∈ Cj∆Cs)

=

∫|fj(Xi)− fs(Xi)|

2F (Xi)dQF (Xi)

this since conditionally onXi = xi, Yi is uniformly distributed over [−F (xi), F (xi)].And

λ(t ∈ [−F (xi), F (xi)] : (xi, t) ∈ Cj∆Cs) = |fj(xi)− fs(xi)|.

Indeed, let max(fj(xi), fs(xi)) = M(xi) and min(fj(xi), fs(xi)) = m(xi) if bothfj(xi), fs(xi) ≥ 0, we take all t ∈ [M(xi),m(xi)[, if both fj(xi), fs(xi) ≤ 0, thentake t ∈]M(xi),m(xi)], if one is nonnegative and the other nonpositive then wehave t ∈ [M(xi),m(xi)]. So whatever the case: we have alwaysM(xi)−m(xi) =|fj(xi)− fs(xi)|.

We continue: ∫|fj(Xi)− fs(Xi)|

2F (Xi)dQF (Xi)

=

∫|fj(Xi)− fs(Xi)|

2F (Xi)

F (Xi)

Q(F )dQ(Xi)

=

∫|fj(Xi)− fs(Xi)| dQ(Xi)/(2Q(F )) > ε/2.

The last equation is valid since p = 1, m = D(1)F (ε,F , Q) and fi where chosen

such that∫|fi − fj| dQ > εQ(F ).

Let Ajsk be the event that (Xi, Yi) /∈ Cj∆Cs for all 1 ≤ i ≤ k. By indepen-dence of the vectors (Xi, Yi) and by the previous inequality:

Pr(Ajsk) ≤(

1− ε

2

)k≤ exp

(−kε2

)and

Pr(∪j 6=sAjsk) ≤(m

2

)exp

(−kε2

)≤ 1,

because k was chosen as the smallest positive integer such that equation 3.2.1 isvalid. So with positive probability there exists an i such that (Xi, Yi) ∈ Cj∆Csfor all j 6= s. We denote the first couple with those property by (Xl, Yl). Thismeans nothing more than that all sets Cp are different. Hence mC(m) ≥ m, sinceotherwise we would have less than m different sets (as in the proof of theorem3.2.1 ).

Let S := S(C). By Sauer’s lemma ( 3.1.2 ) and the Vapnik–Cervonenkisproposition (3.1.3 ), mC(k) ≤ 1, 5kS/S!, for all k ≥ S + 2. Hence for someconstant C depending only on S, mC(k) ≤ CkS . So

m ≤ mC(k) ≤ CkS ≤ C(

1 +4 ln(m)

ε

)S.

As in the proof of theorem 3.2.1, for any α > 0, there is an m0 such that for allm ≥ m0: 1 + 4 ln(m) ≤ mα, and then m1−α ≤ Cε−S . We choose α > 0 smallenough such that αS < 1, this implies m ≤ C1ε

−S/(1−αS) for all m ≥ m0, and C1

some constant. For any W > S, e.g. W = S1−αS , we can solve it for α = W−S

WS.

For such α, we certainly have αS < 1 and since ε < 1, ε−W > 1, so m ≤ Aε−W

where A := max(C1,m0), where m0 depends only on W and S (through thechoice of α), so that A is a function of W and S only.

The theorem is true for p = 1.

Consider the case of 1 εpQ(F p); for i 6= j,

εp∫F p dQ <

∫|fi − fj|p dQ ≤

∫|fi − fj|(2F )p−1 dQ (3.2.2)

=

∫|fi − fj| dQ2F,p Q((2F )p−1)

=

∫|fi − fj| dQF,p Q((2F )p−1),

(recall that we assumed: F ≥ 1). The last equation is valid, becauseQ2F,p = QF,p.Let

δ :=εpQ(F p)

QF,p(F )Q((2F )p−1)

plugging that value for δ in equation 3.2.2:

δQF,p(F ) =εpQ(F p)

Q((2F )p−1)

QF,p(F )

QF,p(F )<

∫|fi − fj| dQF,p.

By the case p = 1 we conclude that the following inequalities

D(p)F (ε,F , Q) ≤ D

(1)F (δ,F , QF,p) ≤ Aδ−W

hold.We simplify our expression for δ:

δ =εpQ(F p)

QF,p(F )Q((2F )p−1)=

εpQ(F p)

2p−1 Q(F p)Q(F p−1)

Q(F p−1)=

εp

2p−1.

As claimed:

D(p)F (ε,F , Q) ≤ Aδ−W = A

(2p−1

εp

)W.

Chapter 4

On measurability.

As we mentioned earlier, the empirical measure, and process or even the supre-mum over a possibly uncountable class of functions, they all don’t have to bemeasurable. Here we provide a way to ensure that measurability, or some moregeneral measurability conditions, like universal measurability, are satisfied.

4.1 Admissibility.Let X be a set, and F a family of real–valued functions on X , such that they aremeasurable for some σ–algebra A on X . Then consider the natural map

F × X → R : (f, x) 7→ f(x).

In general there will not be any σ–algebra on F such that the evaluation map isjointly measurable.

Definition 4.1.1. Let (X ,B) be a measurable space. Then (X ,B) will be calledseparable if B is generated by some countable family C ⊂ B and B contains allsingletons x, x ∈ X . From now on, a space (X ,B) will always be assumed tobe a separable measurable space!

Let F be a family of real–valued functions on X . Then F is said to be admis-sible iff there is a σ–algebra T on F such that the evaluation map

F × X → R : (f, x) 7→ f(x).

is T ⊗ B,R–measurable, whereR is the Borel σ–algebra on R. If this condition,namely joint measurability, is satisfied, T will be called an admissible structurefor F .F will be called image admissible via (Y,S, T ) if (Y,S) is a measurable spaceand T is a function from Y onto F such that the map

Y ×X → R : (y, x) 7→ T (y)(x)

47

48 CHAPTER 4. ON MEASURABILITY.

is jointly measurable. This last two definitions can also be applied to classes ofsets C by letting F := IC : C ∈ C.

Remark. Note that every separable metric space equipped with its Borel σ–algebra,is a separable measurable space in the sense of the above definition. Metrizabilityimplies the Hausdorff property, i.e. singletons are closed, hence Borel. Secondly,in metric space separability is equivalent with the second countability property oftopological space, i.e. the topology has a countable base. So since the topologyhas a countable base, the σ–algebra generated by that countable base equals theBorel σ–algebra.If one considers general, (Hausdorff) separable topological spaces, not necessarilymetrizable, and hence not necessarily satisfying the countable base axiom. Thenthe space with its Borel σ–algebra could possible not be a separable measurablespace anymore. Indeed consider e.g. RR with product topology, i.e. all real valuedfunction on R, with the topology of pointwise convergence. It is well known thatthis space is not metrizable and is separable and Hausdorff (property A.1.8 for thenon metrizability, the remark in the proof of property A.1.9 for the separabilityand [Will] chapter 5 theorem 13.8(b) on page 87 for the Hausdorff property andchapter 5 theorem 16.2(c) on page 108 for the non existence of a countable base).Hence RR with Borel σ–algebra is not a separable measurable space. Indeed, asfor the case for the Borel σ–algebra on R, one shows that the size of a countablygenerated (Borel) σ–algebra is bounded by the cardinality of the continuum (c).On the other hand the product topology on RR is at least as large as the space it-self, since the singletons are all closed sets. And so its cardinality is at least that ofRR, which is strictly greater than c. For the second claim, recall that Card(RR) =Card(P(R)) > Card(R) = c. And for the first claim follows from an argumentusing transfinite induction. As later on, before theorem 4.1.2, one defines a Borelhierarchy as follows. Let (W,≤) be an uncountanble well ordered set, such thatevery segment is at most countable. Then

A0 := any countable class of Borel setsAx := the class formed out of all differences

and (at most) countable unions of elements of Ay

for any x with finite segment and y := maxa : a < x. And finally for theinfinite segments

Az := the class formed out of all differencesand (at most) countable unions of elements of ∪t<z At

One proves that B = ∪w∈(W,≤)Aw, and notes that each class Aw has cardinality

4.1. ADMISSIBILITY. 49

less than c, hence

|B| = | ∪w∈(W,≤) Aw| ≤∑

w∈(W,≤)

|Aw| ≤∑

w∈(W,≤)

c = c · c = c.

We refer the interested reader to [Jech] chapter 1 section 6 for cardinal arithmeticand chapter 7 section 39 for a description from the inside of (Borel) σ–algebra’s.

Example 4.1.1. Let us first consider an example of such X and F . Take (K, d) acompact metric space, and F some set of continuous real–valued functions onK, such that F is compact for the supremum norm on C(K). Then, by theArzela–Ascoli theorem , see theorem A.1.18, the functions in F are uniformlyequicontinuous on K, definition A.1.10, i.e. for all ε > 0 there is a δ > 0 suchthat for all x, y ∈ K and f ∈ F one has

|f(x)− f(y)| < ε whenever d(x, y) < δ.

It follows that the evaluation map is jointly continuous. Indeed for a neighbour-hood BR(f(x), ε) of f(x) the set B(C(K),d∞)(f, ε/2)×B(K,d)(x, δ), with δ chosensuch that if

d(x, y) < δ then |g(x)− g(y)| < ε/2 for all g ∈ F ,

is a neighbourhood of (f, x) that is sent by the evaluation map in BR(f(x), ε).Indeed, let (h, z) ∈ B(C(K),d∞)(f, ε/2)×B(K,d)(x, δ):

|eval(f, x)− eval(h, z)| < |f(x)− f(z)|+ |f(z)− h(z)| < ε,

where the first term is small by uniform (equi)continuity of f , and the second bydefinition of the supremum metric.Now K, which plays the role of X , is chosen to be a compact set, in particular Kis separable. (C(K), d∞) is a metric space and by the Stone–Weierstrass theoremA.1.20 it is separable too. F endowed with the relative topology is separable asa subspace of a metric and separable space. By A.2.6 the Borel σ–algebra of theproduct topology is the same as the product σ–algebra generated by the productof the two Borel σ–algebras.

Therefore the map is jointly measurable andF admissible for the Borel σ–algebragenerated by the relative topology of uniform convergence on F .

Theorem 4.1.1. For any separable space (X ,B), there is a subset Y of I := [0, 1]and a 1–1 function M from X onto Y which is a measurable isomorphism for theBorel σ–algebra on Y .


Proof. The Borel sets of Y are the intersections of Borel sets of I with Y . LetA := Ajj≥1 be a countable set of generators for B. Consider the map,

f : X → 2∞ : x 7→ IAj(x)j≥1,

with the discrete on 0, 1 and 2∞ := 0, 1N equipped with the product topology(see definition A.1.6 ). Then f is a 1–1 function and onto its range f(X ) = Z, seelemma A.2.7.

Moreover f is a measurable isomorphism of X onto Z. First consider Cj :=π−1j 1, Dj := π−1

j 0. Note that such sets form a subbase for the product topol-ogy on 2∞, see definition A.1.6. Because the space is separable and metriz-able (propositions A.1.9 and A.1.8). And hence second–countable, propositionA.1.3; therefore open sets are countable unions of finite intersections of thosesubbase elements, hence open sets are in the σ–algebra generated by the sub-base of the product topology and thus equal the Borel σ–algebra of 2∞. Thenf−1(C) = Aj, f

−1(D) = ACj , which are measurable. Now conversely, to provemeasurability of f−1 : Z → X , it is enough to look at the generating sets Aj andbecause f is a bijection (preserves unions and intersections):

f(Aj) = xnn≥1 ∈ Z : xn = 1 = π−1n (1) ∩ Z = Cn ∩ Z

Where πn :∏

m≥10, 1m → 0, 1. Cn∩Z is a Borel set of the relative σ–algebraon Z, because Cn is Borel (because closed) of 2∞ . Next we look at the map

g : 2∞ → I : zjj≥1 7→∞∑j=1

2zj3j.

Then g(2∞) is the Cantor set. The Cantor set can also be described recursively asa countable intersection of closed set (see e.g. [Will] chapter 6 section 17). LetC1 = [0, 1/3] ∪ [2/3, 1] and Cn the subset of I where we removed the (open)middle third of the intervals [j/3n−1, (j + 1)/3n−1], j = 0, · · · , 3n−1 − 1. ThenC := g(2∞) = ∩nCn. Using that representation g is seen to be 1–1, because ifx 6= y then let n ∈ N be the first index such that: xn 6= yn, so in the n–th step x liesin the first or third part, and y in the other, of such an interval [j/3n−1, (j+1)/3n−1]for some j.If we equip 2∞ with the product topology (definition A.1.6 ) then it is metrizable(lemma A.1.8 ), and one may take the following metric:

e(x, y) :=∞∑j=1

d(xj, yj)/2j,


where d denotes the discrete metric on 0, 1. Now it is easy to see that g isLipschitz continuous and thus continuous.

|g(x)− g(y)| =∣∣∣ ∞∑j=1

2xj3j−∞∑j=1

2yj3j

∣∣∣ =∣∣∣ ∞∑j=1

2(xj − yj)3j

∣∣∣≤

∞∑j=1

2|xj − yj|3j

≤ 2∞∑j=1

|xj − yj|2j

= 2e(x, y)

By Tychonoff’s theorem A.1.16 2∞ is a compact topological space, I is a Haus-dorff topological space and g is a continuous bijection, then it is a well knownfact that g is a homeomorphism (theorem A.1.7). Because the product σ–algebraand the Borel σ–algebra on 2∞ are the same g is easily seen to be a measurableisomorphism between 2∞ and g(2∞). So the restriction of g to Z is a measurableisomorphism onto its range Y . The composition g f , called the Marczewskifunction,

M(x) :=∞∑j=1

2IAj(x)

3j,

is 1–1 from X onto Y ⊂ I := [0, 1] and is also a measurable isomorphism ontoY .

Let (X ,B) be a separable space, where B is generated by A1, A2, · · · . Bytaking the union of the (finite) algebras generated by A1, A2, · · · , An; we can anddo take A := Ajj≥1 to be an algebra.Indeed, let Gn be the algebra generated by Ai, 1 ≤ i ≤ n. Then by propositionA.2.1, Gn is finite. LetA := ∪nGn, thenA is at most countable and note that Gn ⊂Gn+1, so that it is an algebra, it trivially contains X , is contains complements,and finite unions, since for a finite number m of sets, there exists an algebra,namely some Fk, who contains all of the sets. Remark that the same does nothold for σ–algebras, consider on ]0, 1] the finite σ–algebras generated by (refined)partitions of ]0, 1]:

An := A(] j

2n,j + 1

2n

]: 0 ≤ j ≤ 2n − 1

);

none of the An contains 1, but 1 = ∩n](2n − 1)/2n, 1] and the latter term iscontained in σ(∪nAn).

Let F0 be the class of all finite sums∑n

j=1 qjIAj with rational qj and n =1, 2, · · · . Now we are going to define the Borel or Banach classes . Here we willneed the existence of a well–ordered set with certain properties. Recall that awell–ordered set is a pair (X ,≤), where X is a set and ≤ is a partial order on

X , i.e. ≤ is reflexive (a ≤ a), antisymmetric ( if a ≤ b and b ≤ a, then a = b) andtransitive (a ≤ b and b ≤ c, then a ≤ b) (for all a, b, c ∈ X ). Moreover that partialorder must be linear , i.e. a < b or b < a for all different a, b ∈ X , and such thatevery subset of X has a smallest element.Let W be uncountable and such that for every a ∈ W ; the segment Wa, i.e.x ∈ W : x ≤ a, x 6= a = x ∈ W : x < a, is at most countable. Such a setexists, one can e.g. take Ω the first uncountable ordinal with its usual order. Formore informations about ordinals we refer the (interested) reader to the excellentbook on Real and Abstract Analysis by Hewitt and Stromberg, chapter 1 section4.

Denote by 0 the smallest element of (W,≤) which exists, since W was under-stood to be a well–ordered set. We proceed by transfinite induction to define Fvfor all v ∈ W . Since F0 is already defined, then we will define Fw, supposing Fvfor v < w are defined, let Fw the class of all functions f who are the everywherelimit of functions from Fv, v < w. Let U := ∪v∈WFv.

Theorem 4.1.2. U is the set of all measurable real functions on X , with (X ,B) aseparable space.

Proof. By construction of U every function it contains, is measurable. Indeed ev-ery measurable function can be written as the almost everywhere pointwise limitof simple functions, then changing the function and the simple functions on a setof measure zero, one has everywhere pointwise convergence.

Conversely, letC := B ∈ B : IB ∈ U.

Step 1 By definition of F0, C contains A, the generating algebra. We show thatC has some properties, namely it is a monotone class (definition A.2.4).Let Bn ∈ C for n ≥ 1, then IBn ∈ U for every n, and thus IBn ∈ Fvn for somevn ∈ (W,≤). Denote the segment of vn (i.e. all x ∈ W : x < vn) by Wvn ,then Wvn ∪ vn is an at most countable subset of W (by definition of (W,≤));then also ∪n(Wvn ∪ vn) is at most countable. (W,≤) was chosen to be anuncountable set such that every segment is countable, so:

V := W\ ∪n (Wvn ∪ vn) 6= ∅.

Choosew ∈ V and a further element (w <)w′ ∈ (W,≤), which can also be found,since W\(Ww ∪w) is non empty ((W,≤) being uncountable, the segment of wbeing at most countable). Then clearly IBn ∈ Fw ⊂ Fw′ . If there is an everywherelimit for IBnn≥1, then it necessarily belongs to Fw′ , and a fortiori to C.


• suppose that Bn ↑ B, for some B ∈ B. Let x ∈ B, then x ∈ ∪nBn, sox ∈ Bk0 for some k0. From Bn ⊂ Bn+j , one gets IBm(x) = 1 = IB(x) forall m ≥ k0. And if x /∈ B, then x /∈ Bn for any n (implyingIBn(x) = 0 = IB). So in any case: IBn(x) → IB(x) as n → ∞ for allx ∈ X .

• suppose Bn ↓ B, then for x ∈ B, x lies in every Bn andIBn(x) = 1 = IB(x). On the other hand if x /∈ B, x /∈ Bk for some k0.Since Bm ⊃ Bm+p for all p ≥ 1, x /∈ Bm for all m ≥ k. Thus in both cases:IBn(x)→ IB(x) as n→∞.

We deduced that for Bn ↑ B or Bn ↓ B; IBn → IB everywhere, and thusIB ∈ Fw′ , by definition ofFw′ and thus IB ∈ U . This assures that C is a monotoneclass, and since it contains A, the algebra generating B, by the Monotone ClassTheorem ( A.2.10 ), C = B.

Step 2.1 In the next step we fix real constants a, b and a set A ∈ A. Let

C(a,b)A := B ∈ B : aIA + bIB ∈ U.

A ⊂ C(a,b)A is easy to see. Indeed for E ∈ A consider anIA + bnIE , where an, bn

are rationals converging to a and b and note that anIA + bnIE → aIA + bIE andanIA + bnIE ∈ F0 for all n. As in step 1 we prove that C(a,b)

A is a monotone class,and as such must equal B. Let Bn ∈ C(a,b)

A for all n ≥ 1.

• suppose Bn ↑ B, as is the case in step 1; IBn ↑ IB for all x ∈ X henceaIA + dIBn → aIA + bIB everywhere.

• suppose Bn ↓ B, as is the case in step 1; IBn ↓ IB for all x ∈ X henceaIA + bIBn → aIA + bIB everywhere.

and in both cases aIA + bIBn and aIA + bIB ∈ Fw′ for some w′ ∈ W , by anargument as in step 1. And for all a, b ∈ R, for all A ∈ A, by the monotone classtheorem (A.2.10) C(a,b)

A = B.Step 2.2 We continue by letting:

C(c,d)C := D ∈ B : cIC + bID ∈ U, with c, d ∈ R.

We claim : C(c,d)D ⊃ A. Indeed, by step 2.1, for any A ∈ A: C(d,c)

A = B, whichmeans dIA + cIC ∈ U for any C ∈ B. This is the same as saying: cIC + dIA ∈ U ,which is equivalent with: C(c,d)

C ⊃ A for any c, d ∈ R, C ∈ B.Now, by arguments as in step 1 (using the monotone class theorem), it is not hard

to see that for any c, d ∈ R, C ∈ B, C(c,d)C = B.

As a temporary conclusion: eIE + fIF ∈ U for any e, f ∈ R, E,F ∈ B.

Step 3 Then we go to classes of the form:

C(c1,c2,c)A1,A2

:= C ∈ B : c1IA1 + c2IA2 + cIC ∈ U.

for c1, c2, c arbitrary fixed real constants, and A1, A2 fixed elements ofA. C(c1,c2,c)A1,A2

will turn out to be a monotone class, and since A ⊂ C(c1,c2,c)A1,A2

, it will equal B.Letting

C(c1,c2,c)B1,B2

:= C ∈ B : c1IB1 + c2IB2 + cIC ∈ U.

We have, again by similar arguments as above, that C(c1,c2,c)B1,B2

contains A, and isa monotone class, so equals B. One proceeds first by showing that C(c1,c2,c)

A,B andC(c1,c2,c)B,A equal B, for any c1, c2, c ∈ R, A,∈ A and B ∈ B.

It is clear that we can repeat those steps inductively for any n real constants and nelements of B. In particular

∑ni=1 ciIBn lie in U .

Step 4 Let f be a measurable function. Decompose f as f+ − f−, wheref+ := max(f, 0) and f− := max(−f, 0). Each term, is a measurable function,and is the everywhere limit of simple functions (theorem A.2.12). So one has twosequences of simple functions, they can be merged in one simple sequence. Wewill now argue as in step 1 to have that f ∈ U . We continue with the mergedsequence fn, which are simple functions, and by our previous discussion (steps1 − 3) we know that each fn belongs to U , so to some Fvn . Each segment Wvn

with top vn is at most countable, and a countable union of at most countablesets remains at most countable. Since W is uncountable, there is some elementw ∈ (W,≤): vn < w for all n. Then all fn are contained in Fw. Since Ww ∪ wis again at most countable, there is a w′ ∈ (W,≤): w < w′, then f ∈ Fw′ aseverywhere limit of fn.

The following main theorem provides useful characterizations of the admissi-bility condition of a class of (real–valued measurable) functions.

Theorem 4.1.3 (Aumann). Let I := [0, 1] with its usual Borel σ–algebra. Givena separable measurable space (X ,B) and a class F of measurable real–valuedfunctions on X , the following are equivalent:

i) F ⊂ Fw for some w ∈ (W,≤);

ii) there is a jointly measurable function G : I × X → R such that for eachf ∈ F , f = G(t, ·) for some t ∈ I;

iii) there is a separable admissible structure for F;

iv) F is admissible;

v) 2F is an admissible structure for F;

vi) F is image admissible via some (Y,S, T ).

Proof. The implication of (iii) to (iv) is trivial, as is the one from (iv) to (v).From (v) to (iv) follows also immediately. From (iv) to (iii), every real–valuedmeasurable function H is always measurable w.r.t. some countably generatedsub–σ–algebra (e.g. the one generated by the sets H > q for rational q). Inparticular one has that the evaluation map is measurable for a countably gener-ated sub–σ–algebra. Let T be the admissible structure for F; we have got:

eval : (F × X , C)→ (R,B(R)) : (f, x) 7→ eval((f, x)) := f(x)

where C := σ( eval > q : q ∈ Q) ⊂ T ⊗ B. So C is countably generated.Aq := eval > q ∈ T ⊗ B = σ(T ⊗ B). By lemma A.2.5, each Aq lies inthe σ–algebra generated by (at most) countably many measurable rectangles, i.e.by T (q)

i × B(q)i , i ≥ 1. But then C is also generated by at most countably many

measurable rectangles, namely T (q)i ×B

(q)i , q ∈ Q and i ≥ 1. Let

D := σ(T (q)i : q ∈ Q, i ≥ 1

:= σ(Rj : j ≥ 0

where we renumbered the sets T (q)i

It suffices now to show that f ∈ D for all f ∈ F , since then D would beseparable; i.e. countably generated and contain the singletons. Let f ∈ F , weclaim that

f =

( ⋂j∈N1

Rj

)∩

( ⋂k∈N2

(Rk)c

)where j ∈ N1 iff f ∈ Rj and k ∈ N2 iff f /∈ Rk. That f is contained inthat intersection is trivial. Let g 6= f , then for some x ∈ X : f(x) 6= g(x). Thefunction eval(x, ·) : F → R : h 7→ h(x) is D measurable, as seen in the proof ofTonelli –Fubini theorem A.2.17. Let |f(x)− g(x)| = ε > 0, theng /∈ h ∈ F : |f(x) − h(x)| < ε/2; where the latter lies in D. If we wouldsuppose that g is also contained in (∩j∈N1Rj)∩(∩k∈N2(Rk)

c), then we will derivea contradiction. As in the proof of lemma A.2.7, since f and g lie in Rj together

or in Rcj together, we have that f ∈ C iff g ∈ C for all C ∈ D, a contradiction

with the set h ∈ F : |f(x) − h(x)| < ε/2 lying in D. As hoped, D separatesthe elements of F , hence D is separable.

Thus (iii) ⇐⇒ (iv) ⇐⇒ (v).

The equivalence between (ii) and (iii). First assume (ii), by AC, choose foreach f ∈ F a unique t ∈ I such that f = G(t, ·) and let J be the set of all thoset’s. The restriction of the map G to J remains jointly measurable for J u B(I),the relative Borel σ–algebra on J and B, since it is the composition of (i, idX ) :J × X → I × X and G, where i : J → I the natural injection. The function(i, idX ) is measurable, since both components are measurable.

For the converse implication we will use theorem 4.1.1 to find a measurableisomorphism Φ from a subset Y of I with Borel σ–algebra onto F and Φ(t, ·) =f(·). Then by admissibility

G := eval (Φ, idX ) : Y ×X → R

is measurable, as it is the composition of the measurable functions: the evaluationmap F × X → R and the map (Φ, idX ) : Y × X → F × X (consider themeasurable rectangles which generate T ⊗B). Using theorem 4.2.5 in [Dud1] onpage 127, which states that any measurable real–valued function can be extendedto the whole domain by another measurable function, we obtain (ii).

Thus (ii) ⇐⇒ (iii).

(iv) implies (vi), let T be an admissible structure for F , in particular (F , T )is a measurable space. Let (Y,S) := (F , T ), and T : Y → F the identity. ThenF is trivially image admissible via that (Y,S, T ).

Now the converse; if (vi) holds, there is a function T : (Y,S) → F onto,and such that the map (y, x) 7→ T (y)(x) is (jointly) measurable. Now, by AC, wechoose a subset Z of Y such that T is 1–1 and onto. By SZ and TZ we denote therestriction of S and T to Z. By taking the composition of the measurable maps(i, idX ) : (Z × X ,SZ ⊗ B) → (Y × X ,S ⊗ B), where i : Z → Y denotes thenatural injection, and Y ×X → R, F remains image admissible via (Z,SZ , TZ).Because T restricted to Z is a bijection, we define a σ–algebra on F as followsT := T [A] : A ∈ SZ, where T [A] := T (a) : a ∈ A.

Thus (iv) ⇐⇒ (vi)

Finally, it only remains to prove the equivalence between (i) and (ii). Assume(ii) holds, then (I×X ,B(I)⊗B) is a separable space, for if Aii≥1 are Borel setson I that generate B(I) and Bii≥1 : σ(Bi : i ∈ N) = B. And both families ofgenerators are algebra’s, then Ak, Blk,l≥1 generates the product σ–algebra as

seen in theorem A.2.11. Thus the product σ–algebra

B(I)⊗ B := σ(A× B) where A× B := A×B : A ∈ A, B ∈ B

is countably generated.Let G be the given jointly measurable map

G : (I ×X ,B(I)⊗ B)→ R

such that for each f ∈ F : f = G(t, ·) for some t ∈ I . Then, as a real measurablefunction, G, by theorem 4.1.2, belongs to U and thus there exists a v ∈ (W,≤) :G ∈ Fv. Now the sections G(t, ·) which equal the functions from F are containedin Fv on X . For the last assertion we will proceed by transfinite induction .Let v = 0, i.e. v is the smallest element of (W,≤), then any H ∈ F0 is of theform:

H(t, x) :=n∑j=1

cjIAij×Blj (t, x), with cj ∈ Q,

and its sections:

H(t, ·) =n∑j=1

cjIAij (t)IBlj (·)

for each t ∈ [0, 1] fixed, are trivially in F0 for the algebra Bjj≥1. Supposethe assertion, i.e. the sections F (t, ·) of any function F (·, ·) ∈ Fu are in Fu forthe algebra Bjj≥1, holds for all u < w, then we have to show if H ∈ Fw, allits section H(t, ·) lie in Fw too. By our assumption H ∈ Fw, then by definitionof Fw there is a sequence Hnn≥1, with Hn ∈ Fvn , vn < w and such thatHn(z, y)→ H(z, y) as n→∞ for all (z, y) ∈ I×X . Fix t ∈ I , by our inductionhypothesis: Hn(t, ·) ∈ Fvn for all n, and trivially Hn(t, x) → H(t, x) for allx ∈ X . Hence H(t, ·) ∈ Fw for every t ∈ I .

Next assume (i) holds; in proving (ii) we will need the concept of a universalclass α function . Recall that one possible choice for (W,≤) was Ω, the firstuncountable ordinal. Ordinals and their elements, which are also ordinals, aretraditionaly denoted by Greek letters, such as α, β, γ and so on. This explains thename of such functions. Since Ω and our (W,≤) are order isomorphic we couldcall such a function also a universal class w function.

First we state the definition of such a function, a function G : I × X → R iscalled a universal class w function iff every function f ∈ Fw on X is of the formG(t, ·) for some t ∈ I . To finish the proof, we use Lebesgue’s theorem A.2.13,which asserts that such universal class α or w functions exists.

Thus (i) ⇐⇒ (ii).

Theorem 4.1.4. Let (X ,B) be a separable measurable space, 0 < p < +∞ andF ⊂ Lp(X ,B, Q) where F is admissible. Then if F is also image admissible via(Y,S, T ), U ⊂ F and U is relatively dp,Q–open in F , we have T−1(U) ∈ S.

Proof. (X ,B) is separable, thus in particular countably generated. Elements ofLp(X ,B, Q) are approximated by simple functions, which in turn are approxi-mated by the (countable) class of functions F0 as defined before Aumann’s the-orem (4.1.3 ), thus Lp(X ,B, Q) is a separable normed space. By lemma A.1.3 ametric space is separable iff it is second–countable , so for U open inLp(X ,B, Q),U can be written as a countable union of open sets, even open balls:

U = ∪n≥1f : dp,Q(f, gn) < rn with gn ∈ U and 0 < rn < +∞.

The pseudometric dp,Q depends on the value of p.

dp,Q(f, g) =

(∫|f − g|p dQ)1/p, if 1 ≤ p < +∞;

(∫|f − g|p dQ), if 0 0 and g ∈ Lp(X ,B, Q).

Let p > 0 the functions (·)1/p are continuous real–valued functions, so mea-surable. Thus we need only to prove the measurability of the maps:

y 7→∫|T (y)− g|p dQ,

for any g fixed. Let g be fixed, because T is jointly measurable the function:

X × Y → R : (x, y) 7→ |T (y)(x)− g(x)|p

is jointly measurable. So we can reduce the proof to the case to p = 1, g ≡ 0 andT (y)(x) ≥ 0 for all x ∈ X , y ∈ Y . By Tonelli–Fubini ( theorem A.2.17 ) thefunction

Y → R : y 7→∫T (y)(x) dQ(x),

is measurable.

4.2 Suslin image admissibility.In the previous section we discussed the concept of admissibility for a class F .This was useful in the sense that it provides conditions such that the evaluation

4.2. SUSLIN IMAGE ADMISSIBILITY. 59

function (x, f) 7→ f(x) is jointly measurable. What we would like is to deducemeasurability of ‖‖F the supremum over F of e.g. the empirical measure orprocess. It turns out that rather easy counterexamples for measurability of ‖‖Fexist, even when F is admissible. Therefore we need a stronger notion. Thisnotion, called Suslin image admissibility, will assure us that ‖P2n‖F , or ‖νn‖Fsatisfy some measurability condition. Here we state such an example.

Example 4.2.1. Let (X ,B) be (I,B(I)) and let P be Lebesgue measure on I .Furthermore let A be a non–Lebesgue measurable set (such a set exists, under theassumption that the Axiom of Choice (AC) is true). Let C := x : x ∈ A.

• We claim that C is an admissible class. Indeed part (ii) of Aumann’s theo-rem ( 4.1.3 ) is satisfied for the choice:

G : I ×X → R : (t, x) 7→ G(t, x) :=

1, if t = x;0, otherwise.

because G = ID where D denotes the diagonal of I × I , and is closed in aHausdorff space, hence Borel.

• We now claim that ‖P1‖C is nonmeasurable, where

‖P1‖C :(X∞,B∞, P∞)→ (I,B(I))

:xnn≥1 7→ supa∈C

|δX1(xnn≥1)(a)| = supa∈A

δx1(a).

One can also see ‖P1‖C as a function from X into I . Then ‖P1‖C = 1 =A. Since supa∈A δa(x) = 1 iff x = a for some a ∈ A. Hence ‖P1‖C isnot measurable. In fact ‖Pn‖C for any n will never be measurable, since wecan repeat the previous argument, namely ‖Pn‖C = 1 =

∏ni=1Ai, where

Ai = A for all i, a nonmeasurable set.

It is now clear than admissibility alone is not strong enough to imply measurabilityof the empirical measure or process.

Definition 4.2.1. If (X ,A) is a measurable space and F a set of functions on X ,then a function

Ψ : F × X → R(f, x) 7→ Ψ(f, x)

will be called image admissible Suslin via (Y,S, T ) iff (Y,S) is a Suslin mea-surable space, T a function from Y onto F and

Ψ (T, idX ) : (Y ×X ,S ⊗ B)→ (R,B(R)) : (y, x) 7→ Ψ(T (y), x)

is measurable.


This main theorem will have as important corollary that the supremum over aSuslin image admissible class F will be (universally) measurable.

Theorem 4.2.1 (Selection Theorem of Sainte–Beuve). Let (X ,B) be any measur-able space and Ψ : F × X → R be image admissible Suslin via (Y,S, T ). Thenfor any Borel set B ⊂ R,

ΠΨ(B) := x ∈ X : Ψ(x, f) ∈ B for some f ∈ F= x ∈ X : (x, f) ∈ Ψ−1(B) = projX (Ψ−1(B))

is a u.m. set in X , and there is a u.m. function H from ΠΨ(B) into Y such thatΨ(T (H(x)), x) ∈ B for all x ∈ ΠΨ(B).

Proof. Since F is image admissible Suslin via some (Y,S, T ) there is a Polishspace (Z,Z) and a Borel map g from Z onto Y . So F is also image admissibleSuslin via (Z, σ(Z), T g). (g, idX ) : Z × X → Y × X is measurable, so thecomposition with Ψ remains measurable.

Firstly for any measurable set V in a product σ–algebra, there are countablymany An ⊂ Y and En ⊂ X such that V is contained in the σ–generated bycountably many rectangles of the form An × En, see lemma A.2.5, since theproduct σ–algebra is generated by all measurable rectangles. Here in particularone will consider sets of the form:

V := (y, x) : Ψ(T (y), x) ∈ B ⊂ Y ×X .

Secondly Ψ is a real–valued map, and recall that the Borel σ–algebra of Ris countably generated (the topology is countably generated hence the σ–algebratoo). Exactly as in the proof of Aumann’s theorem (4.1.3) in the implication from(iv) to (iii), one can show that Ψ is measurable for an at most countably generated(sub–)σ–algebra of S ⊗ B, e.g.

σ(Ψ > q : q ∈ Q).

Combining those two facts just mentioned one has that for any B, one of the(countably many) generators of B(R) that the set:

Ψ(T (y), x) ∈ B,

is contained in a countably generated sub–σ–algebra of the product σ–algebra. Asis already said B(R) is countably generated, so one concludes that(y, x) 7→ Ψ(T (y), x) is measurable for the product σ–algebra of S (on Y ) andC := σ(An ⊂ X : An ∈ B, n ∈ N) (on X ) (exactly as in the proof ofAumann’s theorem, (iv) implies (iii)).


By repeating the argument in the proof of theorem 4.1.1 we can define a Mar-czewski function

b : (X , C)→ ([0, 1],B(I)) : x 7→ b(x) :=∑j≥1

2IAj(x)/3j

for the sequence An. In fact, b will map X in a measurable way in the Cantor set(for a defintion of the Cantor set, see proof of theorem 4.1.1). Note that b will notnecessarily be a measurable isomorphism this time, because the sub–σ–algebragenerated by An doesn’t need to be separable.

Our next step will be to rewrite Ψ in a different way. So Ψ(T (·), ·) = F (·, b(·)),for some F : Y × C → R, which is S ⊗ BC/B(R) measurable, by lemma 4.2.2.The set

(y, c) ∈ Y × C : F (y, c) ∈ B = F ∈ B,

with B a Borel set of R, is contained in S ⊗ BC by measurability of F . Our aimis to prove that F ∈ B is an analytic set (or Suslin space, which is the same).Note that C as a Borel set of the Polish space [0, 1] is Suslin, by the remark afterdefinition/theorem C.2.1. By Suslin image admissibility Y is Suslin too. If theproduct Y × C is Suslin too, then we are done, since F ∈ B as a Borel set ofa Suslin space is Suslin, see the remark after definition/theorem C.2.1. But Polishspaces are stable under products, lemma A.1.14, so Y ×C is the measurable imageof a Polish space, hence is a Suslin set.

To continue, note that by definition/theorem C.2.1, projections of analytic setsremain analytic, hence

ΠF (B) := c ∈ C : for some y ∈ Y, F (y, c) ∈ B = πCF−1(B)

is Suslin.Suslin subsets of Polish spaces need not be Borel sets, but they are universally

measurable, i.e. they are contained in the measure–theoretic completion (defi-nition A.2.2 ) of the Borel σ–algebra (of the Polish space) for any measure µdefined on the Borel σ–algebra, theorem C.3.1 . By another selection theoremC.4.1, there is a u.m. function h from ΠF (B) into Y such that F (h(c), c) ∈ B forall c ∈ ΠF (B). Using lemma A.2.4, which states that inverse images of u.m. setsthrough measurable functions remain u.m. sets, we conclude that

ΠΨ(B) := b−1(c ∈ C : for some y ∈ Y, F (y, c) ∈ B) = b−1(ΠF (B))

is a u.m. set of (X ,B). Let

H := h b : (X ,B)→ (Y,S);


then H is u.m. Indeed, for any E ∈ S, consider H−1(E) = b−1(h−1(E)). Nowsince h is u.m., by the selection theorem for analytic sets C.4.1, by definitionone gets that h−1(E) is a u.m. set in ΠF (B)(⊂ (C,B(C))) with relative Borelσ–algebra. Then, again by lemma A.2.4, b (which is measurable) preserves u.m.sets, so b−1(h−1(E)) is also a u.m. set in (X ,B). And as claimed, H is u.m.

We conclude with the following:

Ψ(T (H(x)), x) = Ψ(T (h(b(x))), x) = F (h(b(x)), b(x)) = F (h(c), c) ∈ B

for all x ∈ ΠΨ(B) = b−1(ΠF (B)), because Ψ(T (·), ·) = F (·, b(·)) and for allc ∈ C; (h(c), c) ∈ F ∈ B.

Lemma 4.2.2. Let (X , C) and (Y,S) be measurable spaces and

φ : (Y ×X ,S ⊗ C)→ (R,B(R))

a measurable function. Then there is a S ⊗ B(C)/B(R) measurable functionF : Y ×C → R : (y, c) 7→ F (y, c) , where C is the Cantor set and B(C) its Borelσ–algebra and such that φ(y, x) = (F (idY , b))(y, x) for all (y, x) ∈ Y ×X . Asin theorem 4.2.1, b : X → C is a Marczewski function..

Proof. We will work in different steps and use the monotone class theorem (A.2.10).

(i) In the first step we consider indicator functions of some measurable rectan-gle, so let IA×B;A ∈ S, B ∈ B. Then if we define

F := IA×b−1(B) (T, idC),

the result holds. For a finite weighted sum of indicators∑n

i=1 ciIAi×Bi; wecan take a F as

∑ni=1 ciIAi×b−1(Bi) (T, idC). Hence the result holds for

finite sums of weighted indicator functions of measurable rectangles.

(ii) In the second step we prove that indicator functions of finite unions of mea-surable rectangles satisfy the condition of the lemma.

This follows easily from collecting some trivial facts: first of all σ–algebra’sare in particular semirings, and a (finite) product of semirings is again asemiring ( [Dud1] proposition 3.2.2 on page 95 for a proof). Hence theclass of measurable rectangles is contained in a semiring.

Next it is also well known that the class S+ existing of all finite disjointunions of elements of a semiring is a ring, and in fact is even the ring gen-erated by the semiring R(S ⊗ C), i.e. the smallest ring containing the el-ements of the semiring. (See [Dud1] proposition 3.2.3 on page 96 for a


proof.) Since a ring is closed under finite unions, finite unions of measur-able rectangles are contained inR(S ⊗C) and thus also in S+. Elements ofS+ are finite unions of mutually disjoint sets, hence a finite union of mea-surable rectangles has a representation as a finite disjoint union of (other)measurable rectangles.

If we combine this with the result of the first step, we have that indicatorfunctions of finite unions of measurable rectangles, i.e. I∪ni=1Ai×Bi satisfythe condition of the lemma.

(iii) In the third step we claim that the class D of all finite unions of measurablerectangles is an algebra. This follows directly from noting that D = S+ bythe previous discussion in (ii). Now S+ is a ring and contains the wholespace Y ×X , hence is an algebra.

(iv) In the fourth step we show that the lemma is true for the indicator functionof any measurable set. In the third step we had that

D := n⋃i=1

(Ai ×Bi) : Ai ∈ S, Bi ∈ C

is an algebra. Consider the class

E := C ∈ A⊗ B : IC satisfies the statement of the lemma .

In the second step we had that indicator functions of elements of D can bewritten as an F : Y ×C → R, in other words: D ⊂ E . We claim that E is amonotone class, definition A.2.4.

Let En ↑ E, with En ∈ E , then IEn ↑ IE everywhere, and Fn ↓ F ,with Fn ∈ F , then IFn ↓ IF everywhere. In both cases we have func-tions ψn : Y × X → R, such that there exists a Fn : Y × C → R, withψn(y, x) = Fn(y, b(x)) and the sequence ψn converges pointwise to some (S ⊗ C measurable) ψ. Define

F : (Y × C,S ⊗ B(C))→ (R,B(R)

: (y, c) 7→ F (y, c) :=

limn→∞ Fn(y, c) whenever the limit exists;0 otherwise.

As seen in the proof of theorem 4.2.5 on page 127 the set where the Fnconverge is measurable, hence F is measurable. For any (y, x):

ψ(y, x) = limn→∞

ψn(y, x) = limn→∞

Fn(y, b(x)) = F (y, b(x))


since the limit clearly exists. So in particular IE, IF ∈ E . So E is a mono-tone class, that contains an algebra, namely D, which generates productσ–algebra S ⊗ C (= σ(S × C ) ). By the monotone class theorem ( A.2.10 )E = S ⊗ C.

(v) Let ψ be any S ⊗ C measurable function, by theorem A.2.12, there exists asequence of (measurable) simple functions ψn which converges pointwiseeverywhere to ψ. Those ψn are weighted sums of indicators functions, indi-cator functions satisfy the lemma, by the fourth step, and as in the first stepweighted sums of indicators which satisfy the lemma, also satisfy the con-dition of the lemma. Then again as in the fourth step, ψ, as the everywherelimit of the ψn satisfy the lemma.

In the last step we proved that any measurable function satisfy the condition ofthe lemma, hence we are done.

An easy corollary will assure that taking the supremum over a class of func-tions of an image admissible map gives a universally measurable set.

Corollary 4.2.2. Let (X ,A) be a measurable space, F set of real–valued mea-surable functions from X and (f, x) 7→ Ψ(f, x) be real–valued and image ad-missible Suslin via some (Y,S, T ). Then x 7→ supΨ(f, x) : f ∈ F andx 7→ sup|Ψ(f, x)| : f ∈ F are u.m. functions.

Proof. The class of all u.m. sets is a σ–algebra, by theorem A.2.3. Then becauseσ(f−1(C)) = f−1(σ(C)) and the Borel σ–algebra on R is generated by sets of theform ]t,+∞[ it is enough to prove that sets like

x ∈ X : supf∈F

Ψ(f, x) ∈]t,+∞[

are universally measurable. This will follow from Saint–Beuve’s Selection Theo-rem ( 4.2.1 );

x ∈ X : supf∈F

Ψ(f, x) ∈]t,+∞[ = x ∈ X : Ψ(f, x) > t, for some f ∈ F

= ΠΨ(]t,+∞[).

The latter set is universally measurable.

Finally we state a theorem which proves that the image admissible property offinitely many classesFi is preserved when composing with a measurable function.


Theorem 4.2.3. Let Ψi; i = 1, · · · , d be image admissible Suslin real–valuedfunctions on X × Fi, for one measurable space (X ,A), via some (Yi,Si, Ti).

Let g : Rd → R be a Borel measurable function, then

(X × F1 × · · · × Fd)→ R : (x, f1, · · · , fd) 7→ g(Ψ1(x, f1), · · · ,Ψd(x, fd))

is image admissible Suslin.

Proof. It suffices to note that, by lemma A.1.15, (Y d,⊗di=1Si) is again a Suslinmeasurable space and

T : (Y d,⊗di=1Si)→d∏i=1

Fi : (y1, · · · , yd) 7→ (T1(y1), · · · , Td(yd))

is onto and

Ψ T : (x, y1, · · · , yd) 7→ g (Ψ1(x, T1(y1)), · · · ,Ψd(x, Td(yd)))

is A ⊗ S,B(R) measurable as a composition of two measurable functions. g isgiven to be measurable and the second can be written as a composition of

X × Y d → (X × Y )d → R(x, y1, · · · , yd) 7→ ((x, y1), · · · , (x, yd)) 7→ (Ψ1(x, T1(y1)), · · · ,Ψd(x, Td(yd)))

Since Rd is separable and (X × Y )d is also separable (measurable space) it isenough, to look at the component functions. Ψi(·, Ti(·)) are measurable functionsfor 1 ≤ i ≤ d. And (x, y1, · · · , yd) 7→ (x, yi) for 1 ≤ i ≤ d as a projection ismeasurable too.

Chapter 5

Uniform limit theorems for theempirical process and measure.

5.1 Entropy and Covering Numbers.For F a class of measurable functions on (X ,A) let

FF(x) := sup|f(x)| : f ∈ F = ‖δx‖F ,

where δx : F → R is linear functional.A measurable function F ≥ FF is called an envelope function for F . If FF

is A–measurable, then it is said to be the envelope function. Given a law P on(X ,A) we call F ∗F the essential infimum of FF the envelope function of F for P .

Definition 5.1.1. Let Γ be the set of all laws on (X ,A) of the form n−1∑n

j=1 δxjfor some xj ∈ X and j = 1, · · · , n, n ∈ N0. For ε > 0, 0 εp

∫F p dγ

.

logDpF (ε,F , γ) is called the Koltchinskii Pollard entropy.

One also sets supγ∈ΓDpF (ε,F , γ) = Dp

F (ε,F).

In other words m = DpF (ε,F , γ) is the greatest non negative integer such that

for m different functions of F such that no function fi is contained in the closedball of any fj, j 6= i with radius ε(

∫F p dγ)1/p, for p ∈ [1,+∞[ and εp

∫F p dγ for

67

68 CHAPTER 5. UNIFORM LIMIT THEOREMS.

p ∈]0, 1[. Also having an envelope function that is finite, allows us to look at theunit ball of Lp(X ,A, γ). This entropy number is also easily seen to be a coveringnumber. Indeed if some part of F were not covered by those balls then one couldeasily add one such that the condition remains satisfied. In fact Dp

F (ε,F , γ) isthe greatest number of functions of F one can pick such that: the closed balls ofradius ε(

∫F p dγ)1/p cover F and such that the centers don’t overlap.

5.2 A Symmetrization Lemma.Our objective is to try to bound the probability of a not necessarily measurablesupremum. In proving limit theorems it is often useful to make use of symmetriza-tion. If we have an empirical process νn, then we can construct two (or actuallyas many as we want) identical and independent copies ν ′n and ν ′′n of the originalprocess νn. The Koltchinskii–Pollard method provides a way to bound the proba-bility of the original empirical process by the probability of the difference of thetwo copies. This difference is easier to handle when conditioned in a suitable way,namely by conditioning one reduces the differences of this processes to a discreterandom variables taking at most 2n values. For discrete random variables onedisposes of very strong techniques, i.e. exponential inequalities, which will allowus to consider very large (index) classes of sets.

For n = 1, 2, · · · we are given X1, · · · , X2n independent and identically dis-tributed random variables, namely in our case they will be the coordinates of theproduct probability space (X 2n,A2n, P 2n). Now let σini=1 be independent ran-dom variables and independent of the Xi2n

i=1 with

Pr(σi = 2i) = 1/2 = Pr(σi = 2i− 1).

One can e.g. let the σi be the coordinates of the space 1, 2× · · ·×2n−1, 2n.Then it is easy to see that Xσi(ω)(ω)ni=1 are i.i.d. with law P . We first defineother random variables. Then we will calculate their joint distribution. Define therandom variables τini=1 as follows:

τi =

2i, if σi = 2i− 1;2i− 1, if σi = 2i.

Furthermore, let

P′n := n−1

∑nj=1 δXσj P

′′n := n−1

∑nj=1 δXτj

ν′n := n1/2(P

′n − P ) ν

′′n := n1/2(P

′′n − P )

and the differences

P 0n := P

′n − P

′′n ν0

n := ν′n − ν

′′n = n1/2P 0

n .

5.2. A SYMMETRIZATION LEMMA. 69

Independence. We claim that Xσi(ω)(ω)ni=1 and Xτj(ω)(ω)nj=1 are mutuallyindependent with law P .

Proof. Mutual independence and law of the Xσ, Xτ .Let A1

i := σi = 2i, A0i := σi = 2i− 1. Assume (w.l.o.g.) that the Xj, σk are

defined on the same probability space (Ω,A, Q) which is taken to be(X 2n × Y n,Bn ⊗ Cn, P n ×Qn) with Xj, σk the projections onto X , respectivelyY . Pick measurable sets Bj ⊂ X , for j ∈ 1, · · · , 2n. Let

Y := (Y1, · · · , Y2n) := (Xσ1 , · · · , Xσn , Xτ1 , · · · , Xτn)

and for b ∈∏n

i=10, 1i notice that when σi = 2i − bi, τi = 2i − 1 + bi. Wecalculate the distribution of Y.

Pr( 2n⋂j=1

Yj ∈ Bj)

= Pr

(( 2n⋂j=1

Yj ∈ Bj)∩⋃

b∈0,1n

n⋂i=1

σi = 2i− bi

)

= Pr

(⋃b∈0,1n

(( 2n⋂j=1

Yj ∈ Bj)∩( n⋂i=1

σi = 2i− bi)))

=∑

b∈0,1nPr

(( 2n⋂j=1

Yj ∈ Bj)∩( n⋂i=1

σi = 2i− bi))

=∑

b∈0,1nPr

(( n⋂j=1

X2j−bj ∈ Bj)∩( n⋂j=1

X2j−bj+1 ∈ Bn+j)

∩( n⋂i=1

σi = 2i− bi))

=∑

b∈0,1n2−n

n∏j=1

Pr(X2j−bj ∈ Bj)n∏j=1

Pr(X2j−bj+1 ∈ Bn+j)

=2n∏j=1

P (Bj)

The fourth equality follows from the fact that for each b ∈ 0, 1n the σi, τiare all different, and each value of 1, · · · , 2n is taken by one of them. The fifthis implied by the mutual independence of the σi and Xj .

It follows, by the definition of (ν′n(f))f∈F and (ν

′′n)(f)f∈F , that they are inde-

pendent, under suitable measurability conditions, i.e. assumptions on F , of both


processes. Here by independence of stochastic processes we mean that all the fi-nite dimensional distributions are independent, i.e. for any k ≥ 1, f1, · · · , fk ∈ F :(

ν′

n(f1), ν′

n(f2), · · · , ν ′n(fk))⊥⊥(ν′′

n(f1), ν′′

n(f2), · · · , ν ′′n(fk)),

where ⊥⊥ stands for (statistical) independence.Identical distribution. By similar arguments one sees that both are identically

distributed, assuming again measurability of the processes.The σi are defined on another probability space; e.g. one could take the prod-

uct space Πni=12i − 1, 2i with usual product σ–algebra and product measure

Q := ⊗ni=1Qi, where for 1 ≤ i ≤ n:

Qi(2i) = Qi(2i− 1),

(i.e. Qi = 1/2(δ2i+δ2i−1). So the processes ν ′n, ν′′n will be defined onX 2n×Z.

Remark. Note also that P 0n has a symmetric distribution, this is easily seen condi-

tioning on the σ’s.

Theorem 5.2.1 (Symmetrization Lemma). Let ζ > 0 and F ⊂ L2(X ,A, P ) s.t.∫|f |2 dP ≤ ζ2 for all f ∈ F (or F ⊂ BL2(P )(0, ζ)).

Assume further that F is image admissible Suslin via some (Y,S, T ). Then forany η > 0,

Pr‖ν0n‖F > η ≥ (1− ζ2η−2) Pr‖νn‖F > 2η.

Remark. By abuse of notation we will write νn for the empirical process on(X∞,A∞) and its restriction on (X n,An) and the same for ν ′n, ν

′′n and any pre–images

of (measurable) sets through the empirical process or one of its copies.

Proof. We don’t need to work with outer probabilities, since the events will turnout to be (universally) measurable, implying that they equal (at least P∞–a.s. ameasurable event.

Because we are dealing with suprema over large classes, measurability isn’tnecessarily satisfied. It is here that the Suslin condition appears to ensure mea-surability. So we first prove that the events considered are indeed elements of the(completed, see definition A.2.2) σ–algebra A∞. Recall that

ν′′

n := n1/2(P′′

n − P ) := n1/2

(n−1

n∑i=1

(δXτi − P )

),


or more precisely

ν′′

n : (X∞ × Z)×F → R :

((a, z), f) 7→ 1√n

n∑i=1

((f Xτi(z))(a)− P (f)

).

It will turn out to be image admissible Suslin.

Step 1: Measurability of the events and image Suslin admissibility of theempirical process and its copies.

i) Because F is image admissible Suslin, there exists a Suslin space (Y,S)and a map T : Y → F onto, such that

eval(T (·), ·) :(X × Y,A⊗ S)→ R :

(x, y) 7→ eval(T (y), x) := T (y)(x)

is A ⊗ S/B(R) measurable. Secondly, as immediate consequence of theabove remark together with the proof of the theorem of Tonelli–Fubini(A.2.17), on gets the measurable map

(X × Y,A⊗ S)→ R : (x, y) 7→ IX (x)

∫T (y)(u) dP (u).

One then trivially obtains measurability of

Φ : (X × Y,A⊗ S)→ R : ((x, z), y) 7→ T (y)(x)−∫T (y)(u) dP (u).

ii) Recall that the Xi were the coordinate functions on (X∞,A∞) and are thusmeasurable A∞/A. Since the spaces (X ,A) and (Y,S) are separable mea-surable spaces, the product spaces (X×Y ) and (X n×Y ) are also separable,as a consequence:

(Xi, idY ) : (X∞ × Y,A∞ ⊗ S)→ (X × Y,A⊗ S), i ≥ 1

and(Xn, idY ) : (X∞ × Y,A∞ ⊗ S)→ (X n × Y,An ⊗ S),

where Xn := (X1, · · · , Xn), n ≥ 1; are measurable maps.

HenceΨi := Φ (Xi, idY )

is also measurable.


Remark that since the space (X ,A) is a separable measurable space,

(X∞,A∞) and (X∞ × Z,A∞ ⊗Z)

are also separable measurable spaces (proof of lemma A.2.14).

It is now clear that the (normalised) sum of Ψi, 1 ≤ i ≤ n, which equals νn,is measurable. An immediate consequence is measurability of

H :=

(x, f) : |νn(f)| > 2η, f ∈ F⊂ X∞ ×F .

iii) Recall that Xτi : (X∞ × Z,A∞ ⊗Z)→ (X ,A), and

Xτi(xnn≥1, z) =

x2i if τi(z) = 2i;x2i−1 if τi(z) = 2i− 1.

Then Xτi is measurable, since for any A ∈ A: Xτi ∈ A(X2i, τi) ∈ (A× 2i)

⋃(X2i−1, τi) ∈ (A× 2i− 1)

.

iv) Let Xτ := (Xτ1 , · · · , Xτn). Then Xτ : X∞ × Z → X n is measurable, assaid earlier in step 1, point three. We will see that we can go in a measurableway from νn to ν ′′n . Indeed our goal is to construct a function

γn : (X∞ × Z,A∞ ⊗Z)→ (X∞,A∞)

such that Xn γn = Xτ . Recall that Z :=∏∞

j=12j − 1, 2j, let

γn(x∞, z) = (xz1 , xz2 , · · · , xzn , x2n+1, x2n+2, · · · )

Recall

A∞ = σ(A1 × · · · ,×An ×X∞ : Ai ∈ A, 1 ≤ i ≤ n, n ≥ 1),

then it is enough to consider sets B := γn ∈ A1 × · · · × An × X∞ toprove measurability. Trivially:

B =⋃

z∈∏nj=12j−1,2j

(( 2n∏k=1

Ck

)×X∞ × z ×

∏i≥n+1

2i− 1, 2i

)

where

C2k :=

Ak, if zk = 2k,X , otherwise.


and

C2k−1 :=

Ak , if zk = 2k − 1,X , otherwise.

for k ∈ 1, · · · , n. And the equality Xn γn = Xτ is clearly satisfied too.

Step 2: Further let

H :=

((x, z), f) : |ν ′′n(f)| > 2η, f ∈ F⊂ (X∞ × Z)×F .

We now show that H is a measurable subset of X∞ × Z ×F .By step 1, it follows that ν ′′n = νn (γn, idF) and moreover

H = |ν ′′n| ∈ ]2η,+∞[ = |νn (γn, idF)| ∈ ]2η,+∞[= (γn, idF) ∈ H

and is thus a measurable subset of (X∞ × Z)×F .One also has ‖ν ′′n‖F = ‖νn‖F γn. In step 1, νn was showed to be Suslin

image admissible, so ‖νn‖F is a u.m. function by corollary 4.2.2 and further fromlemma A.2.4 universal measurability of ‖ν ′′n‖F is deduced.

By theorem 4.2.1, there exists a universally measurable selector

h : projX∞(H)→ F

such that (x∞, h(x∞)) ∈ H .Define a universally measurable function (lemma A.2.4)

h : projX∞×Z(H)→ F

by letting h = h γn.By abuse of notation νn stands also for

X n ×F 7→ R :1√n

n∑i=1

(f(Xi)− P (f))

where Xi : X n → Xi; this because the empirical process depends only on the firstn coordinates. Then it is easy to see, that one can suppose:

Xτ : X 2n × Z → X , ν′′

n : X 2n × Z ×F → R

and that as above there is a γn : X 2n × Z → X n measurable such that γn = Xτ .So H will also stand for a measurable set of X n × F , and H := γ−1

n (H).Then it is clear that h = h1 γn = h1 Xτ , where both function are u.m. Let


J = proj(H), then J is a u.m. set of X 2 and h is a u.m. function, as the compo-sition h1 γn = h1 Xτ . The set γ−1

n (J) = Xτ ∈ J is a u.m. set of X 2n × Z(lemma A.2.4). And h acts as a fixed f ∈ F on the former set.

Step 3: let Tn be the smallest σ–algebra for which

Xτ (·) = (Xτ1 , · · · , Xτn) : (X 2n × Z)→ X n

is measurable. In other words: Tn = σXτi : i = 1, · · · , n.From the definition of conditional probability one has the following identity:

(P (2n) ⊗Q)(|ν ′n(·, h1(Xτ (·))| ≤ η ∩ Xτ ∈ J||Tn

)=(P (2n) ⊗Q)

(|ν ′n(·, h1(Xτ (·))| ≤ η||Tn

)IXτ∈J.

We will bound the conditional probability, on the set Xτ (·) ∈ J, so that onecan consider h1(Xτ )(·) to act as a given, fixed f ∈ F . Recall that νn(f,Xσ) =ν′n(f, ·) so that

ν′

n((·, h1(Xτ (·)) = νn(Xσ,Xτ ).

Because (Xσ,Xτ ) are independent:

E[I|ν′n(·,h1(Xτ (·))|≤η||Tn]

= E[(G ν (idX 2n×Z , h1)Xσ,Xτ )||Tn]

= E[(G ν (idX 2n×Z , h1)Xσ, x)||Xτ = x]

= E[I|ν′n(·,h1(x)|≤η||Xτ = x]. (5.2.1)

Further by monotonicity of the conditional probability one sees that:

(P (2n) ⊗Q)(|ν ′n(·, h1(Xτ (·))| ≤ η||Tn

)= Pr

(∣∣ν ′′n(·, h1(Xτ (·))− (ν′′

n(·, h1(Xτ (·))− ν′

n(·, h1(Xτ (·))∣∣ ≤ η

||Tn

)

≤ Pr

(∣∣ν ′′n(·, h1(Xτ (·))∣∣− ∣∣(ν ′′n(·, h1(Xτ (·))− ν

′

n(·, h1(Xτ (·))∣∣ ≤ η||Tn

)

Now on Xτ ∈ J, by definition ‖ν ′′n‖F ≥ 2η. Recalling that ν0n := ν

′n − ν

′′n;

(P (2n) ⊗Q)

(2η − η ≤

(ν′′

n(·, h1(Xτ (·))− ν′

n(·, h1(Xτ (·))∣∣||Tn)

≤ (P (2n) ⊗Q)

(η ≤ ‖ν0

n‖F||Tn)

So we have, on the set Xτ ∈ J, the following inequality:

(P (2n) ⊗Q)

(∣∣ν ′n(·, h1(Xτ (·))∣∣ ≤ η

||Tn)

≤ (P (2n) ⊗Q)

(η ≤ ‖ν0

n‖F||Tn).

The conditional probability on Xτ allows one to handle h1(Xτ )(·) as a fixedf(∈ F), see equation 5.2.1. Also since by construction ν

′n is independent of

Tn = σ(Xτ ); one has by Chebyshev’s inequality,

(P (2n) ⊗Q)(|ν ′n(f)| ≥ η) ≤ Var(ν′n(f))

η2≤(ζη

)2

,

where the bound of the variance is deduced from:

Var(ν′

n(f)) = Var(νn(f)) = nVar(Pn(f)− P (f))

= n−1Var

(n∑i=1

(f(Xi)− P (f))

)= Var(f(X1)− P (f)) = Var(f(X1))

=

∫f 2 dP −

(∫f dP

)2

≤∫f 2 dP ≤ ζ2

(recall that νn and ν ′n were identically distributed and that the Xi were all i.i.d.).Taking the complement of the event |ν ′n(f)| ≥ η:

(P (2n) ⊗Q)(|ν ′n(f)| ≤ η) ≥ (P (2n) ⊗Q)(|ν ′n(f)| < η)

≥ 1−(ζη

)2

.

Hence, still on the set Xτ ∈ J:

(P (2n) ⊗Q)

(η < ‖ν0

n‖F||Tn)≥ (P (2n) ⊗Q)

(|ν ′n(·, h1(Xτ (·))| ≤ η||Tn

)

≥

(1−

(ζη

)2)

More precisely :

(P (2n) ⊗Q)

(η < ‖ν0

n‖F||Tn)IXτ∈J ≥

(1−

(ζη

)2)IXτ∈J

Finally, integrating both sides out, and using Xtτ ∈ J = ‖ν ′′n‖F ≥ 2η together

with ν ′′n = νn gives:

(P (2n) ⊗Q)(η < ‖ν0

n‖F)≥

(1−

(ζη

)2)P (n)(‖νn‖F > 2η).

5.3 A martingale property and a uniform law of largenumbers for the empirical process.

A technique often used to prove a limit exists is based upon proving the sequencehas some martingale property (or is related to a martingale), because we have nicelimit theorems for martingales and also maximal inequalities. We first state for-mally the definitions of martingale, submartingale and reversed (sub)martingaleand then prove two theorems about martingale behaviour of the empirical mea-sures.

The empirical measures enjoy many interesting properties and one of them isthat its supremum over a class of integrable functions is a reversed submartingale.

Remark that here n and k will denote negative integers, as it is usual for re-versed submartingales.

Theorem 5.3.1. Let (X ,A, P ) be a probability space, F ⊂ L1(X ,A, P ) and letPnn≥1 be the empirical measures for P . Let Sn be the smallest σ–algebra forwhich

Pk(·, f) :=1

k

k∑i=1

f(Xi(·)),

for all k ≥ n and all f ∈ L1(X ,A, P ) are measurable. Then

i) For any f ∈ F , Pn(f),Snn≥1 is a reversed martingale, i.e.

E[Pn−1(f)||Sn] = Pn(f) a.s. if n ≥ 2.

ii) (F. Strobl) If F has an envelope function F ∈ L1(X ,A, P ) and if for eachn, ‖Pn−P‖F is measurable for the completion of P n (the product measureon (X n,An)). Then ‖Pn − P‖F ,Snn≥1 is a reversed submartingale, i.e.

‖Pn − P‖F ≤ E[‖Pk − P‖F ||Sn] a.s. for k ≤ n

for n ≥ 2.

5.3. MARTINGALE PROPERTY, GLIVENKO–CANTELLI THEOREM. 77

Proof. i) Sm ⊂ Sm−1,m = 2, 3, · · · is an immediate consequence of thedefinition of Sn, n ≥ 1. Hence

S1 ⊃ S2 ⊃ S3 ⊃ · · · .

For n fixed, each set of Sn will be invariant under permutation of the firstn coordinates Xi. Indeed let A ∈ Sn, then for some k ≤ n, f ∈ F andB ∈ B(R) : A = Pk(f)−1(B).

This follows from the particular choice of the Xi (the projections) and thecommutativity of the addition in R.

Sn = σ((Pk(f))−1(B) : B ∈ B(R), f ∈ F , k ≤ n)

All sets of the form (Pk(f))−1(B) are invariant under permutation of thefirst n coordinates. Moreover the class C of all sets invariant under permu-tations of the first n coordinates is a σ–algebra, lemma A.2.8. Thus Sn ⊂ Cand indeed sets in Sn are invariant under permutations of the first n coordi-nates.

Let 1 ≤ i < j ≤ n:

E[f(Xi)||Sn] = E[f(Xj)||Sn],

is valid because both are Sn measurable (by definition of the conditionalexpectation) and for any A ∈ Sn, Π ∈ Sym(n) (the symmetric group oforder n):

(x1, · · · , xn, xn+1, · · · ) ∈ A ⇐⇒ (xΠ(1), · · · , xΠ(n), xn+1, · · · ) ∈ A.

Hence Xi(A) = Xj(A), if x ∈ Xi(A), then for some ym ∈ A: yi = x,but then (yΠ(1), · · · , yΠ(n), yn+1, · · · ) ∈ A where Π ∈ Sym(n): Π(j) = i,so

Xj((yΠ(1), · · · , yΠ(n), yn+1, · · · )) = yΠ(j) = yi = x ⇐⇒ x ∈ Xj(A).

It follows:

E[f(Xi)IA] =

∫A

f(Xi(xmm≥1)) dP∞(xmm≥1)

=

∫Xi(A)

f(xi) d(P∞ X−1i )(xi) =

∫Xi(A)

f(x) dP (x)

=

∫Xj(A)

f(x) dP (x) =

∫Xj(A)

f(xj) d(P∞ X−1j )(xj)

=

∫A

f(Xj(xmm≥1)) dP∞(xmm≥1)

= E[f(Xj)IA]

We used the image measure formula and the fact that our random variablesare identically distributed. Recall: Xn : (X∞,A∞, P∞) → (X ,A, P ) then–th projection. Thus summing over 1 ≤ i ≤ k for k = 1, · · · , n anddividing by k gives:

E[Pk(f)||Sn] = E[f(X1)||Sn].

In particular for the choices: k = n− 1 and k = n;

E[Pn−1(f)||Sn] = E[f(X1)||Sn] = E[Pn(f)||Sn] = Pn(f).

ii) Now we proof the second part. We first prove the integrability of ‖Pn−P‖F ,taking the expectation (w.r.t. the completion of P n) is allowed since weassume it completion measurable. Because one has an integrable envelopefunction:

‖P‖F = supf∈F

(|P (f)|) ≤ supf∈F

(P (|f |)) ≤ P (F ) < +∞

‖Pn‖F = sup|Pn(f)| : f ∈ F ≤ Pn(F ) < +∞

soE[‖Pn − P‖F ] ≤ E[‖Pn‖F + ‖P‖F ] < 2P (F ) < +∞.

So it remains to prove

‖Pn − P‖F ≤ E[‖Pk − P‖F ||Sn] a.s. for k ≥ n.

in order to have a reversed submartingale. Next, it will be shown that‖Pn − P‖F is measurable for the completion of Sn. Because the open halflines ]q,+∞[: q ∈ Q+ form a generating class for the Borel σ–algebraof the real line; it suffices to proof that the sets:

Aq := Aq,n := ‖Pn − P‖F ∈]q,+∞[

are in the completion of Sn (which is a σ–algebra according to definitionA.2.2 ). First by our assumption on ‖Pn − P‖F we have that there existssets

Cq, Dq ∈ An : Cq ⊂ Aq ⊂ Dq and P n(Dq\Cq) = 0.

Our purpose is now to find sets Uq, Vq ∈ Sn such that the above equationremains true (for these sets). Consider

CΠq := CΠ

q,n := (xΠ(−1), · · · , xΠ(n)) : (x−1, · · · , xn) ∈ Cq


the set obtained by permuting the n first coordinates of the set Cq,n ac-cording to Π ∈ Sym(n). Then if we let Uq := ∪Π∈Sym(n)C

Πq , then Uq is

invariant under any permutation the the first n coordinates and because Aqwas invariant: Uq ⊂ Aq. Now in the same way we define DΠ

q and we letVq := ∩Π∈Sym(n)D

Πq , then Vq is invariant under permutations and Aq ⊂ Vq.

Because the mappings

fΠ : (X n,An)→ (X n,An) :

(x1, · · · , xn) 7→ fΠ((x1, · · · , xn)) := (xΠ(1), · · · , xΠ(n))

are measurable; Uq and Vq are both measurable too. Thus Uq, Vq are in Anand are invariant under permutations, by theorem A.2.9 they lie in Sn andAq lies in the completion of Sn so that ‖Pn − P‖F is measurable for thecompletion of Sn.

For i = 1, · · · , n+ 1 let

Pn,i := n−1

n+1∑j=1,j 6=i

δXj .

Then trivially Pn,n+1 := Pn and for i 6= n+ 1, Pn,i has the same propertiesas Pn. In particular, as above, ‖Pn,i−P‖F is measurable for the completionof An+1.

For any n ≥ 1 one has got:

‖Pn+1 − P‖F =∥∥∥(n+ 1)−1

( n+1∑l=1

δXl

)− P

∥∥∥F

=1

n+ 1

∥∥∥n−1( n+1∑l=1

nδXl

)− (n+ 1)P

∥∥∥F

=1

n+ 1

∥∥∥( n+1∑i=1

Pn,i

)− (n+ 1)P

∥∥∥F

where the last step requires only an easy, rapid calculation (namely rear-range the sum

∑n+1l=1 nδXl . By the triangle inequality:

‖Pn+1 − P‖F ≤ (n+ 1)−1∑n+1

i=1 ‖Pn,i − P‖F .

Also since ‖Pn+1 − P‖F is Sn+1 measurable:‖Pn+1−P‖F = E

(‖Pn+1−P‖F ||Sn+1

). And, by the previous calculation

and monotonicity and linearity of the conditional expectation:

E(‖Pn+1 − P‖F ||Sn+1

)≤ 1

n+ 1

n+1∑i=1

E(‖Pn,i − P‖F ||Sn+1

)

The submartingale property would follow ifE(‖Pn,i−P‖F ||Sn+1

)= E

(‖Pn−P‖F ||Sn+1

)a.s. But this is easy to see.

Recall:

Pi,n+1 :=n+1∑

j=1,j 6=i

δXj and Pn :=n∑j=1

δXj .

And thus

‖Pi,n+1 − P‖F = H (X1, · · · , Xi−1, Xi+1, · · · , Xn+1)

‖Pn − P‖F = H (X1, · · · , Xi−1, Xi, Xi+1, · · · , Xn);

for some (universally) measurable function H . Now since

(X1, · · · , Xi−1, Xi+1, · · · , Xn+1)(A) = (X1, · · · , Xi−1, Xi, Xi+1, · · · , Xn)(A)

for all A ∈ Sn+1 (by invariance of permutation of the first n+ 1 coordinatesfor all sets in Sn+1), one has:∫

A

‖Pi,n+1 − P‖F dP∞ =

∫A

‖Pn − P‖F dP∞.

Hence the proof is complete.

Before coming to the main theorem of this section we state a definition.

Definition 5.3.1. Let (X,A, P ) be a probability space, F ⊂ L(X,A, P ) a classof integrable real–valued functions. Then F will be called a strong (respectivelyweak) Glivenko–Cantelli class for P iff ‖Pn − P‖F → 0 as n → ∞ almostuniformly (respectively in outer probability).

Theorem 5.3.2 (Glivenko–Cantelli theorem). Let (X ,A, P ) be a probability space,F ∈ L1(X ,A, P ), and F ⊂ L1(X ,A, P ) where F is an envelope function of F .Suppose that F is image admissible Suslin via (Y,S, T ). Also assume that,

D(1)F (δ,F) <∞ for all δ > 0.

Recall that

Xj : (X∞,A∞, P∞)→ (X ,A, P ) : xnn≥1 7→ xj

is the standard model.Then limn→∞ ‖Pn − P‖F = 0, P∞–a.s., i.e. F is a (strong) Glivenko–Cantelliclass for P or the empirical process satisfies a uniform law of large numbers.

Proof. By corollary 4.2.2 the Suslin image admissibility property of the class Fmakes ‖Pn − P‖F universally measurable, in particular it is completion measur-able. Then by theorem 5.3.1 part (ii), ‖Pn−P‖F (with the right filtration Sn as intheorem 5.3.1 ) is a reversed submartingale. Being nonnegative it converges a.s.and in L1 by Doob’s theorem A.2.24. If we prove that limn→∞ ‖Pn−P‖F = 0 inprobability, then the a.s. (and L1) limit will also be 0.

FIF>n → 0 pointwise for any x ∈ X as n → ∞, so for ε > 0, take Mlarge enough such that P (FIF>M) < ε/4 by Lebesgue’s theorem of dominatedconvergence A.2.19.

Next we use this to bound

‖(Pn − P )IF>M‖F := sup|(Pn − P )(fIF>M)| : f ∈ F

by:

‖(Pn − P )IF>M‖F ≤ ‖PnIF>M‖F + ‖PIF>M‖F≤ Pn(FIF>M) + P (FIF>M)

Then by the usual Strong Law of Large Numbers (A.2.20) we have

Pn(FIF>M) + P (FIF>M)→ 2P (FIF>M) < ε/2 a.s. as n→∞.

This means we can look at the set FIF≤M := fIF≤M : f ∈ F, which isalso Suslin image admissible, and so pretend that the envelope function is bounded(a.s.). F is image admissible Suslin via some (Y,S, T ), then FIF≤M is alsoimage admissible Suslin as follows

X × Y → R : (x, y) 7→ T (y)(x)IF≤M,

which as a product of two measurable functions is again measurable. Let FMstand for the modified class FIF≤M and FM for FIF≤M.

In the next step we will use the symmetrization lemma 5.2.1. Since F ≤ Ma.s., let ζ := M and η := n1/2ε/4 > 2M for n large enough. Then because of thesymmetrization lemma 5.2.1, which says;

(P (2n) ⊗Q)n1/2‖P ′n − P′′

n ‖F > η≥ (1− ζ2η−2)P (2n)n1/2‖Pn − P‖F > 2η.

and which applied to our particular choice of η, ζ:

(P (2n) ⊗Q)‖P ′n − P′′

n ‖F > ε/4 ≥(1− ζ2η−2)P (2n)‖Pn − P‖F > ε/2

≥3

4P (2n)‖Pn − P‖F > ε/2.

Therefore it is enough to show ‖P 0n‖F = ‖P ′n − P

′′n ‖F → 0 in probability.

Note that for the modified class FM one also has D(1)F (δ,FM) < +∞ for all

δ > 0, lemma D.1.1.Next we let γ := P2n, and we use lemma 5.3.3, in order to choose functions

g1, · · · , gm ∈ FM which satisfy the condition in equation 5.1.1 for ε := ε/(9M)

and such that gj := g(x)jn depends in a measurable on x := (x1, · · · , xn).

For g ∈ FM and m := D(1)FM

(ε/(9M),FM , P2n(x)), we have got

min1≤j≤m

P2n(|g − gj|) ≤ε

9MP2n(FM)

and for each j where the minimum is reached:

|P 0n(g)− P 0

n(gj)| = |P ′n(g − gj)− P′′

n (g − gj)|≤ (P

′

n + P′′

n )(|g − gj|) (5.3.1)

and recalling that for each z ∈ Z:

(σ1, · · · , σn, τ1, · · · , τn)(z),

is a permutation of (1, · · · , 2n)

(P′

n + P′′

n )(|g − gj|)

= n−1

(( n∑i=1

δXσi +n∑i=1

δXτi

)(|g − gj|)

)

= n−1

(( 2n∑i=1

δXi

)(|g − gj|)

)= 2P2n(|g − gj|) ≤ 2

ε

9MP2n(FM) (5.3.2)

where (ki)ni=1 ∈ 0, 1n.

As n→∞ by the usual strong law of large numbers ( A.2.20 ):

2ε

9MP2n(FM)→ 2ε

9MP (FM) P∞–a.s. (5.3.3)

and since FM ≤M : (2εP (FM))/(9M) < ε/4. Further, since

m := D(1)FM

(ε/(9M),FM , P2n(x)) < D(1)F (δ,FM) < +∞,

m can be chosen to be independent of n and in particular thus remains boundedas n→∞. Recall the well known bound for the probability of a finite maximum:

(P (∞) ⊗Q)

max1≤j≤m

|P 0n(gj)| > ε/4

= (P (∞) ⊗Q)

⋃1≤j≤m

|P 0n(gj)| > ε/4

≤

∑1≤j≤m

(P (∞) ⊗Q)|P 0n(gj)| > ε/4

≤ m max1≤j≤m

((P (∞) ⊗Q)|P 0

n(gj)| > ε/4)

We show now that the last term goes to zero, as n→∞.Finally one concludes

(P (∞) ⊗Q)‖P 0n‖FM > ε/2 ≤ (P (∞) ⊗Q) sup

g∈FM|P 0n(g)| > ε/2

≤ (P (∞) ⊗Q)

supg∈FM

min1≤j≤m

|P 0n(g − gj)|+ max

1≤j≤m|P 0n(gj)| > ε/2

≤ (P (∞) ⊗Q)∗

supg∈FM

min1≤j≤m

|P 0n(g − gj)| > ε/4

+m max

1≤j≤m

((P (∞) ⊗Q)|P 0

n(gj)| > ε/4)

Finally we will show that both probabilities can be made arbitrary small for nlarge enough.

Firstly combining the result of equations 5.3.1 and 5.3.2

(P (∞) ⊗Q)∗

supg∈FM

min1≤j≤m

|P 0n(g − gj)| > ε/4

≤ P∞

∣∣∣2 ε

9MP2n(FM)

∣∣∣ > ε/4

≤ P∞

∣∣∣2 ε

9MP2n(FM)− 2

ε

9MP (FM)

∣∣∣+∣∣∣2 ε

9MP (FM)

∣∣∣ > ε/4

≤ P∞

∣∣∣2 ε

9MP2n(FM)− 2

ε

9MP (FM)

∣∣∣ > ε

36

< ε/2

where we used equation 5.3.3 for the last inequality.

And secondly

m max1≤j≤m

((P (∞) ⊗Q)|P 0

n(gj)| > ε/4)

≤ m16

ε2max

1≤j≤mVar(P 0

n(gj))

because

E[P 0n(gj)] = E

[n−1

n∑i=1

(gj(Xσi)− gj(Xτi)]

= n−1

n∑i=1

0 = 0.

Furthermore since the Xσi , Xτi , 1 ≤ i ≤ n are all independent

m16

ε2max

1≤j≤mVar(P 0

n(gj))

≤ m16

ε21

n2max

1≤j≤mVar( n∑i=1

gj(Xσi)−n∑i=1

gj(Xτi))

≤ m16

ε21

n22n max

1≤j≤mVar(gj(X1))

≤ 32m

ε2nM2 < ε/2

if n is large enough.

We have just proved that, by theorems 5.3.2 and A.2.24, ‖Pn−P‖F convergesa.s. and in L1 to 0.

Lemma 5.3.3. Let (X n,An, P n) be a probability space for n ≥ 1 and Xj thej–th projection onto (X ,A), i.e. the standard model. For each x ∈ X 2n andP2n := (2n)−1

∑2ni=1 δXj ,

D(1)F (δ,F , P2n) = k(x),

where k is a universally measurable function and for k(x) = k, ηk is a universallymeasurable function from X 2n into Y k such that the functions fi, in equation5.1.1, can be taken as T ((ηk(x)(x))i); i = 1, · · · , k(x).

Proof. For any δ > 0 : k(x) := D(1)F (δ,F , P2n(x)) < +∞. If k(x) = k, then

since (Y,S) is a Suslin measurable space, there exists a Polish space (S, d) anda Borel measurable map b from (S, d) onto (Y,S). From lemma A.1.15 one hasthat Y k with the product σ–algebra is a Suslin measurable space too.

The set

Bk := (x, y1, · · · , yk) ∈ X 2n × Y k : P2n(|T (yi)− T (yj)|) > δP2n(F ), i 6= j,

with x = (x1, · · · , x2n) ∈ X 2n, is product measurable by image admissibility asfollows.

For 1 ≤ i 6= j ≤ k let gi,j : Rk → R :→ |πi(a)− πj(a)|, then clearly gi,j is aBorel measurable function. Using theorem 4.2.3 one has that

(|T (yi)(x)− T (yj)(x)| − δF (x)) = gi,j(T (y1), · · · , T (yk))− δF (x)

is Suslin image admissible, i.e. jointly measurable. Since (X 2n,A2n), by lemmaA.2.11, is again a separable measurable space:

(X 2n × Y k,A2n ⊗ Sk) : (x, y1, · · · , yk) 7→

(2n)−1

2n∑l=1

(gi,j(T (y1), · · · , T (yk))(xl)− δF (xl)

)P2n(|T (yi)− T (yj)|)− δF )(x)

is measurable. Hence

Bk =⋂

1≤i 6=j≤k

(2n)−1

2n∑l=1

(gi,j(T (y1), · · · , T (yk))(xl)− δF (xl) ∈]0,+∞[

is product measurable.Let y(k) := (y1, · · · , yk) and Ak the projection, i.e.

x ∈ X 2n : (x, y(k)) ∈ Bk, for some y(k), for k ≥ 2,

and A1 := X 2n.By Sainte–Beuve’s selection theorem 4.2.1; Ak is a universally measurable set ofX 2n and there exists a universally measurable function ηk from Ak into Y k suchthat (x, ηk(x)) ∈ Bk for all x ∈ Ak.

Let η(x) := ηk(x) if and only if x ∈ Ak\Ak+1, k ≥ 1. For each x, we let k(x)be the unique k such that x ∈ Ak\Ak+1. This means that (ηk(x))i, i = 1, · · · ksatisfies the condition of equation 5.1.1 for γ = P2n(x), and is maximal, in thesense we can not add another function such that (x, ηk+1(x)) ∈ Bk or with otherwords equation 5.1.1 remains true for γ = P2n(x) and (ηk+1(x))i, i = 1, · · · k+1.If we let k(x) the maximal k for which x ∈ Ak\Ak+1, then k(x) is also u.m.Indeed,

k−1n = (An\An+1) ∩( M−1⋂j=n+1

(Aj\Aj+1))

where D(1)F (δ,F , P2n) ≤ D

(1)F (δ,F) = M < +∞. All the sets Al are u.m., so

k(x) is a universally measurable function.


5.4 A uniform central limit theorem with entropycondition for the empirical process.

Here follows the main theorem: a central limit theorem for the empirical processνn, where the class of sets is Suslin image admissible via some (Y,S, T ). It wasfirst stated by D. Pollard, and then extended in the present form by R.M. Dudley.We will first need a lemma which relates the covering number ofF to the coveringof the square of differences of functions of F . Those differences also appear inthe conditions of the asymptotic equicontinuity, definition 2.7.1. After the maintheorem several (easy) corollaries will be given. Those corollaries will allow usto give concrete examples of classes F for which the central limit theorem for theempirical process is valid (by mean of Pollards entropy condition being satisfied).

Lemma 5.4.1. Let (X ,A, P ) be a probability space, F ∈ L2(X,A, P ) and F ⊂L2(X ,A, P ) having as envelope function F . Let H := 4F 2 and

H := (f − g)2 : f, g ∈ F.

Then 0 ≤ φ(x) ≤ H(x) for all φ ∈ H and x ∈ X , and for any δ > 0,

D(1)H (4δ,H) ≤ D

(2)F (δ,F)2

Proof. First note that from the definition of H , it follows that 0 ≤ φ ≤ H , for allφ ∈ H. For any given γ ∈ Γ, we choose m ≤ D

(2)F (δ,F)2 and f1, · · · , fm ∈ F

such that the balls BLp(X ,A,γ)(fi, rp) cover F , where:

rp = δ(

∫F p dγ)1/p for p = 1, 2.

For any f, g ∈ F take fi, fj such that

maxγ((fi − f)2

), γ((fj − g)2

)≤ δ2γ(F 2);

this is possible because the closed balls cover F . Then letting:

A := γ((f − g)2 − (fi − fj)2

),

by the Cauchy–Schwartz inequality (in the second step):

A = γ([f − g − (fi − fj)][f − g + fi − fj]

)≤ γ

([f − fi − (g − fj)]2

)1/2γ((f − g + fi − fj)2

)1/2

≤ γ([f − fi − (g − fj)]2

)1/24γ(F 2)1/2

≤ 41/2(

maxγ((f − fi)2

), γ((g − fj)2

))1/24γ(F 2)1/2

≤ 8δγ(F 2) = 2δγ(H)

5.4. POLLARD’S CENTRAL LIMIT THEORM. 87

where we used that

(f − g + fi − fj)2 ≤ 2(f − g)2 + 2(fi − fj)2 ≤ 4H = 16F 2.

So if one takes functions of the form hk(i,j) := (fi − fj)2, for k(i, j) = in −

j; i, j = 1, · · ·n the closed balls centered in those hk(i,j) with radius 2δγ(H) willcover H. To finish the prove let n ≤ D

(1)H (4δ,H) and H1, · · · , Hn ∈ H such that

no Hj lies in the (closed) ball centered in Hi of radius 4δγ(H) for any i 6= j, i.e.∫|Hj −Hi| dγ > 4δγ(H).

Then we would like to have n ≤ D(2)F (δ,F)2. But this is straightforward for the

closed balls centered in hk(i,j) with radius 2δγ(H) cover H. So with each Hi

associate one hk(i,j). Because of the triangle inequality the same hk(i,j) can not beused for two different Hi1 , Hi2 (otherwise the condition γ(|Hi1 −Hi2| > 4δγ(H)would be violated). Thus we have an injective map from Hi : i = 1, · · · , n intohk(j,l) : j, l = 1, · · · ,m. This holds for all n ≤ D

(1)H (4δ,H), so D(1)

H (4δ,H) ≤D

(2)F (δ,F)2.

This lemma is more effective if one has D(2)F (δ,F)2 < ∞ for all δ > 0 not

just for one single δ, because then D(1)H (ε,H) < ∞ for all ε > 0. Also sets of

squared differences of elements of F are important; they show up in the conditionfor asymptotic equicontinuity (2.7.1 )

We first give a formal definition of a Donsker class.

Definition 5.4.1. Let (X ,A, P ) be a probability space and F ⊂ L2(Ω,A, P ). Fis said to be a Donsker class for P (or to be a P–Donsker class, or to satisfy thecentral limit theorem for empirical measures), if νn ⇒ GP , where GP stands fora tight Borel measurable map, in the metric space (`∞(F), ‖ · ‖F).

Theorem 5.4.2 (Pollard). Let (X ,A, P ) be a probability space andF ⊂ L2(X,A, P ). Let F be image admissible Suslin via (Y,S, T ) and have anenvelope function F ∈ L2(X ,A, P ). Suppose that∫ 1

0

(logD

(2)F (x,F)

)1/2dx <∞. (5.4.1)

Then F is a P–Donsker class, i.e. νn ⇒ GP in `∞(F) and there exists a tightversion of GP (also GP will thus be a tight Brownian bridge process).

The hypothesis 6.4.1 is called Pollard’s entropy condition.

Remark. Equation 5.4.1 gives a condition on the rate of growth for the entropy. Inparticular we obtain information about the behaviour of D(2)

F (x,F) near x = 0,it may not increase too rapidly if the integral has to converge. One always haslogD

(2)F (x,F) ≥ 0.

We note also that D(2)F (x,F) is non increasing in x.

Theorem 2.7.3 gives condition on νn to converge weakly to a Gaussian pro-cess (here a Brownian Bridge) GP in the space (`∞(F), ‖ · ‖F). As mentionedin the second chapter, fifth section, we can and will use theorem 2.7.4, whichgives equivalent condition for asymptotic tightness of νn, namely νn(f) has to beasymptotically tight for all f ∈ F , but this is trivial since by the usual CLT (A.2.21 ), it converges to a normal variable on R, and Borel laws on Polish spacesare tight ( [Bill2] chapter 1 theorem 1.3 on page 8), in the case of Polish spaces,tightness of the limit is the same as uniform tightness and asymptotical tightness.

Indeed, when the limit is tight (which is the case since any Borel law on aPolish space is tight), by the portmanteau theorem 2.5.1 part (e), the sequence isasymptotically tight (Kδ is open!). Also a finite family of measures on a Polishspace is still tight, so choosing K a little bigger, we have uniform tightness. (Afamily of probability measures Π is called uniform tight, iff for every ε > 0, thereexists a compact K: for all P ∈ Π: P (K) > 1− ε.)Conversely if a sequence of probability measure is uniformly tight, then it is triv-ially asymptotically tight. By Prohorov’s theorem (in Polish spaces) (see [Bill2]chapter 1 theorem 5.9 on page 59), Π is relative compact, hence the limit is aprobability measure and on a Polish space the limit is tight. As claimed we havethat ν(f) is asymptotically tight for every f ∈ F .

Since the limit process is a Gaussian process in (`∞(F), ‖ · ‖F), it is enoughto prove, by theorem 2.7.4, firstly that F is totally bounded for ρP (actually onejust needs some semimetric as seen in the theorems from chapter 2 section 7) andsecondly that νn satisfies the asymptotic equicontinuity condition (2.7.1 ).

Thus we will split the proof in two parts, and start with the easiest part, thetotal boundedness of F .

Proof. Under the weaker assumption: D(2)F (δ,F) < +∞ for all δ > 0, one has

that F is totally bounded in L2(X ,A, P ). Assume P (F 2) > 0, otherwise F istrivially totally bounded. Let

H := (f − g)2 : f, g ∈ F.

Then by the above lemma 5.4.1: D(1)H (δ,H) < +∞ for all δ > 0. Since H has

an integrable envelope function, is Suslin image admissible (F is and subtrac-tion and squaring are continuous operations on R so are measurable) and has a

finite entropy,H is a P–strong Glivenko–Cantelli class by theorem 5.3.2. In otherwords:

sup|(Pn − P )(f − g)2 : f, g ∈ F → 0 almost surely as n→∞.

Also by the usual Strong Law of Large Numbers∫F 2 dPn = n−1

n∑i=1

F 2(Xi)→∫F 2 dP a.s. as n→∞.

We choose n0 such that for all n ≥ n0:

2

∫F 2 dP >

∫F 2 dP2n and,

ε/2 > sup|(P2n − P )(f − g)2 : f, g ∈ F.

Take also 0 < δ < (ε/(4P (F 2)))1/2 and choose f1, · · · , fm ∈ F , such that nofi lies in any of the closed ball of radius δ(P2n(F 2))1/2 centered in fj, 1 ≤ j ≤m, j 6= i for all i = 1, · · · ,m (recall that the balls cover F). Then for any f ∈ Fwe have got, for some j ∈ 1, · · · ,m:

P ((f − fj)2) < (P2n − P )((f − fj)2) + P2n((f − fj)2)

< supg,h|(P2n − P )((f − fj)2)|+ δ2P2n(F 2)

< ε/2 + δ22P (F 2) < ε

We proved that F is totally bounded in the L2 metric.

In the second part we are concerned with the asymptotic equicontinuity con-dition, definition 2.7.1.

Proof. This second part is quite lengthy, many things have to be done and then puttogether to arrive at the conclusion. We divide it and start with the rough sketchof the proof.

i) In the first step, we explain briefly how we will tackle the problem.

So given ε > 0, we need to find a δ > 0 such that

lim supn→∞

Pr∗sup|νn(f −g)| : f, g ∈ F , ρP (f, g) < δ > ε < ε (5.4.2)

where Pr∗ = (P∞)∗ and ρP (f, g) := (∫X (f − g)2 dP )1/2 the L2(P ) semi-

metric, see appendix on Gaussian processes for more about the semimetric.

So let ε > 0. We start defining subclasses:

Fj,δ := f − fj : f ∈ F ,∫

(f − fj)2 dP < δ2

ofF , where δ > 0 will be specified later on. Fj,δ is Suslin image admissibleand has a finite entropy, lemma D.1.2. As in the first step, by lemma 5.4.1(to bound the entropy of the class of squared differences of elements ofFj,δ), and theorem 5.3.2 (the strong Glivenko–Cantelli theorem)

sup|(Pn − P )(f − g)2 : f, g ∈ Fj,δ → 0 almost surely as n→∞.

We define the set

F2nj,δ := f − fj : f ∈ F ,

∫(f − fj)2 dP2n < δ2.

Note that in the limit the sets Fj,δ and F2nj,δ are concordant. Indeed for

f − fj ∈ Fj,δ:

P2n((f − fj)2) ≤ (P2n − P )((f − fj)2) + P ((f − fj)2) < εn + δ

and vice versa limn→∞F2nj,δ ⊂ Fj,δ (on a set of P∞ measure 1).

Then for P2n(x) fixed, F2nj,δ is Suslin image admissible. (X 2n,A2n) is a

separable measurable space, because (X ,A) is, see theorem A.2.11. So ona set of big enough probability, we can change Fj,δ for the more tractablesets F2n

j,δ .

But as seen in equation 5.4.2 we are still stuck with a supremum over manylarge sets. A way out is to firstly to condition on x and then use the finiteentropy for the discrete measures P2n(x), to reduce the supremum to a max-imum over a finite set, every function lies in some closed ball around oneof the only finitely centers. And this will be done in a measurable way, soif we are willing add a small error we can reduce the supremum to a finitemaximum.

The (conditional ) probability over that maximum will then be bounded,using exponential inequalities, and such that the final bound will not dependon the choice of the centers

Let T (yj) = fj for yj fixed,

X 2n × Y → R : (x, y) 7→ (2n)−1

2n∑l=1

(T (y)− T (yj)2(xl)− δ2

is A2n ⊗ S measurable, so the set C :=

(x, y) ∈ X 2n × Y : (2n)−1

2n∑l=1

(T (y)− T (yj)2(xl)− δ2 ∈]−∞, 0[

belongs toA2n⊗S . The function IC(x)(·) is S measurable, theorem A.2.17of Tonelli–Fubini: C(x) := y : (x, y) ∈ C is S measurable. The function

(Y,S)→ F2nj,δ : y 7→ IC(x)(y)(T (y)− T (yj))

is onto (zero function lies also in F2nj,δ ) and (z, y) 7→ IC(x)(y)(T (y)(z) −

T (yj)(z)) is A⊗ S measurable.

We want to bound∗

Prsup|νn(f − g)| : f, g ∈ F , ρP (f, g) < δ > ε

if ρP (f, g) < δ then f − g ∈ Fg,δ := f − g : f ∈ F , ρP (f, g) < δ andwe start to bound it, by the symmetrization lemma ( 5.2.1 ), for η = ε/2 andζ = ε/4 like this:

(1− (ε/4)2(2/ε)2)∗

Prsup|ν0n(f − g)| : f, g ∈ F , ρP (f, g) < δ > ε/2

≥∗

Prsup|νn(f − g)| : f, g ∈ F , ρP (f, g) < δ > ε

where ν0n := ν

′n − ν

′′n and ν ′n, ν

′′n independent and copies of νn. The proba-

bility involving the ν0n will be treated as follows: conditionally on

σ((X1, · · · , X2n)) =: A2n and on a set of probability P∞ converging to 1,for ε, η > 0:

Pr

Pr[

sup|ν0n(f − g)| : f, g ∈ F ,∫

(f − g)2 dP2n < δ2> 3η

∣∣∣∣∣∣A2n

]> 3ε

< 3ε

for δ small enough, and n large enough, since then integrating out over(X1, · · · , X2n) yields an upper bound for equation 5.4.2. We are allowed todrop the star since the event:

sup|ν0n(f − g)| : f, g ∈ F ,

∫(f − g)2 dP2n < δ2

> 3η

is measurable, for a proof we refer to lemma D.1.3.


ii) In this second step we go more into detail about the conditioning on x.

Given A2n, or that is given (X1, · · · , X2n) = (x1, · · · , x2n) := x, let‖f‖2n := (P2n(f 2))1/2. Let δi := 2−i; i ≥ 1. Now we choose subsetsF(i, x); i ≥ 1 of F such that for all i and f ∈ F

min‖f − g‖2n : g ∈ F(i, x) ≤ δi‖F‖2n,

with other words F(i, x) is such that the closed balls

BL2(P2n(x)(g, δi‖F‖2n)

for g ∈ F(i, x) cover F . This can be done (entropy is finite) and in fact forall x, i.e. for all P2n(x) ∈ Γ, where Γ is the set in definition 5.1.1, |F(i, x)|can be chosen smaller than or equal to D(2)

F (δi,F), which is always finite.But we can say more about the elements ofF(i, x). For each i fixed,F(i, x)can be written as

g(x)i,1 , · · · , g

(x)i,k(i,x) = T (yi,1(x)), · · · , T (yi,k(i,x)(x))

where, by lemma 5.3.3, g(x)i,j = T (yi,j(x)), 1 ≤ j ≤ k(i, x) and

k(i, ·) : (X 2n,A2n)→ 0, · · · , D(2)(δi,F , P2n(x)) : x 7→ k(i, x)

yim(·) : (X 2n,A2n)→ Y : x 7→ yim(x)

with yim only defined for 1 ≤ m ≤ k(i, x), are universally measurablefunctions.

For each f ∈ F , let fi := g := gim ∈ F(i, x) such that

‖f − fi‖2n = min‖f − g‖2n : g ∈ F(i, x), (5.4.3)

and in case where for multiple gim the minimum in 5.4.3 is achieved, wechoose the one with minimal m. Let Ak be the σ–algebra of universallymeasurable sets for probability measures defined on Ak. We claim that

m(·, ·, i) : X 2n × Y → 1, · · · , D(2)F (δi,F) : (x, y) 7→ m(x, y, i)

is A2n ⊗ S measurable (for i fixed).Indeed, let Ap; 1 ≤ p ≤ D

(2)F (δi,F) be

(x, y) :((2n)−1

2n∑l=1

(T (y)(xl)− g(x)i,p (xl))

2−

δi(2n)−1

2n∑l=1

F (xl))∈]−∞, 0]

Consider first (x, y) ∈ X 2n × Y : m(x, y, i) = 1 = A1 andm(x, y, i) = N := (∩N−1

l=1 Acl ) ∩ AN ∩ N ≤ k(i, x). So it is enough to

prove that Ap is a measurable set.So clearly, since F is measurable (x, y) 7→ δi(2n)−1

∑2nl=1 F (xl)IY (y) is

A2 ⊗ S measurable, since F is Suslin image admisible(x, y) 7→ (2n)−1

∑2nl=1(T (y)(xl) is also A2n ⊗ S measurable. For g(x)

i,m con-sider the function below:

X 2n → (Y ×X )2n → R

x 7→ ((y(x)i,m, x1), · · · , (y(x)

i,m, x2n)) 7→ (2n)−1

2n∑l=1

T (y(x)i,m)(xl);

(g(x)i,m(u) = T (y

(x)i,m)(u)) which is measurable, as composition of two mea-

surable functions. We can conclude that

(u, x, y) 7→ g(x)i,m(x,y,i)(u)

is A2n+1 ⊗ S measurable. Hence

X 2n × Y → R : (x, y) 7→ ν0n(T (y)− T (y)i),

with T (y)i = g(x)i,m(x,y,i), is A2n ⊗ S measurable (as in the begin of proof of

5.2.1 as a composition of the measurable function (T (y)− T (y)i) and

(Xσ(1), · · · , Xσ(n), Xτ(1) · · · , Xτ(n)) :

It equals a A2n ⊗ S measurable function G(x, y) for all x on a set of P 2n

measure one. As in the proof of corollary 4.2.2, x 7→ supy |G(x, y)| is auniversally measurable function. Thus for i fixed:

supy|ν0n(T (y)− T (y)i)| is P 2n–measurable in x. (5.4.4)

For any f ∈ F , by our choice of fi:

‖fi − f‖2n ≤ δi‖F‖2n = 2−i‖F‖2n,

so ‖fi− f‖2n → 0, as i→∞ and moreover for r any fixed positive integerand xl ∈ x1, · · · , x2n:

|(f−fr−(

p∑j=r+1

fj−fj−1))(xl)| = |(f−fp)(xl)| ≤ (2n)2−p‖F‖2n (5.4.5)

so (f − fr)(xl) =∑

r<j<+∞(fj − fj−1)(xl).

iii) In the third step we rewrite the integral condition into an equivalent seriescondition, which we use to show existence of a related series. That relatedseries will be useful in later steps.

Further let Hj := logD(2)F (δj(:= 2−i),F), j ≥ 0, the condition 5.4.1 is

equivalent to the one for the infinite series∑

j≥0 δjH1/2j ( δj = 2−j );∫ 1

0

(logD(2)F (δ,F))1/2 dδ ≤

∑j≥0

∫ 2−j

2−j−1

(logD(2)F (δ,F))1/2 dδ

≤∑j≥0

∫ 2−j

2−j−1

(logD(2)F (2−j−1,F))1/2 dδ

≤∑j≥0

2−(j+1)H1/2j+1

≤∑j≥0

2−jH1/2j

since the entropy D(2)F (δ,F) is a non increasing function in δ (the remark

just before the proof) and conversely:∑j≥0

2−jH1/2j =

∑j≥0

∫ 2−j+1

2−j(logD

(2)F (2−j,F))1/2 dδ

≤∑j≥0

2

∫ 2−j

2−j−1

(logD(2)F (2−j,F))1/2 dδ

≤∑j≥0

2

∫ 2−j

2−j−1


= 2

∫ 1

0


For all x : |F(j, x)| ≤ D(2)F (δj(:= 2−j),F)) ≤ exp(Hj); let

ηj := max(jδj, (576P (F 2)δ2jHj)

1/2) > 0,

(since jδj > 0). In particular

η2j ≥ 576P (F 2)δ2

jHj (5.4.6)

For this choice of ηj we have∑

j≥1 ηj < +∞. Indeed

ηj ≤ jδj + (576P (F 2)δ2jHj)

1/2

and∑

j≥1 jδj < +∞, consider for example∑

j≥1 j/√

2j√

2−j

, j/√

2j → 0

as j →∞ and∑

j≥1

√2−j< +∞ so∑

1≤j≤N0

j/√

2j√

2−j

+√

2−N0

∑j≥1

√2−j< +∞

if j/√

2j< 1 for all j ≥ N0. And the second term has also a convergent

series ∑j≥1

(576P (F 2)δ2jHj)

1/2 = 24(P (F 2))1/2∑j≥1

δjH1/2j < +∞

We also have:∑j≥1

exp(−2η2j/(576δ2

jP (F 2))) ≤∑j≥1

exp(−j2/(288P (F 2))) < +∞

(5.4.7)because −j2δ2

j ≥ −η2j .

iv) As announced in the first step, we will bound the supremum, using that infact it reduces to a maximum, and then the exponential series of the thirdstep will come in handy. We turn our attention to

Pr

supf∈F|ν0n(f − fr)| >

∑j>r

ηj||A2n

. (5.4.8)

where in equation 5.4.4 and the discussion above we showed thatsupf∈F |ν0

n(f − fr)| is measurable for r fixed.

Since, for x given f − fr =∑

j>r fj − fj−1 pointwise on x1, · · · , x2n asseen in equation 5.4.5, it follows

ν0n(f − fr)

= n1/2P 0n(f − fr)

= P′

n(f − fr)− P′′

n (f − fr)

= n−1

n∑p=1

δXσ(p)(f − fr)− n−1

n∑p=1

δXτ(p)(f − fr)

= n−1

n∑p=1

δXσ(p)

(∑j>r

fj − fj−1

)− n−1

n∑p=1

δXτ(p)

(∑j>r

fj − fj−1

)=

∑j>r

n−1

n∑p=1

δXσ(p)(fj − fj−1)−∑j>r

n−1

n∑p=1

δXτ(p)(fj − fj−1)

=∑j>r

ν0n(fj − fj−1)


and this because (Xσ(1), · · · , Xσ(n), Xτ(1), · · · , Xτ(n)) is a permutation of(X1, · · · , X2n) = x. Also given x:


∑j>r

ηj

⊂

supf∈F|ν0n(∑j>r

fj − fj−1)| >∑j>r

ηj

⊂

supf∈F

∑j>r

|ν0n(fj − fj−1)| >

∑j>r

ηj

⊂

∑j>r

supf∈F|ν0n(fj − fj−1)| >

∑j>r

ηj

and since⋂

j>r

supf∈F|ν0n(fj − fj−1)| ≤ ηj

⊂

supf∈F|ν0n(f − fr)| ≤

∑j>r

ηj

we have got that


∑j>r

ηj

⊂⋃j>r

supf∈F|ν0n(fj − fj−1)| > ηj

.

So that the last term in equation 5.4.8 can be bounded:

= E[Isupf∈F |ν0n(f−fr)|>

∑j>r ηj||A2n

]≤ E

[I∪j>rsupf∈F |ν0(fj−fj−1)|>ηj||A2n

]≤ E

[∑j>r

Isupf∈F |ν0n(fj−fj−1)|>ηj||A2n

]≤

∑j>r

E[Isupf∈F |ν0n(fj−fj−1)|>ηj||A2n

]≤

∑j>r

Pr

(supf∈F|ν0n(fj − fj−1)| > ηj

||A2n

)

Which in turn is bounded by:∑j>r

exp(Hj +Hj−1) supf∈F

Pr|ν0n(fj − fj−1| > ηj||A2n. (5.4.9)

This can be seen as follows, conditionally on x (thus first 2n coordinates ofx∞ are fixed)

x∞ : supf∈F|ν0n(fj − fj−1)(x∞)| > ηj

= ∪k(j,x)l=1 ∪k(j−1,x)

q=1 x∞ : |ν0n(gj,l − gj−1,q)(x∞)| > ηj


Hence

Pr(x∞ : sup

f∈F|ν0n(fj − fj−1)(x∞)| > ηj

||A2n

)≤

k(j,x)∑l=1

k(j−1,x)∑q=1

Pr(x∞ : |ν0

n(gj,l − gj−1,q)(x∞)| > ηj||A2n

)≤ exp(Hj +Hj−1) sup

f∈FPr(x∞ : |ν0

n(fj − fj−1)(x∞)| > ηj||A2n

)since k(j, x) ≤ exp(Hj).In the following lines we will use an exponential inequality to get a boundon the ( conditional ) probability and then give a second bound not depen-dening on the class F anymore.

Let, for fixed j(> r) and f ∈ F

zi := (fj − fj−1)(x2i)− (fj − fj−1)(x2i−1)

and ei := Iσi=2i−1, 1 ≤ i ≤ n are random variables taking the values 1and −1 with probability 1/2, thus are Rademacher random variables. Then

ν0n(fj − fj−1) = n−1/2

n∑i=1

(−1)e(i)zi.

So we can apply Hoeffding’s inequality D.2.2 (in the last step);

Pr(ν0

n(fj − fj−1)| > ηj||A2n

)= Pr

n−1/2

∣∣∣∣∣n∑i=1

(−1)e(i)zi

∣∣∣∣∣ > ηj

= Pr

( n∑i=1

(−1)e(i)zi ≥ ηjn1/n∪

n∑i=1

(−1)e(i)zi ≥ −ηjn1/n)

= Pr

( n∑i=1

(−1)e(i)zi ≥ ηjn1/n)

+ Pr

( n∑i=1

(−1)e(i)zi ≥ −ηjn1/n)

≤ 2 exp

(− 1

2nη2

j

/n∑i=1

z2i

)


Now, still for fixed j, we bound∑n

i=1 z2i uniformly for f ∈ F in order to

obtain a bound for Pr(|ν0

n(fj − fj−1)| > ηj||A2n

);

n∑i=1

z2i =

n∑i=1

[(fj − fj−1)(x2i)− (fj − fj−1)(x2i−1)

]2=

n∑i=1

[((fj − fj−1)(x2i))

2 − 2(fj − fj−1)(x2i)

(fj − fj−1)(x2i−1) + ((fj − fj−1)(x2i−1))2]

≤n∑i=1

2[((fj − fj−1)(x2i))

2 + ((fj − fj−1)(x2i−1))2]

≤ 2(2n)P2n((fj − fj−1)2) = 4n‖fj − fj−1‖22n

≤ 4n(‖fj − f‖2n + ‖f − fj−1‖2n)2

≤ 4n(δj‖F‖2n + δj−1‖F‖2n)2 = 4n‖F‖22n(δj + δj−1)2

= 4n‖F‖22n(3δj)

2 ≤ 36δ2j |F‖2

2n

which is less than 72δ2jP (F 2) on the set Bn := ‖F‖2

2n ≤ 2P (F 2). Theprobability of Bn goes to 1, since, by the strong law of large numbers (theorem A.2.20 ) :

‖F‖22n = P2n(F 2) = (2n)−1

2n∑j=1

F 2(Xj)→ P (F 2)

P∞–almost surely, as n→∞.

v) We have already achieved a lot, and we gather all the results obtained untilnow in a whole. So bringing everything, up to now, together we have that onthe set Bn equation 5.4.8 is bounded by equation 5.4.9 and which is smallerthan

∑j>r

exp(Hj +Hj−1) supf∈F

2 exp

(− 1

2nη2

j

/n∑i=1

z2i

)

≤∑j>r

exp(2Hj) supf∈F

2 exp

(− 1

2nη2

j

/n∑i=1

z2i

)

≤∑j>r

exp(2Hj)2 exp

(−η2

j

144δ2jP (F 2)

)

By equation 5.4.6 ( η2j ≥ 576P (F 2)δ2

jHj ) in the second step:

2∑j>r

exp

(2Hj288δ2

jP (F 2)

288δ2jP (F 2)

)exp

(−2η2

j

288δ2jP (F 2)

)

≤ 2∑j>r

exp

(η2j

288δ2jP (F 2)

)exp

(−2η2

j

288δ2jP (F 2)

)

and by equation 5.4.7 the last term is convergent, so for r large enough

2∑j>r

exp

(−η2

j

288δ2jP (F 2)

)< ε.

We also choose our ηj such that∑

j≥1 ηj was convergent. So for some rlarge enough

∑j>r ηj < η and almost surely on Bn

Pr

supf∈F|ν0n(f − fr)| > η||A2n

≤ Pr


∑j>r

ηj||A2n

≤ ε

vi) This small part is similar to step 4, where we used an exponential inequalityto bound a certain probability uniformly in F .

If ‖f − g‖22n > δ2

rP (F 2) and if ‖F‖22n ≤ 2P (F 2), i.e. x ∈ Bn, then by

definition of fr:

‖fr − gr‖2n ≤ ‖fr − f‖2n + ‖f − g‖2n + ‖g − gr‖2n

≤ δr‖F‖2n + δr(P (F 2))1/2 + δr‖F‖2n

≤ δr(P (F 2))1/2 + 2δr√

2(P (F 2))1/2

≤ 4δr(P (F 2))1/2.

By an argument as in the fourth step (measurability follows by argumentsas in the second step ):

Pr

sup|ν0n(fr − gr)| : ‖f − g‖2n < δ, f, g ∈ F > η||A2n

≤ (cardF(r, x))2 sup Pr

sup|ν0

n(fr − gr)| :

‖f − g‖2n < δ, f, g ∈ F > η||A2n

where (cardF(r, x))2 ≤ D2F (δr,F) := exp(Hr). And again by Hoeffding’s

inequality D.2.2, as in the fourth step, here applied to

ν0n(fr − gr) := n−1/2

n∑i=1

(−1)e(i)zi, with,

zi := (fr − gr)(x2i)− (fr − gr)(x2i−1);

(see also step 4 for the definition of the random variable e(i) ) we have thatthe last probability is

≤ (cardF(r, x))2 sup 2 exp(−η2/[8‖fr − gr‖2n])

≤ (cardF(r, x))22 exp(−η2/[8 sup ‖fr − gr‖2n])

since∑n

i=1 z2i ≤ 4n‖fr − gr‖2n, and then

≤ 2 exp(2Hr) exp

(−η2

8 ∗ 42δ2rP (F 2)

)

≤ 2 exp

((2Hr)(128δ2

rP (F 2))− η2

128δ2rP (F 2)

)

≤ 2 exp

(−η2

256δ2rP (F 2)

)< ε

if η2 ≥ 512δ2rHrP (F 2) and for δr small enough the last expression will

the less than ε. Note also that we showed in the third step (at the end):∑j δjH

1/2j is convergent, hence the general term goes to zero and since (·)2

is continuous: δ2jHj → 0 as j → ∞. So for any η > 0, we can find an r

large enough so that δ2rHr < η2 and the exponential is smaller than some

given ε and this holds for all x ∈ Bn, where Bn are sets which have proba-bility converging to one, as seen in at the end of the fourth step.

vii) Finally.

sup|ν0n(f − g)| : ‖f − g‖2n < δ

≤ 2 sup|ν0n(f − fr)| : ‖f − g‖2n < δ

+ sup|ν0n(fr − gr)| : ‖f − g‖2n < δ

Hence, by subadditivity of Pr:

Pr(sup|ν0

n(f − g)| : ‖f − g‖2n < δ > η||A2n

)≤ Pr

(2 sup|ν0

n(f − fr)| : ‖f − g‖2n < δ > η||A2n

)+ Pr

(sup|ν0

n(fr − gr)| : ‖f − g‖2n < δ > η||A2n

)

In the 6th step we showed that the second term can be made small if we letr be large, and at the end of the fifth step, again for r large, we showed thatthe first term can be made small. If the parts 4 and 6 we noted that all theevents appearing here are measurable.

Two corollaries for special classes F , from which the second also due to Pol-lard. The first states that the class of indicators of sets in a Vapnik–Cervonenkisclass, together with measurability as seen in chapter 5, section 2 are Donskerclasses. The second says that VC subgraph classes, with an additional measura-bility property, are also Donsker classes.

Corollary 5.4.2. Let (X ,A, P ) be a probability space, F ∈ L2(X ,A, P ) andF := FIC : C ∈ C where C is a Suslin image admissible Vapnik–Cervonenkisclass of sets. Then F is a P–Donsker class.

Proof. Clearly F is measurable and if C is Suslin image admissible via some(Y,S, T ), i.e. X × Y → R : (x, y) 7→ T (y)(x) is jointly measurable and T :Y → C is onto, then Y → F : y 7→ FT (y) is onto, X × Y → R : (x, y) 7→F (x)T (y)(x) is still jointly measurable. And thus F is Suslin image admissible.By proposition 5.4.3, the entropy of F satisfies the condition of equation 5.4.1.Indeed let p = 2, K < +∞ and dens(C) < ω < +∞.∫ 1

0

(logD

(2)F (x,F)

)1/2dx ≤

∫ 1

0

(log(Kx−pω)

)1/2dx

=

∫ 1

0

(log(Kx−2ω)

)1/2dx

=

∫ 1

0

(log(K) + log(x−2ω)

)1/2dx.

Since (a+ b)1/2 ≤ a1/2 + b1/2, for a, b large enough:

≤(

log(K))1/2

+

∫ 1

0

(log(x−2ω)

)1/2dx

= c+√

2ω

∫ 1

0

(log(1/x)

)1/2dx

= c+√

2ω

∫ +∞

1

(log(u)

)1/2u−2 du

where c :=(

log(K))1/2 and where we put 1/x = u. Continuing and putting

log(u) = z

= c+√

2ω

∫ +∞

0

(t)1/2 exp(−2t) exp(t) dt

Recall that the gamma function Γ(z) has the representation of an indefiniteintegral, namely

∫ +∞0

tz−1e−t dt, on the positive complex plane, i.e. on z ∈ C :<(z) > 0 ( proposition IV.1.1 on page 193 in [FreitagAndBusam] ) and satisfiesthe functional equation

Γ(z + 1) = zΓ(z), for all z ∈ C\0,−1,−2, · · ·

by proposition IV.1.2 on page 195 in [FreitagAndBusam], and by propositionIV.1.11 on page 201 in [FreitagAndBusam];

Γ(z)Γ(1− z) =π

sin(πz)for all z ∈ C\Z.

So we conclude that∫ +∞

0

t1/2e−t dt = Γ(3/2) = 1/2Γ(1/2) = 1/2√π,

Hence ∫ 1

0

(logD

(2)F (x,F)

)1/2dx ≤

√log(K) +

√ωπ

2

where the latter term is finite and so theorem 5.4.2 applies, giving the result.

Proposition 5.4.3. Let (X ,A, P ) be a probability space and C ⊂ A, withdens(C) < +∞ and let F ∈ Lp(X ,A, P ) for p ∈ [1,+∞[ and F ≥ 0. IfF := FIC : C ∈ C, then for any ω > dens(C), there is a constant0 < K < +∞ such that

D(p)F (δ,F) ≤ Kδ−pω,

δ ∈]0, 1].

Proof. D(p)F (δ,F) = supD(p)

F (δ,F , γ) : γ ∈ Γ, so let γ ∈ Γ, and G the small-est set in X with γ(G) = 1. If F (x) = 0 for each x ∈ G, then D(p)

F (δ,F , γ) = 1since otherwise, by the definition:

D(p)F (δ,F , γ) :=

sup

m : f1, · · · , fm ∈ F , and all i 6= j,

∫|fi − fj|p dγ > δp

∫F p dγ

.

for fi, fj different functions:

0 = δp∫F p dγ <

∫|fi − fj|p dγ

= (‖fi − fj‖Lp(γ))p

≤ (‖fi‖Lp(γ) + ‖fj‖Lp(γ))p

≤ 2p(‖F‖Lp(γ) = 0

a contradiction. But if D(p)F (δ,F , γ) = 1 then for any K ≥ 1

D(p)F (δ,F , γ) ≤ Kδ−pω, 0 < δ ≤ 1 .

We may suppose that for some x ∈ G : F (x) > 0. Let C(1), · · · , C(m) ∈ C,with m maximal and such that

(‖FIC(i) − FIC(j)‖Lp(γ))p > δp

∫F p dγ

for all i 6= j. Let Q = Qγ be the probability measure defined by

B 7→(∫

B

F p dγ)/

γ(F p)

then, since |FIC(i) − FIC(j)| = 1 iff x ∈ C(i)\C(j) or x ∈ C(j)\C(i):

Q(C(i)∆C(j)) =(∫

C(i)∆C(j)

F p dγ)/

γ(F p) > δp

Then by maximality and theorem 3.2.1, there is a strictly positive constant K <+∞ depending only on ω and C such that

D(p)F (δ,F , γ) ≤ m ≤ D(δp, C, dQ) ≤ K(ω, C)δ−pω

where dQ(A,B) := Q(A∆B) for A,B ∈ A, as in definition 3.2.1.

The second corollary as announced just after the central limit theorem (5.4.2).

Corollary 5.4.3 (Pollard). Let (X ,A, P ) be a probability space and let F be aSuslin image admissible Vapnik–Cervonenkis subgraph class of functions withenvelope F ∈ L2(X ,A, P ). Then F is a P–Donsker class.

Proof. This is an immediate consequence of theorem 5.4.2 and theorem 3.2.2 forp = 2.

Indeed, by theorem 3.2.2, we have got thatD(2)F (ε,F) is bounded byA(2/ε2)W

for any W ≥ S(C) and A = A(S(C),W ) < +∞ a constant. By calculations

similar as those executed in corollary 5.4.2 one shows that∫ 1

0

√log(D

(2)F (ε,F))

is finite, so that theorem 5.4.2 applies.

Appendix A

Topology and Measure Theory.

A.1 Metric and topological spaces.

A.1.1 Definitions.Definition A.1.1. A pair (S, T ), where S is a set and T a class of subsets of S,which satisfies:

i) S and ∅ are belong to T ,

ii) whenever O1, O2 ∈ T , then O1 ∩O2 ∈ T ,

iii) let I be any (index)set andA = Oi |Oi ∈ T , i ∈ I, then ∪A = ∪i∈IOi ∈T too.

is said to be a topological space. T is called a topology on S.Let S be any set. A function

d : S × S → R+ : (s, r) 7→ d(s, r)

is said to be a metric iff it has the properties:

M1) d(s, r) = 0 ⇐⇒ s = r,

M2) d(s, r) = d(r, s),

M3) d(s, t) 6 d(s, r) + d(r, t)

for all s, r, t in S. The pair (S, d) is called a metric space .A topological space (S, T ) is said to be separable iff there is a countable set

D ⊂ S such that for any s ∈ S and s ∈ O ∈ T : O ∩ D 6= ∅, i.e. the set D isdense.

105

106 APPENDIX A. TOPOLOGY AND MEASURE THEORY.

A subclass B of a topology T is said to be a base for T iff every O ∈T is a union of elements of B. A topological space (S, T ) is said to be sec-ond–countable or A2 iff T has a countable base.

There are many other ways to introduce a topology on a set. We will discusstwo of them which are used in this texts.

For the first characterization we need the concept of closure (operator).

Definition A.1.2. Let (S, T ) be a topological space and F ⊂ S: F is said to beclosed in S iff S\F ∈ T , i.e. the complement of F is open.

Let (S, T ) be a topological space and E ⊂ S, the closure of E in S is the set

E = ClS(E) = ∩F ⊂ S : F is closed and E ⊂ F.

Theorem A.1.1. The operation A 7→ A on a topological space (S, T ) satisfiesthe following properties:

a) E ⊂ E;

b) (E) = E;

c) A ∪B = A ∪B;

d) ∅ = ∅;

e) E is closed in S iff E = E.

Moreover, given a set S and a mapping A 7→ A of P(S) into P(S) satisfying (a)through (d), if we define closed sets in S by condition (e), we have that S becomesa topological space and its closure operator will then be the same operation westarted with.

Proof. We refer to [Will] chapter 1 theorem 3.7 on page 25 for a proof.

The second characterization by mean of the neighborhoods of points.

Definition A.1.3. Let (S, T ) be a topological space and let s ∈ S, a neighbor-hood of s is a set U which contains an open set V , i.e. V ∈ T , containing s.The collection of all neighborhoods of the point s will be denoted by V(s) and iscalled a neighborhood system at s.

Theorem A.1.2. Let (S, T ) be a topological space and s ∈ S; the neighborhoodsystem V(s) at s has the following properties:

a) if U ∈ V(s), then x ∈ U ;

A.1. METRIC AND TOPOLOGICAL SPACES. 107

b) for all U, V ∈ V(s): U ∩ V ∈ V(s);

c) for all U ∈ V(s) there exists a V ∈ V(s): U ∈ V(t) for all t ∈ V ;

d) for all U ∈ V(s) and V ⊂ S: if U ⊂ V , then V ∈ V(s);

e) G ⊂ S is open iff G contains a neighborhood of each of its points..

Conversely, if in a set S, a collection V(s) of subsets of S is assigned to eachs ∈ S, so as to satisfy (a) through (d), and if (e) is used to define “open“ sets,the result is a topology on S, in which the neighborhood system at each s ∈ S isprecisely V(s).

Proof. We refer to [Will] chapter 1 theorem 4.2 on page 31 for a proof.

Here is a lemma which gives the relation between separable andsecond–countable topological spaces.

Lemma A.1.3. Let (S, T ) be a second–countable topological space, then S isseparable. If T is induced by a metric, i.e. T is the topology generated by themetric d, and S is separable, then S is also second–countable.

Proof. For the first assertion let B be a countable base, from each non–emptyB ∈ B pick some b ∈ B. The set D := b : b ∈ B is at most countable, andits intersection with any open set is non void, because it contains an element fromeach set of the base of the topology. Thus D is dense.

If T is induced by a metric, andD is a at most countable dense set, then define

B := B(b, 1/n) : b ∈ D,n ≥ 1.

If O is open then O = ∪b∈O∩DB(b, 1/nb) for 1/nb as big as possible. Becausefor every a ∈ O : B(a, 1/m) ⊂ O for some m := m(a). But D is dense soD ∩ B(a, 1/(2m)) 6= ∅. Thus for some b ∈ D ∩ O : a ∈ B(b, 1/(2m)) ⊂ O andB(b, 1/(2m)) ⊂ B(b, 1/nb).

We give here a definition of the important Hausdorff topological space.

Definition A.1.4. Let (S, T ) be a topological space. (S, T ) is called a Hausdorfftopological space iff for every x, y ∈ S : x 6= y, there exists open neighbourhoudsVx ∈ V(x) and Vy ∈ V(y) such that Vx ∩ Vy = ∅.

Now we come to the definition of compact topological spaces.


Definition A.1.5. Let (K, T ) be a topological space. (K, T ) is called a compacttopological space iff for every collection of open sets that covers K, there existsa finite open subcover.

There is an equivalent characterization in terms of (ultra)filters. Recall that acollectionW of subsets of a set X , is said to be a filter, iff it satisfies

a) W 6= ∅ and ∅ /∈ W;

b) for all W,Z ⊂ X: if W ∈ W ,W ⊂ Z, then Z ∈ W;

c) for all W,Z ∈ W: W ∩ Z ∈ W .

A filter U , which is maximal in the class of filters, i.e. for any other filterM suchthat U ⊂ M, U = M, is said to be an ultrafilter . Let (S, T ) be a topologicalspace, a filterW is said to converge to s ∈ S iff V(s) ⊂ W . Now we can state anequivalent condition for compactness of S, S is said to be a compact topologicalspace iff every ultrafilter U on S converges.

For a proof of the equivalences between this two apparently different defini-tions, see e.g. [Will] chapter 6 theorem 17.4 on page 118 or [Dud1] theorem 2.2.5on page 36.

Proposition A.1.4. Let (K, T ) be a compact topological space and (S, T ′) anytopological space. If g is a continuous function from K into S, then g(K) iscompact.

Proof. Let Ui : i ∈ I, I index set, be an open cover of g(K). Theng−1(Ui) : i ∈ I is an open cover of K. Hence there exists a finite opensubcover g−1(Uj) : j ∈ J that covers K. Because

g(g−1(Uj)) ⊂ Uj so g(K) ⊂ ∪j∈JUj.

Two propositions about compactness and the Hausdorff property.

Proposition A.1.5. Let (S, T ) be a compact topological space. Then any closedsubset F of S is compact. If S is Hausdorff, then a compact subspace of S isclosed.

Proof. Let F be closed in S. Let Ui : i ∈ I be open sets such that F ⊂ ∪i∈IUi.Then S ⊂ (∪i∈IUi) ∪ (S\F ), since S is compact there exists a finite

J ⊂ I : S ⊂ (∪i∈JUi) ∪ (S\F ),

but then F ⊂ (∪i∈JUi) hence compact.


LetK be a compact set contained in S. Let k ∈ K. Suppose k /∈ K. Then it isenough to show that we can separate k andK by disjoint open neighbourhoods,which would be a contradiction with k ∈ K. By the Hausdorff property for eachl ∈ K there exists a Vl ∈ V(l) and Wl ∈ W(k) disjoint open neighbourhoods ofl respectively k. Then K ⊂ ∪l∈KVl and by compactness there exists a finitesubset L of K such that K ⊂ ∪l∈LVl. The sets V := ∪l∈LVl and W := ∩l∈LWl

remains open. The latter is an open neighbourhood of k which has an emptyintersection with K.

W ∩K ⊂ W ∩(⋃l∈L

Vl

)⊂⋃l∈L

(W ∩ Vl) ⊂⋃l∈L

(Wl ∩ Vl) = ∅

Proposition A.1.6. Let (S, T ) be a Hausdorff topological space, K ⊂ S compactand y ∈ S\K. Then there exists disjoint (open) neighborhoods V of y and W ofK.

Proof. Let k ∈ K, and since y 6= k, we choose a Vk, V ky disjoint open neigh-

borhoods of k respectively y by the Hausdorff property. Then K ⊂ ∪k∈KVk andby compactness there exists a finite L ⊂ K: K ⊂ ∪k∈LVk. Let VK := ∪k∈LVk,VK ∈ VS(K) and W := ∩k∈LV k

y ∈ VS(y). Finally: V ∩W = ∅.

Theorem A.1.7. Let K be a compact topological space, S a Hausdorff topologi-cal space and g a continuous bijection from K into S. Then g is a homeomorfism,i.e. g−1 is continuous too.

Proof. LetG be an open set ofK, consider (g−1)−1(G) = g(G). SinceG is open,K\G =: F ⊂ K, is compact, as a closed subspace of a compact space.

Since g is continuous, g(F ) is compact in S, and closed (a compact set in aHausdorff space is closed ). Then g(G) = S\g(F ) is open.

Products allow one to create new spaces from old ones. If the spaces in thebeginning were topological spaces, then one can define a new topology on theproduct such that all projections are continuous, namely the product topology.This topology is the smallest which makes the projections continuous.

Definition A.1.6. Let (Si, Ti) be topological spaces, and i ∈ I an index set. Let∏i∈I Si denote the product of the spaces Si. Let πj :

∏i∈I Si → Sj be the

projection onto Sj . The product topology, on∏

i∈I Si denoted by∏

i∈I Ti, is thetopology generated by or having as basis the sets ∩kl=1Aml , where

Aml := π−1ml

(G)


for G any open set of (Sml , Tml). It is known as the coarsest (i.e. smallest) topol-ogy making all the projections πj continuous. It is the topology of pointwiseconvergence.

We state three propositions about properties that countable product spaces in-herit from their (metric) factors: the first is about metrizability, the second aboutseparability and the third about completeness.

Proposition A.1.8. Let (Sn, dn) be metric spaces for n ≥ 1. The product spaceS :=

∏∞n=1 Sn equipped with the product topology is metrizable, e.g. for the

metric:d(xn, yn) :=

∑n≥1

2−nf(dn(xn, yn)),

with f(t) := t/(1 + t).

Proof. We refer to [Dud1] proposition 2.4.4 on page 50 or to [Will] chapter 7theorem 22.3 on page 161 for a proof.

Proposition A.1.9. Let (Sn, Tn) be (non empty) topological spaces for n ≥ 1.Then S :=

∏n≥1 Sn with the product topology is separable.

Proof. We proof the proposition only for countable product. For an optimal resultwe refer to [Will] theorem 16.4(c) on page 109.

Let Dn be a dense subset of Sn. We choose an an ∈ S (this can be done,since by AC S is non empty), and define D as the union over all n of

∏nj=1Dj ×∏

j≥n+1 aj , then obviously D is at most countable. It is also dense since an set ofthe basis for the product topology is given by ∩nj=1π

−1ij

(Aij), with Aij open in Sij ,then (

n⋂j=1

π−1ij

(Aij) ∩D

)⊃

(n⋂j=1

π−1ij

(Aij) ∩( in∏j=1

Dj ×∏

j≥in+1

aj

))

and if k ∈ i1, · · · , in Dij ∩ Aij 6= ∅; k /∈ i1, · · · , in, k ≤ in and for k 6= ij ,1 ≤ j ≤ n and k ≥ in, Sk ∩ ak = ak. Hence D has a non empty intersectionwith any set of the basis, thus also with any open neighbourhood of any point snin S. So D is dense.

Since a countable product of metric spaces, with product topology is metriz-able, proposition A.1.8, it makes sense to consider if completeness is carried over.This is the purpose of the next proposition.

Proposition A.1.10. Let (Sn, dn) be complete metric spaces, the product space Swith product topology and with metric as in proposition A.1.8 is complete.

Proof. Let snmn≥1m≥1 be a Cauchy sequence in S for d and xm := snmn≥1,then xm ∈ S. For n fixed, consider πn(xm). By proposition 2.4.3 on page 49of [Dud1] the identity from (Sn, f dn) → (Sn, dn) is uniformly continuous, sofor a sequence ykk≥1 in Sn: yn is Cauchy for f dn iff it is Cauchy for dn.From the definition of the metric d on the product space S, it is easy to see thatπn(xm)m≥1 is a Cauchy sequence in (S, fdn), therefore also in (Sn, dn), whichis complete by assumption. Let ε2−n > 0 then for any p, q ≥ N for some N ≥ 1:

f(dn(snp, snq))2−n ≤

∑n≥1

2−nf(dn(snp, snq)) = d(xp, xq) < ε2−n

So coordinate wise we have a limit say x(n). Let x := x(n)n≤1, finally we needto show xm → x in (S, d). Let η > 0 and fix smallest n so that 2−n < η/2,choose Ni, 1 ≤ i ≤ n such that f(di(sip, x

(i))) < η/2n for all p ≥ Ni and letN := maxni=1Ni + 1; then for all q ≥ N :

∑n≥1

2−nf(dn(snq, x(n))) <

n∑i=1

2−if(dj(siq, x(i))) + 2−n

∑j≥1

2−jf(dj(sjq, x(j)))

≤n∑i=1

η/2n+ 2−n∑j≥1

2−j < η

since f ≤ 1.

Often real–valued sequences don’t converge, but bounded monotone sequencedo converge. For bounded sequence, one can define convergent subsequences.

Definition A.1.7. Let xnn≥1 be a bounded sequence of real numbers. Defineyn := supk≥n xk. The yn form a bounded non increasing sequence, thus a conver-gent one, with limit infn yn. Denote by lim supn xn := limn→∞ supk≥n xk, as saidabove the limit exists and equals infn yn.One can define zn := infk≥n xn the zn then form a bounded non decreasing se-quence. So it must converge to its supremum, supn zn. Then let lim infn xn :=limn→∞ infk≤n xk. This lim inf equals then supn zn.

Lemma A.1.11. Let xnn≥1 be a bounded sequence in R. If y < lim supn→∞ xnthen there exists a subsequence xk(n)n≥1 : xk(n) > y.

Proof. By contradiction, suppose such subsequence doesn’t exist. Then, for onlyfinitely many indices, xn > y, so there exists a n0 : xn ≤ y for all n ≥ n0 andsupk≥n xk ≤ y for all n ≥ n0, implying that lim supn xn ≤ y, a contradiction.


Lemma A.1.12. Let R, S be topological spaces, where S moreover is Hausdorff,and f a continuous function from R into S. Let Fn be closed sets of R for n ∈ Nand K compact set of R such that Fn ↓ K, as n→∞. If for every U ⊂ K open,there is an n : Fn ⊂ U then:

f [K] =⋂n≥1

f [Fn] =⋂n≥1

f [Fn].

Proof. The inclusions from the left to the right follows from: K = ∩nFn and

f [K] = f

[ ⋂n≥1

Fn

]⊂⋂n≥1

f [Fn] =⋂n≥1

f [Fn] ⊂⋂n≥1

f [Fn].

For the converse inclusions, we propose two ways. For both it is enough to show

f [K] ⊃ ∩nf [Fn].

⊃: shorter proof. By considering the complements, one has to show:

S\f [K] ⊂ ∪nS\f [Fn].

Let y ∈ S\f [K]. Since f is continuous and K compact, f [K] is compact too,proposition A.1.4. In any Hausdorff space, a compact set and a singleton notbelonging to that compact set can be separated by disjoint open neighborhoods,proposition A.1.6.Hence there are V ∈ VS(f [K]),W ∈ VS(y) open and disjoint. Since f is contin-uous, there exists an open neighborhood U ∈ VR(K) : (f [K] ⊂)f [U ] ⊂ V . Nowby the condition, for some N large enough, Fn ⊂ U , n ≥ N . Hencef [Fn] ⊂ f [U ], and y /∈ f [Fn] (the converse leading to the contradictionV ∩ f [Fn] = ∅). So for n ≥ N :

y ∈ S\f [Fn] ⊂ ∪n≥NS\f [Fn] ⊂ ∪nS\f [Fn].

⊃: longer proof. So take y ∈ ∩nf [Fn] and suppose that for every x ∈ K has

open neighborhood Vx : y /∈ f [Vx]. The Vx form an open cover of the compactset K, let Vxi ; i = 1, · · · , n be an open subcover, which we will denote by U .Now since U is open, U ⊃ K and f [U ] = ∪ni=1f [Vxi ] by definition of the closureoperator, y /∈ f [U ]. But the condition on the Fn implies Fn ⊂ U from a certain,fixed index n on. Thus y /∈ f [U ] ⊃ f [Fn] ⊃ f [Fn] = f [Fn], a contradiction.

So take x ∈ K such that y ∈ f [V ]¯ for every V open neighborhood of x. Wecan drop the closure because if y /∈ f [V ] for some V open neighborhood of x, thenf(x) 6= y and by the Hausdorff property there are open disjoint neighborhoods

V1, V2 of f(x), y. But then V C2 is a closed neighborhood of f(x) and f−1(V1) ⊂

f−1(V C2 ) where the first is open, the second closed and both neighborhoods of x.

f [f−1(V1)] ⊂ f [f−1(V C2 )] ⊂ V C

2 = V C2

which contradicts y ∈ f [V ] for all open neighborhoods V of x. Then we claimthat for that x we have f(x) = y. If it would not be true, then again by the Haus-dorff property there are disjoint open neighborhoods W,Z from f(x), y. But thenf−1(W ) is an open neighborhood of x such that y ∈ f [f−1(W )]. A contradictionwith the choice of x. So x = f(y).

Definition A.1.8. Let (S, d), (S′, d′) be two metric spaces, f : S → S

′ a function.Then f is said to be k–Lipschitz continuous iff for all r, s ∈ S:

d′(f(r), f(s)) ≤ kd(r, s).

Lemma A.1.13. Let (S, d) be a metric space and A any subset of S.

i) The functiond(x,A) := infd(x, y) : y ∈ A

is a function (any non–empty subset of R that is bounded from below has agreatest lower bound, which is the same as the infimum) that is 1–Lipschitzcontinuous.

ii) The function gk(x) := max(1− kd(x,A), 0) is k–Lipschitz, k = 1, 2, · · · .

Proof.

i) We have to show that |d(x,A)−d(y, A)| ≤ d(x, y). Note first that d(x,A) ≤d(x, z) for any z ∈ A, so that by the triangle inequality it follows:

d(x,A) ≤ d(x, z) ≤ d(x, y) + d(y, z).

Because this holds for all z ∈ A, we get d(x,A) − d(y, A) ≤ d(x, y).Interchanging the roles of x and y gives the desired result.

ii) If gk(x) = 0 = gk(y) there is nothing to prove, if gk(x) > 0 and gk(y) > 0

|gk(x)− gk(y)| = k|d(x,A)− d(y, A)| ≤ kd(x, y),

by step (i). Finally if gk(x) = 0 and gk(y) > 0, then

1− kd(x,A) ≤ 0 ⇐⇒ 1 ≤ kd(x,A)

0 < 1− kd(y, A) ⇐⇒ kd(y, A) < 1

kd(y, A) < 1 ≤ kd(x,A) ⇐⇒ 0 < 1− kd(y, A) ≤ kd(x,A)− kd(y, A)


so

|gk(y)| = |1− kd(y, A)| ≤ k|d(x,A)− d(y, A)| ≤ kd(x, y).

Hence gk is k–Lipschitz.

Definition A.1.9. Let (R, T ) be a topological space, it is said to be Polish iffthere exist a metric d that metrizes the topology and such that d is a complete andseparable metric. A separable measurable space (Y,S) will be called a Suslinspace iff there exists a Polish space R and a Borel measurable map from R ontoY . If (Y,S) is a measurable space, a subset Z of Y will be called Suslin set iff(Z,Z u S) is a Suslin space, where Z u S := Z ∩ S : S ∈ S is the relativeσ–algebra on Z.

Let (X ,B) be a given measurable space and M ⊂ X ,M is called universallymeasurable (u.m.) iff for every probability measure P on B,M is measurablefor the completion of P . Or in other words there exists A,B ∈ B : A ⊂ M ⊂ Band P (A) = P (B).

Polish spaces behave well under taking (at most countable) products as is seenin the following lemma.

Lemma A.1.14. Let (S, d) be a Polish space, then∏

i≥1 Si with the product topol-ogy is also Polish.

Proof. By proposition A.1.8 the product topology is metrizable. The producttopology a countably many spearable spaces remains separable, proposition A.1.9.Finally, as seen in proposition A.1.10 the countable product remains complete. Soas claimed, S with the product topology is metrizable by a complete and separablemetric.

Lemma A.1.15. Let (Y,S) be a Suslin measurable space, then Y k with the prod-uct σ–algebra remains a Suslin measurable space.

Proof. Since (Y,S) is a Suslin measurable space, by definition, there exists a Pol-ish space (S, d) and a Borel measurable map b from (S, d) onto (Y,S). (Sk, dsum)is again Polish (lemma A.1.14 ) and

(b, · · · , b) : (Sk, dsum)→ (Y k,⊗ki=1S)

is measurable. (Y k,⊗ki=1S) as a finite product of separable measurable spaceremains separable (theorem A.2.11 ).

A.1.2 Some important theorems.

The first theorem, due to Tychonoff, describes compact sets in arbitrary productspaces. It is a very powerful theorem. Actually it even happens to be equivalentto the Axiom of Choice.

Theorem A.1.16 (Tychonoff). Let (Kj, Tj)j∈J be a family of compact topolog-ical spaces, where J is an index set.Then the product space (

∏jKj,

∏j Tj) endowed with the product topology is

compact too.

Proof. We refer to [Dud1] theorem 2.2.8. page 39 or [Will] chapter 6 theorem17.8 page 120 for a proof.

Compact sets play an important role everywhere in mathematics. In Rd,Cd

(with the usual Euclidean metric) compact sets are those sets that are closed andbounded. More generally in uniform spaces compactness is equivalent to com-pleteness together with total boundedness. Here we deal primarily with functionsspaces, e;g. space of all real–valued continuous functions or all real–valued cadlag(right continuous with left limits) functions on some (compact) (Hausdorff) topo-logical space (K, T ). The next theorem from Arzela–Ascoli completely charac-terizes relative compactness in function spaces. It will turn out that the conceptof equicontinuity plays an important role. Before going to the theorem we give adefinition of equicontinuity.

Definition A.1.10. Let (S, d) be a metric space and F ⊂ C(S). F is said to beequicontinuous at s ∈ S iff for each ε > 0 there is a δ > 0 such that for allt ∈ S : d(s, t) < δ:

|f(s)− f(t)| < ε for all f ∈ F .

If F is equicontinuous at every s ∈ S, then F is said to be equicontinuous. If forany ε > 0 the δ > 0 in the definition of equicontinuity is suitable for any s, t ∈ Sless than δ away from each other, then F is called uniformly equicontinuous.

Theorem A.1.17. Let (K, d) be a compact metric space and (S, e) a metric space.Any equicontinuous family from K into S is uniformly equicontinuous.

Proof. Suppose by contradiction that it is not the case, then from the negation ofthe definition of uniformly equicontinuous family it follows that there exists anε > 0 such that for δn = 1/n > 0 there are xn, yn ∈ K and fn ∈ F such that

d(xn, yn) < 1/n, but e(fn(xn), fn(yn)) ≥ ε

By compactness ofK we choose a subsequence xk(n)n≥1 that converges to somex ∈ K. Because yk(n)n≥1 is an equivalent sequence, yk(n) → x. For n largeenough, by equicontinuity one has

e(fk(n)(xk(n)), fk(n)(x)) < ε/2 and e(fk(n)(yk(n)), fk(n)(x)) < ε/2

This would imply e(fk(n)(xk(n)), fk(n)(yk(n))) < ε, a contradiction.

A first corollary of theorem A.1.17 is that continuous functions on compactdomains are in fact uniformly continuous. And this remains true for any finitefamily of continuous functions.

Corollary A.1.11. Let (K, d) be a compact metric space, (S, e) a metric spaceand f a continuous function from K into S. Then f is uniformly continuous.

Proof. By the previous theorem, A.1.17, it is enough to show that the family fis equicontinuous. But in this case this is reduced to the continuity of f .

Theorem A.1.18 (Arzela–Ascoli). Let (K, e) be a compact metric space andF ⊂C(K), where C(K) is the space of all real–valued continuous functions equipedwith the uniform topology (induced by uniform metric). F is totally bounded in(C(K), d∞) iffF is uniformly bounded (i.e. bounded for d∞) and equicontinuous.

Proof. Assume thatF is totally bounded. We will first prove the (uniform) equicon-tinuity ofF . Let ε > 0, then there are f1, · · · , fn ∈ F such thatF ⊂ ∪nj=1Bd∞(fj, ε).Each of the functions fj is a continuous function from a compact space to a metricspace, and so is uniformly continuous. The set f1, · · · , fn is finite. That is isuniformly equicontinuous is seen as following: for ε > 0, chose δj > 0 such that

|fj(x)− fj(y)| < ε whenever d(x, y) < δj

Let δ := minδj(ε) : j = 1, · · · , n, then δ > 0 and for j = 1, · · · , n:

|fj(x)− fj(y)| < ε whenever d(x, y) < δ.

For all f ∈ F for δ(= minδj(ε) : j = 1, · · · , n), if d(x, y) < δ then:

|f(x)− f(y)| < |f(x)− fj(x)|+ |fj(x)− fj(y)|+ |fj(y)− f(y)| < 3ε.

F is totally bounded, then also bounded. For ε = 1 there are f1, · · · , fm ∈ F suchthat F ⊂ ∪mi=1Bd∞(fi, 1). let M := maxd∞(fi, fj) : 1 ≤ i < j ≤ m < +∞.Then obviously F ⊂ Bd∞(f1,M + 1).

Conversely, let F be equicontinuous and uniformly bounded, then by theoremA.1.17 F is uniformly equicontinuous. Because F is uniformly bounded there

exists 0 < M < +∞ such that |f(x)| < M for all x ∈ K and f ∈ F . The set[−M,M ] is compact. Also C(K) ⊂ RK := g : K → R function . So F ⊂ RK

and since f(x) ∈ [−M,M ] for each x ∈ K, f ∈ F , one sees: F ⊂ [−M,M ]K ,which is a compact space by Tychonoff’s theorem (A.1.16). Taking the closureof F in the product topology of RK (or the relative topology in [−M,M ]K) anddenoting it G, one has that G as a closed set of a compact set is again compact. Gwill also also inherit the uniform equicontinuity from F . For every ε > 0, let

A := k ∈ RK : |f(x)− f(y)| ≤ ε for every x, y ∈ K

This set is closed in RK . Indeed, consider h ∈ cl(Ax,y), where the closure is takenfor the topology of RK . Let

Ax,y := k ∈ RK : |f(x)− f(y)| ≤ ε

If Ax,y is closed for every x, y ∈ K then A = ∩x,y∈KAx,y will be too. Weclaim that Ax,y is closed for the product topology. Indeed, consider h ∈ cl(Ax,y),where the closure is taken for the topology of RK . Then Ax,y has a non–emptyintersection with any (open) neighbourhoud of h, by definition A.1.2. We need tohave that, for any x, y ∈ K,

|h(x)− h(y)| ≤ ε

In the product topology open sets are of the form ∩ni=1π−1xi

(Oi) where Oi is openin R, see definition A.1.6. Take here the open neighbourhood

Vx,y,n := g ∈ RK : |g(x)−h(x)| < 1/2n∩g ∈ RK : |g(y)−h(y)| < 1/2n

Then, since h ∈ cl(Ax,y) : Vx,y,n ∩ A 6= ∅, for each n choose gn ∈ Vx,y,n ∩ A, so

|h(x)− h(y)| ≤ |gn(x)− h(x)|+ |gn(y)− gn(x)|+ |gn(y)− h(y)| ≤ ε+ 1/n.

F was assumed to be uniformly equicontinuous. So for ε > 0 there is a δ > 0such that for all x, y ∈ K, with e(x, y) < δ and f ∈ F : |f(x) − f(y)| ≤ ε. Asabove the set

k ∈ RK : |k(x)− k(y)| ≤ ε for every x, y ∈ K : e(x, y) < δ

is closed in the product topology, and includes F , thus includes G too. Hence re-peating the previous argument for every ε > 0, G is seen to be uniformly equicon-tinuous.

Then last step will be to prove that G is compact for the uniform topology.So let U be an ultrafilter in G. Because G is compact for the product topology,U converges to some g ∈ G, i.e. U contains the neighbourhood filter (in product

topology) of g. G was seen to be uniformly equicontinuous, for ε > 0, let δ > 0such that whenever

e(x, y) < δ then |f(x)− f(y)| ≤ ε/4 < ε/3

for all f ∈ G. K is compact, in particular totally bounded, let S be a finite subsetof K such that for any y ∈ K : e(x, y) < δ for some x ∈ S. Let

U := k ∈ RK : |k(x)− g(x)| < ε/3 for all x ∈ S

=⋂x∈S

π−1x

(]g(x)− ε/3, g(x) + ε/3[

),

so U is open in for the product topology on RK , it contains g. Now U is an (open)neighbourhood of g, hence U ∈ U . Take Bd∞(g, ε). If U ⊂ Bd∞(g, ε), thenBd∞(g, ε) ∈ U , meaning U → g in the uniform topology. This in turn wouldimply compactness of G for the uniform topology. Compactness implies totallyboundedness, which is trivially inherited by subspaces, so F would be totallybounded. Thus it suffices to prove the inclusion U ⊂ Bd∞(g, ε) to finish theprove. Let k ∈ U , for any y ∈ K and x ∈ S : e(x, y) < δ:

|k(y)− g(y)| < |k(y)− k(x)|+ |k(x)− g(x)|+ |g(x)− g(y)| < ε.

The |k(y) − k(x)|, |g(x) − g(y)| are small, due to uniform equicontinuity of G,the middle because k ∈ U . Since the above calculation holds uniformly over ally ∈ K, the inclusion is proved.s

Definition A.1.12. A set V is said to be a real vector space iff there are twooperations, denoted + and .

+ : V × V → V : (v,w) 7→ v + w,

. : R× V → V : (r,v) 7→ r.v

on V such that (V,+) is an abelian group and

(ab).v = a(b.v) and 1.v = v

and

(a+ b).v = a.v + b.v

a.(v + w) = a.v + a.w

hold for all a, b, r ∈ R,v,w ∈ V .


A triple (R,+, ·), with R a se and two operations, + and · defined on it, is saidto be a ring iff

+ : R×R→ R : (r, s) 7→ r + s,

· : R×R→ R : (r, s) 7→ r · s

such that (R,+) is an abelian group, (R, ·) a semigroup with unity and

(r + s) · t = r · t+ s · tt · (r + s) = t · r + t · r

for all r, s, t ∈ R. A quadruple (A,+, ·, .) is said to be an R–algebra iff (A,+, ·)is a ring, (A,+, .) is an R–vector space and

r.(a · b) = (r.a) · b

for all r ∈ R; a, b ∈ A.A vector space V is said to be a vector lattice if for every v ∈ V

v+ := max(v,0) ∈ V

This lemma provides an explanation of the definition of vector lattice. Usuallya lattice is a set together with a partial order and where for each two elements,there exists a unique supremum and infimum.

Lemma A.1.19. Let F be a vector lattice. Then max(f, g),min(f, g) ∈ F for allf, g ∈ F .

Proof. Let f, g ∈ F , it is easy to see that:

max(f, g) =1

2[(f + g) + |f − g|]

min(f, g) =1

2[(f + g)− |f − g|]

So it suffices to show that |f − g| ∈ F . Now |f − g| = f − g, where f ≥ g andg − f , where f ≤ g. Then

(f(x)− g(x))If≥g = max(f − g, 0) = (f − g)+

(g − f)(x)If≤g = max(g − f, 0) = (g − f)+.


The following theorem, due to Stone and Weierstrass, where the latter provedit for C([0, 1]) and the former generalized it, states conditions on a subset F ⊂C(K) = Cb(K), for K some compact topological space, such that it is dense inthe uniform topology. Since weak convergence of variables is actually point wiseconvergence of functionals on Cb(S), the Stone–Weierstrass theorem provides away to find convergence determining sets.

Theorem A.1.20 (Stone–Weierstrass). Let (K, T ) be any compact Hausdorff topo-logical space and F ⊂ C(K) with the uniform topology, i.e. the topology inducedby d∞(f, g) = supx∈K ‖f(x)− g(x)‖.

If F is an algebra (definition A.1.12 ), separates points in K, i.e. for everyx, y ∈ K: x 6= y there exists an f ∈ F: f(x) 6= f(y), and contains the constants;then F is dense in (C(K), d∞).

Proof. We refer to [Dud1] theorem 2.4.11. page 54 or [Will] chapter 10 theorem44.5 page 291 for a proof.

Another version of the Stone–Weierstrass theorem, where less restrictions areput on the class F .

Theorem A.1.21. Let (K, T ) be any compact Hausdorff topological space andF ⊂ C(K) with the uniform topology, i.e. the topology induced by d∞(f, g) =supx∈K ‖f(x)− g(x)‖.

If F is a vector lattice (definition A.1.12 ), separates points inK, i.e. for everyx, y ∈ K: x 6= y there exists an f ∈ F: f(x) 6= f(y), and contains the constants;then F is dense in (C(K), d∞).

Proof. We refer to the book by J.O. Jameson Topology and Normed Spaces, Chap-man and Hall, London, 1974 on page 263 for a proof.

Lemma A.1.22. Let T be a metric space and F ⊂ Cb(`∞(T )),

F := h : `∞(T )→ R :z → h(z) := G(z(t1), · · · , z(tk)) :

G ∈ Cb(Rk); ti ∈ T, i = 1, · · · , k; k ∈ N.

Then F is an algebra, a lattice and a vector space.

Proof. This is easy to see.

This lemma shows that one can not only approximate on compact sets, butthat the approximation can be extended to some open neighborhood of the formKδ := y ∈ S : d(s,K) > δ of K.

Lemma A.1.23. Let (S, d) be a metric space, K ⊂ S compact and F a subal-gebra of Cb(S). Then for all f ∈ Cb(S) for any ε > 0 there exists a δ > 0 andF ∈ F such that |f(x)− F (x)| ≤ ε/3 for all x ∈ Kδ.

Proof. By the Stone–Weierstrass theorem (A.1.20 ) f ∈ Cb(S) can be uniformlyapproximated on K by some F ∈ F .

Since K is compact:

Kδ := s ∈ S : d(s,K) < δ

Then for any δ > 0

K,Kδ ⊂⋃x∈K

B(S,d)(x, δ),

so there exists a finite subset xδ1, · · · , xδNδ ⊂ K

K ⊂Nδ⋃i=1

B(S,d)(xδi , δ).

Now we would like to have Kδ ⊂ ∪Nδi=1B(S,d)(xδi , η) too for some η > 0. Let

η = 2δ then Kδ ⊂ ∪Nδi=1B(S,d)(xδi , 2δ). Indeed, note that

K ⊂Nδ⋃i=1

B(S,d)(xδi , δ) ⊂

Nδ⋃i=1

B(S,d)(xδi , 2δ)

and if y ∈ Kδ\K:

d(xδi , y) < d(k, y) + d(k, xδi ) < 2δ,

where we choose k ∈ K : d(y,K) = d(y, k) < δ and choose xδi : k ∈B(S,d)(x

δi , δ) ⊂ B(S,d)(x

δi , 2δ). Since f, F are both continuous functions on K

compact, we know that f, F are uniformly continuous on K (corollary A.1.11).For ε/24 choose δ > 0

|f(x)− f(y)| < ε/24 and |F (x)− F (y)| < ε/24,

for all x, y ∈ K : d(x, y) < 2δ. Then on K : |f(x)− F (x)| ≤ ε/4 < ε/3 and fory ∈ Kδ\K, if d(xδi , z) < 2δ:

|f(z)− F (z)| ≤ |f(z)− f(xδi )|+ |f(xδi )− F (xδi )|+ |F (xδi )− F (z)|≤ ε/24 + ε/4 + ε/24 ≤ ε/3

Hence as claimed: |f(x)− F (x)| ≤ ε/3, x ∈ Kδ.


A.2 Measure Theory.

A.2.1 Rings, algebra’s σ–algebra’s and (outer) measures.Let us first recall some classes of sets, which are closed under certain operations.

Definition A.2.1. Let X be a set and A ⊂ P(X ), i.e. the powerset of X . If A issatisfies the following properties:

i) ∅ ∈ A;

ii) for every A,B ∈ A : A ∪B ∈ A;

iii) for every A,B ∈ A : A\B ∈ A;

thenA is said to be a ring . IfA satisfies (i), (ii), (iii) and if one also has X ∈ A,then A is called an algebra. And if A is an algebra such that it is closed undercountable unions, then A is said to be a σ–algebra .

In some cases, starting from some class of sets, we can explicitly state a for-mula for the algebra generated by that class, i.e. the smallest algebra which con-tains that particular class.

Proposition A.2.1. Let A be a set and A1, · · · , An subsets of X . Let A be thesmallest algebra containing Ai, 1 ≤ i ≤ n. Then

A =⋃

j∈J

F (j) : J ⊂ 0, 1n

:= F ,

where j ∈ 0, 1n, F (j) := ∩ni=1Ajii , with A1

i = Aci and A0i = Ai.

Proof. Note that F ⊂ A, since elements of F are finite unions of finite intersec-tions of Ai or their complement. Next we prove: F ⊃ A, by showing that F is analgebra containing each of the Ai.X is contained in F , because X = ∪j∈0,1nF (j). Indeed, let J1 ⊂ 0, 1n

such that for all j ∈ J1 : jn = 0, let J2 := j : jn = 1 then J1 and J2 are disjointand ⋃

j∈0,1nF (j) =

⋃j∈J1

F (j) ∪⋃j∈J2

F (j)

Let l ∈ J1 take l := (0, · · · , 0, 1) + l in J2. Then

F (l) ∪ F (l) = ∩n−1i=1 A

lii

A.2. MEASURE THEORY. 123

So ∪j∈0,1nF (j) = ∪l∈0,1n−1F (l). If we continue, we end up with the union oftwo sets A1 and Ac1 which is X as claimed.

That F is closed under finite unions is trivial.We have to verify the second axiom: closed under complementation. We claim

F c := (∪j∈JF (j))c = (∪j∈JcF (j)). Since

F c = X\F = (∪j∈JF (j)) ∪ (∪j∈JcF (j))\F = ∅ ∪ (∪j∈JcF (j))\F

F c ⊂ (∪j∈JcF (j)). Now the converse is also true, to see this let x ∈ ∪j∈JcF (j),then x ∈ F (j) for some p ∈ J c, this means that x ∈ ∩ni=1A

pii .

F c = ∩l∈JF (l)c = ∩l∈J ∪ni=1 A1−lii .

Let q ∈ J , then for some k ∈ 1, · · · , n : qk 6= pk, implying 1 − qk = pk. Butsince x ∈ Apkk , then x ∈ A1−qk

k , hence x ∈ F (q)c and this for any q ∈ J . Hencex ∈ F c.

Note that the minimal algebra containing a given finite number of sets, say n,has cardinality at most 22n . And is thus finite.

Elements of a σ–algebra are called measurable sets. Iff : (X1,A1) → (X2,A2) is a function such that f−1(A2) ∈ A1 for all A2 ∈ A2,then f is said to be measurable A1/A2. But sometimes it may happen that theσ–algebra on the codomain is too large to have measurability of the map. One wayout to have measurability is to allow a slightly bigger σ–algebra on the domain.But first we recall the notions of measure, outer measure and completion of ameasure.

Here is a proposition about continuous functions and Borel measurability. Re-call that the Borel σ–algebra is the σ–generated by the open sets (the topology).

Proposition A.2.2. Let (R,Z), (S, T ) be topological spaces and B1 := B(Z),B2 := B(T ) their respective Borel σ–algebra. If f : (R,Z)→ (S, T ) is continu-ous, then it is also B1/B2 measurable.

Proof. It suffices to note that the class

Q := B ∈ B2 : f−1(B) ∈ B1

is a σ–algebra and by continuity of f contains T . Hence

Q ⊂ B2 = σ(T ) ⊂ σ(Q) = Q.

In other words f is Borel measurable.

Definition A.2.2. Let (X ,A) be a measurable space. If µ is a measure on A,i.e. µ is σ–additive (Ai ∈ A, i = 1, 2, · · · are disjoint sets, then µ(∪iAi) =∑

i≥1 µ(Ai) ) and µ(·) ≥ 0, on A, then we define the outer measure of µ as:

µ∗(E) := inf

∞∑n=1

µ(An) : An ∈ A, E ⊂⋃n≥1

An

or +∞ if no such sequence An exists.

Let (X ,A, µ) be a measure space. The measure theoretic completion ofthe measure µ is defined as following, one extends µ to the smallest σ–algebracontaining A and the null sets of µ, i.e. the sets E ⊂ X : µ∗(E) = 0. Byproposition 3.3.2. page 102 in [Dud1] it is the same as adding toA all sets E ⊂ Xwhich differ only up to a set of measure zero of some set inA. More formally oneadds all E ⊂ X such that for some B ∈ A the symmetric difference E∆B :=(E\B) ∪ (B\E) is negligible (i.e. of measure zero).

A measure µ is said to be σ–finite iff there exists Ann≥1 ⊂ A, ∪n≥1An = Xand µ(An) < +∞ for all n ≥ 1.

Definition A.2.3. A function f from a measurable space (X ,B) into another mea-surable space (Z,A) is called universally measurable (u.m.) iff f−1(A) is uni-versally measurable in (X ,B) for every A ∈ A.

The class of all universally measurable sets is a σ–algebra.

Lemma A.2.3. Let (X ,A) be a measurable space. The universally measurablesets form a σ–algebra.

Proof.

i) The first proof.

It is trivial that the class of universally measurable sets contains the wholespace. It contains also complements: let U ⊂ X be a u.m. set, then for anyP probability measure on (X ,A) there are sets A ⊂ U ⊂ B; A,B ∈ Aand P (B\A) = P (B∩AC) = 0. Then one obviously has: AC ⊃ UC ⊃ BC

with AC , BC ∈ A and P (AC\BC) = P (AC ∩ (BC)C) = 0. And if Ui, i =1, · · · are u.m. sets, then for any P there are Ai, Bi ∈ A : Ai ⊂ Ui ⊂ Bi

with P (Bi\Ai) = 0. Then ∪iAi ⊂ ∪iUi ⊂ Bi, and ∪iAi,∪iBi ∈ A.Finally note that;

P (∪iBi\(∪jAj)) ≤∞∑i=1

P (Bi\(∪jAj))

≤∞∑i=1

P (Bi\Ai) = 0.


ii) The second proof.

By definition, universal sets are sets contained in the completion of everymeasure on A. Still by definition ( A.2.2 ), universally measurable sets arethose sets who lie in ⋂

µ

σ(A tNµ)

where µ is any measure on A and Nµ are the null sets of µ. An intersectionof σ–algebra’s, remains a σ–algebra.

LemSelecTheoSt-Beuve =¿ inverse ima um sets is meas.This lemma shows that u.m. sets are preserved through inverse images of

measurable functions.

Lemma A.2.4. Let (D,D) and (G,G) be two measurable spaces and g a measur-able function from D into G. If U is a u.m. set of G, then g−1(U) is a u.m. set ofD.

Proof. Let P be a probability measure on (D,D), then the image measure of Punder g denoted by Pg := P g−1 is also a probability measure.Thus by the definition of universal measurability there are

A,B ∈ G : A ⊂ U ⊂ B and Pg(B\A) = 0.

By measurability of g:

g−1(A), g−1(B) ∈ D and also g−1(A) ⊂ g−1(U) ⊂ g−1(B).

FinallyP (g−1(B)\g−1(A)) = P (g−1(B\A)) = Pg(B\A) = 0.

A σ–algebra is closed under countable intersections, unions, symmetric unionsand so on. So starting with a class A of subsets of a set X , it is not surprisingthat each element of σ(A) is constructed out of countably many operations (withcountably many sets of A). In the following lemma this is made rigourous.

Lemma A.2.5. Let X be a set andA a class of subsets of X . For each B ∈ σ(A)there exists a countable subclass AB of A such that B ∈ σ(AB).


Proof. Define the following class of sets

C := B ∈ σ(A) : there is a countable subclass AB ⊂ A : B ∈ σ(AB)

For B ∈ A clearly: B ∈ σ(B). SoA ⊂ C. To finish the prove it will be enoughthat C is a σ–algebra, because then, by definition C ⊂ σ(A) and by definition ofσ(A) (the smallest σ–algebra that contains A), C ⊃ σ(A).X is always contained in any σ–algebra by definition, thus X ∈ C, because

X ∈ σ(A) for anyA ∈ A. IfB ∈ C, then there is a countableAB : B ∈ σ(AB),but then Bc ∈ σ(AB) and Bc ∈ C. Finally if Bn, n = 1, 2, · · · are elements of Cthen ∪nBn ∈ σ(∪ABn) and so C is preserved under countable unions. Thus C is aσ–algebra.

Lemma A.2.6. Let (R, T ) and (S,U) be two topological spaces. Denote theirrespective Borel σ–algebras by B(R, T ) and B(S,U). Then the Borel σ–B(R ×S, T × U) on the product space (R × S, T × U) contains the product σ–algebraB(R, T )⊗ B(S,U).Moreover if both topological spaces are second–countable, then the two σ–algebrason R× S coincide.

Proof. We refer to [Bill2] theorem M.10 on page 243 or [Dud1] proposition 4.1.7page 119.

Here is a lemma needed in chapter 4 of our exposition.

Lemma A.2.7. Let (X ,B) be a separable measurable space, i.e. x ∈ B for allx ∈ X and B is generated by countably many sets. Let I := [0, 1], then

f : X → 2∞ : x 7→ IAj(x)j≥1,

is a 1–1 function onto its range f(X ) =: Z.

Proof. We present two proofs, one more elementary and the other based on deter-mining classes for measures on σ–algebra’s.

i) LetA := Ajj≥1 be a countable set of generators for B. Consider the map,

f : X → 2∞ : x 7→ IAj(x)j≥1,

Suppose, by contradiction, that f would not be injective. Then for somex 6= y ∈ X we would have x ∈ Ai ⇐⇒ y ∈ Ai for all i ∈ N.

We define the class

A ∈ B : x, y ⊂ A or x, y ∩ A = ∅,


i.e. the sets of B such that the set contains both x and y, or such that bothx and y are in the complement. It contains, by assumption, the generatingsets. It then suffices to prove that the above class is a σ–algebra, since thenit would equal B, which is a contradiction.

It contains certainly the whole space X . It is also closed under comple-ments. Because either x, y ⊂ A, then x, y∩Ac = ∅ or x, y∩A = ∅,but then x, y ⊂ Ac. Finally consider sets Bn in the above class. Two sit-uations are possible, either x, y ⊂ Bk for some k, then x, y ⊂ ∪nBn.Or for all n, x, y ∩Bn = ∅. But then x, y ∩ (∪nBn) = ∅.

ii) Denote by Fn the algebra generated by A1, · · · , An. Proposition A.2.1 tellsus that Fn is finite. So without loss of generality we may and do assumethat B is generated by ∪nFn which consists of at most countably many sets.Since Fn ⊂ Fn+m for m ≥ 1, and are all algebra’s. It follows that ∪nFn isalso an algebra, which will be denoted by A0.

Since we assumed that IAn(x) = IAn(y) for all n ≤ 1. As seen in theproof of A.2.1, the F (j) form a partition of X . Hence IC(x) = IC(y) for allC ∈ A0.

Now we define two measures µ1 and µ2 onA, where µ1 := δx and µ2 := δy,two Dirac delta measures. The previous discussion implies that µ1 and µ2

are equal on the generating algebraA0 of B. By the uniqueness of extensionof measures, theorem A.2.15, µ1 and µ2 agree on B. Hence we can notdifferentiate x from y by sets of B, a contradiction with the separability ofB.

The empirical measures are (normalized) sums of i.i.d. random variables ofa special kind, namely projections. Because of that their σ–algebra enjoys a niceproperty, the sets is contains are invariant under permutation of the first n coordi-nates for Pn. Here is a more general lemma about the class of sets invariant underthe permutations of the first n coordinates.

Lemma A.2.8. Let (X∞,A∞, P∞) denote the standard model and Pn the n–thempirical measure. Then the class of all sets invariant under permutations, say C,of the first n coordinates is a σ–algebra, and thus contains σ(Pn(f)), the smallestσ–algebra which makes Pn(f) measurable, where f is a real–valued measurablefunction.

Proof. Permutations are in particular bijections so clearly X∞ ∈ C. If A ∈ C andAc is not invariant under a permutation of the first n coordinates, then for some


x ∈ Ac and Π ∈ Sym(n), where Sym(n) stands for the symmetric group of ordern, (xΠ(1), · · · , xΠ(n), xn+1, · · · ) ∈ A, butA is invariant, so x ∈ A; a contradiction.

If Am ∈ C,m ≥ 1, we obviously have ∪Am ∈ C.

The theorem below is about a property of the empirical measure, in some senseempirical measures contain and the information one needs in order to do statisticson them.

Theorem A.2.9. For any measurable space (X ,A) and for each n = 1, 2, · · · ,the empirical measure Pn is sufficient for Pn, where P is the class of all laws on(X ,A). In other words the set H of functions x 7→ Pn(B)(x) for all B ∈ A, issufficient.

In fact the σ–algebra SH, i.e. the smallest σ–algebra making all function ofHmeasurable, is exactly Sn ( the σ–algebra of all subsets ofAn which are invariantunder permutation of the n coordinates).

Proof. We refer to [Dud2] theorem 5.1.9 on page 177.

Next we recall the Monotone Class Theorem. It provides a way to provethat certain classes are σ–algebras, without having to verify all the axioms of aσ–algebra.

Definition A.2.4. Let X be a set, andM a family of subsets of X . M is said tobe a monotone class iff for all Ann≥1 ⊂ M: if An ↑ A, i.e. A1 ⊂ A2 ⊂ · · ·and A = ∪nAn or An ↓ A, i.e. A1 ⊃ A2 ⊃ · · · and A = ∩nAn, then A ∈M too.

Theorem A.2.10 (Monotone Class Theorem). Let (X ,A) be a measurable space.Let C be an algebra that generates A. If B is a monotone class, and if C iscontained in B, then B also contains A.

Proof. Note that it is enough to prove that the smallest monotone class, sayM,that contains C is a σ–algebra, namely σ(C) = A. This will be a consequence ofthe fact a σ–algebra being a monotone class. Also then A =M⊂ N , for N anymonotone class containing C.

Let D := E ∈ M : X\E ∈ M. Then, because C is an algebra, one hasC ⊂ D. AndD is a monotone class. Indeed letAn ∈ D, n = 1, 2, · · · andAn ↑ A,then ACn ∈ M by definition of D and moreover AC = (∪An)C = ∩ACn . Nowbecause ACn ↓ AC one has AC ∈ M or then by definition AC ∈ D. The samereasoning applies to a sequence An ↓ A. This provesD ⊂M is a monotone classthat contains C. By minimality ofM one hasM ⊂ D so both are equal andMis closed under taking complements.

Now for each set Y ⊂ X , let

MY := E ∈M : E ∩ Y ∈M.


Then for each C ∈ C, because C is an algebra, and in particular stable underintersections, C ⊂ MC . AlsoMC ⊂ M is a monotone class, so by minimalityofM we have thatM ⊂ MC , thus they are the same. ThatMC is a monotoneclass follows from the following considerations. Let An ∈MC , n = 1, 2, · · · andAn ↑ A (the case for An ↓ A is quite analogue). But then An ∩ C ↑ A ∩ C, sinceAn ∩ C ∈ M we have by the monotone class property ofM that A ∩ C ∈ M.So by definition ofMC one has A ∈MC .

For every C ∈ C we have the important equality ofMC =M.Next take anyB ∈M (not necessarily contained in C), and note thatMB containsC. Indeed by our previous considerations we have thatMC =M for everyC ∈ C,this means that C ∩ B ∈ M for every C ∈ C. Also, as in the previous step onecan show thatMB is a monotone class. SoM =MB for any B ∈ M, meaningthatM is closed under (finite) intersections.

ThusM is not only a monotone class, but also an algebra (contains the wholespace: X ∈ C ⊂ M, closed under complements, closed under (finite) intersec-tions). We claim now thatM is a σ–algebra. It remains only to prove thatM isclosed under (arbitrary) countable intersections. So let An ∈ M (where An notnecessarily ↓ A for some A) then:

⋂n≥1

An =⋂n≥1

(n⋂i=1

Ai

)=⋂n≥1

Bn

with Bn = ∩ni=1Ai ∈ M, in particular M is an algebra, and note that Bn ↓∩n≥1An; the last set belongs toM by the monotone class property.

As a consequence of the Monotone Class Theorem one shows:

Theorem A.2.11. Let (X ,A), (Y,B) be two measurable spaces. Let A be gener-ated by the countable algebra Ai : i ∈ N and B by Bi : i ∈ N. Then theproduct σ–algebra A⊗ B is again countably generated by Ai ×Bj : i, j ∈ N.

Proof. In order to see this, define, for fixed Ai and Bj , the following classes ofsets:

DAi := B ∈ B : Ai ×B ∈ σ(Ak ×Bl : k, l ≥ 1) DBj := A ∈ A : A×Bj ∈ σ(Ak ×Bl : k, l ≥ 1)

Then DAi (respectively DBj ) contains the algebra Bi : i ∈ N (respectivelyAi : i ∈ N) and is a monotone class, so it equals B (respectively A). Beforemoving on let us prove our last assertion. So let Cn ∈ DAi(⊂ B), n = 1, 2, · · · forsome i ∈ N and Cn ↑ C (the case Cn ↓ C is treated in an analogue way). Then

Ai × Cn Next define

DA := B ∈ B : A×B ∈ σ(Ak ×Bl : k, l ≥ 1) DB := A ∈ A : A×B ∈ σ(Ak ×Bl : k, l ≥ 1)

By the previous step for each Ai; i = 1, 2, · · · : DAi = B, and if we fix anyB ∈ B, then we know that:

Ai ×B ∈ σ(Ak ×Bl : k, l ≥ 1).

Hence DB contains again the A–generating algebra Ai : i ∈ N, and it is alsoa monotone class (this is done as in the previous step), so equals A. (Repeatingthe argument one also obtains that for any A ∈ A : DA = B.) Thus A × B iscontained in σ(Ak ×Bl : k, l ≥ 1) and thus the product σ–algebra

σ(A× B) = A⊗ B

is countably generated.

Proving theorems about measurable functions is often easier when it is provenfor simple functions and by then passing to the limit, since measurable functionscan be written as the everywhere pointwise limit of simple functions.

Theorem A.2.12. Let (X ,B) be a measurable space and f a real–valued functionon X , measurable B/B(R). There exists a sequence fnn≥1 of simple functions,each measurable B/B(R), such that

0 ≤ fn(x) ↑ f(x) if f(x) ≥ 0,

and0 ≥ fn(x) ↓ f(x) if f(x) ≤ 0.

Proof. We refer to [Bill1] chapter 2 theorem 13.5 on page 195 or [Dud1] theorem4.1.5 on page 116 for a proof.

This theorem is due to Lebesgue. It relates each Borel class to a measurablefunctions on the product I × X . Recall from chapter 5, section 1 that F0 is theclass of all finite sums

∑nj=1 qjIAj with rational qj and n = 1, 2, · · · and Aj ∈ A

the algebra generating B. W is still an uncountable well–ordered set where eachsegment Wa for a ∈ W is at most countable. For w ∈ W : Fw is the class ofall functions f who are the everywhere limit of functions from Fv, v < w. AndU := ∪v∈WFv. We will need following definition.

Definition A.2.5. Let (L,<) be a linearly ordered set. We define the order topol-ogy on (L,<) as the topology with as base sets of the form Ax := a : a < x,or Bx := a : a > x or Cx,y := a : x < a < y, for all y, x ∈ L.

Theorem A.2.13 (Lebesgue). Let (X ,B) be a separable measurable space. Forany w ∈ (W,≤) there exists a universal class w (or α) function G : I × X → R,i.e. for all f ∈ Fw, there is a t ∈ I such that G(t, ·) = f(·).

Proof. The proof will be based on transfinite induction.

Let w = 0, where 0 denotes the smallest element of (W,≤). We start notingthat F0 as defined above is countable. Let f ∈ F0, thus f :=

∑nj=1 qjIAj . The

total for fixed n we have only the choice out of Qn ×An, since both Q and A areat most countable Qn × An is at most countable too. ∪nQn × An is still at mostcountable. Now we enumerate the functions of F0 by positive integers. Let

G(t, x) :=

fk(x), if t = 1/k for some k ≥ 1;0, otherwise.

Then G(t, x) =∑

k≥1 I1/k(t)fk(x)Next, let (0 <)w ∈ (W,≤) and suppose the result holds for all v < w. Since

there are at most countably many v’s smaller thanw, we can number them with thepositive integers, from the smallest to the greatest. We consider two possibilities,either the segment is finite or countable.

When the segment is finite define

G : I∞ ×X → R : (tnn≥1, x) 7→

lim supn→∞H(tn, x) if the limit exists;0 otherwise,

where H := Gmaxv:v<w.

• As such G is (jointly) measurable, since it is the limit of the functions

supk≥n

(H (πk, idX )

)who are measurable, hence is measurable. The function 0 is measurabletoo. And as seen in the proof of theorem 4.2.5 on page 127 in [Dud1], theset of points where a sequence of (real–valued ) measurable functions failsto converge is measurable. So G is indeed measurable.

• Let f ∈ Fw, then f is the everywhere limit of a sequence fn, and withoutloss of generality all fn can be taken inFmaxv:v<w. For each fn there existsa tn such that H(tn, x) = fn(x) for all x ∈ X , then if t := tn we havegot that:

f(x) = limn→∞

fn(x) = lim supn→∞

H(tn, x) := G(t, x)

because the lim sup equals the limit when the limit exists and this holds forall x ∈ X .

By the Borel isomorphism theorem C.1.1, and since I and I∞ are Polishspaces, proposition A.1.14, there exists a Borel measurable map φ : I → I∞

with Borel measurable inverse. The composition of G and (φ, idX ) remains mea-surable. So let G := G (φ, idX ).

Consider now the w which have countable segment. As in the previous case;let

G : I∞ ×X → R : (tnn≥1, x) 7→ lim supn→∞

Gvn(tn, x)

whenever the lim sup exists, and zero otherwise and where vn is a sequence con-verging to w for the order topology, definition A.2.5. Then as such Gw is jointlymeasurable, and the sections are the functions from Fw.

• We start with the measurability of Gw. I∞ is metrizable, proposition A.1.8,and separable, proposition A.1.9. Moreover its Borel σ–algebra equals theproduct σ–algebra, lemma A.2.14. In particular the projections,

(πk, idX ) : I∞ ×X → I ×X

are measurable functions. Gvk : I × X → R are also (jointly) measurable,so that the composition Gvk (πk, idX ) is again (jointly) measurable. Since,by definition A.1.7,

lim supn→∞

Gvn(tn, x) = limn→∞

(supk≥n

((Gvk (πk, idX ))(tjj≥1, x)

))

it follows that the LHS is measurable, because it is the limit, when it exists,of the measurable variables supk≥nGvk (πk, idX ).

• For the second point, it is clear that the section of G are by definition ele-ments of Fw.

Conversely any element of Fw can be written as a section of G for somet ∈ I∞. Let f ∈ Fw, by definition, there exists a sequence of measurablefunctions fnn≥1, such that f is the everywhere limit of fn and such thatfn ∈ Fvn for some vfn := vn < w. If all vn are strictly smaller thansome v < w, then the assertion is trivial. Indeed, the segment of w is atmost countable, so all its elements can be numbered by the positive integers.Because f ∈ Fz for all v ≤ z < w, by our induction hypothesis, there existsfor each such z a tz ∈ I: G(tz, ·) = f . So

f = lim supz→w

Gz(tz, ·) =: G(tzz<w, ·)

where lim supz→w means lim supn→∞, the segment of w is at most count-able.

Now the second case where fn ∈ Fun , and un is strictly increasing to w. Letu be the smallest among the un (u exists, since (W,≤) is well ordered) andu equals vk for some k. For each fixed m, the maximum maxmi=1 um existsand is not w. Indeed, w has countable segment, and ui, 1 ≤ i ≤ m is finite,so Ww\ui : 1 ≤ i ≤ m, is non empty. For u1, take the smallest vn greaterthan or equal to u1, and denote it by vn(1), this will be our starting point,and it doesn’t affect the lim sup. So f1 ∈ Fu1 ⊂ Fvn(1) , hence by inductionthere exists a t1 ∈ I: for which f1 := Gvn(1)(t1, · · · ). For u2, again thereis a smallest vk greater than or equal to u2, so f2 ∈ Fu2 ⊂ Fvn(2) , andf2 = Gvn(2)(tn(2), ·). And for all l in between 1 and n(2), we now thatu1 < vl, for those, hence for each such l, let tl be such that f1 = Gvl(tl, ·).If we continue by recursion, we obtain a sequence where some terms arepossibly finitely many times repeated, this doesn’t changes the convergence,the limit will still be f . And moreover, for t = ti, where ti is defined asabove, we have got

f(x) = limn→∞

fn(x) = lim supn→∞

Gvn(tn, x) =: G(tn, x)

Repeat the operation from the previous case to find a jointly measurable func-tion with domain I ×X and the same properties as G.

Lemma A.2.14. Let I = [0, 1], and I∞ =∏

n≥1 In. Then the Borel σ–algebra onI∞ equals the product σ–algebra.

Proof. Recall that the product σ–algebra is the σ–algebra generated by setsA ⊂ I∞ of the form

∏n≥1Ai, where Ai * I for at most finitely many i and if

Ai 6= I , then Ai ∈ B(I).We first prove that σ(

∏n Tn) includes the product σ–algebra, since this will

hold for any index set and not just countable ones.

Recall that the product topology is the coarsest (or smallest) topology makingall the projections

πj :∏k≥1

(Ik, TEucl)→ (Ij, TEucl)

continuous. In particular, the projections are all B(I∞)/B(I) measurable (propo-sition A.2.2). Also the product σ–algebra is the smallest σ–algebra making all the


projections πj Borel measurable. Indeed, it’s easily seen that the projections aremeasurable for the product σ–algebra, since for E ∈ B(I):

πj(π−1k (E)) =

E if k = j,[0, 1] if k 6= j.

If C is any σ–algebra on I∞ making the projections Borel measurable, then it hasto contain the sets A of the form described above, and hence it has to contain theproduct σ–algebra.We conclude, by minimality of the product σ–algebra:⊗

k≥1

B(Ik, TEucl) ⊂ B(I∞, T ∞).

It is for the converse that the notion of separability will be crucial.Conversely, since I∞ is separable (proposition A.1.9) each open set is an at mostcountable union of sets of the basis, which are of the form described in definitionA.1.6. It is then trivial to see that any open set G ∈

∏n≥1 Tn, is contained in

∪k≥1

∏kl=1 B(I)l ×

∏l≥k+1[0, 1]l. So

σ

(⋃n≥1

(n∏l=1

B(I)l ×∏l≥n+1

[0, 1]l

))⊃ σ

(∏n

Tn).

The following two theorems are about the extension and the uniqueness of thatextension of a (probability) measure defined on some class of sets, which satisfiessome properties, to the σ–algebra generated by that class.

We first mention the extension theorem.

Theorem A.2.15. Let µ be a set function on a semiring C and that µ is non neg-ative, µ(∅) = 0 and that µ is finitely additive and countably subadditive. Then µextends uniquely to a measure on σ(C).


Next comes the uniqueness theorem for probability measures.

Theorem A.2.16. Let µ1, µ2 be two probability measures on σ(P), where P is aπ–system, i.e. for all A,B ∈ P: A ∩ B ∈ P . If µ1 and µ2 agree on P , then theyagree on σ(P).


The theorem of Tonelli–Fubini tells us under what conditions an integral withrespect to a product measure can be seen as an iterated integral w.r.t. the measureson the components of the productspace.

Theorem A.2.17 (Tonelli–Fubini). Let (Xi,Ai, µi), i = 1, 2 be two measure spaces,where µi, i = 1, 2 are σ–finite measures. Let f : X1 × X2 → [0,+∞] be anA1⊗A2,B([0,+∞]) measurable function or f ∈ L1(X1×X2,A1⊗A2, µ1×µ2).Then∫

f d(µ1 × µ2) =

∫ ∫f(x, y) dµ1(x) dµ2(y) =

∫ ∫f(x, y) dµ2(y) dµ1(x)

whenever one of the integrals exists. In particular∫f(x, ·) dµ1(x) is A2 and∫

f(·, y) dµ2(y) is A1 measurable.

Proof. We refer to [Dud1] theorem 4.4.5. page 137 for a proof.

Limit theorems.

Theorem A.2.18 (Monotone Convergence). Let (X ,A, µ) be a measure spaceand fn : (X ,A) → [−∞,+∞], n = 1, · · · measurable functions. If fn ↑ f and∫f1 dµ > −∞, then

∫fn dµ ↑

∫f dµ.


Theorem A.2.19 (Dominated convergence). Let (X ,A, µ) be a measure spaceand g, fn ∈ L1(X ,A, µ), n = 1, · · · . If |fn(x)| ≤ g(x) and if for all x ∈ X :fn(x)→ f(x) as n→∞, then f ∈ L1(X ,A, µ) and

∫fn dµ→

∫f dµ.


Theorem A.2.20 (Strong Law of Large Numbers). Let (X ,A, P ) be a probabil-ity measure and X1, X2, · · · independent and identically distributed, abbreviatedi.i.d., random variables. Let Sn :=

∑ni=1 Xi. Then Sn/n → E[X1] P–a.s. iff

E[|X1|] < +∞.


Theorem A.2.21 (The Central Limit Theorem). Let X1, X2, · · · be i.i.d. randomvariables with values in Rk such thatE[X1] = 0 andE[‖X1‖dE ] ∈]0,+∞[. Then,as n→∞

Sn/√n→ Z

where Z is a random variable on Rk with characteristic function

fZ(t) = exp(− 1

2

n∑r,s=1

Crstrts

)and Crs := E[X1X

t1](r, s).

Continuity of (outer) measures.

We recall continuity from below for measures. And in the second lemma continu-ity from below for outer measures is proved.

Lemma A.2.22. Let (X ,A, µ) be a measure space. Then µ is continuous frombelow, i.e. for any Ann≥1 ⊂ A and A1 ⊂ A2 ⊂ · · · (shortly An ↑ ∪nAn =: A), then µ(An) ↑ µ(A).

Proof. Let An ∈ A : An ↑ ∪nAn =: A. Define

Bn := An\( ⋃

1≤k≤n−1

Ak

),

for n ≥ 1. The Bn as constructed above have two important properties, namely:⋃1≤k≤n

Bk = An and Bi ∩Bj = ∅ for i 6= j.

The second is easy, because for i 6= j one may suppose that i < j and note thatBi ⊂ Ai and Bj = Aj ∩ (∩1≤k≤jA

ck) ⊂ Aci . The former follows by induction.

Thus by σ–additivity:

µ(A) = µ( ⋃n≥1

An

)= µ

( ⋃n≥1

Bn

)=

∑n≥1

µ(Bn)

= limN→∞

N∑n=1

µ(Bn) = limN→∞

µ(AN).

Lemma A.2.23. Let µ be a σ–additive, non negative measure on an algebraA ofsubsets of a set X . Denote its outer measure by µ∗. Let An, n = 1, 2, · · · and Abe arbitrary subsets of X such that An ↑ A, then µ∗(An) ↑ µ∗(A).

Proof. Clearly µ∗(A) ≥ µ∗(An) by monotonicity of µ∗. The sequence µ∗(An)is bounded above and monotone, so converges to some limit c ≤ µ∗(A). So it

remains to prove the converse inequality. First assume µ∗(An)→ +∞, as n→∞then obviously (+∞ =)c ≥ µ∗(A).So assume c < +∞. Let ε > 0, then for every An, by definition of µ∗, there is asequence

Anjj≥1 ⊂ A with An ⊂⋃j≥1

Anj : µ∗(An) + ε ≥∑j≥1

µ(Anj).

Let Bnk = ∪1≤j<kAnj and let µ be the (unique) extension of µ to σ(A). Thiscan be done by using the Extension Theorem A.2.15. Furthermore let Bn :=∪j≥1Anj . Then clearly one has: Bnk, Bn ∈ σ(A) for any k, n ∈ N. Now notethat for k ≥ 2 : Bnk = ∪16j<k(Anj\Bnj) which follows by induction on k. SoBn = ∪j≥1Anj = An1∪ (∪j≥2(Anj\Bnj)). Thus we wrote Bn as a disjoint union.

If r < s, then Anr ∩Bcnr ∩ Ans ∩Bc

ns = ∅, because Anr ⊂ Bns.

µ(Bn) = µ((An1 ∪

(∪j>2 Anj\Bnj

))= µ(An1) +

∑j>2

µ(Anj\Bnj)

= µ(An1) +∑j>2

µ(Anj\Bnj)

6∑j>1

µ(Anj)

≤ µ∗(An) + ε

Let Cn := ∩r>nBr, then C1 ⊂ C2 ⊂ · · · so Cn ↑ ∪n≥1Cn which is contained inσ(A). Also because An ↑ A and by definition of Bn one has

An ⊂ An+l and Bk =⋃j≥1

Akj ⊃ Ak

for all l ∈ N and k ≥ n. Thus

An ⊂ Cn

(=⋂j≥n

Bj

)and Cn ⊂ Bn.

So from the above calculations one obtains:

µ∗(An) ≤ µ(Cn) ≤ µ(Bn) ≤ µ∗(An) + ε ≤ c+ ε.

But we also have that µ is a measure, and measures are continuous ( lemma A.2.22) from below, so µ(Cn) ↑ µ(C), which implies µ(C) ≤ c+ ε. Finally,

A =⋃n≥1

An ⊂⋃n≥1

Cn = C,

thus µ∗(A) ≤ µ(C) ≤ c+ ε, because ε > 0 is arbitrary one has µ∗(A) ≤ c.

A.2.2 (Sub)Martingales and reversed (sub)martingales.Definitions.

Definition A.2.6. Let (X ,A, P ) be a probability space, T a set and (S, d) a metricspace. A stochastic process is a map

X : T ×X → S : (t, ω) 7→ X(t, ω) = Xt(ω)

such that for each t ∈ T fixed, Xt : (X ,A)→ (S,B(Td)) is measurable. Td is thetopology generated by the metric d and B(Td) the Borel σ–algebra on S. UsuallyT = R or a subset of it (like N).

If one has also that (T,≤) is a linearly ordered set and Att∈T a family ofσ–algebras on X and At ⊂ Au ⊂ A; t ≤ u; t, u ∈ T . Such a family is oftencalled a filtration on (X ,A, P ). Then Xt,Att∈T is said to be a martingale if:

i) Xt is At measurable for each t ∈ T (or more generally measurable for theP–completion of At);

ii) E[|Xt|] <∞ for each t ∈ T ;

iii) E[Xu||At] = Xt for all t, u ∈ T : u ≥ t.

As usual equality means equality almost surely. If condition (iii) is replaced by:

Xt ≤ E[Xu||At] for all s, t ∈ T : t ≤ u,

then (Xt,At)t∈T is said to be a submartingale .Finally we define the concept of a reversed (sub)martingale. Let Mn,Bnn≥1

be a sequence of random variablesMn measurable for a Borel σ–algebra Bn. Thenchange the index to negative integers: Mn,Bnn≤−1. The sequence is said to be areversed martingale , if the reversed sequence is a martingale. I.e. · · · ⊂ B−2 ⊂B−1 and

i) Mn is Bn measurable for each n ≤ −1 (or more generally measurable forthe P–completion of Bn);

ii) E[|Xn|] <∞ for each n ≤ −1;

iii) E[Xk||Bn] = Xn for all n, k ≤ −1 : k ≥ n.

In particular E[X−1||Bn] = Xn, n = −1,−2, · · · . If one has

E[Xk||Bn] ≥ Xn for all n, k ≤ −1 : k ≥ n

A reversed (sub)martingale can be thought as a backward (sub)martingale, whenwe start in the future to end in the present.


A convergence theorem for reversed (sub)martingales.

Here we state a convergence theorem without proof.

Theorem A.2.24. [Doob’s Decomposition and Convergence for Reversed Sub-martingales]Let Mn,Bnn≤−1 be a reversed submartingale . Suppose that

K := infn≤−1

E[Mn] > −∞.

Then there is a decomposition Mn ≡ Nn + Zn where Zn is measurable for Bn−1

and Nn,Bnn≤−1 is a reversed martingale and

Z−1 ≥ Z−2 ≥ · · · almost surely, with Zn ↓ 0 a.s. as n→ −∞.

Thus Mn converges almost surely and in L1 as n→ −∞.

Proof. See [Dud1] theorem 10.6.4 page 373.

Appendix B

Gaussian Processes.

Definition B.0.7. Let Xtt∈T , be a stochastic process, i.e. a collection of randomvariables, with T a set. Then Xtt∈T is called Gaussian, if for all F ⊂ T finite:(Xt)t∈F has a multivariate normal distribution.

Recall that the (multivariate) normal distribution is entirely determined by itsmean and its covariance matrix. The following theorem states (necessary and)sufficient conditions on functions m,C such that there exists a Gaussian processwith mean and covariance matrix given by the function m, respectively functionC.This will be done using a theorem of Kolmogorov, which roughly states that ifa given collection of finite dimensional distributions satisfy some ”consistency”assumptions, then there exist a probability space and stochastic process on thatp-space with the given distributions as its finite dimensional distributions.

Definition B.0.8. Let T be a set and

PF : F ⊂ T finite, PF probability distribution on RF

a collection of probability distributions. The collection is said to be consistent, iffor all finite subsets F,G of T , where F ⊂ G and πGF the natural projection fromRG onto RF , we have that PG π−1

GF = PF .

The next theorem is due to Kolomogorov and gives conditions on a system ofdistribution to come from a stochastic process. It can be stated for more generalrange spaces, but for our purposes here R as range space will be enough.

Theorem B.0.25 (Kolmogorov). Let T be a set and µt1···tk a system of distributionsatisfying

141

142 APPENDIX B. GAUSSIAN PROCESSES.

i) µt1···tk(H1×· · ·×Hk) = µtπ1···tπk(Hπ1×· · ·×Hπk) for π any permutation of(1, · · · , k) and any k ∈ N and any t1, · · · , tk ∈ T and Hi ∈ R, 1 ≤ i ≤ k;

ii) µt1···tk−1(H1× · · · ×Hk−1) = µt1···tk(H1× · · · ×Hk−1×R) for any k ∈ N,

any choice t1, · · · , tk ∈ T and any Hi ∈ R, 1 ≤ i ≤ k − 1.

Then there exists a probability measure P on B(RT ) ( the product σ–algebra onthe product space RT ) such that the coordinate variable process [Zt : t ∈ T ] on(RT ,B(RT ), P ) has the µt1···tk as its finite dimensional distributions.

If x denotes an element of RT , and πt the projection from RT into R, thenZt(x) := xt := x(t) := πt(x).

Proof. We refer to [Bill1], Chapter 7, theorem 36.1 on page 517 for a proof.

Theorem B.0.26. Let T be a set, m : T → R, C : T × T → R functions.C(s, t) = C(t, s) for every s, t ∈ T and C(s, t)t,s∈F is nonnegative definitematrix, for every F ⊂ T finite iff there exists a Gaussian process Xtt∈T withmean function m and covariance C.

Proof. We refer to [Dud1] theorem 12.1.3 on page 443 for a proof.

So let GP be a Gaussian process, indexed by f ∈ L2(P ), and such thatGP (f) has mean zero, covariance the covariance of f, g, i.e. Cov(f, g) :=

∫(f −∫

f dP )(g−∫g dP ) dP , for all f, g ∈ (L)(P ). Then using theorem B.0.26 about

existence of Gaussian processes, one sees that such a process exists, because theusual covariance of square–integrable functions is nonnegative definite.

Proof. Let fi; i = 1, · · · , n be square–integrable functions and ai; i = 1, · · · , nreal numbers. Remark that:

atFa = Var( n∑i=1

aifi

)where F := covariance matrix of (fi)

ni=1, and that the latter term is non negative.

Example B.0.1 (Brownian Bridge). A Brownian Bridge Ytt∈[0,1] is a Gaussianprocess, with zero mean and Cov(Yt, Ys) = s(1− t), if t > s. Let P be Lebesguemeasure on the unit interval, then GP (I[0,t]) has zero mean, and covariance∫

(I[0,t] − t)(I[0,s] − s) dP =

∫I[0,t]I[0,s] dP − st = s ∧ t− st.

143

There is one major difference, there exists a version of the Brownian Bridge,which has continuous sample paths, [Dud1] theorem 12.1.5 on page 446. But ageneral Gaussian process on L2(P ) can not have all its sample paths continuous,[Dud2] chapter 2 secton 6.

Definition B.0.9. A version of a random variable ( stochastic process ) is anyother random variable (respectively stochastic process) on some (not necessarilythe same) probability space such that their laws coincide.

Definition B.0.10. Let H be a separable Hilbert space.

i) The isonormal process L on H is the Gaussian process such that for eachx ∈ H,L(x) is distributed according to the normal law with mean zeroand variance ‖x‖2. The finite dimensional distributions L(xi)

ni=1 are mul-

tivariate normal with mean zero and covariance < xk,xl >. Here < ·, · >denotes the innerproduct on H .For any A ⊂ H define L(A)∗ as the ess.supx∈AL(x), the smallest ran-dom variable Y : Y > L(x) a.s. for all x ∈ A. Similarly |L(A)|∗ :=ess.supx∈A|L(x)|.

Note that GP (c) = 0 a.s. for c some constant function. Also GP is linear, thatis GP (αf + g) = αGP (f) +GP (g) a.s.

Proof. Both have mean zero and the difference GP (αf + g)−αGP (f)−GP (g),which we denote by G, has variance equal to zero.

Var(G) = Var(GP (αf + g)) + Var(αGP (f)) + Var(GP (g))

−2[Cov(GP (αf + g), αGP (f)) + Cov(GP (αf + g), GP (g))

−Cov(αGP (f), GP (g))]

= Var(αf + g) + Var(αf) + Var(g)

−2[Cov(αf + g), αf) + Cov(αf + g, g)

−Cov(αf, g)]

= Var(αf + g − αf − g)

= 0

Combining those two facts, we have

0 = Var(GP (αf+g)−αGP (f)−GP (g)) = E(GP (αf+g)−αGP (f)−GP (g))2,

implying that (GP (αf + g)− αGP (f)−GP (g)) = 0 a.s.

144 APPENDIX B. GAUSSIAN PROCESSES.

The covariance w.r.t. P induces also a pseudometric on L2(P ) as follows:ρP (f, g) := E[(GP (f)−GP (g))2]1/2. On (L)2

0 := f : f ∈ (L)2,∫f dP = 0

this pseudometric coincides with the usual one for

ρP (f, g) = E[(GP (f − g))2]1/2 = Var(f − g)1/2 = E[(f − g))2]1/2.

It plays an important role when proving tightness of the empirical process.

Appendix C

More about Suslin / Analytic Sets.

C.1 The Borel Isomorphism Theorem.The main theorem from this part is not only on interested for its use throughoutthe text but is also of interested on its own. It tells us that for some spaces beingBorel isomorphic is nothing more than having the same cardinality, and that thatcardinality can only be finite, countable, or c (cardinality of the continuum). Thisis remarkable: for the spaces about to be specified the continuum hypothesis isconsequence of the other axioms of sets theory where as its more general for isknown to be independent of the usual ZFC axioms of set theory.

Theorem C.1.1. Let (R, d) and (S, e) be two separable metric spaces which areBorel subsets of their (metric) completions. Then R ∼ Y ; i.e. R and S areBorel isomorphic, there exists a Borel measurable bijection f from (R, d) onto(S, e) with measurable inverse; iff R and S have the same cardinality, which ismoreover finite, countable or c (the cardinality of the continuum: [0, 1] ).

Proof. We refer to [Dud1] theorem 13.1.1 on page 487–492 for a (quite lengthy)proof.

We give a short comment on the implications of the theorem. First of allnote that the theorem tells us that for separable space which are Borel sets oftheir metric completion that for such spaces being Borel isomorphic is nothingelse than having the same size as sets. This is remarkable, in the fact that ingeneral the bijection between two sets which are topological spaces, need not bethe right Borel isomorphism and that it isn’t clear how one could make that setisomorphism and its inverse measurable functions for the Borel σ–algebra’s. Butthis theorem says there is such a function. Secondly, like mentioned before thetheorem, we get the continuum hypothesis (CH) for free for such spaces, whereasin the general case CH is not decidable in ZFC theory. Recall that the CH states

145

146 APPENDIX C. MORE ABOUT SUSLIN / ANALYTIC SETS.

that c, the cardinality of the continuum, is the smallest cardinal strictly greaterthan the cardinality of the positive integers.

C.2 Definitions and properties of Analytical Sets.We present some (well known) facts about Suslin sets and spaces that were used inthe fifth chapter. We start with a characterization of Suslin, also called Analytic,sets.

Let us first recall the definition, given also in the second section of the fifthchapter, of a Suslin space.

Definition C.2.1. A separable measurable space (Y,S) will be called a Suslinspace iff there exists a Polish space R and a Borel measurable map from R ontoY . If (Y,S) is a measurable space, a subset Z of Y will be called Suslin set iff(Z,Z u S) is a Suslin space.

Theorem C.2.1. Let (S, d) be a Polish space and A a non–empty subset of S, thefollowing are equivalent:

a) A = f [N∞], for f some continuous function;

a’) A = f [N∞], for f some Borel measurable function;

b) A = f [R], for f some continuous function and (R, e) Polish;

b’) A = f [R], for f some (Borel) measurable function and (R, e) Polish;

c) A = f [B], where f : R→ S is continuous, B Borel and (R, e) Polish;

c’) A = f [B], where f : R→ S Borel measurable, B Borel and (R, e) Polish.

Proof. The implications a → a′, b → b′ and c → c′, follow immediately. More-over, N∞ is, as a countable product of Polish spaces, itself Polish, so a → b → cand a′ → b′ → c′ are straight forward too.

To finish the proof it remains to show c′ → a. We proceed in two steps, firstlywe will proof c′ → b′. So we’re given a Borel measurable map, and we wouldlike to extended the domain of the function. This is easily done as follows, leta ∈ A and f(x) := a for all x ∈ R\B, possible because B is Borel, so that therelative Borel σ–algebra on B,B u σ(Te) ⊂ σ(Te). Secondly b′ → a. If we wereable to prove that the graph of f was a Borel set in R × Y we would be done,because a Borel set in a Polish space can be written as the image of N∞ througha continuous function g. This last statement will be shown in the next lemma (C.2.2 ).

C.2. DEFINITIONS AND PROPERTIES OF ANALYTICAL SETS. 147

Lemma C.2.2. Let R, S be Polish spaces, f from R into S a Borel measurablefunction. Then the graph of f , i.e. (r, f(r)) : r ∈ R, is a Borel set in R× S.

Proof. Making use of the Borel isomorphism theorem C.1.1 , i.e. two separablespace which are Borel sets of their completion are Borel isomorphic iff they havethe same cardinality, and moreover this cardinality is either finite, countable orc the cardinality of the continuum. So w.l.o.g. we can and do assume that Sis a subset of 2∞(2∞ has the cardinality of the continuum and is complete andseparable). Actually if S is uncountable, then S ∼ 2∞, if S is countable then e.g.S ∼ N and N can be identified with a Borel set of 2∞, e.g. n 7→ ∩i 6=nπ−1

i (0) ∩π−1n (1). So that we can identify S with a Borel set of 2∞ and a Borel set in S is

also Borel in 2∞. From now on we continue with 2∞ rather than S.Consider the sets:

T (n) := s = sj1≤j≤n : sj = 0 or 1, for each j= 0, 1n.

For s ∈ T (n) let:

Cs := u = ujj≥1 ∈ 2∞ : uj = sj, for j = 1, · · · , n= ∩nj=1π

−1j (sj),

thenCs is clopen, i.e. closed and open, in 2∞. Further defineBn = ∪s∈T (n)f−1(Cs)×

Cs, which is a Borel set of 2∞. Finally G := ∩nBn. We claim that G is the graphof f . For x ∈ R : (x, f(x)) is an element of the graph of f . For any n ∈ N, definesn = (f(x)1, · · · , f(x)n), then clearly f(x) ∈ Csn and x ∈ f−1(Csn). Con-versely, let (x, y) be contained in G, then for each n there is some s := sn ∈ T (n)such that (x, y) ∈ f−1(Cs) × Cs, this means (y1, · · · , yn) = (sn1 , · · · , snn) andf(x) ∈ Csn ; (f(x)1, · · · , f(x)n) = (sn1 , · · · , snn). Then y = f(x).

So examples of Suslin space include Polish space, these spaces are separable,so their Borel σ–algebra is countably generated and they are metric, in particularHausdorff hence the singletons are closed. Borel sets of Polish spaces are alsoSuslin, and more generally separable metric spaces which are Borel sets of theircompletion are also Suslin.

If an analytic sets is equiped with his Borel σ–algebra, then measurable subsetsof it are again analytic.

Corollary C.2.2. Let (A,A) be an analytic set with Borel σ–algebra. Then anyZ ∈ A is analytic again.

Proof. Since A is analytic, there exists some Polish space (S, T ) and Borel mea-surable map f from S ontoA. Let Z ∈ A, then f−1(Z) ∈ B(T ), so by the remark


just above f−1(Z) is analytic. Since f is Borel measurable and onto, its restric-tion to f−1(Z) is also Borel measurable and onto, its image f(f−1(Z)) = Z isanalytic, definition C.2.1.

C.3 Universal measurability of Analytic Sets.In what follows we will show the important property of universal measurability ofanalytic sets.

Theorem C.3.1. Let (S, d) be a Polish space. Then every analytic subset of S isuniversally measurable.

Proof. Let A be an analytic subset of S and P a probability measure on the Borelσ –algebra of S. One way to prove the assertion of the theorem is to construct aBorel set, whose difference with A has measure zero for the completion of P . ByP ∗ denote the outer measure of P .

By the definition of analytic subset, definition/theorem C.2.1 we have got afunction f from N∞ into S which is continuous and whose range is A. Let ε > 0;for k,M ∈ N let

N(k,M) := njj>1 ∈ N∞ : nk 6M.

Then clearly one has: f [N(1,M)] ↑ f [N∞], as M →∞. Because outer measuresare continuous from below, lemma A.2.23, there is an M1 such that:

P ∗(f [N(1,M1)]) ≥ P ∗(A)− ε/2.

Next we note that, N(1,M1) ∩ N(2,M) ↑ N(1,M1), as M → ∞. Again bycontinuity from below for outer measures, for some M2:

P ∗(f [N(1,M1) ∩N(2,M2)]) ≥ P ∗(A)− ε/2− ε/4.

Recursively define Mk such that

P ∗(f [∩kj=1N(j,Mj)]) ≥ P ∗(A)− εk∑j=1

2−j.

Every N(j,Mj) is closed in N∞ for the product topology. So

Fk := ∩kj=1N(j,Mj)

remains closed and Fk ↓ C,C := ∩j>1N(j,Mj). By Tychonoff’s theorem,A.1.16, F , which equals

∏j≥11, · · · ,Mj is seen to be compact. We would

C.3. UNIVERSAL MEASURABILITY OF ANALYTIC SETS. 149

like to find a Borel sets that doesn’t differ too much from A in probability, theabove calculations indicate that a possible candidate could be ∩kf [Fk]. But thislast set could possibly not be Borel anymore, and also differ too much from A.Taking the closure of f [Fk] in A resolves the first problem. Now we will see thatit is true that f [Fk] ↓ f [C]. We will use lemma A.1.12.

Let U be an open set of N∞ such that C ⊂ U . Consider the usual base of theproduct topology, A.1.6

B :=

⋂j∈J

π−1j (Uj) : J ⊂ N finite, Uj open subset of N

.

Every open set is a union of such basis sets, so denote U = ∪k∈KUk, Uk ∈ Band K arbitrary index set (actually because N∞ is separable we could take K asa subset of N). Since C is compact in N∞ there is L ⊂ K finite such that Cis covered by ∪l∈LUl. Because Ul are sets from the basis, there is a index Nl

such that from that index on the projection πn(Ul) = N for all n ≥ Nl. So ifN := maxl∈LNl, then for all n ≥ N : πn(Ul) = N, l ∈ L. Noting that theprojections of Fn on the coordinates n + k; k ∈ N0 and C ⊂ ∪l∈LUl, for alln > Nl : Fn ⊂ ∪l∈LUl.

Lemma A.1.12 applies and as we hoped for, we obtain that f [Fk] ↓ f [C], ask →∞. Therefore:

P (f [C]) = P (∩kf [Fk])

= P (∩kf [Fk])

≥ P ∗(A)− ε,

and we have found a Borel set, f [C], that is ε close to A. Repeating that argumentfor ε = 1/n;n = 1, 2, · · · one obtains Borel sets Bn:

P (Bn) ≥ P ∗(A)− 1/n.

This finishes the proof because by construction the sets Bn ⊂ A : P (B) ≤P ∗(A).

This results actually holds for σ finite measures. Recall that a measure is calledσ finite if there exists a sequence of measurable sets Ωn with finite measure suchthat ∪nΩn = Ω. Then σ finite measure are equivalent, i.e. they dominate eachother, so they have the same nullsets, to probability measures of the form

P (·) =∑j≥1

µ(· ∩ Ωj)

µ(Ωj)2−j.

That P is dominated by µ is easily seen. For the converse, if A is such thatP (A) = 0 then µ(A ∩ Ωj) = 0, but because ∪j≥1Ωj = Ω on has

µ(A) ≤∑j≥1

µ(A ∩ Ωj) = 0.

So P also dominates µ.

C.4 A selection theorem for Analytic Sets.Next we state a selection theorem for analytic sets.

Theorem C.4.1. Let R and S be Polish spaces and let A be an analytic subsetof the product space R × S. Let C := πS(A) = s ∈ S : (r, s) ∈ A forsome r ∈ R. Then there is a function g from the analytic set C into R such that(g(s), s) ∈ A for all s ∈ C and such that g is measurable from the σ–algebra ofuniversally measurable sets of S to the Borel sets of R.

Proof. A nice property of analytic sets is that they are conserved by projections,because those are continuous surjective maps, so C is analytic (definition/theoremC.2.1).

There is some link with the usual Axiom of Choice, which is equivalent to thewell ordering principle. Here the proof will go along another, weaker, form of thiswell ordering principle.

We define an ordering on N∞, the lexicographical one: mjj≥1 < njj≥1

iff for some i ∈ N : mj = nj for all j < i and mi < ni. Next we claim thatevery non–empty closed subset F of N∞ has a smallest element, i.e. there is anx ∈ F : x ≤ y for all y ∈ F . Indeed let m1 be the smallest positive integer thatappears in the first coordinate of elements of F . This can be done because N iswell ordered. From all elements of F which have m1 as first coordinate choosem2 the smallest positive integer that appears as second coordinate. Continuing byinduction we obtain mkk≥1, by construction it is a lower bound of F (one neverhas nkk≥1 < mkk≥1 for any nkk≥1 ∈ F ). Moreover because F is closed inN∞, by picking an element from each set Fl := nkk≥1 ∈ F : mj = nj forj = 1, · · · , l we have a sequence of elements in F that converges to mkk≥1, somkk≥1 ∈ F .

Continuing with the proof let γ := πSf . Then γ is a continuous function fromN∞ onto C. For s ∈ C : γ−1(s) is closed in N∞. By the above considerationthere exists a smallest element in γ−1(s). Let h : C → N∞ : s → smallestmember of γ−1s. Let g := πR f h. If h is universally measurable, it will

C.4. A SELECTION THEOREM FOR ANALYTIC SETS. 151

follow that g will be too.First note that if we set

A(s) := r ∈ R : (r, s) ∈ A,

the section of A along s ∈ S, then

γ−1(C) = f−1(π−1S (C)) = f−1(π−1

S (C))

= f−1(A ∩ (R× C)) = f−1(∪s∈CA(s)× s).

So y = h(s) ∈ γ−1s = f−1(A(s)× s), and f(h(s)) ∈ A(s)× s.Now let

Cn := y ∈ C : there is some m = mjj≥1 ∈ N∞ : γ(m) = y, m1 = n:= γ[π−1

1,N∞(n)].

Then Cn is an analytic set. In order to see that γ[π−11,N∞(n)] is analytic, we note

that N∞ is a Polish space, π−11,N∞(n) is an open set in N∞ hence Borel. Finally γ

is a continuous surjections and according to definition/theorem C.2.1 continuousimages of Borel sets of Polish spaces are analytic sets.

Let h1 : C → N : y 7→ h1(y) = n where n is the positive integer such thaty ∈ Cn. This h1 is a measurable function from the Borel σ–algebra on N to theuniversally measurable sets from C, because h−1

1 (n) = Cn\ ∪n−1i=1 Ci. (Recall

also that analytic sets are universally measurable). We continue recursively ifwe have h1, · · · , hk then hk+1 : C → N and hk+1(y) the least n such that y =γ(m),m ∈ N∞ and mj = hj(y), j = 1, · · · , k; hk+1(y) = n.Our purpose is to show that h is a measurable function, because its codomain isN∞ we can write h as a sequence hj as above. The space N∞ is equipped withthe Borel σ–algebra generated by the product topology. Here the Borel σ–algebraequals the product σ–algebra, so in order to prove measurability of h it is sufficientto prove it for each component function hj .Before proving measurability of an arbitrary hk we first consider the case k = 2as it will show how to easily generalize to an arbitrary k ∈ N. So start withconsidering j = 2. Now h2(y) = n iff y = γ(m) for some m in

π−12 (n) ∩ π−1

1 (h1(y))\ ∪n−1i=1 π

−12 (i) ∩ π−1

1 (h1(y)).

Meaning that h2(y) = n = m2, h1(y) = m1 and for any other m ∈ N∞ withh1(y) = m1: h2 = m2 > n. Then y ∈ h−1

2 (n) iff

y ∈ γ

[ ⋃k∈K

([π−1

2 (n) ∩ π−11 (k)

]\ n−1⋃

i=1

[π−1

2 (i) ∩ π−11 (k)

])]


because h1(y) = k for some (unique) k ∈ N and h1(C) = K ⊂ N. Actuallyfor h2(y) one is looking for the set of sequences of positive integers in the setπ−1

1 (h1(y)) ∩ γ−1(y) which has the smallest second coordinate. That set isu.m., because the set on which γ acts is a Borel set, see theoremNow we can move on to the general case. So assume h1, · · · , hk are measurablefor the Borel σ–algebra on N and the σ–algebra of u.m. sets on C. Then lethk+1 be as defined above. Denote the range of hj as Kj, j = 1, 2, · · · , k. Theny ∈ h−1

k+1(n) iff y ∈

γ

[k⋃j=1

⋃lj∈Kj

([π−1k+1(n)∩

k⋂j=1

π−1j (lj)

]\ n−1⋃

i=1

[π−1k+1(i)∩

k⋂j=1

π−1j (lj)

])]

Again all operations are countable operations on open sets so what is inside thebrackets is a Borel set of N∞, which is Polish. By definition/theorem C.2.1 aboutanalytic sets, the image of γ of that Borel set is analytic, thus universally measur-able by theorem C.3.1. As said before, h = hjj≥1 by definition and thus h ismeasurable.

Appendix D

Entropy and useful inequalities.

D.1 Entropy.We often modified our class of functions F to obtain a more tractable class. Itis important to prove that those modified classes still enjoy the properties of theoriginal one, such as a finite entropy number. This will be proven first, after thedefinition of entropy is recalled.

Definition D.1.1. For F a class of measurable functions on (X ,A) let

FF(x) := sup|f(x)| : f ∈ F = ‖δx‖F ,

where δx : F → R is linear functional.A measurable function F ≥ FF is called an envelope function for F . If FF

is A–measurable, then it is said to be the envelope function. Given a law P on(X ,A) we call F ∗F the essential infimum of FF the envelope function of F for P .

Let Γ be the set of all laws on X of the form n−1∑n

j=1 δxj for some xj ∈ Xand j = 1, · · · , n, n ∈ N0. For ε > 0, 0 εp

∫F p dγ

.

Furthermore let

Lemma D.1.1. Let (X ,A, P ) be a probability space, F ∈ Lp(X ,A, P ), andF ⊂ L1(X ,A, P ) where F is the envelope function of F and p ∈ [1,+∞[. If

D(p)F (δ,F) <∞ for all δ > 0;

153

154 APPENDIX D. ENTROPY AND USEFUL INEQUALITIES.

then also for FM := fIF≤M : f ∈ F

D(p)

F(ε,FM) < +∞, for all ε > 0.

Proof. Indeed, let

γ := n−1

n∑l=1

δxl ∈ Γ

and xl ∈ X , 1 ≤ l ≤ n. If g1, · · · , gm ∈ FIF≤M such that∫|gi − gj|p dγ > εpγ(F p

M).

Now γ(F pM) = n−1

∑l∈AM (F (xl))

p, where AM := k : F (xk) ≤ M. Letγ :=card(AM)−1

∑l∈AM δxl and fiIF≤M := gi. Then∫|(fi − fj)IAM |p dγ > εpγ(F p

M)

|AM |n

∫|(fi − fj)|p dγ > εp

|AM |n

γ(F p).

So m ≤ D(p)F (ε,F , γ) ≤ D

(p)F (ε,F).

In the next series of lemma we will prove that certain classes G derived fromF , which is Suslin image admissible, are still Suslin image admissible.

Lemma D.1.2. Let

Xl : (X∞,A∞, P∞)→ (X ,A, P ) : xnn≤1 7→ xll≥1

be the standard model, with (X ,A, P ) a probability space. Let F ⊂ L2(X ,A, P )and F ∈ L2(X ,A, P ) be an envelope function for F .

IfF is Suslin image admissible through some (Y,S, T ) andD(2)F (ξ,F) < +∞

for all ξ > 0. Then, for any δ > 0,

Fj,δ := f − fj : f ∈ F ,∫

(f − fj)2 < δ2,

with fj ∈ L2(X ,A, P ), is also Suslin image admissible and has a finite entropy.

Proof. fj + Fj,δ ⊂ F ∩ BL2(fj, δ), thus fj + Fj,δ is relatively d2,P–open andaccording to theorem 4.1.4,

Z := y ∈ Y : T (y) ∈ fj + Fj,δ = T−1(fj + Fj,δ) ∈ S.

D.1. ENTROPY. 155

(Y,S) is a Suslin space, by corollary C.2.2, Z with relative σ–algebra is alsoSuslin. Moreover (Z,S u Z) is also a separable space measurable space, since(Y,S) is. fj + Fj,δ is Suslin image admissible, and because fj is measurable Fj,δis Suslin image admissible.

D(2)F (ξ,Fj,δ) < +∞, because let m ≤ D

(2)F (ξ,Fj,δ), γ ∈ Γ and hi ∈ Fj,δ,

1 ≤ i ≤ m such that∫

(hk − hl)2 dγ > ξ2γ(F 2). Then hi − fj ∈ F and∫((hk − fj)− (hl − fj))2 dγ =

∫(hk − hl)2 dγ > ξ2γ(F 2)

but hi − fj, 1 ≤ i ≤ m may possibly not cover whole F , so m ≤ D(2)F (ξ,F).

A lemma needed in chapter 6, in theorem 5.3.2 and 5.4.2.

Lemma D.1.3. Let (X ,A) be a separable measurable space and F a class ofreal–valued measurable functions on X . If Xi denote the coordinates on(X∞,A∞, P∞), then for any fixed x = (x1, · · · , xn)

G := F × F ∩ (f, g) :

∫(f − g)2 dP2n(x) < δ2

is Suslin image admissible and

|ν0n(f − g)| for f, g ∈ G

is measurable.

Proof. It is enough to have that

sup|ν0n(f − g)| : f, g ∈ F ,

∫(f − g)2 dP2n < δ2

is measurable.

• The product F × F is certainly Suslin image admissible via(Y 2,S ⊗ S, (T, T )), as in the proof of theorem 4.2.3. Now we show that

A :=

(y, z) :

∫(T (y)− T (z))2 dP2n(x) < δ2

is measurable. Then G will be image admissible Suslin via(Y 2,S ⊗ S, (T, T )IA).

• Measurability of A. First of all consider the function

h1 : X 2n × Y × Y → R4n : (x, y, z) 7→(T (y)(x1), T (z)(x1), · · · , T (y)(x2n), T (z)(x2n));

which is measurable, because the coordinate functions, i.e.

(x, y, z) 7→ (π1(x), y) 7→ T (y)(π1(x))

are measurable and R4n is separable implying,B(R4n) = ⊗4n

i=1B(R), so that measurability of the coordinate functions aresufficient to have measurability of h1. The function

h2 : R4n → R : (r) 7→ (2n)−1

2n∑l=1

(π2l−1(r)− π2l(r))2

is also measurable, since it is continuous. The composition h := h2 h1(x, y, z) equals by definition

∫(T (y)− T (z))2 dP2n(x).

• Finally we come to the measurability of ν0n(f − g) for

f, g ∈ F ∩ f, g :∫

(f − g)2 dP2n < δ2.For the functions h, h1 and h2 as in the previous step, ν0

n(f − g) is measur-able as the product of (x, y, z) 7→ Ih−1([0,δ2[) and

n−1/2( n∑i=1

(T (y)− T (z))(Xσ(i))−n∑i=1

(T (y)− T (z))(Xτ(i))).

Then as in corollary 4.2.2,

sup|ν0n(f − g)| : f, g ∈ F ,

∫(f − g)2 dP2n < δ2

is universally measurable. Indeed, noting that (X 2n,A2n) is a separablemeasurable space, ( theorem A.2.11 together with (X ,A) is a separablemeasurable space), we look at the jointly measurable map

Y 2 ×X 2n → R : (y1, y2, x) 7→ IA(x)(y1, y2)(T (y1)(x)− T (y2)(x))

and we consider

Ψ : (F × F)×X 2n → R : ((f1, f2), x) 7→ IB(f1, f2)(f1(x)− f2(x))

is jointly measurable, where B := (T, T )(A). This since, by Aumann’stheorem ( 4.1.3 ) admissibility is equivalent to image admissibility. Hence

D.2. EXPONENTIAL INEQUALITIES. 157

supΨ(f1, f2, x) : for some (f1, f1) ∈ F × F is a universally measurablefunction.

So ‖ν0n(f − g)‖F×F is universally measurable as the composition of a uni-

versally measurable and a measurable function:

‖Ψ‖F×F (Xσ1 , · · · , Xσn , Xτ1 , · · · , Xτn).

D.2 Exponential inequalities.We first prove an auxilary theorem that forms the starting point for a whole bunchof exponential inequalities.

Theorem D.2.1. Let X be any real random variable and t ∈ R, then

PrX ≥ t ≤ infu≥0

e−tuE[euX ].

Proof. Let u ≥ 0 fixed, then IX≥t ≤ eu(X−t). Integrating gives PrX ≥ t ≤e−tuE[euX ]. So taking infu≥0 finishes the proof.

This inequality is about one of the most simple random variables there are,namely Rademacher random variables and it is due to Hoeffding.

Definition D.2.1. A real random variables X that takes a.s. the values ±1 withequal probaiblity, i.e. P (X = 1) = 1/2 = P (X = −1), is called aRademacher random variable .

Proposition D.2.2 (Hoeffding). Let s1, · · · , sn be Rademacher random variables.For any t ≥ 0 and aj ∈ R:

Pr n∑

j=1

ajsj ≥ t≤ exp

(− t2

/2

n∑j=1

a2j

).

Proof. Note that

(2n)! = (2n)(2n− 1) · · · (2n− (n− 1))n! > 2nn!

for n ≥ 2, and equality holds for n = 0, 1. So 1/(2n)! ≤ 2−n/n! for n ≥ 0.Consider the function cosh(x) := (ex + e−x)/2, then by the inequality above


cosh(x) ≤ exp(x2/2) for all x:

cosh(x) = 1/2(ex + e−x)

= 1/2(∑n≥0

xn/n! +∑n≥0

(−1)nxn/n!)

= 1/2(∑n≥0

2(x2)n/(2n)!)

≤∑n≥0

(x2/2)n/n! = exp(x2/2).

Applying theorem D.2.1, gives

Pr n∑

j=1

ajsj ≥ t≤ inf

u≥0e−tuE

[exp

(u

n∑j=1

ajsj

)]= inf

u≥0e−tuE

[ n∏j=1

exp(uajsj)]

= infu≥0

e−tun∏j=1

E[exp(uajsj)]

= infu≥0

e−tun∏j=1

1/2(exp(uaj) + exp(−uaj))

≤ infu≥0

e−tun∏j=1

exp((uaj)2/2)

= infu≥0

exp(− tu+

n∑j=1

(uaj)2/2)

where in the third step we used the independence of the sj’s and in the fourth theirspecial form to calculate the expectation. Finally, we only have to calculate theinfu≥0 using ordinary calculus. The derivative with respect to u is given by

exp(− tu+

n∑j=1

(uaj)2/2)[− t+

n∑j=1

(uaj)2]

and switches sign from minus to plus in the point u = t/(∑n

j=1 aj)2), which is

thus the minimum. Plugging it in one obtains,

exp(−t2/

( n∑j=1

a2j

)+

(t2/( n∑j=1

aj)2)2)

n∑j=1

a2j/2)

= exp(−t2

/(2

n∑j=1

a2j

))

D.2. EXPONENTIAL INEQUALITIES. 159

Remark. Hoeffding’s inequality is not only useful for Rademacher random vari-ables with fixed coefficients, but one can also consider coefficients which are in-dependent random variables (and independent from the si too).

Bibliography

[Bill1] P. Billingsley, Probability and Measure, John Wiley And Sons, New York,2012 (ISBN: 978-1-1181-2237-2)

[Bill2] P. Billingsley, Convergence of Probability Measures, John Wiley AndSons, New York, 1999 (ISBN: 978-0-471-19745-4)

[Cohn] Donald L. Cohn, Measure Theory, Birkhauser, Quinn-Woodbine, 1980

[Dud1] Richard M. Dudley, Real Analysis and Probability, Cambridge UniversityPress,, Cambridge, 2002 (ISBN: 0-521-80972-X)

[Dud2] Richard M. Dudley, Uniform Central Limit Theorems, Cambridge Uni-versity Press, Cambridge, 2008 (ISBN: 972-0-521-05221-4)

[FreitagAndBusam] E. Freitag and R. Busam, Complex Analysis., Springer Ver-lag, Berlin Heidelberg, 2009 (ISBN: 978-3-540-93982-5)

[HewAndStrom] E. Hewitt and K. Stromberg, Real and Abstract Analysis,Springer Verlag, New York, 1965 (ISBN: 3-540-90138-8)

[Jech] T. Jech, Set Theory, Academic Press, London, 1978 (ISBN: 0-12-381950-4)

[Pol] D. Pollard, Convergence of Stochastic Processes., Springer Verlag, NewYork, 1984 (ISBN: 0-387-90990-7)

[vdVaartAndWell] Aad W. van der Vaart and Jon A. Wellner, Weak Convergenceand Empirical Processes., Springer Verlag, Berlin Heidelberg, 2000 (ISBN:0-387-94640-3)

[Will] S. Willard, General Topology, Dover Publications, New York, 2004(ISBN: 978-0-486-43479-7)

161

limit theorems for general empirical...

Documents