[leadbetter r., cambanis s., pipiras v.] a basic c(bokos-z1)

A Basic Course in Measure and Probability

Originating from the authors’ own graduate course at the University of North Carolina,this material has been thoroughly tried and tested over many years, making the bookperfect for a two-term course or for self-study. It provides a concise introduction thatcovers all of the measure theory and probability most useful for statisticians, includingLebesgue integration, limit theorems in probability, martingales, and some theory ofstochastic processes. Readers can test their understanding of the material through the300 exercises provided.

The book is especially useful for graduate students in statistics and related fieldsof application (biostatistics, econometrics, finance, meteorology, machine learning,etc.) who want to shore up their mathematical foundation. The authors establishcommon ground for students of varied interests, which will serve as a firm “take-offpoint” for them as they specialize in areas that exploit mathematical machinery.

ROSS LEADBETTER is Professor of Statistics and Operations Research at theUniversity of North Carolina, Chapel Hill. His research involves stochastic processtheory, point processes, particularly extreme value and risk theory for stationarysequences and processes, and applications to engineering, oceanography, and theenvironment.

STAMATIS CAMBANIS was a Professor at the University of North Carolina, ChapelHill until his death in 1995, his research including fundamental contributions tostochastic process theory, and especially stable processes. He taught a wide range ofstatistics and probability courses and contributed very significantly to the developmentof the measure and probability instruction and the lecture notes on which this volumeis based.

VLADAS PIPIRAS has been with the University of North Carolina, Chapel Hill since2002, and a full Professor since 2012. His main research interests focus on stochasticprocesses exhibiting long-range dependence, multifractality and other scalingphenomena, as well as on stable, extreme value and other distributions possessingheavy tails. He has also worked on statistical inference questions for reduced-rankmodels with applications to econometrics, and sampling issues for finite pointprocesses with applications to data traffic modeling in computer networks.

A Basic Course in Measure and ProbabilityTheory for Applications

ROSS LEADBETTERUniversity of North Carolina, Chapel Hill

STAMATIS CAMBANISUniversity of North Carolina, Chapel Hill

VLADAS PIPIRASUniversity of North Carolina, Chapel Hill

University Printing House, Cambridge CB2 8BS, United Kingdom

Published in the United States of America by Cambridge University Press, New York

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit ofeducation, learning and research at the highest international levels of excellence.

www.cambridge.orgInformation on this title: www.cambridge.org/9781107020405

© Ross Leadbetter and Vladas Pipiras 2014

This publication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place without the written

permission of Cambridge University Press.

First published 2014

Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall

A catalog record for this publication is available from the British Library

Library of Congress Cataloging-in-Publication Data

Leadbetter, Ross, author.A basic course in measure and probability : theory for applications / Ross Leadbetter,

Stamatis Cambanis, Vladas Pipiras.pages cm

ISBN 978-1-107-02040-5 (hardback)ISBN 978-1-107-65252-1 (paperback)

1. Measure theory. 2. Probabilities. I. Cambanis, Stamatis, 1943-1995, author.II. Pipiras, Vladas, author. III. Title.

QC20.7.M43L43 2013515′.42–dc23 2013028841

ISBN 978-1-107-02040-5 HardbackISBN 978-1-107-65252-1 Paperback

Cambridge University Press has no responsibility for the persistence or accuracy ofURLs for external or third-party internet websites referred to in this publication,

and does not guarantee that any content on such websites is, or will remain,accurate or appropriate.

www.cambridge.org/9781107020405

www.cambridge.org

Contents

Preface page ixAcknowledgements xiii

1 Point sets and certain classes of sets 11.1 Points, sets and classes 11.2 Notation and set operations 21.3 Elementary set equalities 51.4 Limits of sequences of sets 61.5 Indicator (characteristic) functions 71.6 Rings, semirings, and fields 81.7 Generated rings and fields 111.8 σ-rings, σ-fields and related classes 131.9 The real line – Borel sets 16

Exercises 18

2 Measures: general properties and extension 212.1 Set functions, measure 212.2 Properties of measures 232.3 Extension of measures, stage 1: from semiring to ring 272.4 Measures from outer measures 292.5 Extension theorem 312.6 Completion and approximation 342.7 Lebesgue measure 372.8 Lebesgue–Stieltjes measures 39

Exercises 41

3 Measurable functions and transformations 443.1 Measurable and measure spaces, extended Borel sets 443.2 Transformations and functions 45

vi Contents

3.3 Measurable transformations and functions 473.4 Combining measurable functions 503.5 Simple functions 543.6 Measure spaces, “almost everywhere” 573.7 Measures induced by transformations 583.8 Borel and Lebesgue measurable functions 59

Exercises 60

4 The integral 624.1 Integration of nonnegative simple functions 624.2 Integration of nonnegative measurable functions 634.3 Integrability 684.4 Properties of the integral 694.5 Convergence of integrals 734.6 Transformation of integrals 774.7 Real line applications 78

Exercises 80

5 Absolute continuity and related topics 865.1 Signed and complex measures 865.2 Hahn and Jordan decompositions 875.3 Integral with respect to signed measures 925.4 Absolute continuity and singularity 945.5 Radon–Nikodym Theorem and the Lebesgue decomposition 965.6 Derivatives of measures 1025.7 Real line applications 104

Exercises 112

6 Convergence of measurable functions, Lp-spaces 1186.1 Modes of pointwise convergence 1186.2 Convergence in measure 1206.3 Banach spaces 1246.4 The spaces Lp 1276.5 Modes of convergence – a summary 134

Exercises 135

7 Product spaces 1417.1 Measurability in Cartesian products 1417.2 Mixtures of measures 1437.3 Measure and integration on product spaces 1467.4 Product measures and Fubini’s Theorem 1497.5 Signed measures on product spaces 1527.6 Real line applications 1537.7 Finite-dimensional product spaces 155

Contents vii

7.8 Lebesgue–Stieltjes measures on Rn 1587.9 The space (RT ,BT ) 1637.10 Measures on RT , Kolmogorov’s Extension Theorem 167

Exercises 170

8 Integrating complex functions, Fourier theoryand related topics 177

8.1 Integration of complex functions 1778.2 Fourier–Stieltjes, and Fourier Transforms in L1 1808.3 Inversion of Fourier–Stieltjes Transforms 1828.4 “Local” inversion for Fourier Transforms 186

9 Foundations of probability 1899.1 Probability space and random variables 1899.2 Distribution function of a random variable 1919.3 Random elements, vectors and joint distributions 1959.4 Expectation and moments 1999.5 Inequalities for moments and probabilities 2009.6 Inverse functions and probability transforms 203

Exercises 204

10 Independence 20810.1 Independent events and classes 20810.2 Independent random elements 21110.3 Independent random variables 21310.4 Addition of independent random variables 21610.5 Borel–Cantelli Lemma and zero-one law 217

Exercises 219

11 Convergence and related topics 22311.1 Modes of probabilistic convergence 22311.2 Convergence in distribution 22711.3 Relationships between forms of convergence 23511.4 Uniform integrability 23811.5 Series of independent r.v.’s 24111.6 Laws of large numbers 247

Exercises 249

12 Characteristic functions and central limit theorems 25412.1 Definition and simple properties 25412.2 Characteristic function and moments 25812.3 Inversion and uniqueness 26112.4 Continuity theorem for characteristic functions 26312.5 Some applications 265

viii Contents

12.6 Array sums, Lindeberg–Feller Central Limit Theorem 26812.7 Recognizing a c.f. – Bochner’s Theorem 27112.8 Joint characteristic functions 277

Exercises 280

13 Conditioning 28513.1 Motivation 28513.2 Conditional expectation given a σ-field 28713.3 Conditional probability given a σ-field 29113.4 Regular conditioning 29313.5 Conditioning on the value of a r.v. 30013.6 Regular conditional densities 30313.7 Summary 305

Exercises 306

14 Martingales 30914.1 Definition and basic properties 30914.2 Inequalities 31414.3 Convergence 31914.4 Centered sequences 32514.5 Further applications 330

Exercises 337

15 Basic structure of stochastic processes 34015.1 Random functions and stochastic processes 34015.2 Construction of the Wiener process in R[0,1] 34315.3 Processes on special subspaces of RT 34415.4 Conditions for continuity of sample functions 34515.5 The Wiener process on C and Wiener measure 34615.6 Point processes and random measures 34715.7 A purely measure-theoretic framework for r.m.’s 34815.8 Example: The sample point process 35015.9 Random element representation of a r.m. 35115.10 Mixtures of random measures 35115.11 The general Poisson process 35315.12 Special cases and extensions 354

References 356Index 357

Preface

This work arises from lecture notes for a two semester basic course se-quence in Measure and Probability Theory given for first year Statisticsgraduate students at the University of North Carolina, evolving throughmany generations of handwritten, typed, mimeographed, and finallyLaTeX editions. Their focus is to provide basic course material, tailoredto the background of our students, and influenced very much by their re-actions and the changing emphases of the years. We see this as one sideof an avowed department educational mission to provide solid and diversebasic course training common to all our students, who will later specializein diverse areas from the very theoretical to the very applied.

The notes originated in the 1960’s from a “Halmos style” measure theorycourse. As may be apparent (to those of sufficient age) the measure theorysection has preserved that basic flavor with numerous obvious modern-izations (beginning with the early use of the Sierpinski-type classes moresuited than monotone class theorems for probabilistic applications), and ex-position more tailored to the particular audience. Even the early “Halmosframework” of rings and σ-rings has been retained up to a point since thesenotions are useful in applications (e.g. point process theory) and their in-clusion requires no significant further effort. Integration itself is discussedwithin the customary σ-field framework so the students have no difficultyin relating to other works.

Strong opinions abound as to how measure theory should be taught, oreven if it should be taught: its existence was once described by a Dan-ish statistical colleague as an “unfortunate historical accident” and by alocal mathematician as an “unnatural way of approaching integration.” Inparticular he felt that the Caratheodory extension “was not natural” since,

x Preface

as he expressed it “If Caratheodory had not thought of it, I wouldn’t haveeither!”

Perhaps more threatening is the “bottom line” climate in some of to-day’s universities suggesting that training in measure-theoretic probabilityand statistical theory belongs to the past and should be deemphasized infavor of concentrated computational training for modern project-orientedactivity. In this respect we can point with great pride to the many of ourgraduates making substantial statistical contributions in applications as-cribable in (excuse us) “significant measure” to a solid theoretical compo-nent in their training. Moreover we ourselves see rather dramatic enroll-ment increases in our graduate probability courses from students in otherdisciplines in our own university and beyond, in fields such as financialmathematics with basic probability prerequisite. These (at least local) fac-tors suggest a continuing role for both basic and more advanced courseofferings, with the opportunity for innovative selection of special topics tobe included.

Our viewpoint regarding presentation, much less single minded thansome, is that we would teach (even name) this subject differently accordingto the particular audience needs. Based on the typical “advanced calculus”and “operational probability” backgrounds of our own students we preferan essentially non-topological measure theory course followed by one inbasic probability theory. For those of a more mathematical bent, the beau-tiful interplay between measure, topology (and algebra) can be studied at alater stage and is not a substantial part of our standard training mission forfirst year statistics graduate students. This organization has the incidentaladvantage that those who do further study have gained an understanding ofwhich arguments (such as the central “σ-ring game”) are measure theoreticin nature in contrast to being topological, or algebraic.

Our aim in the first semester is to provide a comprehensive accountof general measure and integration theory. This we see as a quite welland naturally defined body of topics, generalizing much of standard realline Lebesgue integration theory to abstract spaces. Indeed a valuable by-product is that a student may automatically acquire an understanding ofreal line Lebesgue integration and its relationship to Riemann theory, madevisible by a supply of exercises involving real line applications. We findit natural to first treat this body of (general measure) theory, giving ad-vance glimpses from time to time of the probabilistic context. Some au-thors prefer the immediacy of probabilistic perspective attainable from aprimary focus on probability in development ab initio, with extensions togeneral measures being indicated to the degree desired. This is primarily a

Preface xi

question of purpose and taste with pros and cons. The only viewpoint wewould strongly disagree with is that there exists a uniformly best didacticapproach.

In the context of “measure theory” we view σ-finiteness as the “naturalnorm” for the statement of results, and finite measures as (albeit important)special cases. This, naturally, changes in the second part with primary focuson probability measures and more special resulting theory. In addition tothe specialization of general measure theoretic results to yield the basicframework for probability theory there is, of course, an unlimited varietyof results which may be explored in the purely probabilistic context andone may argue about which are truly central and a sine qua non for a one-semester treatment. There would probably be little disagreement with thetopics we have included as being necessary and desirable knowledge, butthey certainly cannot be regarded as sufficient for all students. Again ourguiding principle has been to provide a course suited as common groundfor our students of varied interests and serving as a “take-off point” forthem as they specialize in areas ranging from applied statistics to stochasticanalysis.

For a course one has to decide whether to emphasize basic ideas, de-tails, or both. We have certainly attempted to strongly highlight the cen-tral ideas; if we have erred it is in the direction of including as completedetails as possible, feeling that these should be seen at least once by thestudents. For example, detailed consideration of sets of measure zero, ofpossibly infinite function values and the specific identification of XxYxZwith (XxY)xZ are not necessarily issues of lasting emphasis in practicebut we think it appropriate and desirable to deal with them carefully whenintroduced in a course. As will be clear, it has not been our intention toproduce yet one more comprehensive book on this subject. Rather we haveused the facilities of modern word processing as encouragement to give ourlecture notes a better organized and repeatedly updated basic course formin the hope that they (and now this volume) will be the more useful to ourown students, for whom they are designed, and to others who may shareour educational perspectives.

Finally, it is with more than a twinge of sadness that this preface iswritten in the absence of coauthor Stamatis Cambanis, without whom thelecture notes would not have taken on any really comprehensive form.From the rough (mainly measure - theoretic) notes prepared by MRL inthe 1960’s, SC and MRL worked together in developing the notes fromthe mid-1970’s as they taught the classes, until Stamatis’ untimely death in1995.

xii Preface

Stamatis Cambanis was a wonderfully sensitive human being and friend,with unmatched concern to give help wherever and whatever the need. Hewas also The Master Craftsman in all that he did, his character echoing thewords of Aristotle: “Είμαστε αυτό που πράττουμε επανειλημμένα. ΄Ετσι,η τελειότητα δεν είναι πράξη αλλά συνήθεια.” (We are what we repeatedlydo. Excellence then is not an act but a habit.)

M.R.L., V.P.

Acknowledgements

It is indeed hazardous to list acknowledgements in a work that has beenused in developing form for almost half a century, and we apologize inadvance for inevitable memory lapses that have caused omissions. It goeswithout saying that we are grateful to generations of questioning students,often indicating some lack of clarity of exposition in class or in the notes,and leading to needed revisions. Some have studied sections of special in-terest to them and not infrequently challenged details or phrasing of proofs– again leading to improvements in clarity. In particular Chihoon Lee un-dertook a quite unsolicited examination of the entire set of notes and pointedout many typographic and other blemishes at that time. Xuan Wang re-viewed the entire manuscript in detail. We are especially grateful to MartinHeller who critically reviewed the entire set of book proofs and has pre-pared a solution set for many of the exercises.

Typing of original versions of the notes was creatively done by PeggyRavitch and Harrison Williams, who grappled with the early mysteries ofLaTeX, pioneered its use in the department, and constantly found imagina-tive ways to outwit its firm rules. Further residual typing was willingly doneby Jiang Chen, James Wilson and Stefanos Kechagias, who also doubledas Greek linguistics advisor. It is a pleasure to record the encouragementand helpful comments of our colleague Amarjit Budhiraja who used thenotes as supplementary material for his classes, and the repeated naggingof Climatologist Jerry Davis for publication as a book, as he used the notesas background in his research.

We are especially grateful to the Institute of Mathematical Statistics andthe Editors of the IMS Lecture Note Series Anirban DasGupta and theinimitable Susan Murphy for their enthusiasm for production as a volume,

xiv Acknowledgements

and for the conversion of the entire manuscript from older LaTeX and handcorrected pdf files into the new format, through Mattson Publishing Com-pany, the ever patient and gracious Geri Mattson, and the magical groupVTeX. In particular we thank IMS Executive Director Elyse Gustafson forher quiet efficiency, willing support and generously provided advice whenneeded, and Sir David Cox for his ready encouragement as coordinatingeditor of the new IMS Monograph and Textbook series, in cooperation withCambridge University Press.

We shall, of course, be most grateful for any brief alert (e.g. [email protected] or [email protected]) regarding remaining errors,blemishes or inelegance (which will exist a.s. in spite of years of revision!)as well as general reactions or comments a reader may be willing to share.

[email protected]

[email protected]

1

Point sets and certain classes of sets

1.1 Points, sets and classes

We shall consider sets consisting of elements or points. The nature of thepoints will be left unspecified – examples are points in a Euclidean space,sequences of numbers, functions, elementary events, etc. Small letters willbe used for points.

Sets are aggregates or collections of such points. Capital letters will beused for sets.

A set is defined by a property. That is, given a point, there is a criterionto decide whether it belongs to a given set, e.g. the set which is the openinterval (–1, 1) on the real line is defined by the property that it contains apoint x if and only if |x| < 1.

A set may be written as {x : P(x)} where P(x) is the property definingthe set; e.g. {x : |x| < 1} is the above set consisting of all points x for which|x| < 1, i.e. (–1, 1).

In any given situation, all the points considered will belong to a fixed setcalled the whole space and usually denoted by X. This assumption avoidssome difficulties which arise in the logical foundations of set theory.

Classes or collections of sets are just aggregates whose elements them-selves are sets, e.g. the class of all intervals of the real line, the class ofall circles in the plane whose centers are at the origin, and so on. Scriptcapitals will be used for classes of sets.

Collections of classes are similarly defined to be aggregates whose ele-ments are classes. Similarly, higher logical structures may be defined.

Note that a class of sets, or a collection of classes, is itself a set. Thewords “class of sets” are used simply to emphasize that the elements arethemselves sets (in some fixed whole space X).

1

2 Point sets and certain classes of sets

1.2 Notation and set operations

∈ x ∈ A means that the point x is an element of the set A. Thissymbol can also be used between sets and classes, e.g. A ∈ Ameans the set A is a member of the class A. The symbol ∈ mustbe used between entities of different logical type, e.g. point ∈ set,set ∈ class of sets.

� The opposite of ∈, x � A means that the point x is not an elementof the set A.

⊂ A ⊂ B (or B ⊃ A) means that the set A is a subset of B. That is,every element of A is also an element of B, or x ∈ A ⇒ x ∈ B(using “⇒” for “implies”). Diagrammatically, one may think ofsets in the plane:

A ⊂ B.

The symbol ⊂ is used between entities of the same logical type such assets (A ⊂ B), or classes of sets (A ⊂ B meaning every set in the classA isalso in the class B.A is a subclass of B).

Examples

A = {x : |x| ≤ 1/2} = [–1/2, 1/2],

B = {x : |x| < 1} = (–1, 1),

(A ⊂ B),

A = class of all intervals of the form (n, n + 1) for n = 1, 2, 3, . . . ,

B = class of all intervals,

(A ⊂ B).

Note that A ⊂ A, i.e. the symbol ⊂ does not preclude equality.

1.2 Notation and set operations 3

= Equals If A ⊂ B and B ⊂ A we write A = B. That is A and Bconsist of the same points.

∅ The empty set, i.e. the set with no points in it. Note by definition∅ ⊂ A for any set A. Also if X denotes the whole space, A ⊂ X forany set A.

∪ The union (sum) of two sets A and B, written A ∪ B is the set ofall points in either A or B (or both). That is

A ∪ B = {x : x ∈ A or x ∈ B or both}.

A ∪ B is the entire shaded area.

∩ The intersection of two sets A and B, written A ∩ B is the set ofall points in both A and B.

A ∩ B (shaded area)A – B, B – A (unshaded areas).


Two sets A, B with no points in common (A ∩ B = ∅) are said to bedisjoint. A class of sets is called disjoint if each pair of its members isdisjoint.Sometimes AB is written for A ∩ B, and A + B for A ∪ B (though A + B issometimes reserved for the case when A ∩ B = ∅).

The difference of two sets. A – B is the set of all points of A which arenot in B, i.e. {x : x ∈ A and x � B}.If B ⊂ A, A – B is called a proper difference. Note the need for care withalgebraic laws, e.g. in general

(A – B) ∪ C � (A ∪ C) – B.

The complement Ac of a set A consists of all points of the space X whichare not in A, i.e. Ac = X – A.

The symmetric difference AΔB of A and B is the set of all points whichare in either A or B but not both, i.e.

AΔB = (A – B) ∪ (B – A).

AΔB = shaded area.

Unions and intersections of arbitrary numbers of sets:If Aγ is a set for each γ in some index set Γ, ∪γ∈ΓAγ is the set of all points

which are members of at least one of the Aγ.

∪γ∈ΓAγ = {x : x ∈ Aγ for some γ ∈ Γ}.∩γ∈ΓAγ = {x : x ∈ Aγ for all γ ∈ Γ}.

If Γ is, for example, the set of positive integers, we write ∪∞n=1 for ∪n∈Γ, etc.For example, ∪∞n=1[n, n + 1] = [1,∞), where [ ] denotes a closed intervaland [ ) semiclosed, etc., and ∩∞n=1[0, 1

n ) = {0}, the set consisting of thesingle point 0 only. Also ∩∞n=1(0, 1

n ) = ∅.

1.3 Elementary set equalities 5

The set operations ∪,∩, –,Δ have been defined for sets but of coursethey apply also to classes of sets; e.g. A ∩ B = {A : A ∈ A and A ∈ B} isthe class of all those sets which are members of both the classes A and B.(Care should be taken – cf. Ex. 1.3!)

1.3 Elementary set equalities

To prove a set equality A = B, it is necessary by definition, to show thatA ⊂ B and B ⊂ A (i.e. that A and B consist of the same points). Thus wefirst take any point x ∈ A and show x ∈ B; then we take any point y ∈ Band show y ∈ A. The following result summarizes a number of simple setequalities.

Theorem 1.3.1 For any sets A, B, . . . ,

(i) A ∪ B = B ∪ A, A ∩ B = B ∩ A (commutative laws)(ii) (A∪ B)∪C = A∪ (B∪C), (A∩ B)∩C = A∩ (B∩C) (associative

laws)(iii) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) (distributive law)(iv) E ∩ ∅ = ∅, E ∪ ∅ = E(v) E ∩ X = E, E ∪ X = X

(vi) If E ⊂ F then E ∩ F = E and conversely(vii) E – F = E ∩ Fc for all E, F

(viii) E – (F ∪ G) = (E – F) ∩ (E – G), E – (F ∩ G) = (E – F) ∪ (E – G)(ix) (∪γ∈ΓAγ)c = ∩γ∈ΓAc

γ, (∩γ∈ΓAγ)c = ∪γ∈ΓAcγ.

These are easily verified and we prove just two ((iii) and (ix)) by way ofillustration. As already noted, the symbol ⇒ is used to denote “implies”,“LHS” for “left hand side”, etc.

Proof of (iii)

x ∈ LHS⇒ x ∈ A and x ∈ B ∪ C

⇒ x ∈ A, and x ∈ B or x ∈ C

⇒ x ∈ A and B, or x ∈ A and C

⇒ x ∈ A ∩ B or x ∈ A ∩ C

⇒ x ∈ RHS.

Thus LHS ⊂ RHS. Similarly RHS ⊂ LHS, showing equality. Both in-clusions may actually be obtained together by noting that each statement


not only implies the next, but is equivalent to it, i.e. we may write “⇔”(“implies and is implied by” or “is equivalent to”) instead of the one wayimplication⇒. From this we obtain x ∈ LHS⇔ x ∈ RHS, giving inclusionboth ways and hence equality. �

Proof of (ix) The same style of proof as above may be used here, of course.Instead it may be set out in a slightly different way using the notation{x : P(x)} defining a set by its property P. For the first equality

(∪Aγ)c = {x : x � ∪Aγ}= {x : x � Aγ for any γ}= {x : x ∈ Ac

γ, all γ}= ∩Ac

γ.

The second equality follows similarly or by replacing Aγ by Acγ in the

first to obtain ∩Aγ = (∪Acγ)

c and hence (∩Aγ)c = ∪Acγ. �

The equality (ii) may, of course, be extended to show that the terms of aunion may be grouped in any way and taken in any order, and similarly forthe terms of an intersection. (This is not always true for a mixture of unionsand intersections, e.g. A∩ (B∪C) � (A∩B)∪C, in general, but rather lawssuch as (iii) hold.)

(viii) and (ix) are sometimes known as “De Morgan laws”. (ix) states thatthe “complement of a union is the intersection of the complements”, andthe “complement of an intersection is the union of the complements”. (viii)is essentially just a simpler case of this with complements taken “relativeto a fixed set E”. In fact (viii) follows from (ix) (and (vii)) e.g. by notingthat

E – (F ∪ G) = E ∩ (F ∪ G)c = E ∩ Fc ∩ Gc = (E ∩ Fc) ∩ (E ∩ Gc)

= (E – F) ∩ (E – G).

1.4 Limits of sequences of sets

Let {En : n = 1, 2, . . .} be a sequence of subsets of X.

limEn (the upper limit of {En}) is the set of all points x which belong toEn for infinitely many values of n. That is, given any m, there issome n ≥ m with x ∈ En (i.e. we may say x ∈ En “infinitely often”or “for arbitrarily large values of n”).

limEn (the lower limit of {En}) is the set of all points x such that x be-longs to all but a finite number of En. That is x ∈ En for all n ≥ n0

1.5 Indicator (characteristic) functions 7

where n0 is some integer (which will usually be different for dif-ferent x). Equivalently, we say x ∈ En “for all sufficiently largevalues of n”.

Theorem 1.4.1 For any sequence {En} of sets

(i) limEn = ∩∞n=1 ∪∞m=n Em

(ii) limEn = ∪∞n=1 ∩∞m=n Em.

Proof To show (ii):x ∈ limEn ⇒ x ∈ En for all n ≥ some n0, and thus x ∈ ∩∞m=n0

Em. Hencex ∈ ∪∞n=1

(∩∞m=nEm)

.Conversely if x ∈ RHS of (ii) then, for some n0, x ∈ ∩∞m=n0

Em, and hencex ∈ Em for all m ≥ n0. Thus x ∈ limEn as required. Similarly for the proofof (i). �

A sequence {En} is called convergent if limEn = limEn and we then writelim En for this set. Since clearly limEn ⊂ limEn, to show a sequence {En} isconvergent it need only be shown that limEn ⊂ limEn.

A sequence {En} is called monotone increasing (decreasing) if En ⊂En+1

(En ⊃ En+1) for all n. These are conveniently written respectively as En ↑,En ↓.

Theorem 1.4.2 A monotone increasing (decreasing) sequence {En} isconvergent and lim En = ∪∞n=1En (∩∞n=1En).

Proof If En ↑ (i.e. monotone increasing),

limEn = ∩∞n=1(∪∞m=nEm

)= ∩∞n=1

(∪∞m=1Em)

since ∪∞m=1Em = ∪∞m=nEm (Em ↑). But ∪∞m=1Em does not depend on n andthus

limEn = ∪∞m=1Em.

But also limEn = ∪∞n=1 ∩∞m=n Em = ∪∞n=1En since ∩∞m=nEm = En.Hence limEn = ∪∞n=1En = limEn as required. Similarly for the case En ↓

(i.e. monotone decreasing). �

1.5 Indicator (characteristic) functions

If E is a set, its indicator (or characteristic) function χE(x) is defined by

χE(x) = 1 for x ∈ E

= 0 for x � E.


This function determines E since E is the set of points x for which the valueof the function is one, i.e. E = {x : χE(x) = 1}.

Simple properties:χE(x) ≤ χF(x), all x⇔ E ⊂ F

χE(x) = χF(x), all x⇔ E = F

χ∅(x) ≡ 0, χX(x) ≡ 1

χEc (x) = 1 – χE(x), all x

χ∩n1Ei (x) =

∏n1 χEi (x).

If Ei are disjoint,

χ∪n1Ei (x) =

∑n1 χEi (x).

1.6 Rings, semirings, and fields

One of the most basic concepts in measure theory is that of a ring of sets.Specifically a ring is a nonempty class R of subsets of the space X such

that if E ∈ R, F ∈ R, then E ∪ F ∈ R and E – F ∈ R.Put in another way a ring is a nonempty classRwhich is closed under the

formation of unions and differences (of any two of its sets).1 The followingresult summarizes some simple properties of rings.

Theorem 1.6.1 Every ring contains the empty set ∅. A ring is closed underthe formation of

(i) symmetric differences and intersections(ii) finite unions and finite intersections (i.e. if E1, E2, . . . , En ∈ R, then∪n

1Ei ∈ R and ∩n1Ei ∈ R).

Proof Since R is nonempty it contains some set E and hence ∅ = E – E ∈R. If E, F ∈ R, then

EΔF = (E – F) ∪ (F – E) ∈ R (since (E – F), (F – E) ∈ R)

E ∩ F = (E ∪ F) – (EΔF) ∈ R (since E ∪ F, EΔF ∈ R).

Thus (i) follows. (ii) follows by induction since e.g. ∪n1Ei =

(∪n–1

1 Ei

)∪ En.

(See also Footnote 1.) �

The next result gives an alternative criterion for a class to be a ring.

1 Whenever we say a class is “closed under unions” (or “closed under intersections”) it ismeant that the union (or intersection) of any two (and hence, by induction as above, anyfinite number of) members of the class, belongs to the class. If countable unions orintersections are involved, this will be expressly stated.

1.6 Rings, semirings, and fields 9

Theorem 1.6.2 Let R be a nonempty class of sets which is closed underformation of either

(i) unions and proper differences or(ii) intersections, proper differences and disjoint unions.

Then R is a ring.

Proof Suppose (i) holds. Then if E, F ∈ R, E – F = (E∪F) – F ∈ R sincethis is a proper difference of sets of R. Hence R is a ring.

If now (ii) holds and E, F ∈ R, then

E ∪ F = (E – (E ∩ F)) ∪ F.

This expresses E ∪ F as a disjoint union of sets of R. Hence E ∪ F ∈ R.Thus (i) holds so that R is a ring. �

Trivial examples of rings are

(i) the class {∅} consisting of the empty set only(ii) the class of all subsets of X.

More useful rings will be considered later.The next result is a useful lemma which shows how a union of a sequence

of sets of a ring R may be expressed either as a union of an increasingsequence or a disjoint sequence, of sets of R.

Lemma 1.6.3 Let {En} be a sequence of sets of a ring R, and E = ∪∞1 En

(E is not necessarily in R). Then

(i) E = ∪∞1 Fn = lim Fn where Fn = ∪ni=1Ei are increasing sets in R

(ii) E = ∪∞1 Gn where Gn are disjoint sets of R, such that Gn ⊂ En.

Proof (i) is immediate.(ii) follows from (i) by writing G1 = E1 and Gn = Fn – Fn–1 (⊂ En),

for n > 1. Clearly the Gn are in R, are disjoint since Fn are increasing, and∪∞1 Fn = ∪∞1 Gn, completing the proof. �

Fields. A field (or algebra) is a nonempty class F of subsets of Xsuch that if E ∈ F , then Ec ∈ F and if E, F ∈ F then E ∪ F ∈ F . That is, afield is closed under the formation of unions and complements.

Theorem 1.6.4 A field is a ring of which the whole space X is a member,and conversely.

Proof Let F be a field, and let E ∈ F . Then Ec ∈ F and hence X =E ∪ Ec ∈ F .

Further, if E ∈ F , F ∈ F , then

E – F = E ∩ Fc = (Ec ∪ F)c ∈ F

(using the field axioms). Thus F is a ring and contains X.Conversely, if F is a ring containing X and E ∈ F , we have Ec = X – E ∈

F . Thus F is a field. �

The next lemma shows that the intersection of an arbitrary collectionof rings (or fields) is a ring (or field). In fact such a result applies muchmore widely to many (but not all!) classes defined by very general closureproperties (and exactly the same method of proof may be used. This willbe seen later in further important cases).

Lemma 1.6.5 Let Rγ be a ring, for each γ in an arbitrary index set Γ(which may be finite, countable or uncountable). Let R = ∩{Rγ : γ ∈ Γ}i.e. R is the class of all sets E belonging to every Rγ for γ ∈ Γ. Then R is aring.

Proof If E, F ∈ R then E, F ∈ Rγ for every γ ∈ Γ. Since Rγ is a ring itfollows that E–F and E∪F belong to each Rγ and hence E–F ∈ R, E∪F ∈R.

Finally the empty set ∅ belongs to every Rγ and hence to R which istherefore a nonempty class, and hence is a ring. �

A useful class of sets which is less restrictive than a ring is a semiring.Specifically, a semiring is a nonempty class P of sets such that

(i) If E ∈ P, F ∈ P, then E ∩ F ∈ P,(ii) If E ∈ P, F ∈ P, then E – F = ∪n

1Ei, where n is some positive integerand E1, E2, . . . , En are disjoint sets of P.

Clearly the empty set ∅ belongs to any semiring P since there is someset E ∈ P and hence by (ii) ∅ = E – E = ∪n

i Ei for some n, Ei ∈ P. But thisimplies that each Ei is empty so that ∅ = Ei ∈ P.

A ring is clearly a semiring. In the real line, the class of all semiclosedintervals of the form a < x ≤ b ((a, b]) is a semiring which is not a ring.However, the class of all finite unions of semiclosed intervals is a ring – aswill be seen in the next section.

1.7 Generated rings and fields 11

1.7 Generated rings and fields

If E is any class of sets, one may ask the question, “Is there a smallestring (or field) containing E?” This question is answered by the followingimportant result.

Theorem 1.7.1 Let E be any class of sets. Then there exists a unique ringR0 so that R0 ⊃ E (i.e. every set of E is in R0) and such that if R is anyother ring containing E, then R ⊃ R0.R0 is thus the smallest ring containing E and is called the ring generated

by E, written R(E).The corresponding result holds for fields – there is a unique smallest

field F (E) containing a given class E.

Proof LetRγ denote any ring containing E (and let Γ index all such rings).There is certainly one such ring, the class of all subsets of X. Write

R0 = ∩γ∈ΓRγ = ∩{R : R is a ring containing E}.

By Lemma 1.6.5, R0 is a ring. Further, if E ∈ E, then E ∈ Rγ for each γ andthus E ∈ R0. Thus E ⊂ R0.R0 is thus a ring containing E. Further, if R is any ring containing E, R

must be one of the Rγ, for some γ, Rγ0 say. Thus

R = Rγ0 ⊃ ∩γ∈ΓRγ = R0.

R0, then, is a smallest ring containing E. To show uniqueness, suppose R*0

is another ring with the properties of the theorem statement. Then, sinceR*

0 ⊃ E, we have R*0 ⊃ R0. But R0 ⊃ E and hence R0 ⊃ R*

0. Thus R*0 = R0

as required.The same proof holds for fields with the replacement of “ring” by “field”

throughout. �

It should be shown as an exercise that F (E) ⊃ R(E) and these classesneed not coincide.

The next result is important as an illustration of a method of proof whichwill be used over and over again. The situation is that the sets of E areknown to have some property and one wishes to show that this propertyalso holds for sets of R(E). The method is to denote the class of sets withthis property, by G, say, and to show (if possible) that G is a ring. SinceG ⊃ E, it then follows that G ⊃ R(E) so that each set of R(E) has the


desired property. Many variants of this technique will be used throughout.2

The following theorem provides a simple illustration.

Theorem 1.7.2 If E is any nonempty class of sets, any set in R(E) canbe covered by a finite union of sets in E. That is, if F ∈ R(E), there existn, Ei ∈ E with F ⊂ ∪n

1Ei.

Proof Let G be the class of those sets that can each be covered by somefinite union of sets of E. If E, F ∈ G, then E ∪ F can be covered by a finiteunion of sets of E, as also can E – F. Hence E ∪ F ∈ G, E – F ∈ G. Alsoany set of E is in G and thus G is nonempty, and hence is a ring. Thus Gis a ring containing E and, by Theorem 1.7.1, G ⊃ R(E). That is, any set ofR(E) can be covered by a finite union of sets of E, as required. �

The following result shows the nature of the ring generated by a semi-ring.

Theorem 1.7.3 Let P be a semiring. The ring R(P) generated by P isprecisely the class of all sets of the form ∪n

1Ei where E1, . . . , En are disjointsets of P.

Proof Let L denote the class of all sets of this given form. If E ∈ L, thenE = ∪n

1Ei, Ei ∈ P, Ei disjoint. But Ei ∈ R(P) and thus E ∈ R(P). HenceL ⊂ R(P). To show the opposite inclusion, it is sufficient to show that Lis a ring. For then since trivially L ⊃ P, we would have L ⊃ R(P) asrequired.

To show that L is a ring:

(i) L is obviously closed under the formation of disjoint unions of anytwo of its sets.

(ii) L is closed under the formation of intersections. For if E, F ∈ L, E =∪n

i=1Ei, F = ∪m1 Fj, where Ei are disjoint sets in P, and Fj are disjoint

sets in P, and

E ∩ F = ∪ni=1 ∪m

j=1 (Ei ∩ Fj).

Now Ei ∩ Fj ∈ P since P is a semiring. Further, the nm sets (Ei ∩ Fj)are disjoint. Thus E ∩ F ∈ L as required.

(iii) L is closed under the formation of (proper) differences. For let E ∈L, F ∈ L, E = ∪n

1Ei, F = ∪m1 Fj as in (ii). Then

E – F = ∪ni=1(Ei – ∪m

1 Fj) = ∪ni=1 ∩m

j=1 (Ei – Fj).

2 Referred to descriptively by the eminent mathematician B.J. Pettis as “the σ-ring game”when used for σ-rings (cf. Section 1.8).

1.8 σ-rings, σ-fields and related classes 13

Now the sets ∩mj=1(Ei – Fj) (⊂ Ei) are disjoint for i = 1, 2, . . . , n. Hence

if we can show that Ei – Fj ∈ L for each i, j then it will follow by (ii)that ∩m

j=1 (Ei –Fj) are disjoint sets ofL and hence by (i) that E –F ∈ L.But since Ei and Fj are sets of the semiring P, Ei – Fj is a disjointunion of sets of P, i.e. is in L, completing the proof of (iii).

Hence the conditions of Lemma 1.6.2 (ii) are satisfied, and L is thus aring. �

Corollary A finite union of sets of a semiring P may be written as a finitedisjoint union of sets of P. Hence the word “disjoint” may be omitted inthe statement of the theorem.

Proof This is immediate since if Ei ∈ P, 1 ≤ i ≤ n, then Ei ∈ R(P)and ∪n

1Ei ∈ R(P) so that ∪n1Ei is a finite disjoint union of sets of P by the

theorem. �

For other results concerning construction of generated rings and fields,see Exs. 1.11, 1.12.

1.8 σ-rings, σ-fields and related classes

A σ-ring is a nonempty class S of sets such that

(i) if E, F ∈ S, then E – F ∈ S(ii) if Ei ∈ S, i = 1, 2, . . . , then ∪∞1 Ei ∈ S.

As for rings, the empty set is a member of every σ-ring. Hence if E,F ∈S, E ∪ F = E ∪ F ∪ ∅ ∪ ∅ . . . ∈ S by (ii). Thus a σ-ring is a ring whichis closed under the formation of countable unions.3

A σ-field (or σ-algebra) is a nonempty class S of sets such that if E ∈ S,then Ec ∈ S and if Ei ∈ S, i = 1, 2, . . . , then ∪∞1 Ei ∈ S. A σ-field isa field which is closed under the formation of countable unions (since ifE, F ∈ S, E ∪ F = E ∪ F ∪ F ∪ F ∪ . . . ∈ S).

Theorem 1.8.1 A σ-field is a σ-ring containing X, and conversely.

Proof If S is a σ-ring containing X, it is clearly a σ-field. Converselyif S is a σ-field, it is a field (as above) and hence a ring containing X byTheorem 1.6.4. Since it is closed under the formation of countable unions,it is also a σ-ring containing X, as required. �

3 To be definite the word “countable” is used throughout to mean “countably infinite orfinite”.


Note that a σ-ring (or σ-field) is closed under the formation of countableintersections. For if S is a σ-ring and Ei ∈ S, i = 1, 2, . . . , E = ∪∞1 Ei, thenE ∈ S and

∩∞1 Ei = E ∩ ∩∞1 Ei = E – ∪∞1 (E – Ei) ∈ S.

It is easily checked that the intersection of an arbitrary collection of σ-rings (or σ-fields) is a σ-ring (or σ-field) in the same manner as for ringsand fields (Lemma 1.6.5) and then the following result may be proved ex-actly along the same lines as Theorem 1.7.1.

Theorem 1.8.2 If E is any class of sets, there is a unique σ-ring S0 ⊃ E,such that if S is any σ-ring containing E, then S ⊃ S0. S0 will be written asS(E) and called the σ-ring generated by E. It is thus the (unique) smallestσ-ring containing E.

Similarly there is a unique smallest σ-field σ(E) containing a class E(and called the σ-field generated by E).

Lemma 1.8.3 (i) If E,F are classes of sets with E ⊂ F , then S(E) ⊂S(F ), σ(E) ⊂ σ(F ).

(ii) If E is any class of sets, then S (R(E)) = S(E).

Proof (i) Since E ⊂ F ⊂ S(F ), S(F ) is a σ-ring containing E and henceS(F ) ⊃ S(E). Similarly σ(F ) ⊃ σ(E).

(ii) Since R(E) ⊃ E we have by (i) that S (R(E)) ⊃ S(E). For the reverseinclusion note that the σ-ring S(E) is also a ring containing E, so thatS(E) ⊃ R(E). Thus S(E) is a σ-ring containing R(E), and hence S(E) ⊃S (R(E)). �

It is sometimes useful to consider closure with respect to other set oper-ations (or combinations of set operations), and correspondingly obtain thesmallest class which contains a given class E, and is closed with respectto these operations. For example, a monotone class is a nonempty classMof sets which is closed under formation of monotone limits (lim En ∈ Mwhenever {En} is a monotone (increasing or decreasing) sequence of sets inM). The monotone classM(E) generated by a class E, is then the smallestclass which contains E, and which is so closed. It is – by a now familiarpattern – the intersection of all monotone classes containing E.

The importance of monotone classes has derived from the fact that, if Eis a ring, so isM(E), from which it follows easily thatM(E) = S(E). Thisresult (known as the “monotone class theorem” (Ex. 1.16)) provides an al-ternative way of obtaining S(E) when E is a ring, and this is convenientfor some purposes. It will be more useful for us here, however, to consider

1.8 σ-rings, σ-fields and related classes 15

different closure operations and obtain a theorem of Sierpinski (popular-ized by Dynkin) to be used for such purposes (since this will require fewerrestrictions on E than the assumption that it is a ring).

Specifically we shall consider a nonempty classD which is closed underformation of both proper differences and countable disjoint unions4 (that isif E, F ∈ D and E ⊃ F, then E – F ∈ D and if Ei ∈ D, i = 1, 2, . . .for disjoint Ei, then ∪∞1 Ei ∈ D). Such a class will be called a “D-class”throughout. Clearly the empty set is a member of any D-class. If E is anyclass of sets, the familiar arguments show that there is a unique smallestD-classD(E) which contains E. The result which we shall find most useful isbased on the following lemma.

Lemma 1.8.4 Let E be a nonempty class of sets which is closed under theformation of intersections (E ∩F ∈ E whenever E, F ∈ E). ThenD = D(E)is also closed under the formation of intersections.

Proof For any set E letDE = {F : F ∩ E ∈ D(E)}. Clearly if F ∈ DE thenE ∈ DF. Now for a given fixed E, DE is a D-class. (For if F, G ∈ DE andF ⊃ G, then (F –G)∩E = (F∩E)–(G∩E) which is the proper difference oftwo sets ofD(E) and hence belongs toD(E) so that F – G ∈ DE. ThusDE

is closed under the formation of proper differences. It is similarly closedunder the formation of countable disjoint unions. DE is not empty since itclearly contains ∅.)

Now if E ∈ E, it follows that E ⊂ DE (since F ∩ E ∈ E ⊂ D(E) for allF ∈ E). Thus DE is a D-class containing E so that DE ⊃ D(E), wheneverE ∈ E.

Hence if E ∈ E and F ∈ D(E) we must have F ∈ DE, so that alsoE ∈ DF. But this means that E ⊂ DF whenever F ∈ D(E), and hencefinally that D(E) ⊂ DF if F ∈ D(E). Restating this, if E, F ∈ D(E), thenE ∈ DF so that E ∩ F ∈ D(E). That is, D(E) is closed under intersections,as required. �

The lemma shows that if E is closed under intersections, so isD(E). Thefollowing widely useful result follows simply from this.

Theorem 1.8.5 Let E be a nonempty class of sets which is closed underthe formation of intersections. Then S(E) = D(E).

4 This includes finite disjoint unions (since clearly ∅ ∈ D) even if we initially assume onlyclosure under countably infinite disjoint unions (and proper differences). This conformswith our use of “countable”.

Proof Since S(E) is a σ-ring it is closed in particular under the formationof proper differences and countable disjoint unions, i.e. is a D-class. Thussince S(E) ⊃ E it follows that S(E) ⊃ D(E).

To show the reverse inclusion, note that by Lemma 1.8.4D(E) is closedunder formation of intersections, as well as proper differences and count-able disjoint unions. But it is easily checked (Ex. 1.17) that a class withthese properties is a σ-ring. Hence D(E) is a σ-ring containing E, so thatD(E) ⊃ S(E), as required. �

Finally it should be noted that if it is required that X ∈ D(E), in additionto the assumption that E is closed under intersections, then it follows thatD(E) = σ(E). Other variants are also possible (cf. Ex. 1.18).

Corollary If D0 is a D-class containing E, where E is closed underintersections, thenD0 ⊃ S(E).

Proof D0 ⊃ D(E) = S(E). �

1.9 The real line – Borel sets

Let X be the real line R = (–∞,∞), and P the class consisting of allbounded semiclosed intervals of the form (a, b] = {x : a < x ≤ b}, (–∞ <

a ≤ b < ∞). P is clearly a semiring. The σ-ring S(P) generated by P iscalled the class of Borel sets of the real line (and will usually be denotedby B in the sequel). Since R = ∪∞n=–∞(n, n + 1] and (n, n + 1] ∈ P ⊂ B itfollows that B is also a σ-field, and B = S(P) = σ(P).

The Borel sets play a most important role in measure and probabilitytheory. The first theorem lists some examples of Borel sets.

Theorem 1.9.1 The following are Borel sets:

(i) any one-point set(ii) any countable set

(iii) any interval: open, closed, semiclosed, finite or infinite(iv) any open or closed set.

Proof (i) A one-point set {a} can be written as ∩∞1 (a– 1n , a] ∈ B since each

term belongs to B.(ii) A countable set is a countable union of one-point sets, and is thus

in B.(iii) If a, b are real

(a, b) = (a, b] – {b} ∈ B,

1.9 The real line – Borel sets 17

[a, b] = (a, b] ∪ {a} ∈ B

(a,∞) = ∪∞n=1(a, a + n] ∈ B

and so on.(iv) An open set is a countable union of open intervals and hence is in

B. A closed set is the complement of an open set and is thus in B (since Bis a σ-field). �

Property (iv) will not be needed here. However, it is included since itshows that Borel sets can have quite a complicated structure. Not all setsare Borel sets however. (See also Section 2.7.)

The class B of Borel sets was defined to be the σ-ring S(P), generatedby the classP of bounded semiclosed intervals (a, b]. It is easy to see thatBis also generated by the open intervals, or the closed intervals, or indeed byvarious classes of semi-infinite intervals (see Exs. 1.19–1.21 for details).Another class which generates B is the class of open sets. This (easilyproved) fact provides the basis for generalizing the concept of Borel sets toquite abstract topological spaces – which, however, is not of concern here.

The final topic of this section, is the effect on a Borel set of a lineartransformation of all its points. Specifically, let T denote the “linear trans-formation” of the real line given by Tx = αx + β, where α � 0. If E is anyset, denote by TE the set of all images (under T) of the points of E. That isTE = {Tx : x ∈ E}. It seems intuitively plausible that if E is a Borel set, thenTE will also be one. (For TE is just a “scaled”, “translated” and possibly“reflected” (if α < 0) version of E.)

Theorem 1.9.2 With the above notation TE is a Borel set if and only if Eis a Borel set.

Proof Suppose α > 0. (The needed modifications for α < 0 will be obvi-ous.) Clearly for any sequence {Ei} of sets we have T(∪∞1 Ei) = ∪∞1 TEi andfor this (or in fact any (1-1) T), T(E1 – E2) = TE1 – TE2. (These shouldbe checked!) Using these facts it is easy to see that the class G of all sets Esuch that TE ∈ B, is a σ-ring (e.g. if Ei ∈ G then T(∪∞1 Ei) = ∪∞1 TEi ∈ B,and hence ∪∞1 Ei ∈ G). But G ⊃ P since T(a, b] = (αa + β,αb + β] ∈ B.Hence G ⊃ S(P) = B. That is if E ∈ B, TE ∈ B.

Conversely, the inverse (point) mapping T–1 given by T–1y = (y – β)/α isa transformation of the same kind as T , and thus also converts Borel setsinto Borel sets. Hence if TE ∈ B we have T–1(TE) ∈ B. But T–1(TE) = E(this also needs checking – it is not true for general transformations!) andhence E ∈ B. Thus TE is a Borel set if and only if E is. �


Exercises1.1 Prove the following set equalities.

E – F = (E ∪ F) – F = E – (E ∩ F) = E ∩ Fc

E ∩ (F – G) = (E ∩ F) – (E ∩ G)

(E – F) – G = E – (F ∪ G)

E – (F – G) = (E – F) ∪ (E ∩ G)

(E – F) ∩ (G – H) = (E ∩ G) – (F ∪ H)

EΔ(FΔG) = (EΔF)ΔG

E ∩ (FΔG) = (E ∩ F)Δ(E ∩ G)

EΔ∅ = E EΔX = Ec

EΔE = ∅ EΔEc = X

EΔF = (E ∪ F) – (E ∩ F)

1.2 Show that if EΔF = GΔH, then EΔG = FΔH, by considering GΔ(EΔF)ΔH.

1.3 Let the classA consist of the single set A and the classB consist of the singleset B. What areA∪ B andA∩ B?

1.4 (i) Show that any disjoint sequence of sets converges to ∅.(ii) If A and B are two sets and En = A or B according as n is even or odd,find limEn and limEn. When does {En} converge?

1.5 Show that

lim(F – En) = F – limEn, lim(F – En) = F – limEn.

1.6 If {En} is a sequence of sets and D1 = E1, Dn+1 = DnΔEn+1, n = 1, 2, . . . ,show that lim Dn exists if and only if lim En = ∅.

1.7 (i) If {En} is a sequence of sets, show that

χ∪∞1 En = χE1 +(1 – χE1

)χE2 +

(1 – χE1

) (1 – χE2

)χE3 + . . . .

(ii) If E and F are two sets, evaluate χEΔF in terms of χE and χF .1.8 Show that for a sequence {En} of sets

χlimEn (x) = lim χEn (x), χlimEn(x) = lim χEn (x),

where lim an = lim sup an, lim an = lim inf an, the upper and lower limitsfor a real number sequence {an}.

1.9 Let X be an uncountably infinite set and E1 the class of sets which are eithercountable or have countable complements. Is E1 a ring? A field? A σ-ring?Let E2 be the class of all countable subsets of X. Is E2 a ring? A field? Aσ-ring?

Exercises 19

1.10 What are the rings, fields, σ-rings and σ-fields generated by the followingclasses of sets?(a) E = {E}, the class consisting of one fixed set E only(b) E is the class of all subsets of a fixed set E(c) E is the class of all sets containing exactly two points.

1.11 Let E be any nonempty class of sets and let P be the class of all possiblefinite intersections of the form E1 ∩E2 ∩ . . .∩En, n = 1, 2, . . . , where E1 ∈ Eand for each j = 2, . . . , n, either Ej ∈ E or Ec

j ∈ E. Then show that P is asemiring, P ⊃ E, and R(P) = R(E).

1.12 Let E be any nonempty class of sets and P the class consisting of the wholespace X, together with all possible finite intersections of the form E1 ∩ E2 ∩. . . ∩ En, n = 1, 2, . . ., where for each j = 1, 2, . . . , n either Ej ∈ E or Ec

j ∈ E.Then show that P is a semiring, P ⊃ E, and the field F (E) generated by E isgiven by F (E) = R(P) (= F (P) since X ∈ P).Note that P includes intersections where all Ec

j ∈ E, whereas in the previousexercise at least one Ej (E1) was required to be in E. Exercises 1.11 and1.12 give constructive procedures for the generated ring or field, in view ofTheorem 1.7.3.

1.13 If X is any nonempty set, show that the class P consisting of ∅ and all one-point sets is a semiring. Is it a ring? A field?

1.14 Show that if E is a nonempty class of sets, then every set in S(E) can becovered by a countable union of sets in E.

1.15 Let E be a class of sets. Is there a smallest semiring P(E) containing E?1.16 Show the “monotone class theorem”, viz. the monotone classM(R) gener-

ated by a ring R is the same as the σ-ring S(R) generated by R.(Hint: Show that M(R) is closed under unions and differences along thelines of Lemma 1.8.4, so thatM(R) is a ring. Use the monotone property todeduce that it is a σ-ring by using Lemma 1.6.3.)

1.17 Show that a nonempty class which is closed under the formation of intersec-tions, proper differences and countable disjoint unions, is a σ-ring.

1.18 If E is any class of sets, let D*(E) denote the smallest class containing Esuch that(a) X ∈ D*(E) and(b) D*(E) is closed under the formation of proper differences and limits ofmonotone increasing sequences (i.e. E – F ∈ D* = D*(E) if E, F ∈ D* andE ⊃ F, ∪∞1 En ∈ D* if {En} is an increasing sequence of sets in D*). Such aclass is sometimes called a “λ-system” and is a variant of our “D-class”.Show that if E is closed under intersections, then so isD*(E) and hence thatD*(E) = σ(E).

1.19 Let I denote the class of all bounded open intervals (a, b) (–∞ < a < b < ∞)on the real line R. Show that I generates the Borel sets, i.e. S(I) = B. (Hint:Express (a, b] as ∩∞n=1

(a, b + 1

n

)to show P ⊂ S(I).)


1.20 Let I (J) be the class of bounded open (closed) intervals, I1 the class of allsemi-infinite intervals of the form (–∞, a), J1 the class of all semi-infiniteintervals of the form (–∞, a]. Show that S(J) = S(I1) = S(J1) = B. Thatis, all the classes I,J ,I1,J1 generate B.

1.21 Let I2 denote the class of all intervals of the form (–∞, r) where r is rational,andJ2 the class of intervals of the form (–∞, r]. Show thatS(I2) = S(J2) =B.

1.22 If E is any class of subsets of X and A a fixed subset of X write E ∩ A forthe class {E ∩ A : E ∈ E}. Show that S(E ∩ A) = S(E) ∩ A. (Hint: It iseasy to show that S(E ∩ A) ⊂ S(E) ∩ A. To prove the reverse inequality letG = {F : F ∩ A ∈ S(E ∩ A)} and show G ⊃ S(E).)

1.23 Let E, F be two subsets of X and E = {E, F}. Write downD(E) and show thatD(E) = S(E) if and only if either(i) E ∩ F = ∅ or(ii) E ⊃ F or(iii) F ⊃ E.(Sufficiency may be shown even more quickly than by enumeration, by not-ing that D(E) = D(E, F, ∅) and considering when (E, F, ∅) is closed underintersections.)

2

Measures: general properties and extension

2.1 Set functions, measure

A set function is a function defined on a class of sets; that is, for every set ina given class, a (finite or infinite) function value is defined. The set functionis finite, real-valued if it takes real values, i.e. values in R = (–∞,∞). Thesets of the class are mapped into R by the function.

For example, the class might consist of all bounded intervals and the setfunction might be their lengths.

It will be desirable to consider possibly infinite-valued set functions also(for example, lengths of intervals such as (0,∞)). To that end, it is con-venient to adjoin two points ∞, –∞ to the real numbers and make the fol-lowing algebraic conventions concerning these points.

For any real a, –∞ < a < ∞, –∞± a = –∞, ∞± a = ∞.For 0 < a ≤ ∞, a(∞) = ∞, a(–∞) = –∞.For –∞ ≤ a < 0, a(∞) = –∞, a(–∞) = ∞.

21

22 Measures: general properties and extension

∞ +∞ = ∞, –∞ –∞ = –∞.∞(0) = (–∞)(0) = 0.We do not allow the operations∞ –∞, ∞ + (–∞).It should be noted that there is nothing mysterious or improper in this

procedure. This is emphasized since one is taught “not to regard the sym-bol ∞ as a number” in the theory of limits. Here we are simply concernedwith adding the two points +∞, –∞ (which “compactify”, “complete” or“extend” the real line), preserving as many of the usual algebraic opera-tions between them and the real numbers as possible. Note that all the con-ventions given are natural with the exception of the requirement ∞(0) = 0,which, however, will be very useful in allowing more generality in somestatements and proofs. For example, the integral of a function with infinitevalues over a set of zero “Lebesgue measure” (e.g. a countable set) is zero,as will be seen.

The symbol R* = [–∞,∞] will denote the real line (–∞,∞) togetherwith the adjoined points +∞, –∞. A set function will be assumed to takevalues in R* (i.e. real or ±∞) (unless otherwise stated).

A set function μ defined on a class E of sets is called additive if μ(E∪F)= μ(E) + μ(F) whenever E ∈ E, F ∈ E, E ∪ F ∈ E, E ∩ F = ∅.μ defined on E is called finitely additive (countably additive) if μ(∪n

1Ei) =∑n1 μ(Ei) (μ(∪∞1 Ei) =

∑∞1 μ(Ei)) whenever Ei are disjoint sets of E for i =

1, 2, . . . , n (i = 1, 2, . . .), whose union ∪n1Ei (∪∞1 Ei) is also in E.

μ is called a finite set function on E if |μ(E)| < ∞ for each E ∈ E. μ iscalled σ-finite on E if, for each E ∈ E there is a sequence {En} of sets ofE with E ⊂ ∪∞n=1En and |μ(En)| < ∞; that is, if E can be “covered” by asequence of sets En ∈ E with |μ(En)| < ∞.

It will also be useful to talk about extensions and restrictions of a setfunction μ on a class E since one often needs either to “extend” the def-inition of μ to a class larger than E, or restrict attention to some subclassof E. Specifically, let μ, ν be two set functions defined on classes E,F re-spectively. Then if E ⊂ F and ν(E) = μ(E) for all E ∈ E, ν is said to be anextension of μ to F , or equivalently μ is the restriction of ν to E.

Measure. A measure on a class of sets E (which contains the empty set ∅)is a nonnegative, countably additive set function μ defined on E, such thatμ(∅) = 0.

Note that the assumption μ(∅) = 0 follows from countable additivityexcept in the trivial case where μ(E) = ∞ for all E ∈ E. For if μ(E) < ∞ forsome E ∈ E, E = E ∪ ∅ ∪ ∅ ∪ . . . so that μ(E) = μ(E) + μ(∅) + μ(∅) + . . .and subtracting (the finite) μ(E) shows that μ(∅) = 0.

2.2 Properties of measures 23

If E1, E2, . . . , En are disjoint sets of E whose union ∪n1Ei ∈ E, since

∪n1Ei = E1 ∪ E2 ∪ . . . ∪ En ∪ ∅ ∪ ∅ ∪ . . . , we have μ(∪n

1Ei) =∑n

1 μ(Ei).Thus a measure is finitely additive also.

If a measure μ, as a set function on E, is finite (or σ-finite), μ is referredto as a finite (or σ-finite) measure.

As will be seen the most interesting cases will be when the class of setson which μ is defined is at least a semiring, ring, σ-ring or most commonlya σ-field. However, for development of the theory it is convenient to definethe concept for general classes of sets.

2.2 Properties of measures

This section concerns some general properties of measures. Most are statedfor rings (though they typically have natural semiring or more generalversions) where greater generality is not needed later. First, two defini-tions are needed. A set function μ defined on a class E is monotone ifμ(E) ≤ μ(F) whenever E ∈ E, F ∈ E and E ⊂ F. μ is called subtractiveif whenever E ∈ E, F ∈ E, E ⊂ F, F – E ∈ E and |μ(E)| < ∞ we haveμ(F – E) = μ(F) – μ(E).

Theorem 2.2.1 A nonnegative and finitely additive set function μ on asemiring P is monotone and subtractive. In particular this holds if μ is ameasure on P.

Proof If E ∈ P, F ∈ P and E ⊂ F, then F – E = ∪n1Ei for disjoint sets

Ei ∈ P. Hence F = E ∪ (∪n1Ei) and since E, Ei are all (disjoint) sets of P,

with union F ∈ P,

μ(F) = μ(E) +∑n

1 μ(Ei) ≥ μ(E) (2.1)

since μ is nonnegative. Hence μ is monotone.If also F–E ∈ P and μ(E) is finite, then F = E∪(F–E) and μ(F) = μ(E)+

μ(F – E) so that μ(F) – μ(E) = μ(F – E), showing that μ is subtractive. �

Theorem 2.2.2 If μ is a measure on a ring R, if E ∈ R, and {Ei} is anysequence of sets of R such that E ⊂ ∪∞1 Ei, then μ(E) ≤ ∑∞

1 μ(Ei). (Note thatit is not assumed that ∪∞1 Ei ∈ R.)

Proof Write

E = ∪∞i=1E ∩ Ei = ∪∞1 Gi,

where Gi are disjoint sets of R such that Gi ⊂ E ∩ Ei for each i (Lemma1.6.3). Thus

μ(E) =∑∞

1 μ(Gi) ≤∑∞

1 μ(Ei)

since μ is monotone and Gi ⊂ E ∩ Ei ⊂ Ei. �

The next result establishes a reverse inequality for disjoint sequences.

Theorem 2.2.3 If μ is a measure on a ring R, if E ∈ R, and if {Ei} is adisjoint sequence of sets in R such that ∪∞1 Ei ⊂ E, then

∑∞1 μ(Ei) ≤ μ(E).

Proof ∪n1Ei ∈ R for any n since R is a ring, and ∪n

1Ei ⊂ E. Hence∑n1 μ(Ei) = μ(∪n

1Ei) ≤ μ(E) by finite additivity and monotonicity of μ. Thisholds for all n, so that

∑∞1 μ(Ei) ≤ μ(E), as required. �

The next two important theorems concern the measure of limits of mono-tone sequences.

Theorem 2.2.4 If μ is a measure on a ring R, and {En} is a monotoneincreasing sequence of sets in R such that lim En ∈ R, then

μ(lim En) = limn→∞

μ(En).

Proof Write E0 = ∅. Then

μ(lim En) = μ(∪∞1 Ei)

= μ{∪∞1 (Ei – Ei–1)}=

∑∞1 μ(Ei – Ei–1) (the sets (Ei – Ei–1) being disjoint and in R)

= limn→∞

∑n1μ(Ei – Ei–1)

= limn→∞

μ{∪n1(Ei – Ei–1)}

= limn→∞

μ(En),

as required. �

Theorem 2.2.5 If μ is a measure on a ring R, and {En} is a monotonedecreasing sequence of sets in R, of which at least one has finite measure,and if lim En ∈ R, then

μ(lim En) = limn→∞

μ(En).

Proof If μ(Em) < ∞ then μ(En) < ∞ for n ≥ m and μ(lim En) < ∞ sincelim En ⊂ Em. Now (Em – En) is monotone increasing in n, and

limn→∞

(Em – En) = ∪n(Em – En) = Em – ∩nEn = Em – limn→∞

En ∈ R.

2.2 Properties of measures 25

Thus, by Theorem 2.2.4,

μ(Em) – μ(lim En) = μ{limn

(Em – En)} = limn→∞

μ(Em – En)

= limn→∞{μ(Em) – μ(En)} (μ(En) < ∞, En ⊂ Em for n ≥ m)

= μ(Em) – limn→∞

μ(En).

Since μ(Em) is finite, subtracting it from each side yields the desired result.�

The two preceding theorems may be expressed in terms of notions of setfunction continuity. Specifically, a set function μ defined on a class E is saidto be continuous from below at a set E ∈ E if for every increasing sequenceof sets En ∈ E such that lim En = E, we have limn→∞ μ(En) = μ(E).

Similarly μ is continuous from above at E ∈ E if for every decreasingsequence {En} of sets in E for which lim En = E and such that |μ(Em)| < ∞for some integer m, we have limn→∞ μ(En) = μ(E).

Hence by the previous theorems, a measure on a ring is continuous fromabove and below at every set of the ring. The following converse result issometimes useful in showing that certain set functions known to be finitelyadditive, are in fact measures.

Theorem 2.2.6 Let μ be a finite, nonnegative, additive set function on aring R. If

(i) μ is continuous from below at every E ∈ R or(ii) μ is continuous from above at ∅,

then μ is a measure on R. (Note μ(∅) = 0 by additivity.)

Proof μ is finitely additive (by induction) since it is additive and R is aring. Let {En} be a disjoint sequence of sets in R whose union E = ∪∞1 En isalso in R. Write

Fn = ∪n1Ei, Gn = E – Fn.

If (i) holds, since {Fn} is increasing and lim Fn = E,

μ(E) = lim μ(Fn) = lim∑n

1μ(Ei) =∑∞

1 μ(Ei) (2.2)

as required. On the other hand, if (ii) holds, since {Gn} is decreasing andlim Gn = ∅, and since μ is finite,

limn→∞

(μ(E) – μ(Fn)) = limn→∞

μ(Gn) = μ(∅) = 0

so that

μ(E) = limn→∞

μ(Fn)

from which the desired result follows as in (2.2). �

As noted, more general versions of some of these results may be obtainedsimilarly. Also the statements of some of the above theorems simplify alittle in more special cases – e.g. if stated for σ-rings rather than rings. Foran assumption such as that ∪∞1 Ei belongs to a σ-ring (when each Ei does),can be omitted.

Finally, we obtain a result of general use, which will be applied first inthe coming sections, giving conditions on which a measure on a generatedσ-ring S(E) is determined by its values on the generating class E.

Theorem 2.2.7 Let E be a class (containing ∅) which is closed underintersections, and write S = S(E). Let μ be a measure on S which is σ-finite on E. Then μ is σ-finite on S. If μ1 is another measure on S withμ1(E)=μ(E) for all E ∈ E, then μ1(E)=μ(E) for all E ∈ S.

Proof Let A be any fixed set in E such that μ(A) < ∞. Write

D = {E ∈ S : μ1(A ∩ E) = μ(A ∩ E)}.

If E, F ∈ D and E ⊃ F then

μ1{(E – F) ∩ A} = μ1(E ∩ A) – μ1(F ∩ A) (μ1(F ∩ A) ≤ μ1(A) < ∞)

= μ(E ∩ A) – μ(F ∩ A)

= μ{(E – F) ∩ A}

so that E – F ∈ D, i.e. D is closed under formation of proper differences.Similarly D is closed under the formation of countable disjoint unions,so that D is a D-class. Since clearly D ⊃ E (closed under intersections)Theorem 1.8.5 (Corollary) shows that D ⊃ S(E) = S. Hence μ1(E ∩ A) =μ(E ∩ A) if E ∈ S, A ∈ E, μ(A) < ∞.

Now any set in S(E) may be covered by some countable union of sets offinite μ-measure in E. That is, if E ∈ S(E) there are sets En ∈ E such thatμ(En) < ∞ and E ⊂ ∪∞1 En. (For the class of sets which may be so coveredis a σ-ring which contains E, since μ is σ-finite on E.) Hence μ is σ-finiteon S, i.e. the first conclusion holds. Further, since E = ∪∞1 E ∩ En it followsfrom Lemma 1.6.3 (ii) that E = ∪∞1 Gn where Gn are disjoint sets in S withGn ⊂ E ∩ En and hence Gn = En ∩ (E ∩ Gn). Thus (with En for A above)

μ1(Gn) = μ1(En ∩ (E ∩ Gn)) = μ(En ∩ (E ∩ Gn)) = μ(Gn)

2.3 Extension of measures, stage 1: from semiring to ring 27

so that

μ1(E) =∑∞

1 μ1(Gn) =∑∞

1 μ(Gn) = μ(E),

as required. �

2.3 Extension of measures, stage 1: from semiring to ring

It is often convenient to define a measure on a small class of sets, and extendit to obtain one on a much larger class (ring or σ-ring). As an example onemay (as in Section 2.7) start with μ defined for each bounded interval of thereal line as its length, and extend this to obtain what is called “Lebesguemeasure” on the σ-field B of Borel sets (and even on a slightly larger σ-field – the “Lebesgue measurable” sets).

It is natural to begin with a measure μ on a semiring P, and showthat it can be extended to a measure μ defined on the σ-ring S(P). Thiswill be done in two stages, first extending μ to R(P) – in this section –and then from R(P) to S(P) in subsequent sections (using the fact thatS(R(P)) = S(P)). It is possible to omit the first extension to R(P) at theexpense of requiring the somewhat more complicated semiring versionsof the preceding results but it is simpler (and natural) to include it. Thefollowing theorem contains the extension to R(P).

Theorem 2.3.1 Let μ be a nonnegative, finitely additive set function on asemiring P, such that μ(∅) = 0. Then (i) there is a unique finitely additive(nonnegative) extension ν of μ to R = R(P). (ii) If μ is countably additive(and thus a measure) on P, ν is a measure on R (and hence is the uniquemeasure extending μ to R). (iii) Finally if μ is finite (or σ-finite) on P, thenν is finite (or σ-finite) on R.

Proof (i) Suppose that μ is finitely additive on P, and let E ∈ R. Thenby Theorem 1.7.3, E = ∪n

1Ej where the Ej are disjoint sets of P. Defineν(E) =

∑n1 μ(Ej).

We must check that ν is well defined. That is, if E can also be writtenas ∪m

1 Fk, for disjoint sets Fk ∈ P, it must be verified that∑m

k=1 μ(Fk) =∑nj=1 μ(Ej). To see this, write Hjk = Ej ∩ Fk. The Hjk are all disjoint sets ofP and

∪mk=1Hjk = ∪m

k=1(Ej ∩ Fk) = Ej ∩ E = Ej,

whereas similarly ∪nj=1Hjk = Fk. Thus, since μ is finitely additive on P,∑

j μ(Ej) =∑

j∑

k μ(Hjk) =∑

k∑

j μ(Hjk) =∑

k μ(Fk),

as required. In particular ν(E) = μ(E) when E ∈ P, so that ν extends μ.


To see that ν is finitely additive, let E, F be disjoint sets of R, E =∪n

1Ej, F = ∪m1 Fk, Ej being disjoint sets of P, and similarly for Fk. Also

Ej ∩ Fk = ∅ for any j and k since E ∩ F = ∅. Since E ∪ F = (∪Ej) ∪ (∪Fk),the definition of ν gives

ν(E ∪ F) =∑μ(Ej) +

∑μ(Fk) = ν(E) + ν(F).

Thus ν is additive. Since R is a ring, it follows at once by induction thatν is finitely additive. Finally, to show that ν is the unique finitely additiveextension of μ to R, suppose that ν* is another such extension. Then forE ∈ R, E = ∪n

1Ek for disjoint sets Ek ∈ P and since ν* is finitely additive

ν*(E) =∑n

1ν*(Ek) =

∑n1μ(Ek)

since ν* = μ on P. But this sum is just ν(E) so that ν* = ν on R and henceν is unique. From its definition ν(E) is nonnegative for E ∈ R.

(ii) Suppose that μ is countably additive on P. To show that ν is a mea-sure on R its countable additivity must be demonstrated. Let, then, Ek bedisjoint sets of R, and E = ∪∞1 Ek be such that E ∈ R. We must show thatν(E) =

∑∞1 ν(Ek).

Assume first that E ∈ P. Then since Ek ∈ R there are disjoint sets Eki ∈ P(1 ≤ i ≤ nk, say) such that Ek = ∪nk

i=1Eki. Hence

E = ∪∞k=1 ∪nki=1 Eki

expressing E ∈ P as a countable union of disjoint sets Eki ∈ P so that1

ν(E) = μ(E) =∑∞

k=1∑nk

i=1μ(Eki) =∑∞

k=1ν(Ek).

On the other hand if E � P, E = ∪n1Fj for some n, where F1, . . . , Fn are

disjoint sets of P (since E ∈ R). Since Fj = ∪∞k=1Ek ∩ Fj (a union of disjointsets of R), the above result implies

μ(Fj) =∑∞

k=1ν(Fj ∩ Ek).

Hence

ν(E) =∑n

j=1μ(Fj) =∑∞

k=1∑n

j=1ν(Fj ∩ Ek) =∑∞

k=1ν(Ek)

(Ek = ∪nj=1Fj∩Ek and ν is finitely additive on R) so that countable additivity

follows.1 Strictly this step involves writing the double union as a single one, and rearranging the

order of the double series of positive terms, which may always be done, e.g. summing“by diagonals”.

2.4 Measures from outer measures 29

(iii) If μ is finite, ν clearly is also. If μ is σ-finite, and E ∈ R, thenE = ∪n

1Fi for some Fi ∈ P. Each Fi may be covered by a countable sequenceof sets ofP (⊂ R) with finite μ-values. The combined (countable) sequenceof all these n sequences covers E and thus ν is σ-finite. �

2.4 Measures from outer measures

In this section we discuss the notion of an “outer measure” and show howan outer measure may be used to construct a measure. This will lead (inthe next section) to the extension of a measure from a ring to its generatedσ-ring, and thus complete the extension procedure.

By an outer measure we mean a nonnegative, monotone set function μ*,defined for all subsets of X, with μ*(∅) = 0 and such that, if {Ei} is anysequence of sets, then μ*(∪∞1 Ei) ≤

∑∞1 μ

*(Ei). (This last property is calledcountable subadditivity. μ* may, of course, take finite values, or the value+∞.)

The basic idea of this section may be expressed as follows. Given anouter measure μ*, find a (large) σ-ring S* such that (the restriction to S*

of) μ* is actually a measure on S*.To be specific, a set E will be called μ*-measurable if, for every set A,

μ*(A) = μ*(A ∩ E) + μ*(A ∩ Ec).

That is, E is μ*-measurable if it “splits every set additively” as far as μ* isconcerned.S* will denote the class of all μ*-measurable sets. Note that to test

whether a set E is μ*-measurable, it need only be shown that

μ*(A) ≥ μ*(A ∩ E) + μ*(A ∩ Ec)

for each A, since the reverse inequality always holds, by subadditivity ofμ*. The aim of the next two results is to show that S* is a σ-field and thatμ* gives a measure when restricted to S*.

Lemma 2.4.1 For any E, F ∈ S*, A ⊂ X

(i) μ*(A) = μ*(A∩E∩F)+μ*(A∩E∩Fc)+μ*(A∩Ec∩F)+μ*(A∩Ec∩Fc)(ii) μ*[A ∩ (E ∪ F)] = μ*(A ∩ E ∩ F) + μ*(A ∩ Ec ∩ F) + μ*(A ∩ E ∩ Fc)

(iii) If E, F are also disjoint then

μ*[A ∩ (E ∪ F)] = μ*(A ∩ E) + μ*(A ∩ F).


Proof Since E is μ*-measurable,

μ*(A) = μ*(A ∩ E) + μ*(A ∩ Ec). (2.3)

But F is also μ*-measurable and hence (writing A ∩ E, A ∩ Ec in turn inplace of A),

μ*(A ∩ E) = μ(A ∩ E ∩ F) + μ*(A ∩ E ∩ Fc)

μ*(A ∩ Ec) = μ*(A ∩ Ec ∩ F) + μ*(A ∩ Ec ∩ Fc).

Substitution of these two latter equations in (2.3) gives (i).(ii) follows from (i) by writing A ∩ (E ∪ F) in place of A and noting

identities such as A∩ (E∪F)∩E∩F = A∩E∩F, A∩ (E∪F)∩Ec∩Fc = ∅.(iii) follows at once from (ii) when E∩F = ∅ (then F ⊂ Ec, E ⊂ Fc). �

Theorem 2.4.2 If μ* is an outer measure, the classS* of all μ*-measurablesets is a σ-field. If {En} is a disjoint sequence of sets of S*, and E = ∪∞n=1En

then μ*(E) =∑∞

n=1 μ*(En). Thus the restriction of μ* to S*, is a measure on

S*.

Proof We show first that S* is a field. From the definition, it is clear thatEc is μ*-measurable whenever E is, and thus S* is closed under comple-mentation.

If E ∈ S*, F ∈ S*, A ⊂ X, it follows from (i) and (ii) of Lemma 2.4.1 that

μ*(A) = μ*[A∩(E∪F)]+μ*(A∩Ec∩Fc) = μ*[A∩(E∪F)]+μ*[A∩(E∪F)c].

Hence E∪F ∈ S* and thus S* is a field. (S* is nonempty since it obviouslycontains X.)

The proof that S* is a σ-field, is completed by showing that the unionof any countable sequence of sets in S*, is also in S*. But S* is a field(and hence a ring), so that by Lemma 1.6.3, any countable union of setsin S* may be written as a countable union of disjoint sets in S*. Hence toshow that S* is a σ-ring, it need only be shown that if {En} is a sequence ofdisjoint sets of S*, then E = ∪∞1 En ∈ S*. By induction from (iii) of Lemma2.4.1 it follows at once that

μ*(A ∩ ∪n1Ei) =

∑n1 μ

*(A ∩ Ei).

Writing Fn = ∪n1Ei we have Fn ∈ S* (S* is a field), and thus for any A,

μ*(A) = μ*(A ∩ Fn) + μ*(A ∩ Fcn)

=∑n

1μ*(A ∩ Ei) + μ*(A ∩ Fc

n)

≥ ∑n1μ

*(A ∩ Ei) + μ*(A ∩ Ec)

2.5 Extension theorem 31

since Fcn ⊃ Ec and μ* is monotone. This is true for all n, and hence

μ*(A) ≥ ∑∞1 μ

*(A ∩ Ei) + μ*(A ∩ Ec) (2.4)

≥ μ*(A ∩ E) + μ*(A ∩ Ec)

since A∩E = ∪∞1 A∩Ei and μ* is countably subadditive. Thus, by the remarkfollowing the definition of μ*-measurability, it follows that E ∈ S*, as wasto be shown; that is S* is a σ-field.

To see that μ* is a measure note that since μ*(A) = μ*(A∩E)+μ*(A∩Ec),the inequalities in (2.4) are in fact equalities and thus for any disjointsequence {En} of sets in S* with E = ∪∞1 En

μ*(A) =∑∞

1 μ*(A ∩ En) + μ*(A ∩ Ec).

On putting A = E, the last term vanishes so that countable additivity isevident and the final conclusions of the theorem follow. �

2.5 Extension theorem

In this section we first show how a measure on a ring R may be extended toan outer measure μ*, whose restriction to the class S* of μ*-measurable setsis thus a measure on S*. It will then be shown that S(R) ⊂ S* so that thefurther restriction of μ* to S(R) is a measure on S(R), extending μ. Finallythis may be combined with the extension of Section 2.3 from a semiring Pto R(P), to give the complete extension from P to S(R(P)) = S(P).

Suppose then that μ is a measure on a ring R and E ⊂ X. Define

μ*(E) = inf{∑∞n=1μ(En) : ∪∞n=1En ⊃ E, En ∈ R, n = 1, 2, . . .}

when this makes sense; i.e. for any set E which can be covered(E ⊂ ∪∞n=1En

)by at least one countable sequence of sets En ∈ R. If E cannot be coveredby any such sequence, write μ*(E) = +∞.

Theorem 2.5.1 The set function μ*, defined as above, is an outer measure,and extends μ on R (i.e. μ*(E) = μ(E) when E ∈ R).

Proof First, if E ∈ R, since E ⊂ E ∪ ∅ ∪ ∅ ∪ . . ., we have μ*(E) ≤ μ(E) +0 + 0 + . . . = μ(E). On the other hand, if E ∈ R, En ∈ R, E ⊂ ∪∞1 En,then by Theorem 2.2.2, μ(E) ≤ ∑∞

1 μ(En), and hence μ(E) ≤ μ*(E). Thusμ*(E) = μ(E) when E ∈ R (thus μ* extends μ) and, in particular μ*(∅) = 0.

It is immediate that μ* is monotone, since if E ⊂ F are sets, any sequenceof sets in R which cover F also cover E, and hence μ*(E) ≤ μ*(F). Theresult is trivial, of course, if F cannot be covered by any sequence of setsin R (μ*(F) = +∞).

To see that μ* is countably subadditive, consider a sequence {En} of sets,with μ*(En) < ∞ for each n. Then, by definition of μ*, given ε > 0, cor-responding to each n there is a sequence of sets Enm ∈ R, m = 1, 2, . . .such that ∪∞m=1Enm ⊃ En and

∑∞m=1 μ(Enm) ≤ μ*(En) + ε/2n. Now the sets

{Enm : n = 1, 2, . . . , m = 1, 2, . . .} may be written as a sequence coveringE = ∪∞1 En. Hence2

μ*(E) ≤ ∑∞n=1

∑∞m=1μ(Enm) ≤ ∑∞

n=1(μ*(En) + ε/2n)

=∑∞

n=1μ*(En) + ε.

Since ε > 0 is arbitrary, μ*(E) ≤ ∑∞n=1μ

*(En). On the other hand this istrivially true if μ*(En) = ∞ for one or more values of n. Thus μ* is an outermeasure, as required. �

It is seen from Theorem 2.4.2 that the restriction of the above μ* to theclass S* of μ*-measurable sets is a measure on S* (extending μ on R byTheorem 2.5.1). However, we are primarily interested in obtaining a mea-sure on S(R). This may be done by restricting μ* further to S(R) (a sub-class of S* by the next lemma). Then the set function μ on S(R), definedby μ(E) = μ*(E), will be a measure on S(R), again extending μ on R.

Lemma 2.5.2 With the above notation, S(R) ⊂ S*.

Proof SinceS* is aσ-ring, it is sufficient to show thatR ⊂ S*. To see this,let E ∈ R, A ⊂ X. It is sufficient to show that μ*(A) ≥ μ*(A∩E)+μ*(A∩Ec)when μ*(A) < ∞ since this holds trivially when μ*(A) = ∞.

If then, μ*(A) < ∞, and ε > 0 is given, there is a sequence {En} of sets ofR such that ∪∞n=1En ⊃ A and

∑∞1 μ(En) ≤ μ*(A) + ε. Thus

μ*(A) + ε ≥ ∑∞1 μ(En ∩ E) +

∑∞1 μ(En ∩ Ec)

(En ∩ E ∈ R, En ∩ Ec = En – E ∈ R)

≥ μ*(A ∩ E) + μ*(A ∩ Ec)

2 Again see the footnote to Theorem 2.3.1.

2.5 Extension theorem 33

since {En ∩ E}, {En ∩ Ec} are sequences of sets of R whose unions containA ∩ E, A ∩ Ec respectively. But since ε is arbitrary we have μ*(A) ≥ μ*(A ∩E) + μ*(A ∩ Ec), for all A, showing that E ∈ S* as required. �

For E ∈ R, μ*(E) = μ(E) (Theorem 2.5.1), and hence μ(E) = μ*(E) =μ(E). Thus μ is a measure on S(R) extending μ on R. This holds whatevermeasure μ is on R. It is important to know whether such an extension isunique, i.e. whether μ is the only measure on S(R) such that μ(E) = μ(E)when E ∈ R. It follows immediately from Theorem 2.2.7 that this is thecase if μ is σ-finite on R. This is shown, and the results thus far summar-ized, in the following theorem.

Theorem 2.5.3 (Caratheodory Extension Theorem) Let μ be a measureon a ring R. Then there exists a measure μ on S(R) extending μ on R (i.e.μ(E) = μ(E) if E ∈ R). If μ is σ-finite on R, μ is then the unique suchextension of μ to S(R), and is itself σ-finite on S(R).

Proof The existence of μ has just been shown. Suppose now that μ isσ-finite on R, and that μ1 is another measure on S(R), extending μ on R(i.e. μ1(E) = μ(E) = μ(E) for all E ∈ R). Then it follows from Theorem2.2.7, identifying E with R (closed under intersections) that μ1(E) = μ(E)for E ∈ S(R). Thus μ is unique and (as also follows from Theorem 2.2.7)σ-finite on S(R). �

This result can now be combined with Theorem 2.3.1. That is, startingfrom a measure μ on a semiring P, an extension may be obtained to a mea-sure ν on R(P). ν may then be extended to a measure μ on S(R(P)) = S(P)by Theorem 2.5.3. The extension of μ to ν is unique (Theorem 2.3.1). Theextension of ν to μ will be unique, provided ν is σ-finite on R. This will beso (Theorem 2.3.1) if μ is σ-finite on P. This is summarized in the follow-ing theorem.

Theorem 2.5.4 Let μ be a measure on a semiring P. Then there exists ameasure μ on S(P), extending μ on P (μ(E) = μ(E) if E ∈ P). If μ is σ-finite on P, then μ is the unique such extension to S(P) and is itself σ-finiteon S(P).


Class of all subsets of X

The diagram above indicates the relationships between the variousclasses of sets used in the extension procedure. (Each point on the pagerepresents a set.) A measure μ on P is extended to a measure on R(P),thence to an outer measure μ* on all subsets of X. μ* is restricted to a mea-sure on S* and thence to a measure on S(R(P)) = S(P).

Note that some authors do not define μ* for all sets E, but only thosewhich can be covered by countably many sets of R = R(P). This leads toa potentially smaller class S* but, of course, the same extension of μ* toS(R). (See Ex. 2.13.)

In the sequel we shall not usually employ different letters for a set func-tion on one domain, and its extension to another. For example, the symbolμ will be used to refer to a measure on a semiring P, or its extension to oneon R(P) or S(P).

2.6 Completion and approximation

If μ is a measure on a σ-ring S and E ∈ S with μ(E) = 0 then μ(F) = 0for every F ∈ S with F ⊂ E. However, if F ⊂ E but F � S, μ(F) is notdefined. This somewhat inesthetic property can be avoided by assumingthat the measure μ is complete in the sense that if for any set E ∈ S suchthat μ(E) = 0 it is necessarily the case that F ∈ S for every F ⊂ E (andhence μ(F) = 0). It will be shown in this section that a measure on a σ-ringSmay be completed by slightly enlarging the σ-field – “adding” all subsets

2.6 Completion and approximation 35

of zero measure sets and simply extending μ to the enlarged σ-field. Thisis often a convenient process which avoids what J.L. Doob termed “fussydetails” in applications, and is especially relevant to Lebesgue measure,considered in the next section.

Theorem 2.6.1 Let μ be a measure on a σ-ring S. Then the class S of allsets of the form E∪N where E ∈ S and N is a subset of some set A ∈ S suchthat μ(A) = 0, is a σ-ring. A measure μ may be unambiguously defined onS by the equation

μ(E ∪ N) = μ(E), E ∈ S, N ⊂ A ∈ S, μ(A) = 0.

μ is then a complete measure on S, extending μ on S.The σ-ring S is thus “slightly” enlarged by adjoining subsets of zero

measure, to sets of S.

Proof We show first that μ is well defined. That is, if E1 ∪ N1 = E2 ∪ N2,where E1, E2 ∈ S, N1 ⊂ A1 ∈ S, N2 ⊂ A2 ∈ S and μ(A1) = μ(A2) = 0, thenwe must show that μ(E1) = μ(E2). To see this, note that E1 – E2 is clearly asubset of N2, hence of A2, and thus μ(E1 –E2) = 0. Similarly, μ(E2 –E1) = 0.Thus μ(E1) = μ(E1 ∩ E2) = μ(E2), as required.

It is next shown thatS is aσ-ring.S is clearly closed under the formationof countable unions since if Ei ∪ Ni, i = 1, 2, . . . are any members of S(Ei ∈ S, Ni ⊂ Ai ∈ S, μ(Ai) = 0) then ∪∞1 (Ei ∪Ni) = (∪∞1 Ei) ∪ (∪∞1 Ni). But∪∞1 Ei ∈ S and ∪∞1 Ni ⊂ ∪∞1 Ai ∈ S where μ(∪∞1 Ai) ≤

∑∞1 μ(Ai) = 0. Thus

∪∞1 (Ei ∪ Ni) ∈ S.To see that S is a σ-ring, it thus need only be shown that the difference

of two sets in S is in S. Let E1 ∪N1, E2 ∪N2 be members of S, E1, E2 ∈S,N1 ⊂ A1 ∈ S, N2 ⊂ A2 ∈ S, μ(A1) = μ(A2) = 0. Then it may easily bechecked that

(E1 ∪ N1) – (E2 ∪ N2)

= (E1 ∪ N1) ∩ Ec2 ∩ Nc

2 = (E1 ∩ Ec2 ∩ Nc

2) ∪ (N1 ∩ Ec2 ∩ Nc

2)

= (E1 ∩ Ec2 ∩ Ac

2) ∪ (E1 ∩ Ec2 ∩ Nc

2 ∩ A2) ∪ (N1 ∩ Ec2 ∩ Nc

2).

The first of the sets on the right (= (E1 – (E2 ∪ A2))) is a member of S. Theunion of the remaining two is a subset of A1 ∪ A2 which is a member ofS and has measure zero since μ(A1 ∪ A2) ≤ μ(A1) + μ(A2) = 0. Thus thedifference of two sets of S is in S, completing the proof that S is a σ-ring.

To see that μ is a measure on S, let {Ei ∪ Ni} be a sequence of disjointsets in S where as usual Ei ∈ S, Ni ⊂ Ai ∈ S, μ(Ai) = 0. Then

μ{∪∞1 (Ei ∪ Ni)} = μ{(∪∞1 Ei) ∪ (∪∞1 Ni)} = μ(∪∞1 Ei)

since ∪∞1 Ni ⊂ ∪∞1 Ai and μ(∪∞1 Ai) ≤∑∞

1 μ(Ai) = 0. Further, the sets Ei areclearly disjoint sets of S and thus countable additivity of μ follows since

μ{∪∞1 (Ei ∪ Ni)} =∑∞

1 μ(Ei) =∑∞

1 μ(Ei ∪ Ni).

Finally, to see that μ is complete, let F be a subset of a zero measure setin S, E ∪ N say, where E ∈ S, N ⊂ A ∈ S, μ(A) = 0, and μ(E) = 0 sinceμ(E ∪ N) = 0.

Then F = ∅∪F, showing that F ∈ S since ∅ ∈ S and F ⊂ E∪N ⊂ E∪A,E∪A being a zero measure set of S. Thus μ is complete, as is the proof. �

Thus a measure μ on a σ-ring may be extended to the “slightly larger”σ-ring S to give a complete measure, called the completion of μ. It is easilyseen (Ex. 2.14) that this completion is unique on S. A case where comple-tion is often advantageous is that considered in the previous section – whereμ is formed by extension from a semiring or ring. The extended measureon S(P) or S(R) is not usually complete.

The final result of this section shows how in the case of σ-finite measureon a ring R, any set of finite measure in S(R), may be approximated “inmeasure” by a set of R.

Theorem 2.6.2 Let R be a ring and μ a measure on S(R) which is σ-finiteon R. Then for E ∈ S(R) with μ(E) < ∞, and ε > 0, there exists a set F ∈ Rsuch that μ(EΔF) < ε.

That is, E ∈ S(R) (with μ(E)<∞) can be approximated by some F ∈Rarbitrarily closely in this measure-theoretic sense of requiring EΔF to havesmall measure.

Proof By the results of Sections 2.4–2.5, the value of μ(E) is also μ*(E)where μ* is the outer measure extending μ from R. Thus there are setsEn ∈ R, n = 1, 2, . . . such that ∪∞1 Ei ⊃ E and

∑∞1 μ(En) ≤ μ(E) + ε/2.

Now, by Theorem 2.2.4, limn→∞ μ(∪n1Em) = μ(∪∞1 Em) and hence for some

n0, F = ∪n01 Em (∈ R) satisfies μ(F) ≥ μ(∪∞1 Em) – ε/2, so that

μ(E – F) ≤ μ(∪∞1 En) – μ(F) ≤ ε/2

(μ(F) ≤ ∑n01 μ(En) < ∞). Also F – E ⊂ ∪∞1 En – E and hence

μ(F – E) ≤ μ(∪∞1 En) – μ(E) ≤ ∑∞1 μ(En) – μ(E) ≤ ε/2.

The desired result follows, since μ(EΔF) = μ(E – F) + μ(F – E). �

2.7 Lebesgue measure 37

2.7 Lebesgue measure

Consider again the real line (with the notation of Section 1.9). Define aset function μ on the semiring P of bounded semiclosed intervals (a, b] byμ{(a, b]} = b – a. μ is finite on P, and we shall show that μ is also countablyadditive, and hence is a measure onP. It will then follow that μ has a uniqueextension to a measure on the class B = S(P), of Borel sets. This measurewill be called Lebesgue measure on the Borel sets. Three simple lemmasare required:

Lemma 2.7.1 Let E0 ∈ P, and let {Ei} be a sequence of disjoint intervalsin P such that Ei ⊂ E0 for i = 1, 2, . . .. Then

∑∞1 μ(Ei) ≤ μ(E0).

Proof For fixed n, trivial algebra shows that∑n

1 μ(Ei) ≤ μ(E0). The resultthen follows by letting n→ ∞. �

Lemma 2.7.2 If a bounded closed interval F0 = [a0, b0] is contained inthe union of a finite number of open intervals U1, U2, . . . , Un, Ui = (ai, bi),then b0 – a0 ≤

∑ni=1(bi – ai).

The proof of this is clear from simple algebra.

Lemma 2.7.3 If E0, E1, E2, . . . are sets in P such that E0 ⊂ ∪∞1 Ei, thenμ(E0) ≤ ∑∞

i=1 μ(Ei).

Proof Let Ei = (ai, bi], i = 0, 1, 2, . . .. Choose 0 < ε < b0 – a0 (assumingb0 > a0). Then (a0, b0] ⊂ ∪∞i=1(ai, bi] so that clearly

[a0 + ε, b0] ⊂ ∪∞i=1(ai, bi + ε/2i).

By the Heine–Borel Theorem (i.e. compactness), the bounded closed inter-val on the left is contained in a finite number of the open intervals on theright, and hence for some n, [a0 + ε, b0] ⊂ ∪n

i=1(ai, bi + ε/2i). By Lemma2.7.2,

b0 – a0 – ε ≤ ∑ni=1(bi – ai +

ε

2i ) ≤∑∞

i=1(bi – ai) + ε.

Since ε is arbitrary, b0 – a0 ≤∑∞

i=1(bi – ai), as required. �

The main result is now simply obtained:

Theorem 2.7.4 There is a unique measure μ on the σ-field B of Borelsets, such that μ{(a, b]} = b – a for all real a < b. μ is σ-finite and is calledLebesgue measure on B.

Proof Define μ on P by μ{(a, b]} = b – a. If Ei are disjoint members of Pand if ∪∞1 Ei = E0 ∈ P it follows from Lemmas 2.7.1 and 2.7.3 that μ(E0) =∑∞

1 μ(Ei) and hence that μ is a measure onP. Thus μ has a unique (σ-finite)extension to a measure on S(P) by Theorem 2.5.4, as asserted. �

If {a} is a one-point set, Theorem 2.2.5 shows that μ{a} = limn→∞ μ{(a –1n , a]} = 0. Consequently any countable set has Lebesgue measure zero.Also, the Lebesgue measure of any closed, or open interval is its length(e.g. μ{[a, b]} = μ{(a, b]}+μ({a}) = b – a). Lebesgue measure on B providesa generalized notion of “length”, for sets of B which need not be intervals.

The measure μ is not, in fact, complete on B, but may be completed asin Theorem 2.6.1 to obtain μ on a σ-field B ⊃ B. B consists of sets of theform B ∪ N where B ∈ B and N ⊂ A for some A ∈ B, μ(A) = 0. B iscalled the σ-field of Lebesgue measurable sets, and the completion μ on Bis called Lebesgue measure on the class (B) of Lebesgue measurable sets.The symbolLwill be used (instead ofB) for the Lebesgue measurable sets.Further m will be used from here on instead of μ for Lebesgue measure onthe Borel sets B, and the completed measure on L. No confusion shouldarise from the dual use.

Thus “Lebesgue measure” refers to either the uncompleted measure onthe Borel sets B, or the completed measure on the Lebesgue measurablesets L. One may ask whether there are in fact (a) any Lebesgue measurablesets which are not Borel sets, and (b) any sets at all which are not Lebesguemeasurable. The answer is, in fact affirmative in both cases (the formermay be proved by a cardinality argument and the latter by using the “axiomof choice”), but we shall not pursue the matter here. See also Section 1.9.

It is worth noting that both Borel and Lebesgue measurable sets of finitemeasure, may be approximated by finite unions of intervals. That is if E ∈Bor E ∈ L and m(E) < ∞ there are, given ε > 0, intervals I1, I2, . . . , In suchthat m(EΔ ∪n

1 Ij) < ε. This follows at once from Theorem 2.6.2 if E ∈ Band from the definition E = F ∪ N if F ∈ L (where F ∈ B and N ⊂ A ∈ B,m(A) = 0). The details of this should be checked.

In Section 1.9 we considered the linear mapping Tx = αx +β and showedthat the set TE of images of E is a Borel set if E is. This can also be shownfor Lebesgue measurable sets (and also if E ∈ L then m(TE) = |α|m(E), asexpected).

Theorem 2.7.5 Let T be the transformation Tx = αx + β (α � 0). ThenTE is Lebesgue measurable if and only if E is. Also m(TE) = |α|m(E).

2.8 Lebesgue–Stieltjes measures 39

Proof Note first that m(TE) = |α|m(E) for all E ∈ B. For ν1(E) = m(TE)and ν2(E) = |α|m(E) are clearly both measures on B (check!) and equal(and finite-valued) on the semiring P, so that by Theorem 2.5.4 they areequal on B.

If E ∈ L then E = F ∪ N where F ∈ B and N ⊂ A ∈ B, m(A) = 0.Thus TE = TF ∪ TN with TN ⊂ TA ∈ B (Theorem 1.9.2) and by the abovem(TA) = |α|m(A) = 0. Since TF ∈ B it follows that TE ∈ L. The conversefollows by considering T–1.

Finally, if E ∈ L, E = F∪N as above and, by definition of the completionm(E) = m(F), m(TE) = m(TF) (since E = F ∪ N, TE = TF ∪ TN). Butas shown above m(TF) = |α|m(F) since F ∈ B so that m(TE) = m(TF) =|α|m(F) = |α|m(E) as required. �

2.8 Lebesgue–Stieltjes measures

We use the notation of the two previous sections. Lebesgue measure mwas defined on the Borel sets B by the requirement m{(a, b]} = b – a forall (a, b] ∈ P. That is, m{(a, b]} = F(b) – F(a) where F(x) = x. Moregenerally, now, let F be any finite-valued, nondecreasing function on R,such that F is right-continuous at all points (i.e. F(x + 0) = F(x), whereF(x+0) = limh↓0 F(x+h) – a limit which exists by monotonicity of F). Definea set function μF on P by μF{(a, b]} = F(b) – F(a). We shall show – by thesame pattern of proof as for Lebesgue measure – that μF may be extendedto a measure onB. Moreover, it will be seen that every measure onBwhichis finite on P can be written as μF for some such nondecreasing F. Such ameasure μF is called the Lebesgue–Stieltjes measure on B correspondingto the function F.

Theorem 2.8.1 Let F(x) be a nondecreasing real-valued function whichis right-continuous for all x. Then there is a unique (σ-finite) measure μF

on the class B of Borel sets such that μF{(a, b]} = F(b) – F(a) whenever–∞ < a < b < ∞. Conversely, if ν is a measure onB such that ν{(a, b]} < ∞whenever –∞ < a < b < ∞, then there exists a nondecreasing, right-continuous F, such that ν = μF. F is unique up to an additive constant.

Proof (i) Suppose that F is a nondecreasing, right-continuous functionand define μF on P as above by μF{(a, b]} = F(b) – F(a). It is easy to showthat μF is countably additive on P, by the same arguments as in Section2.7. In fact, Lemmas 2.7.1 and 2.7.2 hold for μF, if (bi – ai) is replaced byF(bi) – F(ai). A small modification is needed to the proof of Lemma 2.7.3.Specifically, assume (a0, b0] ⊂ ∪∞1 (ai, bi] and choose 0 < ε < b0 – a0 and

(by right-continuity) δi > 0, such that F(bi + δi) < F(bi) + ε/2i, i = 1, 2, . . ..Then since [a0 + ε, b0] ⊂ ∪∞i=1(ai, bi + δi),

F(b0) – F(a0 + ε) ≤ ∑∞i=1 [F(bi + δi) – F(ai)]

=∑∞

i=1 μF{(ai, bi]} +∑∞

i=1 [F(bi + δi) – F(bi)]

≤ ∑∞i=1 μF(Ei) + ε,

where Ei = (ai, bi]. The desired conclusion, μF(E0) = F(b0) – F(a0) ≤∑∞i=1 μF(Ei) now follows by letting ε → 0, and using the right-continuity of

F again.Countable additivity of μF on P now follows at once by combining these

lemmas in exactly the same way as in Theorem 2.7.4 for Lebesgue measure.It again also follows from Theorem 2.5.4 that μF has a unique (σ-finite)extension to B = S(P).

(ii) Conversely let ν be a measure on B such that ν(E) < ∞ for all E ∈ P.Define F(x) = ν{(0, x]} or –ν{(x, 0]} according as x ≥ 0 or x < 0. It isobvious that F is nondecreasing and easily checked that it is continuousto the right (e.g. if x ≥ 0 and {hn} is any sequence which decreases tozero, {(0, x + hn]} is a decreasing sequence of sets with limit (0, x] so thatν(0, x] = lim ν(0, x + hn]. Thus F(x + h) → F(x) as h ↓ 0 through anysequence and hence as h ↓ 0 generally).

The measure μF corresponding to F clearly equals ν for sets (a, b] of P(μF{(a, b]} = F(b) – F(a) = ν{(a, b]}) and hence ν = μF on B. Finally if G isanother such function with μG = ν we have G(x)–G(0) = F(x)–F(0) (beingν(0, x] or –ν(x, 0] according as x > 0 or x < 0). Hence G differs from F byan additive constant, so that F is unique up to an additive constant. �

Note that in defining μF, the assumption of right-continuity of F is madefor convenience only. (If F were not right-continuous, μF could be definedby μF{(a, b]} = F(b + 0) – F(a + 0).)

In contrast to Lebesgue measure, it is not necessarily the case thatμF{a}= 0 for a single-point set {a}, i.e. μF may have an atom at a. In fact,μF{a} = limn→∞{F(a) – F(a – 1

n )} = F(a) – F(a – 0). Thus μF({a}) is zero ifF is continuous at a, and otherwise its value is the magnitude of the jumpof F at a. We see also that for open and closed intervals,

μF{(a, b)} = F(b – 0) – F(a), μF{[a, b]} = F(b) – F(a – 0)

(writing (a, b) = (a, b] – {b}, [a, b] = (a, b] ∪ {a}).

As noted in Theorem 2.8.1 if F, G are two nondecreasing (right-continuous) functions defining the same measure, i.e. μF = μG, then F – G

Exercises 41

is constant. The converse of this is clear – if F and G differ by a constantthen certainly μF = μG on P and hence on B. This means that any fixedconstant can be added to or subtracted from F to give the same measureμF. In particular if F is a bounded function (i.e. μF is a finite measure),then F(–∞) = limx→–∞ F(x) is finite and F(x) – F(–∞) may be used insteadof F itself. That is we may take F(–∞) = 0. In this case, F(∞) is also finiteand equal to μF(R).

Finally the following result shows that μF has at most countably manyatoms.

Lemma 2.8.2 A nondecreasing (right-continuous) function F has at mostcountably many discontinuities. Equivalently the corresponding Lebesgue–Stieltjes measure μF has at most countably many atoms.

Proof Since an atom of μF is a discontinuity point of F and conversely,the equivalence of the two statements is clear. If for fixed a, b, Dm denotesthe set of atoms of size at least 1/m in (a, b], then since∞ > F(b) – F(a) ≥μF(Dm) ≥ #(Dm)/m, the number of points in Dm, #Dm, is finite. But the setof all atoms of μF in (a, b] is ∪∞m=1Dm and is therefore countable. Finally,the set of all atoms of μF in R is the union of those in the sets (n, n+1] (n =0,±1, . . .) and is also countable. �

Finally note that μF may be completed in the usual way. However, theσ-field on which the completion of μF is defined will depend on F, andwill not in general coincide with the Lebesgue measurable sets.

Exercises2.1 Let μ be a measure defined on a ring R. Show that the class of sets E ∈ R

with μ(E) finite, forms a ring.2.2 Let R consist of all finite subsets of X. For a given nonnegative function f on

X define μ on R by

μ({x1, . . . , xn}) =∑n

i=1f (xi), μ(∅) = 0.

Show that μ is a measure on the ring R. (If f (x) ≡ 1, μ is called countingmeasure on R. Why?)

2.3 Let E be a class of sets and μ be a measure on R(E) such that μ(E) < ∞ forall E ∈ E. Show that μ is a finite measure on R(E).

2.4 Let X be the set {1, 2, 3, 4, 5} and let P be the class of sets ∅, X, {1}, {2, 3},{1, 2, 3}, {4, 5}. Show that P is a semiring. Define μ on P by the values (inthe order of the sets given) 0, 3, 1, 1, 2, 1. Show that μ is finitely additive onP. What is R(P)? Find the finitely additive extension of μ to R(P). Is it ameasure on R(P)?

2.5 Is the class of rectangles in the plane of the form {(x, y) : a < x ≤ b,c < y ≤ d} a semiring? Suggest how Borel sets, Lebesgue measurable sets,and Lebesgue measure, might be defined in the plane, and in n-dimensionalEuclidean space Rn.

2.6 If μ is a measure on a ring R and E, F ∈ R, show that μ(E ∪ F) + μ(E ∩ F) =μ(E) + μ(F). (Remember μ can take the value +∞.)If E, F, G are sets in R, show that μ(E) + μ(F) + μ(G) + μ(E ∩ F ∩ G) =μ(E ∪ F ∪ G) + μ(E ∩ F) + μ(F ∩ G) + μ(G ∩ E). Generalize to an arbitraryfinite union. In the case where μ is a finite measure, show that

μ(∪n1Ei) =

∑n1μ(Ei) –

∑i<jμ(Ei ∩ Ej) + · · · + (–)n–1μ(E1 ∩ E2 ∩ . . . ∩ En).

2.7 Let μ be a measure on a σ-ring S, and Ei ∈ S for i = 1, 2, . . . . Show that

μ(limEn) ≤ lim infn→∞

μ(En).

If μ(∪∞1 En) < ∞ show that

μ(limEn) ≥ lim supn→∞

μ(En).

(Note: limEn = limn(∩∞m=nEm), etc.)2.8 Let X be the set of all positive integers and R the class of all finite subsets of

X and their complements. For E ∈ R, let μ(E) = 0 or ∞ according as E is afinite or infinite set. Show that μ is continuous from above at ∅ but is not ameasure. What does this say about Theorem 2.2.6?

2.9 Let X be any space with two or more points. Write μ(∅) = 0 and μ(E) = 1 forE � ∅. Is μ an outer measure, a measure?

2.10 If μ* is an outer measure and E, F are two sets, E being μ*-measurable, showthat

μ*(E) + μ*(F) = μ*(E ∪ F) + μ*(E ∩ F).

2.11 Let x0 be a fixed point of space X. Is μ*(E) = χE(x0) an outer measure?2.12 Let R be a ring of subsets of a countable set X with the property that every

nonempty set in R is infinite and such that S(R) is the class of all subsetsof X (give an example of such on (X,R)). Then if for every E ⊂ X, μ1(E) isthe number of points in E, and μ2(E) = 2μ1(E), show that μ1 = μ2 on R butnot on S(R). What does this show about the extension of measures from Rto S(R)?

2.13 In some treatments of the extension procedure (starting with a measure μ ona ring R), μ* is defined not for every subset of the space X, but just for theclassH of subsets E which can be covered by some countable union of setsof R. E is then called μ*-measurable if E ∈ H and μ*(A) = μ*(A ∩ E) +μ*(A ∩ Ec) for all A ∈ H . Check that the class (S*

H , say) of sets which areμ*-measurable in this sense is a σ-ring and, in fact S*

H = S* ∩H where S*

is defined as in Section 2.4. If μ is σ-finite on R the sets of S*H are precisely

Exercises 43

those of S* which have σ-finite measure (i.e. the sets E ∈ S* such thatE = ∪∞1 Ei for some Ei ∈ S* and μ*(Ei) < ∞). It may also be shown that μ*,as a measure on S*

H , is precisely the completion (of the extension) of μ onS(R).

2.14 Let μ be a measure on a σ-ring S, and μ the completion of μ on the σ-ringS. Show that if ν is an extension of μ to a complete measure on a σ-ringT (⊃ S), then T ⊃ S and ν extends μ (i.e. ν(E) = μ(E) when E ∈ S).

2.15 Let F(x) = 0 for x < 0 and for x ≥ 0, F(x) = r if r ≤ x < r + 1,r = 0, 1, 2, . . . . Consider the corresponding Lebesgue–Stieltjes measure μF .What is μF(a, b] (if 0 < a < b)? Describe μF(E) for any Borel set E in simpleterms.

2.16 Let X be the positive integers {1, 2, 3, . . .}, and let S be all subsets of X. Letμ(E) be the number of points in E (for any E ∈ S). Show that μ is a measure(μ is a “counting measure” again). How is μ related to μF of Ex. 2.15?

2.17 Prove the analog of Theorem 2.2.2 which applies if μ is just finitely additive,rather than countably additive on R (replacing the infinite union and sumsby finite ones). Note that Theorem 2.2.1 holds for such a μ.

2.18 Let μ be a nonnegative, finitely additive set function on a semiring P suchthat μ(∅) = 0. As shown in Theorem 2.2.1, μ is monotone on P. Let E0,E1, . . . , En ∈ P. Show that

(i) If E0 ⊂ ∪n1Ei then μ(E0) ≤ ∑n

1 μ(E).(ii) If E1, . . . , En are disjoint and ∪n

1Ei ⊂ E0 then∑n

1 μ(Ei) ≤ μ(E0). (Hint:Use Theorem 2.3.1 and Ex. 2.17.)

2.19 Let F be a nondecreasing, continuous function on the real line such that

limx→∞

F(x) = ∞ and limx→–∞

F(x) = –∞.

Prove that for every Borel set E

μF{F–1(E)} = μ(E)

where μ is Lebesgue measure, μF is the Lebesgue–Stieltjes measure inducedby F, and F–1(E) = {x ∈ R : F(x) ∈ E}. (Hint: First show that if E is a Borelset, then so is F–1(E) (in fact the converse is also true). Then show that

ν(E) = μF{F–1(E)}

defines a measure on the Borel sets, and that ν and μ are equal on intervals.)

3

Measurable functions and transformations

3.1 Measurable and measure spaces,extended Borel sets

The discussion up to now has been primarily concerned with the construc-tion and properties of measures on σ-rings. There was some advantage(with a little added complication) in preserving the generality of consid-eration of σ-rings, rather than σ-fields during this construction process(cf. preface). In this chapter we prepare to use the results obtained so farto develop the theory of integration of functions on abstract spaces. Fromthis point it will usually be convenient to assume that the basic σ-ring onwhich the measure is defined is, in fact, a σ-field. This will avoid a num-ber of rather fussy details, and will involve negligible loss of generality forintegration.

The basic framework for integration will be a space X, a σ-field S ofsubsets of X, and a measure μ on S. The triple (X,S, μ) will be referred toas a measure space. When μ(X) = 1, μwill be called a probability measure.Probabilities are studied in depth from Chapter 9, though also occasionallyappear earlier as special cases.

In most of this chapter we shall be not concerned at all with the measureμ, but just with properties of functions and transformations defined on X, inrelation to S. To emphasize this absence of μ from consideration, the pair(X,S) will be referred to as a measurable space. In these combinations Swill always be a σ-field and in “stand-alone” cases it will be clearly statedwhether a σ-ring or σ-field is assumed. Generated σ-rings and σ-fieldswill continue to be denoted by S(E), σ(E) respectively.

Following normal usage, any set E which belongs to the σ-field S ofa measurable space (X,S) will be called measurable (or S-measurable ifthere is any possible ambiguity).

A measurable space of particular interest is the real line, where S iseither the class of Borel sets, Lebesgue measurable sets, or occasionally

44

3.2 Transformations and functions 45

some other σ-field. It will also be important to consider the extended realline R* = [–∞,∞] (cf. Section 2.1) as our basic space. The σ-field ofprimary concern in R* will be the smallest σ-field (or equivalently σ-ring)containing (a) all the Borel sets B of the (unextended) real line R and (b)each of the one-point sets {∞}, {–∞} of R*. This σ-field will be denoted byB*, and called the class of extended Borel sets. It is very easy to see that B*

consists precisely of all sets of the form

B, B ∪ {∞}, B ∪ {–∞}, B ∪ {∞} ∪ {–∞}

where B is any (ordinary) Borel set.The following result shows that B* may be generated from intervals (and

the points {±∞}) and has obvious variants using bounded intervals, rationalend points etc. (cf. Exs. 1.19–1.21).

Lemma 3.1.1

B* = S{{∞}, {–∞}, (–∞, a], –∞ < a < ∞} (= S1, say)

= S{R*, [–∞, a], –∞ < a < ∞} (= S2, say).

Proof Clearly B* ⊃ S1 and B* ⊃ S2. But also S1 ⊃ S{(–∞, a], a real} =B (cf. Ex. 1.20) and hence S1 ⊃ S{{∞}, {–∞},B} = B*. This gives the firstequality. Now {–∞} = ∩∞n=1[–∞, –n] ∈ S2, {∞} = R* –∪∞n=1[–∞, n] ∈ S2, and(–∞, a] = [–∞, a] – {–∞} ∈ S2 for real a, so that S2 ⊃ S1 = B*, completingthe proof. �

3.2 Transformations and functions

While the main concern (e.g. for integration) is with real-valued functions(on a space X), it will be very useful to consider a more general framework.Specifically if X and Y are two spaces, a mapping T defined on some subsetD of X and taking values in Y will be here called a transformation from asubset of X into Y . That is, to every point x ∈ D there corresponds animage point Tx ∈ Y . If the domain D is (all of) X we say simply that T is atransformation from X into Y .

In the special case where Y is the extended real line R*, a transformationwill be referred to as an extended real-valued function, or simply a function,defined on (a subset of) X. Functions, of course, will be generally writtenwith letters such as f , g rather than T . They may have infinite values. Whenoccasionally the values of a function f are assumed to be finite (i.e. in R),f will be specifically referred to as a real function.

46 Measurable functions and transformations

The remainder of this section will be concerned with general transforma-tions (and hence the results will apply in particular to functions). Specialresults pertaining only to functions will be obtained in subsequent sections.

Let, then, T be a transformation from a subset D of a space X into a spaceY . The inverse image T–1G of a subset G ⊂ Y is defined to be the set of allx “which map into G”, i.e.

T–1G = {x : Tx ∈ G} (i.e. {x ∈ D : Tx ∈ G}).

Note that while T is a “point mapping”, T–1 is a “set mapping” – convert-ing subsets of Y into subsets of X. The following result shows how T–1

commutes pleasantly with set operations (see also Ex. 3.6).

Lemma 3.2.1 Let T be a transformation from a subset D of a space Xinto a space Y, and let G, H, Gi, i = 1, 2, . . . , be subsets of Y. Then

(i) T–1(G – H) = T–1G – T–1H,(ii) T–1(∪∞1 Gi) = ∪∞1 T–1Gi,

(iii) T–1(∩∞1 Gi) = ∩∞1 T–1Gi,(iv) T–1Gc = D – T–1G. In particular if D = X then T–1Gc = (T–1G)c.

Proof (i) x ∈ T–1(G – H) if and only if Tx ∈ G – H; that is, if and only ifTx ∈ G, Tx � H, or x ∈ T–1G – T–1H, as required. The remaining proofs aresimilar. �

The following simple, but quite useful, result may now be obtained asan immediate corollary. In this and subsequently, if T is a class of subsetsof Y , T–1T will denote the class of all subsets of X of the form T–1G forG ∈ T , i.e. T–1T = {T–1G : G ∈ T }. Note that since T–1 is a set functionthis notation T–1T is consistent with the usage TE = {Tx : x ∈ E} for thepoint function T .

Theorem 3.2.2 Let T be a transformation from a subset of a space X intoa space Y, and let T be a σ-ring of subsets of Y. Then T–1T is a σ-ring inX. T–1T is a σ-field if T is, provided T is defined on (all of) X.

Proof If Ei ∈ T–1T , then Ei = T–1Gi for some Gi ∈ T (i = 1, 2, . . .).Then also

∪∞1 Ei = ∪∞1 T–1Gi = T–1 ∪∞1 Gi

by Lemma 3.2.1. Since ∪∞1 Gi ∈ T it follows that ∪∞1 Ei ∈ T–1T . Similarly,it is easy to show that if E, F ∈ T–1T , then E – F ∈ T–1T from which itfollows at once that T–1T is a σ-ring. Finally, if T is defined on X and T isa σ-field, then Y ∈ T and hence X = T–1Y ∈ T–1T , as required. �

3.3 Measurable transformations and functions 47

This result demonstrates the use of T for “inducing” a σ-ring (or σ-field)in X from one in Y . Thus if (Y ,T ) is a measurable space and T a transfor-mation from a subset of X into Y , then T–1T is a σ-ring in X whereasσ(T–1T ) is a σ-field of subsets of X. This σ-field will be denoted by σ(T)and termed the σ-field in X generated (or induced) by T from T . As notedif T is a σ-field and T is defined at all points of X then σ(T) = T–1T . Inany case (X, σ(T)) is a measurable space. Note that σ(T) depends on theσ-field T here assumed fixed.

Finally a transformation T may also be used to go in the other directionto obtain a σ-ring in Y from one in X, as the following result shows.

Theorem 3.2.3 Let S be a σ-ring on a space X and T a transformationfrom a subset of X into Y. Then the class T of subsets G of Y such thatT–1G ∈ S, i.e. T = {G : T–1G ∈ S} is a σ-ring of subsets of Y.

Proof Similar to the previous result. �

Corollary Let T be a transformation from a (subset of a) space X into aspace Y and G a class of subsets of Y. Then S(T–1G) = T–1S(G).

Proof Since T–1G ⊂ T–1S(G), a σ-ring, it is immediate that S(T–1G) ⊂T–1S(G). Conversely by the theorem the class of sets G ⊂ Y such thatT–1G ∈ S(T–1G) is a σ-ring. Since this (trivially) contains G it containsS(G) so that T–1S(G) ⊂ S(T–1G), completing the proof. �

3.3 Measurable transformations and functions

Suppose now that (X,S) and (Y ,T ) are measurable spaces, and that T isa transformation from a subset of X into Y . Then T is called a measurable(or S|T -measurable) transformation (or mapping) if T–1G ∈ S wheneverG ∈ T , i.e. if the inverse image under T of each T -measurable set, is S-measurable. This may obviously be rephrased as T–1T ⊂ S from whichit follows at once that σ(T) = σ(T–1T ) ⊂ S. Hence it follows that S-measurability of T may be equivalently defined as σ(T) ⊂ S. The simpledetails should be checked (Ex. 3.1).

While a measurable transformation T need not be defined on the wholespace X, its domain of definition D must be a measurable set (since D =T–1Y). Also, there may be many σ-fields S on X for which T is S|T -measurable. Clearly one such σ-field is σ(T) itself and if S is another,S ⊃ σ(T). Hence (Ex. 3.1) σ(T) is the smallest σ-field S for which T isS|T -measurable.


An important special case occurs when Y is R* = [–∞,∞] and T is theclass B* of extended Borel sets. The transformation is then a function fdefined on a subset of X. If measurable in the sense described, f is calleda measurable (or S-measurable) function. Thus an S-measurable functionis a function from a subset of X with values in R*, such that f –1B ∈ S foreach B ∈ B*.

Before studying the measurability of functions in detail, we give asimple general result concerning measurability of the composition of twomeasurable transformations, and a very useful measurability criterion.

Theorem 3.3.1 Let (X,S), (Y ,T ), (Z,W) be measurable spaces. LetT1 be an S|T -measurable transformation from a subset D1 of X into Y,and T2 a T |W-measurable transformation from a subset D2 of Y into Z.Then the composition T2T1 defined for x ∈ D1 such that T1x ∈ D2 into Z by(T2T1)(x) = T2(T1x) is S|W-measurable.

Proof It is easily checked that for any G ⊂ Z,

(T2T1)–1G = T–11 (T–1

2 G).

If G ∈ W, it follows that T–12 G ∈ T and hence T–1

1 (T–12 G) ∈ S, or (T2T1)–1G ∈

S, as required to show measurability of T2T1. �

According to the definition of measurability a transformation T from asubset of X into Y is S|T -measurable if the inverse image under T of eachT -measurable set is S-measurable. The following simple result shows thatT is S|T -measurable provided the inverse image under T of each set in aclass G generating T is S-measurable. Since G may be a much simplerclass than T , this result can be very helpful in proving measurability oftransformations and functions.

Theorem 3.3.2 Let (X,S) and (Y ,T ) be measurable spaces, T a trans-formation from a subset of X into Y, and G a class of subsets of Y suchthat S(G) = T . Then T is S|T -measurable if and only if T–1G ∈ S for allG ∈ G, i.e. if and only if T–1G ⊂ S.

Proof The “only if” part is immediate since G ⊂ T . The “if” part fol-lows simply since if T–1G ⊂ S then by Theorem 3.2.3 (Corollary) T–1T =T–1S(G) = S(T–1G) ⊂ S and since S is also a σ-field, S ⊃ σ(T–1T ) =σ(T) showing that T is S|T -measurable, as required. �

Corollary With the notation of the theorem, if S(G) = T , then σ(T) =σ(T–1G).

3.3 Measurable transformations and functions 49

Proof By the theorem T is σ(T–1G)|T -measurable since T–1E ∈ σ(T–1G)for all E ∈ G andS(G) = T , so thatσ(T–1G) ⊃ σ(T ). The reverse inclusionis immediate giving the desired conclusion. �

Variant An S|T -measurable transformation T is necessarily defined on ameasurable subset of X. If this were included as an assumption in Theorem3.3.2, then S(G) = T may be replaced by σ(G) = T in the theorem.

Application of Theorem 3.3.2 to functions gives the following criteriafor measurability.

Theorem 3.3.3 Let (X,S) be a measurable space and f a function definedon a subset D of X. Then the following are equivalent.

(i) f is measurable.(ii) f –1{∞} ∈ S, f –1{–∞} ∈ S and either

(a) f –1B ∈ S for all B ∈ B or(b) {x : –∞ < f (x) ≤ a} = f –1(–∞, a] ∈ S for every real a.

The set (–∞, a] may be replaced by (–∞, a), [a,∞) or (a,∞) and “real a”may be replaced by “rational a”.

(iii) D ∈ S and {x : f (x) ≤ a} = f –1[–∞, a] ∈ S for every real a.

The set [–∞, a] may be replaced by [–∞, a), [a,∞] or (a,∞] and “reala” may be replaced by “rational a”.

These follow at once from Theorem 3.3.2, using Lemma 3.1.1 and itsobvious variants. For example if (iii) holds, f –1(G) ∈ S for G ∈ G ={R*, [–∞, a]; a real} (f –1(R*) = D ∈ S) and S(G) = B* by Lemma 3.1.1so that Theorem 3.3.2 gives measurability of f .

Note that conditions (ii) separate the finite and infinite values of f andthat in conditions (iii), where these values are no longer separated, it isnecessary to have D ∈ S (if D ∈ S is deleted from (iii) then it is no longerequivalent to (i)).

Finally note that the simplest example of a measurable function definedon (X,S) is the indicator function χE of a measurable set E ∈ S. In fact it isquite clear that if E ⊂ X, then χE is measurable if and only if E ∈ S. Thus ifthere is a nonmeasurable set, there is a nonmeasurable function. Similarlyif E ∈ S and a is any real number, aχE is measurable and by taking E = X,the constant functions on X are measurable. It will be shown in Section 3.5that every measurable function can be simply obtained (as made specific inTheorem 3.5.2) from the class of indicator functions.

3.4 Combining measurable functions

Measurability of various combinations of measurable functions such assums, products and limits will be shown in this section. Throughout theremainder of this chapter (X,S) will denote a fixed measurable space andall functions to be considered will be functions defined on subsets of X.

First note that it is sometimes desirable to define a function “piecewise”– equating it with each one of, say n given measurable functions, on eachone of n given (disjoint) measurable sets. The following simple lemmashows that a measurable function is thus obtained.

Lemma 3.4.1 (i) Let f1, . . . , fn be measurable functions defined on setsD1, . . . , Dn respectively. Let h be defined on H = H1 ∪ H2 ∪ . . . ∪ Hn,where Hi are disjoint measurable sets and Hi ⊂ Di, by h(x) = fi(x) forx ∈ Hi. Then h is measurable.

(ii) In particular if f is a measurable function defined on D and h is itsrestriction to a measurable subset H ⊂ D (i.e. h is defined on H andh(x) = f (x) for x ∈ H), then h is measurable.

Proof (i) For any B ∈ B*

h–1B = ∪n1{(h–1B) ∩ Hi} = ∪n

1{(f –1i B) ∩ Hi}

which is clearly measurable, since each fi is measurable.(ii) follows at once from (i). �

We now consider sums of measurable functions, recalling that if f , g aredefined on subsets of X, then f + g is defined by (f + g)(x) = f (x) + g(x) atall points x for which this sum makes sense. That is f + g is not defined atany point x for which f (x) = ∞, g(x) = –∞ or f (x) = –∞, g(x) = ∞, nor ofcourse, at any point at which one of f , g is undefined.

Theorem 3.4.2 Let f , g be measurable functions. Then f + g is a mea-surable function, as also is af , for any real number a. Hence finite linearcombinations of measurable functions are measurable (i.e. if fi is measur-able and ai real for i = 1, . . . , n,

∑n1 aifi is measurable).

Proof Let f be defined on the subset D of X. Then af is also defined onD and since f is measurable, D ∈ S and {x : af (x) ≤ c} ∈ S for all real c,since this set is {x : f (x) ≤ c/a} if a > 0, {x : f (x) ≥ c/a} if a < 0, D if a = 0and c ≥ 0, and ∅ if a = 0 and c < 0. The measurability of af follows nowfrom Theorem 3.3.3.

3.4 Combining measurable functions 51

Define now h1(x) on the set D1 where f and g are both finite (D1 =(f –1R) ∩ (g–1

R)) by h1(x) = f (x) + g(x). D1 is clearly measurable and h1 is ameasurable function since for any real c

{x : h1(x) < c} = D1 ∩ {x : f (x) < c – g(x)}= D1 ∩ ∪r rational{x : f (x) < r} ∩ {x : g(x) < c – r}

(since if f (x) < c – g(x) there is some rational between these two numbers)and the union involves a countable number of measurable sets.

Define h2(x) on the set D2 where f + g is +∞ by h2(x) = +∞, and h3(x)on D3 where f + g is –∞ by h3(x) = –∞. h2 and h3 are measurable (e.g. h2

is the restriction of the function identically equal to +∞, to the measurableset D2 = {f –1(R)∩ g–1(∞)} ∪ {g–1(R)∩ f –1(∞)} ∪ {f –1(∞)∩ g–1(∞)}). f + g isdefined precisely on D1 ∪ D2 ∪ D3 and (f + g)(x) = hi(x) for x ∈ Di so thatf + g is measurable by Lemma 3.4.1. �

Corollary If f , g are measurable functions, the sets

{x : f (x) = g(x)}, {x : f (x) < g(x)}, {x : f (x) ≤ g(x)}

are all measurable.

Proof It is seen at once that e.g.

{x : f (x) = g(x)} = {x : (f – g)(x) = 0} ∪ {f –1(∞) ∩ g–1(∞)}∪ {f –1(–∞) ∩ g–1(–∞)}.

The first set on the right is measurable by the theorem, so that the entireright hand side is measurable.

The other two cases are similarly treated. �

The next results specialize Theorem 3.3.1 in two stages – first concern-ing the composition of a transformation and a function, and then for twofunctions.

Theorem 3.4.3 (i) Let (X,S), (Y ,T ) be measurable spaces, T anS|T -measurable transformation from a subset of X into Y, and g aT -measurable function from a subset of Y. Then the composition gT((gT)(x) = g(Tx)) is an S-measurable function.

(ii) Let (X,S) be a measurable space, f an S-measurable function, and ga B*-measurable function defined on a subset of R*. Then the compos-ition (written g ◦ f when f , g are both functions since gf will denotetheir product) is S-measurable.


Note that a useful “converse” result to Theorem 3.4.3 (i) is given later(Theorem 3.5.3). Note also that Theorem 3.4.3 (ii) requires that g be mea-surable with respect to the extended Borel sets. It says that an “extendedBorel” measurable function of a measurable function is measurable. It isnot always true that, e.g. a “Lebesgue measurable” function (see Section3.8) of a measurable function is measurable.

Corollary If f (x) is a measurable function, then, for any real a, |f (x)|a ismeasurable, and so is f n(x), n = 1, 2, . . . .

Proof This follows since it is easy to show directly that |t|a is a measurablefunction on R* (use Theorem 3.3.3), and so is tn. �

The next result shows that products and ratios of measurable functionsare measurable. Of course if f , g are defined on subsets of X, then theirproduct fg is defined by (fg)(x) = f (x)g(x) at all points x at which both fand g are defined. Their ratio f /g is defined by (f /g)(x) = f (x)/g(x) at thosepoints x at which f , g are both defined but g is neither 0 nor ±∞. (f /g couldbe defined at other points, but note that under this definition f /g = f · (1/g).)

Theorem 3.4.4 If f , g are measurable functions then fg, f /g are measur-able.

Proof First consider the product fg. Let D1 be the (measurable) set onwhich both f and g are both finite. Then h1 = 1

4 [(f + g)2 – (f – g)2] is definedprecisely on D1 and h1(x) = f (x)g(x) for x ∈ D1. By Theorem 3.4.2, f + g,f – g are measurable and hence so are (f + g)2, (f – g)2 by the corollary toTheorem 3.4.3, and also h1 by Theorem 3.4.2.

It is easily checked that the sets D2 = (fg)–1(∞), D3 = (fg)–1(–∞),D4 = [f –1(±∞)∩g–1(0)]∪[g–1(±∞)∩ f –1(0)], are measurable, and hence thefunctions h2, h3, h4 defined on these respective sets as ∞, –∞, 0, are mea-surable. Further (fg)(x) = hi(x) for x ∈ Di (i = 1, 2, 3, 4) so that by Lemma3.4.1, fg is measurable.

For the measurability of f /g only the case f ≡ 1 need be considered, bythe result just proved (since f /g = f · (1/g) as noted above). If h = 1/g isdefined on the set {x : g(x) � 0 or ±∞} and c is any real number then it iseasily checked that

{x : h(x) ≤ c} = ({x : cg(x) ≥ 1} ∩ g–1(0,∞))

∪ ({x : cg(x) ≤ 1} ∩ g–1(–∞, 0))

demonstrating measurability of h (since cg is a measurable function). �

3.4 Combining measurable functions 53

The next result concerns measurability of the maximum and minimumof two measurable functions and of the “positive and negative parts” of ameasurable function. Specifically, consider max(f (x), g(x)), min(f (x), g(x))defined on the measurable set on which f and g are both defined. Write also

f+(x) = max(f (x), 0)

f–(x) = – min(f (x), 0)

and then

f (x) = f+(x) – f–(x), |f (x)| = f+(x) + f–(x)

(note that for each x, at least one of f+(x), f–(x) is zero). f+ and f– are calledthe positive and negative parts of f , respectively.

Theorem 3.4.5 Let f , g be measurable functions. Then max(f , g),min(f , g), f+, f–, |f | are all measurable functions.

Proof For any real c,

{x : max(f (x), g(x)) < c} = {x : f (x) < c} ∩ {x : g(x) < c}

which is measurable, showing measurability of max(f , g). Also min(f , g) =– max(–f , –g) is measurable. Since a constant function (and in particularthe zero function) is measurable, it follows that f+ and f– are measurableand so is |f | = f+ + f–. �

We now consider sequences of measurable functions.

Theorem 3.4.6 Let {fn} be a sequence of measurable functions. Thenthe functions supn fn(x), infn fn(x), lim supn→∞ fn(x), lim infn→∞ fn(x) (eachdefined on the set D = ∩∞n=1{x : fn(x) is defined }), are all measurable.

Proof For any real c,

{x : infn

fn(x) < c} = ∪∞n=1{x : fn(x) < c} ∩ D

which is measurable, and hence infn fn(x) is measurable, as thusalso is supn fn(x) = – infn{–fn(x)}. Hence also lim supn→∞ fn(x) =infn≥1{supm≥n fm(x)} is measurable, and similarly so is lim infn→∞ fn(x). �

The next result shows in particular that if a sequence of measurable func-tions converges on a set D then the limit (defined on D) is a measurablefunction.

Theorem 3.4.7 Let {fn} be a sequence of measurable functions. Let Ddenote the set of all x for which fn(x) are all defined and fn(x) converges(to a finite or infinite value). Then D is a measurable set and the function fdefined on D by f (x) = limn→∞ fn(x) is measurable.

Proof Define g(x) = lim supn→∞ fn(x) and h(x) = lim infn→∞ fn(x) on thesubset of X where each fn is defined. Since fn(x) converges (to a finite orinfinite value) if and only if g(x) = h(x), D = {x : g(x) = h(x)}. Since g, hare measurable by Theorem 3.4.6, it follows from the corollary to Theorem3.4.2 that D is measurable.

Further, for any real c,

{x : f (x) < c} = D ∩ {x : g(x) < c}

which is measurable since D ∈ S by the above, and g is measurable. Hencef is measurable. �

3.5 Simple functions

The so-called simple functions to be introduced in this section are easy tomanipulate, and can be used to approximate measurable functions in a veryuseful way. Again throughout (X,S) will be a fixed measurable space inwhich all functions will be defined.

A real-valued function f defined on (all of) X is called simple if it ismeasurable and assumes only a finite number of (finite, real) values. Thesimplest of all simple functions is clearly the indicator function of a mea-surable set. The basic properties of simple functions are collected in thefollowing result.

Theorem 3.5.1 (i) Finite linear combinations and products of simplefunctions are simple functions.

(ii) f is a simple function if and only if for every x ∈ X,

f (x) =∑n

i=1 aiχEi(x)

where the sets E1, . . . , En are disjoint measurable sets such that∪n

i=1Ei = X, and a1, . . . , an are real numbers.

Proof (i) is obvious and so is the “if” part of (ii), in view of Theorem3.4.2. For the “only if” part of (ii) let a1, . . . , an be the distinct real valuesof f and define Ei = {x : f (x) = ai}, i = 1, . . . , n. Since f is measurableand a1, . . . , an distinct, the sets E1, . . . , En are disjoint and measurable and∪n

i=1Ei = X since f is defined on (all of) X. �

3.5 Simple functions 55

The representation of a simple function given in (ii) will be used in thefollowing without further explanation. This representation is obviously notunique, unless a1, . . . , an are required to be distinct, or, equivalently, Ei ={x : f (x) = ai}, i = 1, . . . , n.

A sequence {fn} of functions defined on X will be called an increasingsequence if for every x ∈ X, fn(x) ≤ fn+1(x), n = 1, 2, . . . . Such a sequence offunctions has a (pointwise) limit f (x); i.e. fn(x)→ f (x) for each x. (f (x) may,of course, be infinite – even if all fn(x) are finite.) The next (very useful)result shows that any nonnegative measurable function may be expressedas the limit of an increasing sequence of simple functions.

Theorem 3.5.2 Let f be a nonnegative measurable function defined on(all of) X. Then there exists an increasing sequence {fn} of nonnegativesimple functions such that fn(x)→ f (x) for each x ∈ X.

Proof Define

fn(x) =i – 12n if

i – 12n ≤ f (x) <

i2n , i = 1, 2, . . . , n2n

= n if f (x) ≥ n.

Then

{x : fn(x) = (i – 1)/2n} = f –1[i – 12n ,

i2n ) ∈ S

{x : fn(x) = n} = f –1[n,∞] ∈ S.

Thus, for each n, fn(x) is a nonnegative simple function. It is easy to see thatfn(x) is nondecreasing in n for each x (since, e.g. if fn(x) = (i – 1)/2n then

(2i – 2)/2n+1 ≤ f (x) < (2i)/2n+1

showing that fn+1(x) is either (2i – 2)/2n+1 = fn(x) or (2i – 1)/2n+1 > fn(x)).If f (x) < ∞, choose n0 > f (x). Then for n ≥ n0, 0 ≤ f (x) – fn(x) ≤ 2–n

showing that fn(x)→ f (x) as n→ ∞. If f (x) = ∞, fn(x) = n→ ∞ and hencefn(x)→ f (x) for all x and the proof is complete. �

The next result follows by writing f = f+ – f– and applying the theoremto f+ and f– separately.

Corollary Let f be a measurable function defined on (all of) X. Thenthere exists a sequence {fn} of simple functions such that fn(x) → f (x) foreach x ∈ X. In fact {fn} may be taken so that {|fn|} is an increasing sequence.


This corollary (along with Theorem 3.4.7) shows that a function definedon X is measurable if and only if it is the (pointwise everywhere) limit ofa sequence of simple functions. This is sometimes used as the definition ofmeasurability (for functions defined on X).

Theorem 3.5.2 and its corollary are very useful in extending propertiesvalid for simple functions to measurable functions. Typically a propertyis proved or a concept defined (e.g. the integral in the next chapter) forsimple functions and then extended to measurable functions by using theseresults. This useful method of establishing results will be used repeatedlyin the following chapters. The first application is a result in the conversedirection to Theorem 3.4.3 (i).

Theorem 3.5.3 Let X be a space, (Y ,T ) a measurable space, T a trans-formation from X into Y, and T–1T the σ-field of subsets of X induced byT. Then a function f defined on X is T–1T -measurable if and only if there isa T -measurable function g defined on Y such that f = gT (i.e. f (x) = g(Tx)for all x ∈ X).

Proof The “if” part follows from Theorem 3.4.3 (i) since T is T–1T |Tmeasurable. For the “only if” part assume first that f is a simple function,f (x) =

∑ni=1 aiχEi

(x) say. Then since Ei ∈ T–1T , Ei = T–1Gi for each i, whereGi ∈ T . Hence

f (x) =∑n

i=1aiχT–1Gi(x) =

∑ni=1aiχGi

(Tx).

The result then follows (when f is simple) by writing g(y) for the measur-able function

∑ni=1 aiχGi

(y).If f is not necessarily simple, but just T–1T -measurable, Theorem 3.5.2

(Corollary) may be used to express f (x) as a limit of a sequence of simplefunctions fn(x) (where fn(x) → f (x) for each x). By the above result forsimple functions there is a T -measurable (simple) function gn(y) such thatfn(x) = gn(Tx). Write g(y) = lim gn(y) when this limit exists and g(y) = 0otherwise. Then g is clearly T -measurable (Ex. 3.3) and for x ∈ X, gn

converges at Tx and hence

(gT)(x) = g(Tx) = limn→∞

gn(Tx) = limn→∞

fn(x) = f (x)

as required. �

Note that the function g in the theorem need not be unique (unless Tmaps “onto” Y – Ex. 3.8). A function f may be called “measurable withrespect to T” if it is T–1T -measurable. This theorem then says that f ismeasurable with respect to T if and only if it has the form gT for some

3.6 Measure spaces, “almost everywhere” 57

T -measurable function g; i.e. if and only if f is a “T -measurable functionof T”.

3.6 Measure spaces, “almost everywhere”

The existence of a measure on the measurable space (X,S) has not beenrelevant in this chapter up to this point. This section will be more specifi-cally concerned with a measure space (X,S, μ) and introduces some usefulterminology.

Suppose then that (X,S, μ) is a fixed measure space. Suppose that someproperty holds at all points of A ∈ S where μ(Ac) = 0. Then this propertyis said to hold almost everywhere (abbreviated “a.e.” or “a.e. (μ)”). Forexample, if f is a function on X the statement f ≥ 0 a.e. means that thereis a set A ∈ S, μ(Ac) = 0, such that f (x) ≥ 0 for all x ∈ A. Note thatthe set where f (x) < 0 is to be a subset of the set Ac. The precise set wherethe property does not hold is not necessarily measurable unless, of course,μ is a complete measure. Some authors require this set to be measurable,but we do not do so here.

Thus, as defined above, to say that a property holds a.e. means that itholds at all points of A, where A is a measurable set with μ(Ac) = 0.Whether the property holds at any points of Ac is not relevant. With slightinconsistency Ac will nevertheless be referred to as “the exceptional set”.

As a further example, to say that a function f is defined a.e. on X meansthat f is defined for all x ∈ A where A ∈ S, μ(Ac) = 0. To say that twofunctions f , g are equal a.e. on X means that f (x) = g(x) for all x ∈ A (∈ S)where μ(Ac) = 0, and so on.

This terminology will be used a great deal in subsequent chapters. Forthe moment we make a few comments relative to the measurability discus-sions of the present chapter, and looking ahead to later usage.

First, one often has several properties which each hold a.e., and it is de-sired to say that they hold a.e. as a group. That is, one seeks one exceptionalset, rather than several. This is clearly possible for a finite or countably in-finite set of properties since countably many zero measure sets may becombined to get a zero measure set. For example, if {fn} is a sequence offunctions and fn ≥ 0 a.e. for each n, there is a set An ∈ S, μ(Ac

n) = 0,such that fn(x) ≥ 0 for x ∈ An. Writing A = ∩∞n=1An, it follows that A ∈ S,μ(Ac) = μ(∪∞n=1Ac

n) = 0, and fn(x) ≥ 0 for x ∈ A and all n. That is, a singlezero measure “exceptional set” Ac is obtained. This, of course cannot bedone in general if there are uncountably many conditions. (Why not?)


Next suppose that f , g are functions defined on subsets of X, and suchthat f = g a.e. This means that there is a set A ∈ S, μ(Ac) = 0, such thatf , g are both defined and equal on A. Each may be defined or not at anypoint of Ac, of course, and if both are defined, their values may or may notcoincide. Suppose f is known to be measurable (with respect to S). It isthen not necessarily true that g is measurable (example?). If μ is complete,however, then g must be measurable as shown in the following theorem.

Theorem 3.6.1 Let (X,S, μ) be a measure space, and f , g functions de-fined on subsets of X. If f is measurable and μ is complete, and f = g a.e.,then g is measurable.

Proof Let g be defined on G ⊂ X, and let A ∈ S, μ(Ac) = 0, be such thatf (x) = g(x) for all x ∈ A. Then A ⊂ G and G = A ∪ (G – A). Since G – A is asubset of the measurable set Ac which has zero measure, and μ is complete,G – A ∈ S. Hence G ∈ S. Now for each real a

{x : g(x) ≤ a} = (A ∩ {x : g(x) ≤ a}) ∪ (Ac ∩ {x : g(x) ≤ a}).

The second set is a subset of Ac and is measurable since Ac ∈ S, μ(Ac) =0, and μ is complete. The first set is just A∩ {x : f (x) ≤ a} which is measur-able since f is measurable and A ∈ S. It follows now from Theorem 3.3.3that g is measurable. �

Pursuing this line a little further, suppose that (X,S, μ) is a measurespace and μ is the completion of μ, on the “completion σ-field” S (cf.Theorem 2.6.1). Suppose that f is S-measurable. Then it can be shown thatthere is an S-measurable function g such that f = g a.e. (μ). A sketch of theproof of this is contained in Ex. 3.9.

Finally, note the important notion of convergence a.e. Specifically,“fn → f a.e.” means of course that fn(x) → f (x) for all x ∈ A, whereA ∈ S, μ(Ac) = 0. (This implies in particular that each fn and f are defineda.e.) This does not necessarily imply that f is measurable, even though thefunction limn→∞ fn(x) is a measurable function (Theorem 3.4.7). Note thatf (x) = limn fn(x) a.e. but measurability of the right hand side does not nec-essarily imply that of the left – unless μ is complete (Theorem 3.6.1).

3.7 Measures induced by transformations

The following result concerns the use of a measurable transformation to“induce” a measure on a measurable space from a measure space.

3.8 Borel and Lebesgue measurable functions 59

Theorem 3.7.1 Let (X,S, μ) be a measure space, (Y ,T ) a measurablespace, and T a measurable transformation from a subset of X into Y. Thenthe set function μT–1 defined on T by

(μT–1)(G) = μ(T–1G), G ∈ T

is a measure on T .

μT–1 is called the measure induced on T from μ on S by the measurabletransformation T .

Proof Since T is S|T -measurable, T–1G ∈ S for each G ∈ T and thusμT–1 is defined. Clearly μT–1 is a nonnegative-valued set function andμT–1(∅) = μ(T–1∅) = μ(∅) = 0. Further, μT–1 is countably additive sinceif {Gi} are disjoint sets of T (i = 1, 2, . . .) then clearly {T–1Gi} are disjointand

(μT–1)(∪∞1 Gi) = μ(T–1 ∪∞1 Gi)

= μ(∪∞1 T–1Gi) (Lemma 3.2.1)

=∑∞

1 μ(T–1Gi)

=∑∞

1 (μT–1)(Gi)

as required. Hence μT–1 is a measure on T . �

This theorem will have important implications in the transformation ofintegrals and in probability theory where a transformation is a “randomelement” and the induced measure is its distribution.

3.8 Borel and Lebesgue measurable functions

So far measurable functions have been defined on arbitrary measurablespaces (X,S). When X is the real line R, and f is a measurable functionwith respect to the σ-field B of Borel sets on X, then f is called a Borelmeasurable function. On the other hand, if f is measurable with respect tothe σ-field of Lebesgue measurable sets, it is called a Lebesgue measur-able function. Exercise 3.9 shows in particular that a Lebesgue measurablefunction is equal a.e. to some Borel measurable function.

A useful subclass of simple functions on the real line are what we herecall step functions. These are functions of the form f (x) =

∑ni=1 aiχIi

(x),where I1, . . . , In are disjoint intervals such that ∪n

i=1Ii = R and a1, . . . , an arereal numbers.

Most of the usual functions defined on the real line, or on Borel subsetsof it, are Borel measurable. Specifically continuous functions on the real

line are Borel measurable (see Ex. 3.10) and so are monotone functions (Ex.3.11); the same is of course true if such functions are defined on an intervalof the real line. It turns out that every Borel measurable function definedon a closed and bounded interval is nearly continuous in the followingmeasure-theoretic sense.

Theorem 3.8.1 Let f be an extended real-valued Borel measurable func-tion defined on the bounded closed interval [a, b], –∞ < a 0 there is a step function g and a continuous function h(both of course depending on ε) such that

m{x ∈ [a, b] : |f (x) – g(x)| ≥ ε} < ε, m{x ∈ [a, b] : f (x) � h(x)} < ε,

where m is Lebesgue measure. If in addition c ≤ f (x) ≤ d for all x ∈ [a, b],g and h can be chosen so that c ≤ g(x) ≤ d and c ≤ h(x) ≤ d for allx ∈ [a, b].

The proof of this result is outlined in Ex. 3.12 and Ex. 3.13.

Exercises3.1 Fill in the details of the first paragraphs of Section 3.3 to show that measur-

ability of a transformation T from (X,S) to (Y ,T ) is equivalent to σ(T) ⊂ Sand hence that σ(T) is the smallest σ-field on X such that T is measurableinto (Y ,T ).

3.2 If |f | is a measurable function on (X,S), is f measurable? (Give proof orcounterexample.)

3.3 Let fn, n = 1, 2, . . . , be measurable functions. Set f (x) = limn→∞ fn(x) wherethis limit exists and f (x) = 0 otherwise. Show that f is measurable.

3.4 Let (X,S) be a measurable space, E a subset of X, and SE = S ∩ E (seeEx. 1.22). Suppose f is a function defined on E. Then f may be viewed as afunction in either of the measurable spaces (X,S), (E,SE). Show that if f isS-measurable then it is SE-measurable and find a necessary and sufficientcondition for the converse to be true.

3.5 Let X be the real line (R), and S the σ-field consisting of X, ∅, (–∞, 0],(0,∞). What functions defined on X are S-measurable?

3.6 Let T be a transformation defined from a space X into a space Y . For anyE ⊂ X write TE for the set of images {Tx : x ∈ E}. Thus T may be regardedas operating on sets. Do any of the results of Lemma 3.2.1 hold when T–1 isreplaced by T (and subsets of Y by subsets of X)? Compare with the proof ofTheorem 1.9.2.

Exercises 61

3.7 Suppose T is a 1-1 transformation from X onto Y . How is the “set inverse”T–1 (T–1G = {x : Tx ∈ G}) related to the “point inverse” T–1 (T–1y = x whereTx = y)?

3.8 Show that the function g of Theorem 3.5.3 is unique if T maps onto Y .3.9 Let (X,S, μ) be a measure space and μ the completion of μ, on the σ-field S.

Let f be S-measurable. Show that there exists an S-measurable function gsuch that f = g a.e. (μ). (Hint: This clearly holds for the indicator of a set inS and hence for an S-measurable simple function. A general S-measurablefunction is the limit of such simple functions.)

3.10 Show that every continuous function on the real line is Borel measurable.(Hint: Use the property of continuous functions f that if B is open so isf –1(B), or else verify that f (x) = limn→∞ fn(x) where for each n = 1, 2, . . . ,fn is defined by fn(x) = f ( k

2n ) if k2n < x ≤ k+1

2n , k = 0,±1,±2, . . . .)3.11 If a real-valued function f defined on the real line is monotone nondecreasing

or nonincreasing, show that f is Borel measurable.3.12 Prove the part of Theorem 3.8.1 involving the step function g using the fol-

lowing steps.(a) Show that there is M, 0 < M < ∞, such that |f (x)| ≤ M except on a Borelset of Lebesgue measure less than ε/2.(b) Given any M, 0 < M < ∞, there is a simple function φ such that |f (x) –φ(x)| < ε for x ∈ [a, b] except where |f (x)| ≥ M. If c ≤ f (x) ≤ d on [a, b]then φ can be chosen so that c ≤ φ(x) ≤ d on [a, b]. (This step followsimmediately from the construction in the proof of Theorem 3.5.2 and itscorollary.)(c) Given a simple function φ there is a step function g such that m{x ∈[a, b] : φ(x) � g(x)} < ε/2. If c ≤ φ(x) ≤ d on [a, b] then g can be chosen sothat c ≤ g(x) ≤ d on [a, b]. (Use Theorem 2.6.2.)

3.13 Prove the part of Theorem 3.8.1 involving the continuous function h as fol-lows: First assume the Borel measurable function f is bounded, |f | < M forsome M > 0. Without loss of generality, further assume that M = 1 (e.g.normalize f to f /M). Using the construction of Theorem 3.5.2 and its corol-lary write f = limn→∞ fn =

∑∞n=1(fn – fn–1) =

∑∞n=1 pn where f0 ≡ 0 and each

pn = fn – fn–1 is a simple function with values 0,±2–n (|f | < 1). Then show(using Theorem 2.6.2) that for each n there is a continuous function hn on[a, b] such that

m{x ∈ [a, b] : pn(x) � hn(x)} < ε

2n+1 .

Show that the series∑∞

n=1 hn(x) converges uniformly on [a, b] so that (by awell known result in analysis) it is a continuous function h(x) on [a, b], andthat

m{x ∈ [a, b] : f (x) � h(x)} < ε.

Finally use (a) of Ex. 3.12 to show that the result holds for a general Borelmeasurable function.

4

The integral

The purpose of this chapter is to define and develop properties of the inte-gral

∫X

f dμ for a suitable class of functions f on a measure space (X,S, μ).This will be done in stages in the first three sections and further proper-ties of the integral studied in the remainder of the chapter. To emphasizethe previous convention, the statement “f is defined on X” means that f isdefined at all points of X. Such functions will be considered first (including,of course, simple functions) before generalizing to functions which may bedefined only a.e.

4.1 Integration of nonnegative simple functions

It is natural to define the integral of a nonnegative simple function f =∑n1aiχEi (x) with respect to μ over X by

∫f dμ (=

∫Xf dμ) =

∑n1aiμ(Ei).

The first result shows that this definition is unambiguous.

Lemma 4.1.1 Let f be a nonnegative simple function, f (x) =∑ni=1aiχEi (x), where E1, . . . , En are disjoint sets inSwith union X and ai ≥ 0

(cf. Theorem 3.5.1 (ii)). Then the extended nonnegative real number∑n

i=1aiμ(Ei)

does not depend on the particular representation of f .

Proof Let f (x) =∑m

j=1bjχFj (x) also represent f , where Fj are disjoint mea-surable sets whose union is X and bj ≥ 0. We must show that∑m

j=1bjμ(Fj) =∑n

i=1aiμ(Ei).Since f (x) = ai for x ∈ Ei and f (x) = bj for x ∈ Fj, it follows that if

Ei ∩ Fj is not empty then ai = bj. That is, for given i, j, either Ei ∩ Fj = ∅

62

4.2 Integration of nonnegative measurable functions 63

or else ai = bj. Now∑

i aiμ(Ei) =∑

i aiμ(∪jEi ∩ Fj) (∪jFj = X)

=∑

i∑

j aiμ(Ei ∩ Fj) (μ is finitely additive)

=∑

i∑

j bjμ(Ei ∩ Fj)

since ai = bj whenever μ(Ei ∩ Fj) � 0. Similarly∑

jbjμ(Fj) is also given asthis double sum and hence

∑i aiμ(Ei) =

∑j bjμ(Fj) as required. �

Note that the value of∫

fdμ is either a finite nonnegative number or +∞,and that it is defined even if one or more of the μ(Ei) is +∞, since eachai ≥ 0. Note also that there is zero contribution to the sum from any termfor which ai = 0 and μ(Ei) = ∞ (in view of the convention that ∞(0) = 0).Elementary properties of integrals of simple functions will be given nowfor later extension.

Lemma 4.1.2 (i) Two (or finitely many) simple functions may be repre-sented as f =

∑aiχEi g =

∑biχEi with the same Ei.

(ii) If f and g are nonnegative simple functions and a, b are nonnegativereal numbers, then

∫(af + bg) dμ = a

∫f dμ + b

∫g dμ.

(iii) If f and g are nonnegative simple functions such that f (x) ≥ g(x) forall x, then

∫f dμ ≥

∫g dμ.

Proof (i) If f =∑

aiχFi , g =∑

bjχGj then f =∑

i,j aiχFi∩Gj , g =∑i,j bjχFi∩Gj .(ii) By (i) write f =

∑n1 aiχEi , g =

∑n1 biχEi . Then

∫(af +bg) dμ =

∑(aai +

bbi)μ(Ei) = a∫

f dμ + b∫

g dμ.(iii) follows at once since ai ≥ bi for each i. �

4.2 Integration of nonnegative measurable functions

The definition of the integral will now be extended from nonnegative sim-ple functions to nonnegative measurable functions defined on (all of) X(and later just a.e.) by using the fact that each nonnegative measurablefunction f is the limit of an increasing sequence {fn} of nonnegative simplefunctions. Specifically it will be shown (Theorem 4.2.2) that the integralof f may be unambiguously defined by

∫f dμ = lim

∫fn dμ. The following

lemma will be used in proving the theorem, and also later in discussingconvergence properties of the integral.

64 The integral

Lemma 4.2.1 If {fn} is an increasing sequence of nonnegative simplefunctions and limn→∞ fn(x) ≥ g(x) for all x ∈ X, where g is a nonnega-tive simple function, then

limn→∞

∫fn dμ ≥

∫g dμ.

Proof Write g(x) =∑m

1 aiχEi where, as usual, Ei are disjoint measurablesets whose union is X, and ai ≥ 0, i = 1, . . . , m. Then

∫g dμ =

∑m1 aiμ(Ei).

(i) Suppose∫

g dμ = +∞. Then for some p (1 ≤ p ≤ m), ap > 0 andμ(Ep) = ∞. Given ε such that 0 < ε < ap define

An = {x : fn(x) > g(x) – ε}.

{An} is a monotone nondecreasing sequence of sets with lim An = X so thatlimn An ∩Ep = Ep and thus by Theorem 2.2.4, limn→∞ μ(An ∩Ep) = μ(Ep) =∞. But by Lemma 4.1.2 (iii), since fn ≥ fnχAn∩Ep ≥ (ap – ε)χAn∩Ep ,∫

fn dμ ≥ (ap – ε)μ(An ∩ Ep) → ∞ as n→ ∞,

showing that limn→∞∫

fn dμ = ∞ as required.(ii) Suppose that

∫g dμ is finite. Write

A = {x : g(x) > 0} = ∪{Ei : ai > 0}.

Let a be the minimum nonzero ai. (Assume not all ai are zero, since other-wise the result is trivial.) Now

∫g dμ < ∞ implies that μ(Ei) < ∞ for each

i such that ai > 0, so that μ(A) =∑

ai>0 μ(Ei) < ∞. Define An again as aboveand let ε be such that 0 < ε < a. Then again by Lemma 4.1.2 (iii)

fn ≥ fnχAn∩A ≥ (g – ε)χAn∩A (≥ 0)

implies that∫

fn dμ ≥∫

(g – ε)χAn∩A dμ. But by Lemma 4.1.2 (ii),∫gχAn∩A dμ =

∫(g – ε)χAn∩A dμ + ε

∫χAn∩A dμ

and hence ∫fn dμ ≥

∫gχAn∩A dμ – εμ(An ∩ A)

≥∫

gχAn∩A dμ – εμ(A) (An ∩ A ⊂ A)

=∑m

i=1aiμ(An ∩ Ei) – εμ(A)

since gχAn∩A =∑m

i=1aiχEiχAn∩A =∑m

i=1aiχAn∩Ei (Ei ⊂ A if ai � 0). Thus

limn→∞

∫fn dμ ≥ ∑m

i=1aiμ(Ei) – εμ(A) =∫

g dμ – εμ(A),

since An ∩ Ei increases to Ei as n → ∞, and hence μ(An ∩ Ei) → μ(Ei).Since ε is arbitrary the result follows. �


Theorem 4.2.2 Let f be a nonnegative measurable function defined on X,and let {fn} be an increasing sequence of nonnegative simple functions suchthat fn(x)→ f (x) for all x ∈ X. Then the extended nonnegative real numberlimn→∞

∫fn dμ does not depend on the particular sequence {fn}.

Proof Let {gn} be another increasing sequence of nonnegative simple func-tions with limn→∞ gn(x) = f (x) for all x ∈ X. Then since limn→∞ fn(x) ≥gm(x) for any fixed m, it follows from Lemma 4.2.1 that limn→∞

∫fn dμ ≥∫

gm dμ for each m and hence that

limn→∞

∫fn dμ ≥ lim

m→∞

∫gm dμ.

The opposite inequality follows by interchanging the roles of the fn and gn

showing that limn→∞∫

fn dμ = limn→∞∫

gn dμ, so that the value of thelimit does not depend on the particular sequence {fn}. �

Note that by Lemma 4.1.2 (iii), {∫

fn dμ} is a nondecreasing sequence ofextended nonnegative real numbers which thus always has a limit (a finitenonnegative real number or ∞). We then define the integral of f withrespect to μ over X by ∫

f dμ = limn→∞

∫fn dμ.

This definition clearly extends the definition of the integral given in Section4.1 for nonnegative simple functions; that is if f is a nonnegative simplefunction, then its integral defined as for a nonnegative measurable functionis the same as its integral defined as for a nonnegative simple function. Hereand subsequently

∫f dμ will be shortened to

∫f when just one measure

is considered and there is no danger of confusion. However, the notation∫f dμ will be retained whenever it seems clearer to do so.The integral of nonnegative measurable functions inherits the properties

of the integral of nonnegative simple functions given in Lemma 4.1.2.

Lemma 4.2.3 Let f and g be nonnegative measurable functions on X.

(i) If a ≥ 0, b ≥ 0, then∫(af + bg) dμ = a

∫f dμ + b

∫g dμ.

(ii) If f (x) ≥ g(x) for all x ∈ X, then∫f dμ ≥

∫g dμ.

Proof If {fn}, {gn} are increasing sequences of nonnegative simple func-tions such that fn(x) → f (x), gn(x) → g(x) for each x ∈ X, then {afn + bgn}

66 The integral

is an increasing sequence of nonnegative simple functions converging toaf + bg at each x. Thus by definition,∫

(af + bg) dμ = limn→∞

∫(afn + bgn) dμ

= limn→∞

(a∫

fn dμ + b∫

gn dμ) (Lemma 4.1.2 (ii))

= a∫

f dμ + b∫

g dμ

whether the limits are finite or infinite (nonnegative terms). Hence (i)follows.

If further f (x) ≥ g(x) for each x, then limn→∞ fn(x) = f (x) ≥ g(x) ≥ gm(x)for each m and thus by Lemma 4.2.1∫

f dμ = limn→∞

∫fn dμ ≥

∫gm dμ.

Since this is true for all m,∫f dμ ≥ lim

m→∞

∫gm dμ =

∫g dμ

and thus (ii) holds. �

If f is a nonnegative measurable function on X and E is a measurableset, the integral of f over E is defined by∫

Ef dμ =

∫fχE dμ.

This set function (defined for E ∈ S) is referred to as the indefinite integralof f . Note that even if

∫f dμ = ∞,

∫E

f dμ may be finite. The followingresult will be useful in the sequel.

Theorem 4.2.4 (i) If f is a nonnegative measurable function on X andE is a measurable set such that μ(E) = 0, then

∫E

f dμ = 0.(ii) If f , g are nonnegative measurable functions on X with f = g a.e. then∫

f dμ =∫

g dμ.

Proof (i) Let {fn} be an increasing sequence of nonnegative simple func-tions such that fn(x) → f (x) for each x ∈ X. Then {fnχE} is an increasingsequence of nonnegative simple functions such that fn(x)χE(x)→ f (x)χE(x)for all x ∈ X. Further, if fn =

∑iaiχEi then fnχE =

∑iaiχEi∩E. Hence∫

fnχE dμ =∑

aiμ(E ∩ Ei) = 0, and∫

Ef dμ =

∫fχE dμ = lim

n→∞

∫fnχE dμ = 0

as required.


(ii) If f (x) = g(x) for x ∈ E, μ(Ec) = 0 then by Lemma 4.2.3 and (i),∫f =

∫χEf +

∫χEc f =

∫χEf =

∫χEg =

∫χEg +

∫χEc g =

∫g,

completing the proof. �

Note that the integral∫

f dμ has been defined for any nonnegative mea-surable function f defined on X. The value of

∫f dμ is a nonnegative real

number, or +∞. If∫

f dμ is finite, f is said to be a nonnegative integrablefunction. Thus, the integral

∫f dμ is defined for any nonnegative measur-

able function f defined on X but the adjective integrable is reserved for thecase when its integral is finite. If a nonnegative measurable f is not inte-grable, there is an increasing sequence {fn} of nonnegative simple functionssuch that fn(x)→ f (x) for all x ∈ X and

∫fn dμ→ +∞, so that

∫f dμ = ∞.

If f is a nonnegative measurable function defined on X and E ∈ S,μ(Ec) = 0 then as in the above proof

∫f dμ =

∫χEf dμ so that the val-

ues of f on the zero measure set Ec do not affect the value of the integral.Since this is so, it should not matter whether f is even defined on the setEc with μ(Ec) = 0, in order to define

∫f dμ. It is thus natural to define the

integral for such functions which may be defined (and also nonnegative)only a.e. The following lemma formalizes the natural definition of

∫f dμ

for such f .

Lemma 4.2.5 Let f be a measurable function defined and nonnegativea.e., i.e. (at least) on a set D ∈ S where μ(Dc) = 0. Then the integral off is unambiguously defined by

∫f dμ =

∫g dμ where g is any nonnegative

measurable function on X with g = f a.e.

Proof There is certainly one such function (g(x) = f (x) for x ∈ D, g(x) = 0for x ∈ Dc) and if h is another such function, h = f = g a.e. so that

∫h dμ =∫

g dμ by Theorem 4.2.4 (ii). �

If f is a measurable function defined and nonnegative a.e., then so alsois fχE for each E ∈ S and the indefinite integral is defined as

∫E

f dμ =∫χEf dμ.

Lemma 4.2.6 Let f , g be measurable, defined and nonnegative a.e. Then

(i) for a ≥ 0, b ≥ 0, af + bg is also measurable, defined and nonnegativea.e. and

∫(af + bg) dμ = a

∫fdμ + b

∫g dμ.

(ii) If f ≥ g a.e. then∫

f dμ ≥∫

g dμ.(iii) If f = g a.e., then

∫f dμ =

∫g dμ.

(iv) If E ∈ S, μ(E) = 0 then∫

Ef dμ = 0.

68 The integral

Proof (i) Let f ′, g′ be nonnegative measurable functions defined on Xwith f ′ = f a.e., g′ = g a.e. Then af ′ + bg′ = af + bg a.e. so that

∫(af +

bg) dμ =∫

(af ′+bg′) dμ = a∫

f ′ dμ+b∫

g′ dμ = a∫

f dμ+b∫

g dμ by Lemmas4.2.3 and 4.2.5, showing (i).

(ii) If f ≥ g a.e. then the functions f ′, g′ used in (i) satisfy f ′ ≥ g′ a.e.and adjustment of values at exceptional points (e.g. f ′(x) = g′(x) = 0) givesf ′(x) ≥ g′(x) for all x. Then

∫f dμ =

∫f ′ dμ ≥

∫g′ dμ =

∫g dμ, by Lemma

4.2.3 (ii).(iii) follows from (ii) by interchanging f , g.The final part (iv) is immediate since fχE = f ′χE a.e. (with f ′ as above)

and∫

Ef dμ =

∫E

f ′ dμ = 0 by Theorem 4.2.4 (i). �

Again, a nonnegative measurable function f defined a.e. will be termedintegrable if

∫f dμ < ∞.

4.3 Integrability

The concept of integrability was defined in the previous section for non-negative measurable functions defined a.e. The definition will now beextended to functions which can take either sign (or values ±∞) by theobvious means of splitting a function into its positive and negative parts.As noted before,

∫f dμ will be shortened to

∫f as convenient when there is

no danger of confusion.Specifically, a measurable function f defined a.e. on (X,S, μ) is termed

integrable if its positive and negative parts f+, f– are integrable (as non-negative functions), i.e. if

∫f+ < ∞,

∫f– < ∞. The integral of f is then

naturally defined as ∫f =

∫f+ –

∫f– .

The value of the integral of an integrable function is a finite real number. Iff is not integrable but one of

∫f+,

∫f– is finite, the integral of f may still be

defined by this equality, taking the appropriate one of the values ±∞. Onthe other hand, the integral is not defined if

∫f+ =

∫f– = ∞.

This extension of integrability to functions which are not necessarilypositive is clearly consistent with its use for a.e. nonnegative f , where f– = 0(a.e.) so that

∫f– = 0. By the same token the definition of

∫f for noninte-

grable f also reduces to that given previously when f is nonnegative (againsince

∫f– = 0).

The indefinite integral∫

Ef dμ is again defined as

∫χEf dμ where this

latter integral is defined, i.e. when one or both of (χEf )+ (= χEf+), (χEf )–

4.4 Properties of the integral 69

(= χEf–) are integrable. This may occur with∫

Ef dμ defined (finite or in-

finite) even though∫

f dμ is not defined, and of course,∫

Ef dμ may be

defined and finite-valued when∫

f dμ = ±∞.In summary the integral has been defined for

(a) all nonnegative measurable functions defined a.e. and then 0≤∫

f ≤∞.If

∫f < ∞, f is termed integrable, and otherwise we say that

∫f is

defined (having the value +∞);(b) a measurable function f defined a.e. for which at least one of

∫f+,

∫f–

is finite. The integral is then defined as∫

f =∫

f+ –∫

f–, which can befinite or one of the values ±∞. If both

∫f+,

∫f– are finite, f is termed

integrable and otherwise we just say that∫

f is defined, with the value+∞ if

∫f+ = ∞,

∫f– < ∞ and –∞ if

∫f+ < ∞,

∫f– = ∞.

Finally we note that for added clarity we will sometimes write∫f (x) dμ(x) for

∫f dμ especially if integrals over different spaces are be-

ing considered (cf. Theorem 4.6.1). Another popular notation is to write∫f (x)μ(dx) which can be helpful in some special contexts.

4.4 Properties of the integral

This section concerns the basic properties of the integral. Some of theseproperties have been obtained already in special cases as part of thedefining process used. First we show the intuitively obvious facts that anintegrable function must be finite a.e., and that integrals over zero measuresets are zero.

Theorem 4.4.1 (i) If f is integrable, it is finite a.e.(ii) If f is measurable, defined a.e., and E ∈ S, μ(E) = 0, then fχE is

integrable and∫

Ef dμ = 0.

Proof (i) If E = f –1(∞) = f –1+ (∞) then f+ ≥ nχE a.e. (i.e. at all points

where f is defined) so that∫

f+ ≥ nμ(E) by Lemma 4.2.6 (ii). Thus μ(E) ≤n–1

∫f+ (< ∞) and n → ∞ gives μ(E) = 0. That is μ(f –1(∞)) = 0 and

similarly μ(f –1(–∞)) = 0 so that f is finite a.e.(ii) By Lemma 4.2.6 (iv),

∫f+χE = 0 =

∫f–χE so that fχE is integrable

and∫

fχE = 0 as required. �

Theorem 4.4.2 Let f , g be measurable, defined a.e., and f = g a.e. on(X,S, μ). Then the following hold:

(i) If f is integrable, so is g and∫

g =∫

f .

70 The integral

(ii) If f is not integrable but∫

f is defined, then g is not integrable but∫

gis defined and

∫g =

∫f (i.e. ±∞).

(iii) If∫

f is not defined then∫

g is not defined.

Further,

(iv) If f is an integrable function there exists a finite-valued integrablefunction h defined on X with h = f a.e. (and hence

∫f =

∫h).

Proof If f = g a.e. then f+ = g+, f– = g– a.e., and∫

g+ =∫

f+ ≤ ∞,∫

g– =∫f– ≤ ∞ by Lemma 4.2.6 (iii). If f is integrable these four integrals are

finite so that g is integrable and∫

g =∫

g+ –∫

g– =∫

f+ –∫

f– =∫

f and hence(i) holds. On the other hand if f is not integrable but

∫f is defined then∫

f = ±∞. If∫

f = ∞, then∫

f+ = ∞,∫

f– < ∞ and∫

g+ =∫

f+ = ∞,∫

g– =∫f– < ∞ so that g is not integrable but

∫g is defined and

∫g = ∞ =

∫f .

Similarly,∫

g = –∞ if∫

f = –∞, giving (ii). (iii) is immediate from (ii)since if

∫g were defined,

∫f would be also.

(iv) Since f is finite a.e., by Theorem 4.4.1 (i) it is defined and has finitevalues on D ∈ S, with μ(Dc) = 0. The function h defined to be equal tof on D and zero on Dc is finite, equal to f a.e. (thus integrable by (i)) andsatisfies the conditions of (iv). �

The next result establishes the linearity of the integral.

Theorem 4.4.3 (i) If f , g are integrable so is f +g and∫

(f +g) =∫

f +∫

g.(ii) If f is integrable and a is a real number, then af is integrable and∫

af = a∫

f .Hence if f1, f2, . . . , fn are integrable, a1, a2, . . . , an real, then

∑n1aifi is in-

tegrable and∫

(∑n

1aifi) dμ =∑n

1ai

∫fi dμ.

Proof (i) f and g are both finite a.e. by Theorem 4.4.1 so that (f + g) iscertainly defined and finite a.e. Further, (f + g)+ ≤ f+ + g+ a.e. and hence∫

(f + g)+ ≤∫

f+ +∫

g+ < ∞, by Lemma 4.2.6 (ii) and (i). Similarly∫(f + g)– < ∞ so that (f + g) is integrable. Now clearly (f + g)+ – (f + g)– =

f + g = f+ – f– + g+ – g– a.e. so that using a.e. finiteness, (f + g)+ + f– + g– =(f + g)– + f+ + g+ a.e. and by Lemma 4.2.6 (i) (for the nonnegative functionsinvolved) ∫

(f + g)+ +∫

f– +∫

g– =∫

(f + g)– +∫

f+ +∫

g+.

Since all terms are finite we have∫(f + g) =

∫(f + g)+ –

∫(f + g)– =

∫f+ +

∫g+ –

∫f– –

∫g– =

∫f +

∫g

as required.

4.4 Properties of the integral 71

(ii) If f is integrable and a > 0, (af )+ = af+, (af )– = af– and by Lemma4.2.6 (i) (b = 0),

∫af+ = a

∫f+,

∫af– = a

∫f– so that

∫af =

∫(af )+ –

∫(af )– =

a(∫

f+ –∫

f–) = a∫

f (the terms being finite) as required. The changes neededfor a < 0 are obvious. �

The next result shows the monotonicity property of the integral in gen-erality and provides the basis of important integrability criteria to follow.

Theorem 4.4.4 Let f , g be measurable functions defined a.e., with f ≥ ga.e. and such that

∫f dμ,

∫g dμ are defined. Then

∫f dμ ≥

∫g dμ.

Proof Clearly f+ ≥ g+ a.e., f– ≤ g– a.e. so that∫

f+ ≥∫

g+,∫

f– ≤∫

g–

by Lemma 4.2.6 (ii). Since∫

f ,∫

g are defined, at least one of∫

f+,∫

f– isfinite as is at least one of

∫g+,

∫g– which together with the above inequal-

ities clearly imply that∫

f =∫

f+ –∫

f– ≥∫

g+ –∫

g– =∫

g. �

It will be natural at this point to introduce the standard terminology ofwriting L1 or L1(X,S, μ) for the class of integrable functions. In later chap-ters L1 will be developed as a linear space but here the statement “f ∈ L1”will simply be a compact and natural alternative to writing “f is integrable”.

The next result gives the important property that a measurable functionf is integrable if and only if |f | is. Note that the assumption that f be mea-surable is necessary in this statement since |f | can be measurable when fitself is not (cf. Ex. 3.2).

Theorem 4.4.5 Let f be a measurable function defined a.e. Then thefollowing conditions are equivalent.

(i) f ∈ L1,(ii) f+ ∈ L1, f– ∈ L1,

(iii) |f | ∈ L1.

Further, if f ∈ L1, |∫

f dμ| ≤∫|f | dμ.

Proof The equivalence of (i) and (ii) is simply the definition of integra-bility of f as integrability of both f+ and f–. If (ii) holds then so does (iii)by Theorem 4.4.3 (i) since |f | = f+ + f–. The proof of equivalence will becompleted by showing that (iii) implies (ii). In fact, if (iii) holds then since0 ≤ f+ ≤ |f | it follows from Theorem 4.4.4 that

∫f+ ≤

∫|f | < ∞ so that

f+ ∈ L1. Similarly, f– ∈ L1 and (ii) holds.

72 The integral

Finally if f ∈ L1, then |f | ∈ L1 as shown. Since f ≤ |f |, it follows that∫f dμ ≤

∫|f | dμ by Theorem 4.4.4. But also –f ≤ |f | and hence –

∫f dμ =∫

(–f ) dμ ≤∫|f | dμ. Thus |

∫f dμ| ≤

∫|f | dμ and the proof of the theorem is

complete. �

The following result gives a useful test for integrability akin to (and in-deed generalizing) the “Comparison Theorem” for testing convergence ofseries.

Theorem 4.4.6 Let f ∈ L1 and let g be a measurable function defined a.e.and such that |g| ≤ |f | a.e. Then g ∈ L1.

Proof By Theorem 4.4.5, |f | ∈ L1 and hence∫|g| ≤

∫|f | < ∞ by Lemma

4.2.6 (ii). Hence |g| ∈ L1 and g ∈ L1, again by Theorem 4.4.5. �

If f is measurable and f = 0 a.e. then it is clear that f ∈ L1 and∫

f dμ = 0.The converse is, of course, not true. However, it is intuitively clear that iff is nonnegative and has zero integral, then f = 0 a.e. Specifically, thefollowing result holds.

Theorem 4.4.7 If f is a measurable function, defined and nonnegativea.e., and such that

∫f dμ = 0, then f = 0 a.e.

Proof Define the following sets (measurable since f is measurable)

E = {x : f (x) > 0}, En = {x : f (x) ≥ 1/n}, n = 1, 2, . . . .

Now {En} is an increasing sequence whose limit is E, so that μ(E) =limn→∞ μ(En). Since f ≥ fχEn ≥ 1

nχEn a.e., it then follows from Theorem4.4.4 that

1nμ(En) ≤

∫f dμ = 0.

Hence μ(En) = 0 for all n so that μ(E) = 0 and f = 0 a.e. �

A useful variant of this result is the following (see also Exs. 4.13, 4.14).

Theorem 4.4.8 If f ∈ L1 and∫

Ef dμ = 0 for all E ∈ S, then f = 0 a.e.

Proof Let E = {x : f (x) > 0}. Then E ∈ S and by assumption∫

fχE dμ =∫E

f dμ = 0. Since fχE ≥ 0 a.e. it follows by Theorem 4.4.7 that fχE = 0a.e. But fχE > 0 on E so that μ(E) = 0. Similarly μ{x : f (x) < 0} = 0 andthus f = 0 a.e. �

Corollary If f , g are L1-functions and∫

Ef dμ =

∫E

g dμ for all E ∈ S,then f = g a.e.

4.5 Convergence of integrals 73

Proof By Theorem 4.4.3, f – g ∈ L1 and∫

E(f – g) =

∫E

f –∫

Eg = 0 for any

E ∈ S. Thus f – g = 0 a.e. and this is easily seen to imply that f = g a.e. (fand g are each finite a.e.). �

The set of points at which an integrable function f is infinite has measurezero (Theorem 4.4.1). Further, if f is simple and integrable, the set (Nf say)of points where f � 0, has finite measure. This latter property is no longernecessarily true for general integrable f . However, it is true that the set ofpoints where |f | exceeds any fixed ε > 0 has finite measure and that Nf

has σ-finite measure in the sense that Nf ⊂ ∪∞1 Ei for some Ei ∈ S withμ(Ei) < ∞. This is shown by the following result.

Theorem 4.4.9 If f ∈ L1 then μ{x : |f (x)| ≥ ε} < ∞ for every ε > 0 andthe set Nf = {x : f (x) � 0} has σ-finite measure.

Proof Write E = {x : |f (x)| ≥ ε}. Since |f | ∈ L1 and |f | ≥ |f |χE ≥ εχE

a.e. (in fact this holds at all points where f is defined), we have εμ(E) ≤∫|f | dμ < ∞ by Theorem 4.4.4, so that μ(E) < ∞, as required.Also Nf = {x : f (x) � 0} = ∪∞n=1{x : |f (x)| ≥ 1/n}. Since μ{x : |f (x)| ≥

1/n} < ∞ by the above, Nf has σ-finite measure. �

4.5 Convergence of integrals

This section considers questions relating to the convergence of sequencesof integrals

∫fn dμ (on a basic measure space (X,S, μ), as before). In

particular, conditions are obtained under which∫

fn dμ →∫

f dμ whenfn(x) → f (x) for all x (or a.e.). Put in another way, we seek conditionsunder which limn→∞

∫fn dμ =

∫(limn→∞ fn) dμ, i.e. conditions under which

the order of “limit” and “integral” may be reversed. (Writing limn→∞ an = ameans throughout that the limit of the sequence of real numbers {an} existsand is equal to a.) Some celebrated results in this connection will now beobtained, the first of these being the very important Monotone ConvergenceTheorem, stated first in a more limited context and then generally.

Lemma 4.5.1 Let {fn} be an increasing sequence of nonnegative measur-able functions defined on X, and f a nonnegative measurable function onX such that fn(x)→ f (x) for each x (f can take infinite values). Then∫

fn dμ →∫

f dμ as n→ ∞.

Note that this means that if∫

f dμ is finite,∫

fn dμ is finite for each n, and∫fn dμ converges to the finite limit

∫f dμ. However, if

∫f dμ = ∞, then

74 The integral

either each∫

fn dμ is finite and∫

fn dμ→ ∞ or∫

fn dμ = ∞ for all n ≥ someN0.

Proof For each n, there is an increasing sequence {fn,k}∞k=1 of nonnegativesimple functions such that limk→∞ fn,k(x) = fn(x) for all x ∈ X. Since themaximum of a finite number of simple functions is simple, it follows thatgk(x) = maxn≤k fn,k(x) is a simple function. Further {gk} is an increasingsequence of functions since gk(x) ≤ maxn≤k fn,k+1(x) ≤ gk+1(x). Since {fk}is an increasing sequence and fk(x) → f (x) it follows that for all x and alln ≤ k,

fn,k(x) ≤ gk(x) = maxm≤k

fm,k(x) ≤ maxm≤k

fm(x) = fk(x) ≤ f (x).

Letting k → ∞, we have fn(x) ≤ limk→∞ gk(x) ≤ f (x) for all x and n. Hencef ≤ limk→∞ gk ≤ f (letting n → ∞) and thus {gk} is also an increasingsequence of simple functions converging to f .

Further, since for all k ≥ n, fn,k ≤ gk ≤ fk, it follows from Theorem 4.4.4that ∫

fn,k ≤∫

gk ≤∫

fk.

Letting k → ∞ (and using the definition of∫

fn = lim∫

fn,k and∫

f =lim

∫gk) we see that for all n∫

fn ≤∫

f ≤ limk→∞

∫fk.

Now letting n→ ∞ gives the desired result

limn→∞

∫fn =

∫f . �

The conditions assumed to hold “everywhere” in this lemma may berelaxed to conditions holding only a.e., as follows, to give the general result.

Theorem 4.5.2 (Monotone Convergence Theorem) Let {fn} be a sequenceof a.e. nonnegative measurable functions each defined a.e. and such thatfn(x) ≤ fn+1(x) a.e. for each n. Let f be a measurable function defined a.e.and nonnegative a.e. on X, and such that fn(x) → f (x) a.e. Then

∫fn dμ →∫

f dμ.

Proof By combining zero measure sets in the usual way, a set E ∈ S withμ(Ec) = 0 may be found such that for x ∈ E, fn(x) ≥ 0, fn(x) ≤ fn+1(x),n = 1, 2, . . . , and fn(x)→ f (x) ≥ 0.

Define measurable functions f ′n , f ′ (cf. Lemma 3.4.1) by f ′n(x) = fn(x),f ′(x) = f (x) when x ∈ E and f ′n(x) = f ′(x) = 0 for x ∈ Ec. The func-tions f ′n , f ′ satisfy the conditions of Lemma 4.5.1, and hence

∫f ′n →

∫f ′.

4.5 Convergence of integrals 75

But f ′n = fn a.e., f ′ = f a.e., giving∫

fn =∫

f ′n ,∫

f =∫

f ′ (Theorem 4.4.2 (i)and (ii)), giving the desired result. �

An important corollary of monotone convergence concerns the inversionof order of summation and integration for nonnegative integrands.

Corollary Let {fn} be a sequence of (a.e.) nonnegative measurable func-tions defined (a.e.) on X. Then

∑∞1 fn is an a.e. nonnegative measurable

function (defined a.e. on X) and∫(∑∞

n=1fn) dμ =∑∞

n=1

∫fn dμ (≤ ∞).

Proof It is easily checked that the functions∑n

1fi are a.e. nonnegative,nondecreasing and converge to f =

∑∞1 fn, 0 ≤ f (≤ ∞) a.e.

It thus follows from Theorem 4.5.2 and Lemma 4.2.6 (i) that∫f = lim

n→∞

∫ ∑n1fi = lim

n→∞

∑n1

∫fi =

∑∞1

∫fi. �

A corresponding result holds for series whose terms can take positiveand negative values, under appropriate convergence conditions. This isgiven as Ex. 4.19 (see also Ex. 7.19).

The indefinite integral∫

Ef dμ is zero when μ(E) = 0 (for any measurable

f – Theorem 4.4.1). This property, to be studied in the next chapter, assertsthat the indefinite integral is absolutely continuous with respect to μ. Thefollowing result gives an equivalent criterion for absolute continuity for theindefinite integral which will be later extended (Theorem 5.5.3, Corollary)to more general set functions. Its proof makes an interesting application ofmonotone convergence.

Theorem 4.5.3 If f ∈ L1, given any ε > 0, δ > 0 can be found such that|∫

Ef dμ| < ε whenever E ∈ S and μ(E) < δ. In particular

∫En

f dμ → 0 ifμ(En)→ 0 as n→ ∞.

Proof Write fn = |f | if |f | ≤ n, and fn = n otherwise. Then {fn} isan (a.e.) increasing sequence of nonnegative measurable functions (cf.Lemma 3.4.1) with limn→∞ fn = |f | a.e. By monotone convergence,limn→∞

∫fn dμ =

∫|f | dμ and hence, given ε > 0, there exists N such that∫

fN ≥∫|f | – ε/2. Choose δ = ε/(2N). Then if E ∈ S, μ(E) < δ,

|∫

Ef | ≤

∫E|f | =

∫E

fN +∫

E(|f | – fN).

The first term in the expression on the right does not exceed Nμ(E) < ε/2,and the second term is dominated by

∫(|f | – fN) ≤ ε/2. Hence the result

follows. �

76 The integral

The next theorem is another famous and very useful result (perhaps con-trary to appearances), known as Fatou’s Lemma.

Theorem 4.5.4 (Fatou’s Lemma) Let {fn} be a sequence of a.e. nonnega-tive measurable functions each defined a.e. on X. Then

lim infn→∞

∫fn dμ ≥

∫(lim inf

n→∞fn) dμ.

Proof Define gn(x) = infk≥n fk(x). Then {gn} is an a.e. increasing sequenceof a.e. nonnegative measurable functions, defined a.e., and limn→∞ gn(x) =lim infn→∞ fn(x) a.e. Also gn ≤ fk a.e. for all k ≥ n, so that by Theorem 4.4.4∫

gn dμ ≤∫

fk dμ, and thus∫

gn dμ ≤ infk≥n

∫fk dμ. Hence

∫(lim inf

n→∞fn) dμ =

∫limn→∞

gn dμ

= limn→∞

∫gn dμ (monotone convergence)

≤ lim infn→∞ k≥n

∫fk dμ = lim inf

n→∞

∫fn dμ. �

The following example shows that equality does not always hold inFatou’s Lemma. Let m be Lebesgue measure on the real line and fn =χ(n,n+1). Then limn→∞ fn(x) = 0 for all x,

∫fn dm = m{(n, n + 1)} = 1 for

all n so that∫

(lim infn→∞

fn) dm = 0 < 1 = lim infn→∞

∫fn dm

where in both cases lim inf = lim.The final result of this section is again a celebrated and extremely useful

one, known as Lebesgue’s Dominated Convergence Theorem.

Theorem 4.5.5 (Dominated Convergence Theorem) Let {fn} be a sequenceof L1-functions on a measure space (X,S, μ) and g ∈ L1, such that |fn| ≤ |g|a.e. for each n = 1, 2, . . . . Let f be measurable and such that fn(x) → f (x)a.e. Then

f ∈ L1 and∫|fn – f | dμ → 0 as n→ ∞.

Since |∫

fn dμ –∫

f dμ| = |∫

(fn – f ) dμ| ≤∫|fn – f | dμ, it also follows that

∫fn dμ →

∫f dμ.

Proof Since fn → f a.e. and |fn| ≤ |g| a.e. we see simply that |f | ≤ |g| a.e.Hence f ∈ L1 by Theorem 4.4.6. Since |fn – f | ≤ 2|g| a.e. it follows that

4.6 Transformation of integrals 77

for each n, (2|g| – |fn – f |) is defined and nonnegative a.e. Thus, by Fatou’sLemma,∫

2|g| =∫

lim infn→∞

(2|g| – |fn – f |) ≤ lim infn→∞

∫(2|g| – |fn – f |)

since |fn – f | → 0 a.e. Hence∫2|g| ≤

∫2|g| + lim inf

n→∞

{–∫|fn – f |

}.

Since g ∈ L1, i.e.∫|g| is finite, we have lim infn→∞{–

∫|fn – f |} ≥ 0 so that

lim supn→∞{∫|fn – f |} ≤ 0 and hence limn→∞

∫|fn – f | = 0 as required. �

The same real line example fn(x) = χ(n,n+1)(x) as for Fatou’s Lemmashows that the conclusion of the dominated convergence theorem is notnecessarily true in the absence of the L1-bound g. Then f (x) =limn→∞ fn(x) = 0 for all x and writing m for Lebesgue measure,

limn→∞

∫|fn – f | dm = 1 � 0, lim

n→∞

∫fn dm = 1 � 0 =

∫f dm.

In this case of course any g such that |fn| ≤ |g| a.e. for each n, satisfiesχ(1,∞) ≤ |g| a.e. and hence is not in L1.

4.6 Transformation of integrals

This is a natural point to demonstrate a general transformation theorem forintegrals. Let (X,S, μ) be a measure space, (Y ,T ) a measurable space, T ameasurable transformation from a subset of X into Y , and μT–1 the measureinduced on T by μ and T as in Section 3.7, i.e. (μT–1)(E) = μ(T–1E) forall E ∈ T . Suppose also that f is a T -measurable function defined on Y .Then the composition f T ((f T)(x) = f (Tx)) is a measurable function on X(Theorem 3.4.3), and it is natural to ask whether there is any relationshipbetween the two integrals

∫X

f T dμ,∫

Yf dμT–1. The following important

transformation theorem shows that these integrals are either both defined,or neither is, and if defined they are equal.

Theorem 4.6.1 (Transformation Theorem) Let (X,S, μ) be a measurespace, (Y ,T ) a measurable space, T a measurable transformation defineda.e. (μ) on X into Y, and f a measurable function defined on Y. Then∫

Yf dμT–1 =

∫X

f T dμ

whenever f is nonnegative (a.e.), or μT–1-integrable, or f T is μ-integrable.

78 The integral

Proof If f is the indicator function χE(y) of E ∈T , then f T(x) = χE(Tx) =χT–1E(x) and hence∫

Yf (y) dμT–1(y) = μT–1(E) =

∫XχT–1E(x) dμ(x) =

∫X

f T(x) dμ(x).

The result is thus true for indicator functions. It follows for nonnegativesimple functions by addition and for nonnegative T -measurable functions fby considering an increasing sequence {fn} of nonnegative simple functionsconverging to f , and using the definition of the integral. Finally the resultfollows if f is μT–1-integrable or f T is μ-integrable by writing f = f+ – f–

and noting that (f T)+ = f+T , (f T)– = f–T . �

Corollary The theorem remains true if f is defined only a.e. (μT–1), orequivalently if f T is defined just a.e. (μ). In fact if either of the two integralsis defined (finite or infinite) so is the other and equality holds. (See Ex.4.24.)

Note that the theorem and its corollary imply that f ∈ L1(Y ,T , μT–1) ifand only if f T ∈ L1(X,S, μ). Some interesting applications of the transfor-mation theorem will be given in the exercises of Chapter 5 in connectionwith the result concerning a “change of measure”. It is also very importantin probability theory (see Chapter 9) where it expresses the expected valueof a function f of a random element as the integral of f with respect to thedistribution of the random element.

4.7 Real line applications

This section contains some comments concerning Lebesgue and Lebesgue–Stieltjes integrals on the real line R. As usual, let B denote the Borel setsof R. Let μF be the Lebesgue–Stieltjes measure on B corresponding to anondecreasing right-continuous function F defined on R (cf. Section 2.8).If g is a Borel measurable function such that

∫g dμF is defined, write∫ ∞

–∞ g(x) dF(x) =∫R

g dF =∫R

g dμF.

That is the Lebesgue–Stieltjes Integral∫R

g dF is defined as∫R

g dμF. Forsuch a g we have also

∫R

g dF =∫R

g dμF (cf. Ex. 4.10), where μ is thecompletion of μF, on its σ-fieldBF say. (Note that g isBF-measurable sinceBF ⊃ B.) On the other hand if g is just BF-measurable the latter definition∫R

g dμF may still be used for∫R

g dF.In particular, if F(x) = x, write∫ ∞

–∞ g(x) dx =∫R

g dm

4.7 Real line applications 79

where m is Lebesgue measure on the Borel sets B or the Lebesgue measur-able sets L, as appropriate.

Suppose now that g is a Lebesgue measurable function and m isLebesgue measure. For any –∞ < a ≤ b < ∞ write

∫ b

ag(x) dx =

∫(a,b)

g dm =∫Rχ(a,b) g dm

when this is defined. Note that this has the same value if the open inter-val (a, b) is closed at either end since m({a}) = m({b}) = 0. Equivalently∫ b

ag(x) dx may be defined by integrating g over the space (a, b) with re-

spect to Lebesgue measure on the Lebesgue measurable subsets of (a, b).We write L1 for L1(R,L, m) and L1(a, b) for the Lebesgue measurable func-tions g such that gχ(a,b) ∈ L1. Note that if g ∈ L1, then g ∈ L1(a, b) for every–∞ < a ≤ b < +∞. (The converse is not true – Ex. 4.28.) Further, if g ∈ L1

then dominated convergence with gn = gχ(–n,n) gives∫ ∞–∞ g(x) dx = lim

n→∞

∫ n

–ng(x) dx.

On the other hand, for all Lebesgue measurable functions g, monotoneconvergence gives ∫ ∞

–∞ |g(x)| dx = limn→∞

∫ n

–n|g(x)| dx.

Hence a Lebesgue measurable g belongs to L1 if and only if

limn→∞

∫ n

–n|g(x)| dx < ∞.

Thus if g is Lebesgue measurable, we may determine whether it is in L1

by the finiteness (or otherwise) of limn

∫ n

–n|g(x)| dx, and then, if g ∈ L1,

evaluate∫ ∞

–∞ g(x) dx by limn→∞∫ n

–ng(x) dx.

In practical cases, one often deals with a function g which is Riemannintegrable on every finite interval. It then follows (Exs. 4.25, 4.26) that g isLebesgue measurable on R. It also follows that g ∈ L1(a, b) and

∫ b

ag(x) dx

is the same as the Riemann integral of g over (a, b) (Ex. 4.26) if a, b arefinite. Thus, in such a case,

∫ n

–n|g(x)| dx and

∫ n

–ng(x) dx may be evaluated as

Riemann integrals and their limits used to determine whether g ∈ L1, andif so to obtain the value of

∫ ∞–∞ g dx.

The point is that it is usually easiest to evaluate an integral by Riemannprocedures (e.g. inversion of differentiation) when possible. There are, ofcourse, functions which are Lebesgue- but not Riemann-integrable on afinite range (such as the indicator function of the rationals in (0,1)) butthese are not usually encountered in practice.

80 The integral

As an example, suppose g(x) = 1/x2 for x ≥ 1, and g(x) = 0 otherwise.Then g is Borel, hence also Lebesgue, measurable (cf. Lemma 3.4.1) andRiemann integrable on every finite range. Further

∫ n

–n|g(x)| dx may be eval-

uated as a Riemann integral – viz. 1 – 1/n. Since this tends to 1 as n → ∞,we see that g ∈ L1 and, in fact,

∫ ∞–∞ g(x) dx = limn→∞

∫ n

–ng(x) dx = 1. On the

other hand, if 1/x2 is replaced by 1/x, it is seen at once that g � L1.The “comparison theorem” (Theorem 4.4.6) is also very useful in deter-

mining integrability. For example, let g(x) = 1/(1 + x2) for all x. Since g iscontinuous it is Borel and also Lebesgue measurable. Further, |g(x)| ≤ 1 for|x| ≤ 1 and |g(x)| < 1/x2 for |x| > 1. Since

∫ 1

–11 dx < ∞ and

∫ ∞1

(1/x2) dx < ∞we have g ∈ L1. (The simple details are left as an exercise.)

The (“proper”) Riemann integrals considered apply to bounded func-tions on finite ranges. These requirements may be relaxed by taking limitsover increasing integration ranges to give “improper Riemann integrals”,and corresponding Lebesgue integrals may or may not exist. Exercise 4.27provides a useful illustration of this.

Finally note that if Tx = αx + β, x ∈ R, α � 0, then T is a measurabletransformation from (R,L, m) onto (R,L) and mT–1 = 1

|α|m (cf. Theorem2.7.5). It then follows from the transformation theorem (Theorem 4.6.1 andits corollary) that if g is nonnegative a.e. or if g ∈ L1, then

∫ ∞–∞ g(αx + β) dx =

1|α|

∫ ∞–∞ g(y) dy.

Similarly, if g is nonnegative a.e. on (a, b), –∞ < a ≤ b < +∞, or ifg ∈ L1(a, b) then

∫ b

ag(αx + β) dx =

1α

∫ αb+β

αa+βg(y) dy,

where the notation∫ d

cg(y) dy = –

∫ c

dg(y) dy is used for d ≤ c. This is easily

seen by noting e.g. that χ(a,b)(x) = χ(αa+β,αb+β)(Tx) when α > 0 so that theleft hand side is∫

χ(αa+β,αb+β)(Tx) g(Tx) dx =1α

∫χ(αa+β,αb+β)(y) g(y) dy.

Exercises4.1 If f , g are nonnegative simple functions, and g is integrable, show that the

product fg is integrable.4.2 Let μ be a finite measure on a measurable space (X,S), and f a measurable

function which is bounded a.e. (i.e. |f | ≤ M a.e. for some finite M). Showthat f ∈ L1.

Exercises 81

4.3 Let μ be a finite measure on a measurable space (X,S) and let E1, . . . ,En be sets in S. Show that

χ∪n1Ei =

∑ni=1χEi –

∑i<jχEi∩Ej + · · · + (–)n–1χE1∩E2∩...∩En .

Hence provide a simple proof of Ex. 2.6.4.4 If f , g are measurable functions defined on the measure space (X,S, μ) and

such that a ≤ f (x) ≤ b a.e. and g ∈ L1, show the mean value theorem forintegrals, i.e. show that

∫X f |g| dμ = c

∫X |g| dμ

for some real c such that a ≤ c ≤ b.4.5 Let (X,S, μ) be a measure space and suppose f ∈ L1, g ∈ L1. Show that

min(f , g) ∈ L1 and

min(∫

f dμ,∫

g dμ) ≥∫

min(f , g) dμ.

If equality holds, what may be deduced about the relation between f , g?4.6 Let (X,S, μ) be a measure space, f an integrable function defined on X and

En = {x ∈ X : |f (x)| ≥ n}, n = 1, 2, . . . . Show that if E is the set where f is notfinite, then

μ(E) = limn→∞

μ(En) = 0.

Show also the following stronger property:

limn→∞

nμ(En) = 0.

4.7 Let X be the set of positive integers, and S the σ-field of all subsets of X.Let μ be “counting measure” on X (i.e. μ(E) is the number of points in E). Afunction f is defined on X by f (n) = an, n = 1, 2, . . . . Show that f is integrableif and only if

∑∞n=1|an| < ∞, and then

∫f dμ =

∑∞n=1an.

4.8 Let μ be a finite measure on a measurable space (X,S), and f a measurablefunction defined on X. Show that f is integrable if and only if

∑∞n=1 μ{x :

|f (x)| ≥ n} converges.4.9 Let E0 be a fixed measurable subset of a measure space (X,S, μ), and define a

measure μ0 onS by μ0(E) = μ(E∩E0) for E ∈ S. Show that∫

f dμ0 =∫

E0f dμ

for any f for which∫

f dμ0 is defined.4.10 Let (X,S, μ) be a measure space. Let T be a σ-field such that T ⊃ S (i.e. S

is a “sub-σ-field” of T ) and let ν be a measure on T such that ν(E) = μ(E)when E ∈ S (i.e. ν is an extension of μ to T ). Suppose f is an S-measurablefunction. Show that it is T -measurable and that

∫X f dμ =

∫X f dν where the

latter is defined. (In the former integral f is regarded as S-measurable, andT -measurable in the latter.) In particular if ν is the completion μ of μ (andT is the “completion σ-field” S) then

∫f dμ =

∫f dμ.

82 The integral

4.11 Suppose μ1 and μ2 are two measures defined on a σ-field S of subsets of Xand μ(E) = μ1(E) + μ2(E) for every E ∈ S. Show that μ is a measure on Sand if f is nonnegative measurable, or integrable with respect to both μ1 andμ2, ∫

f dμ =∫

f dμ1 +∫

f dμ2.

In the latter case f is then integrable with respect to μ.4.12 Let {μn}∞n=1 be a sequence of probability measures on (X,S) (i.e.

μn(X) = 1) and define the set function μ on S by

μ(E) =∑∞

n=112n μn(E) for all E ∈ S.

Show that μ is a probability measure on S and that for all nonnegative mea-surable or μ-integrable functions f defined on X

∫f dμ =

∑∞n=1

12n

∫f dμn.

4.13 Let (X,S, μ) be a measure space, E a class of subsets of X which is closedunder the formation of intersections, and such that S(E) = S, and either f ∈L1 or f is a measurable function defined and nonnegative a.e. If∫

E f dμ = 0 for all E ∈ E

then show that f = 0 a.e.4.14 Let f be an integrable function defined on (X,S, μ).

(i) Show that if∫

E f dμ ≥ 0 for all E ∈ S, then f ≥ 0 a.e.(ii) If E is a field of subsets of X such that S(E) = S, and if∫

E f dμ ≥ 0 for all E ∈ E

then show that f ≥ 0 a.e.4.15 Let f be a finite-valued nonnegative measurable function defined on a mea-

sure space (X,S, μ). Write

Sn =∑∞

k=0k2n μ{x :

k2n < f (x) ≤ k + 1

2n } n = 1, 2, . . . .

Show that Sn →∫

f dμ as n → ∞. (Write fn(x) = k/2n if k/2n < f (x) ≤(k + 1)/2n, fn(x) = 0 if f (x) = 0.) This result may be generalized to includefunctions taking positive and negative values.

4.16 Let {fn} be a sequence of measurable functions on (X,S, μ) and g ∈ L1. Showthat if |fn| ≤ 1

2n a.e. for each n, then∫ ∑∞n=1fng dμ =

∑∞n=1

∫fng dμ.

4.17 Let g, fn, n = 1, 2, . . . , be L1-functions on a measure space (X,S, μ) such that|fn(x)| ≤ g(x) a.e. for each n. Show that∫

(lim supn→∞

fn) dμ ≥ lim supn→∞

∫fndμ.

Exercises 83

(Apply Fatou’s Lemma to g – fn.) Note that this result, and Fatou’s Lemmamay be combined to give a statement of dominated convergence (directly,at least for nonnegative fn’s). This sheds light on where the “dominated”assumption is relevant in that theorem.

4.18 Let μ be Lebesgue measure on the real line. Let

fn(x) = –n2 for 0 < x < 1/n

= 0 otherwise.

Evaluate lim infn∫

fn dμ,∫

lim infn fn dμ and comment concerning Fatou’sLemma and Dominated Convergence.

4.19 Let {fn} be a sequence of L1-functions on a measure space (X,S, μ) such thateither ∑∞

n=1

∫|fn| dμ < ∞ or

∫(∑∞

n=1|fn|) dμ < ∞.

Show that∑∞

1 fn(x) converges a.e. to an L1-function f and that∫

f dμ =∑∞n=1

∫fn dμ. (Hint: Compare and use Theorem 4.5.2, Corollary.)

4.20 Let (X,S, μ) be a measure space and f , fn, n = 1, 2, . . . , measurable functionson X. If ∑∞

n=1

∫X |fn – f | dμ < +∞

show that

limn→∞

fn(x) = f (x) a.e.

4.21 For each 0 ≤ t ≤ 1 let f (x, t) be a measurable function on x defined on themeasure space (X,S, μ). If |f (x, t)| ≤ g(x) for all x ∈ X and 0 ≤ t ≤ 1 whereg ∈ L1(X,S, μ), and if for each x ∈ X the function f (x, t) is continuous in t,show that the function h defined on [0,1] by

h(t) =∫

X f (x, t) dμ(x), 0 ≤ t ≤ 1,

is continuous in t.4.22 Let f be a Lebesgue integrable function defined on the real line and define g

by

g(x) =∫ x+1

x f (t) dt

for all real x. Show that g is a uniformly continuous function and that g(x)→0 as |x| → ∞.

4.23 Let {fn}∞n=1 be a sequence of Borel measurable functions defined on the realline R and such that

0 ≤ fn+1(x) ≤ fn(x) for all n = 1, 2, . . . , and x ∈ R.

(i) If limn→∞ fn(x) = 0 for all x ∈ R, is limn→∞∫R

fn(x) dx = 0?(ii) If limn→∞

∫R

fn(x) dx = 0, is limn→∞ fn(x) = 0 a.e.?Justify your answers with proofs or counterexamples.

84 The integral

4.24 Prove the corollary to Theorem 4.6.1. (E.g. if f defined only a.e. (μT–1)choose g defined on Y with g = f a.e. (μT–1) and show that this impliesgT = f T a.e. (μ).)

4.25 Let L be the class of Lebesgue measurable sets of the real line R. Let f bea function, defined on R, and Riemann integrable on a finite interval (a, b).Then show that fχ(a,b) is L-measurable.(Hints:

(i) Divide (a, b] into 2n (semiclosed) subintervals In,j (j = 1, . . . , 2n) eachof length (b – a)/2n. If x ∈ In,j write f *

n (x) = sup{f (y) : y ∈ In,j}, fn∗(x) =inf{f (y) : y ∈ In,j} and f *

n (x) = fn∗(x) = 0 if x � (a, b]. Then fn∗ isincreasing, f *

n is decreasing, and fn∗ ≤ fχ(a,b] ≤ f *n .

(ii) Write gn = f *n – fn∗. Show that the (Lebesgue) integrals

∫gn dx → 0 as

n→ ∞. (Use the definition of Riemann integrability.)

(iii) Use (ii) of Ex. 4.23 to show that limn→∞ gn(x) = 0 a.e. and hencefχ(a,b] = limn→∞ f *

n a.e. Then fχ(a,b] is L-measurable.

(iv) Use the converse of Ex. 3.8.)

4.26 With the notation of Ex. 4.25, suppose f is Riemann integrable on everyfinite interval (a, b). Then f is L-measurable. (Write f = lim fχ(–n,n].) Alsoshow that f ∈ L1(a, b) for every finite a, b, and the Lebesgue integral

∫ baf dx

equals the Riemann integral in value. (To show f ∈ L1(a, b) note that f isbounded on (a, b), a set of finite measure. The Riemann integral of f over(a, b) may be expressed as limn→∞

∫f *n dx.)

4.27 A function f which is Riemann integrable over every interval (0, T) for T > 0and such that the (Riemann) integrals

∫ T0 f (x) dx converge to a finite limit as

T → ∞, is called improperly Riemann integrable over (0,∞). (The valueof the improper integral is then defined to be limT→∞

∫ T0 f (x) dx.) The ex-

ample f (x) = (sin x)/x may be used to show that a function can be im-properly Riemann integrable over (0,∞) without belonging to L1(0,∞)(|(sin x)/x| � L1(0,∞)).

4.28 Show that it is possible to have f ∈ L1(a, b) for every finite (a, b) but yetf � L1. In fact as noted in Section 4.7, if f ∈ L1(a, b) for all a, b then f ∈ L1

iff limn→∞∫ n

–n |f (x)| dx < ∞.4.29 Let f be a Lebesgue measurable function on the real line and |f (x)| ≤ an for

n < x ≤ n + 1, n = 0,±1,±2, . . . , where∑

n|an| < ∞. Show that f is Lebesgueintegrable. Determine whether

1/[x(log x)α] ∈ L1(a,∞), α ≥ 1, a > 1,

1/xα ∈ L1(0, 1), α > 0.

Exercises 85

4.30 Let the function F be defined on the real line R by

F(x) =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩0 for x ≤ 0x for 0 < x < 11 for 1 ≤ x.

Let μF be the Lebesgue–Stieltjes measure on the Borel sets B induced by F,BF the completion ofBwith respect to μF , μF the completion of μF (definedon BF), and m Lebesgue measure. Show that

μF(B) = m{B ∩ (0, 1)} for all B ∈ B,

describe BF , and prove that for all μF-integrable functions f ,∫R

f dμF =∫ 1

0f (x) dx.

5

Absolute continuity and related topics

5.1 Signed and complex measures

Relaxation of the requirement of a measure that it be nonnegative yieldswhat is usually called a signed measure. Specifically this is an extendedreal-valued, countably additive set function μ on a class E (containing ∅),such that μ(∅) = 0, and such that μ assumes at most one of the values+∞ and –∞ on E. As for measures, a signed measure μ defined on a classE, is called finite on E if |μ(E)| < ∞, for each E ∈ E, and σ-finite if foreach E ∈ E there is a sequence {En}∞n=1 of sets in E with E ⊂ ∪∞n=1En and|μ(En)| < ∞, that is, if E can be covered by the union of a sequence of setswith finite (signed) measure. It will usually be assumed that the class onwhich μ is defined is a σ-ring or σ-field.

Some of the important properties of measures (see Section 2.2) holdalso for signed measures. In particular a signed measure is subtractive andcontinuous from below and above. The basic properties of signed measuresare given in the following theorem.

Theorem 5.1.1 Let μ be a signed measure on a σ-ring S.

(i) If E, F ∈ S, E ⊂ F and |μ(F)| < ∞ then |μ(E)| < ∞.(ii) If E, F ∈ S, E ⊂ F and |μ(E)| < ∞ then μ(F – E) = μ(F) – μ(E).

(iii) If {En}∞n=1 is a disjoint sequence of sets in S such that |μ(∪∞n=1En)|<∞then the series

∑∞n=1 μ(En) converges absolutely.

(iv) If {En}∞n=1 is a monotone sequence of sets in S, and if |μ(En)| < ∞ forsome integer n in the case when {En} is a decreasing sequence, then

μ(limn

En) = limnμ(En).

Proof If E, F ∈ S, E ⊂ F then F = E ∪ (F – E), a union of two disjointsets, and from the countable (and hence also finite) additivity of μ,

μ(F) = μ(E) + μ(F – E).

86

5.2 Hahn and Jordan decompositions 87

Hence (i) follows since if μ(F) is finite, so are (both) μ(E) and μ(F – E). Onthe other hand if μ(E) is assumed finite it can be subtracted from both sidesto give (ii).

(iii) Let E+n = En or ∅, and E–

n = ∅ or En, according as μ(En) ≥ 0 orμ(En) < 0 respectively. Then

∑∞n=1μ(E+

n ) = μ(∪∞n=1E+n ) and

∑∞n=1μ(E–

n) = μ(∪∞n=1E–n)

imply by (i) that∑∞

n=1 μ(E+n ) and

∑∞n=1 μ(E–

n) are both finite. Hence∑∞

n=1|μ(En)| =∑∞

n=1(μ(E+n ) – μ(E–

n)) =∑∞

n=1μ(E+n ) –

∑∞n=1μ(E–

n)

is finite as required.(iv) is shown as for measures (Theorems 2.2.4 and 2.2.5). �

While not needed here, it is worth noting that the requirement that μbe (extended) real may also be altered to allow complex values. That is,a complex measure is a complex-valued, countably additive set functionμ defined on a class E (containing ∅) and such that μ(∅) = 0. Thus if En

are disjoint sets of E with ∪∞n=1En = E ∈ E, we have μ(E) =∑∞

n=1 μ(En).Since the convergence of a complex sequence requires convergence of itsreal and imaginary parts, it follows that the real and imaginary parts of μare countably additive. That is, a complex measure μ may be written inthe form μ = λ + iν where λ and ν are finite signed measures. Conversely,of course, if λ and ν are finite signed measures then λ + iν is a complexmeasure. Thus the complex measures are precisely the set functions of theform λ+iνwhere λ and ν are finite signed measures. Some of the propertiesof complex measures are given in Ex. 5.29.

5.2 Hahn and Jordan decompositions

If μ1, μ2 are two measures on a σ-field S, their sum μ1 + μ2 (defined forE ∈ S as μ1(E) + μ2(E)) is clearly a measure on S. The difference μ1(E) –μ2(E) is not necessarily defined for all E ∈ S (i.e. if μ1(E) = μ2(E) = ∞).However, if at least one of the measures μ1 and μ2 is finite, μ1 –μ2 is definedfor every E ∈ S and is a signed measure on S. It will be shown in thissection that every signed measure can be written as a difference of twomeasures of which at least one is finite (Theorem 5.2.2).

If μ is a signed measure on a measurable space (X,S), a set E ∈ S willbe called positive (resp. negative, null), if μ(F) ≥ 0 (resp. μ(F) ≤ 0, μ(F) =0) for all F ∈ S with F ⊂ E. Notice that measurable subsets of positivesets are positive sets. Further the union of a sequence {An} of positive sets

88 Absolute continuity and related topics

is clearly positive (if F ∈ S, F ⊂ ∪∞1 An, F = ∪∞1 (F ∩ An) = ∪∞1 Fn whereFn are disjoint sets of S and Fn ⊂ F ∩ An (Lemma 1.6.3) so that μ(Fn) ≥ 0and μ(F) =

∑μ(Fn) ≥ 0). Similar statements are true for negative and null

sets.

Theorem 5.2.1 (Hahn Decomposition) If μ is a signed measure on themeasurable space (X,S), then there exist two disjoint sets A, B such that Ais positive, and B is negative, and A ∪ B = X.

Proof Since μ assumes at most one of the values +∞, –∞, assume fordefiniteness that –∞ < μ(E) ≤ +∞ for all E ∈ S. Define

λ = inf{μ(E) : E negative}.

Since the empty set ∅ is negative, λ ≤ 0. Let {Bn}∞n=1 be a sequence ofnegative sets such that λ = limn→∞ μ(Bn) and let B = ∪∞n=1Bn. The theoremwill be proved in steps as follows:

(i) B is negative since as noted above the countable union of negativesets is negative.

(ii) μ(B) = λ, and thus –∞ < λ ≤ 0. For certainly λ ≤ μ(B) by (i) and thedefinition of λ. Also for each n, B = (B – Bn) ∪ Bn and hence

μ(B) = μ(B – Bn) + μ(Bn) ≤ μ(Bn)

since B – Bn ⊂ B (negative). It follows that μ(B) ≤ limn→∞ μ(Bn) = λ, sothat μ(B) = λ as stated.

(iii) Let A = X – B. If F ⊂ A is negative, then F is null. For let F ⊂ A benegative and G ∈ S, G ⊂ F. Then G is negative and E = B ∪ G is negative.Hence, by the definition of λ and (ii), λ ≤ μ(E) = μ(B) + μ(G) = λ + μ(G).Thus μ(G) ≥ 0 but since F is negative, μ(G) ≤ 0, so that μ(G) = 0. Thus Fis null.

(iv) A = X–B is positive. Assume on the contrary that there exists E0 ⊂A,E0 ∈ S, with μ(E0) < 0. Since E0 is not null, by (iii) it is not negative. Let k1

be the smallest positive integer such that there is a measurable set E1 ⊂ E0

with μ(E1) ≥ 1/k1. Since μ(E0) is finite (–∞ < μ(E0) < 0) and E1 ⊂ E0,Theorem 5.1.1 (i) and (ii) give μ(E0 – E1) = μ(E0) – μ(E1) < 0, sinceμ(E0) < 0, μ(E1) > 0. Thus the same argument now applies to E0 – E1.Let k2 be the smallest positive integer such that there is a measurable setE2 ⊂ E0 – E1 with μ(E2) ≥ 1/k2. Proceeding inductively, let kn be thesmallest positive integer such that there is a measurable set En ⊂ E0 –∪n–1

i=1 Ei

with μ(En) ≥ 1/kn.

Write F0 = E0 – ∪∞i=1Ei. Now ∪∞1 En ⊂ E0, |μ(E0)| < ∞ so that∑∞

1 μ(En)(= μ(∪∞1 En)) converges and hence μ(En) → 0, so that kn → ∞. Now foreach n, F0 ⊂ E0 – ∪n–1

i=1 Ei. Hence for all F ∈ S, F ⊂ F0, we have μ(F) <1/(kn – 1) so that μ(F) ≤ 0, since kn → ∞. Thus F0 is negative and by (iii)F0 is null. But

μ(F0) = μ(E0) –∑∞

i=1μ(Ei) < 0

since μ(E0) < 0, μ(Ei) > 0, i = 1, 2, . . . . But μ(F0) < 0 contradicts the factthat F0 is null.

Hence the assumption that A is not positive leads to a contradiction, sothat A is positive, as stated. �

A representation of X as a disjoint union of a positive set A and a negativeset B is called a Hahn decomposition of X with respect to μ. Thus, by thetheorem, a Hahn decomposition always exists, but is clearly not unique(since a null set may be attached to either A or B – see the example afterTheorem 5.2.3). Even though a Hahn decomposition of X with respect tothe signed measure μ is not unique, it does provide a representation of μas the difference of two measures which does not depend on the particularHahn decomposition used. This is seen in the following theorem.

Theorem 5.2.2 (Jordan Decomposition) Let μ be a signed measure ona measurable space (X,S). If X = A ∪ B is a Hahn decomposition of Xfor μ, then the set functions μ+, μ– defined on S by μ+(E) = μ(E ∩ A),μ–(E) = –μ(E ∩ B) for each E ∈ S, are measures on S, at least one ofwhich is finite, and μ = μ+ – μ–. The measures μ+, μ– do not depend onthe particular Hahn decomposition chosen. The expression μ = μ+ – μ– iscalled the Jordan decomposition of the signed measure μ.


Proof Since A ∩ E ⊂ A (positive) and B ∩ E ⊂ B (negative), the setfunctions μ+ and μ– are nonnegative, and thus are clearly measures on S.Since μ assumes at most one of the values ±∞, at least one of μ+, μ– isfinite. Also, for every E ∈ S,

μ(E) = μ(E ∩ A) + μ(E ∩ B) = μ+(E) – μ–(E)

and thus μ = μ+ – μ–.In order to prove that μ+, μ– do not depend on the particular Hahn de-

composition chosen, we consider two Hahn decompositions X = A1 ∪ B1 =A2 ∪ B2 of X with respect to μ and show that for each E ⊂ S,

μ(E ∩ A1) = μ(E ∩ A2) and μ(E ∩ B1) = μ(E ∩ B2).

Notice that the set E ∩ (A1 – A2) is a subset of the positive set A1, and thusμ{E∩(A1–A2)} ≥ 0, as well as of the negative set B2, so that μ{E∩(A1–A2)} ≤0. Hence μ{E∩ (A1 – A2)} = 0 for each E ∈ S. Similarly μ{E∩ (A2 – A1)} = 0and it follows that

μ(E ∩ A1) = μ(E ∩ A1 ∩ A2) = μ(E ∩ A2)

as desired. It follows in the same way that μ(E ∩ B1) = μ(E ∩ B2) and thusthe proof is complete. �

It is clear that a signed measure may be written as a difference of twomeasures in many ways; e.g. μ = (μ+ + λ) – (μ– + λ) where λ is an ar-bitrary finite measure. However, among all possible decompositions of asigned measure as a difference of two measures, the Jordan decompositionis characterized by a certain uniqueness property and also by a “minimalproperty”, given in Ex. 5.6.

The set function |μ| defined on S by |μ|(E) = μ+(E) + μ–(E) is clearly ameasure (see Ex. 4.11) and is called the total variation of μ. Note that a setE ∈ S is positive if and only if μ–(E) = 0. For if E is positive, E ∩ B is asubset of both the positive set E and the negative set B so that μ(E ∩B) = 0and hence μ–(E) = 0. Conversely if μ–(E) = 0 and F ∈ S, F ⊂ E thenμ–(F) = 0 and μ(F) = μ+(F) ≥ 0, showing that E is positive. Similarly E isnegative if and only if μ+(E) = 0. Also

|μ(E)| ≤ |μ|(E)

with equality only if E is positive or negative. Finally note that |μ|(E) = 0implies that E is a null set with respect to |μ|, μ+, μ– and μ.

A useful example of a signed measure is provided by the indefinite inte-gral of a function whose integral can be defined, as shown in the followingresult.

Theorem 5.2.3 Let (X,S, μ) be a measure space and f a measurablefunction defined a.e. on X and such that either f+ ∈ L1(X,S, μ) or f– ∈L1(X,S, μ). Then the set function ν defined for each E ∈ S by

ν(E) =∫E

f dμ

is a signed measure on S; and if f ∈ L1(X,S, μ) then ν is a finite signedmeasure.

Proof Clearly ν(∅) = 0 and if f ∈ L1(X,S, μ) then ν is finite. The proofwill be completed by checking countable additivity of ν. Let {En}∞n=1 be asequence of disjoint measurable sets, E = ∪∞n=1En. Then f+χE =

∑∞n=1f+χEn

a.e. (i.e. for all x for which f is defined) and by the corollary to Theorem4.5.2 ∫

Ef+ dμ =

∫f+χE dμ =

∑∞n=1

∫f+χEn

dμ =∑∞

n=1

∫En

f+ dμ.

Hence∫

Ef+ dμ =

∑∞n=1

∫En

f+ dμ and similarly∫

Ef– dμ =

∑∞n=1

∫En

f– dμ. Sinceeither f+ ∈ L1(μ) or f– ∈ L1(μ), at least one of the two positive series con-verges to a finite number and thus

ν(E) =∫E

f+ dμ –∫E

f– dμ =∑∞

n=1

(∫En

f+ dμ –∫En

f– dμ)

=∑∞

n=1

∫En

f dμ =∑∞

n=1ν(En)

as required. �

It is clear that a Hahn decomposition of X with respect to ν is A∪Bwhere A = {x : f (x) ≥ 0} and B = Ac (i.e. {x : f (x) < 0} if f is defined on X).If the set {x : f (x) = 0} is nonempty then another Hahn decomposition isA1 ∪ B1 where A1 = {x : f (x) > 0} and B1 = Ac

1. The Jordan decompositionν = ν+ – ν– of ν is given in both cases by

ν+(E) =∫E

f+ dμ, ν–(E) =∫E

f– dμ

for each E ∈ S, and the total variation |ν| of ν is

|ν|(E) = ν+(E) + ν–(E) =∫E

f+ dμ +∫E

f– dμ =∫E|f | dμ.

Finally the following simple application of the Jordan decompositionshows that extensions of σ-finite signed measures have a uniqueness prop-erty corresponding to that for measures. This will be useful later.

Lemma 5.2.4 Let μ, ν be signed measures on the σ-field S which areequal on a semiring P such that S(P) = S. If μ is σ-finite on P then μ = ν

on S.


Proof Write μ = μ+ – μ–, ν = ν+ – ν–. For E ∈ P

μ+(E) – μ–(E) = ν+(E) – ν–(E)

and hence μ+(E) + ν–(E) = ν+(E) + μ–(E) when all four terms are finite. Butif e.g. μ+(E) = ∞ then clearly ν+(E) = ∞ (and μ–(E), ν–(E) are finite) sothat the same rearrangement holds, i.e. μ+ + ν– = ν+ + μ– on P. Since thesetwo σ-finite measures are equal on P, they are equal on S(P) = S, fromwhich μ = ν on S follows by the reverse rearrangement. �

5.3 Integral with respect to signed measures

If μ is a signed measure on (X,S) with Jordan decomposition μ = μ+ –μ–, the integral with respect to μ over X of any f which belongs to bothL1(X,S, μ+) and L1(X,S, μ–) may be defined by∫

f dμ =∫f dμ+ –

∫f dμ–

=∫

f+ dμ+ –∫

f– dμ+ –∫

f+ dμ– +∫

f– dμ–.

Notice that since |μ| = μ+ + μ– we have for every measurable f defined a.e.(|μ|) on X ∫

|f | d|μ| =∫|f | dμ+ +

∫|f | dμ–

(see Ex. 4.11) and thus f belongs to both L1(X,S, μ+) and L1(X,S, μ–) ifand only if f ∈ L1(X,S, |μ|). Further, as at the end of Section 4.3, if f isa measurable function defined a.e. (|μ|) on X but f � L1(X,S, |μ|) we maydefine

∫f dμ = +∞ when the two negative terms in the above defining

expression for∫

f dμ are finite and one of the positive terms is +∞. That is∫f dμ = +∞ when f– ∈ L1(μ+), f+ ∈ L1(μ–) and f+ � L1(μ+) or f– � L1(μ–).

Similarly∫

f dμ is defined as –∞ when f+ ∈ L1(μ+), f– ∈ L1(μ–) and f– �L1(μ+) or f+ � L1(μ–).

This integral has many of the basic properties of the integral with respectto a measure described in Chapter 4. A few of these are collected here, moreas examples and for reference than for detailed study.

Theorem 5.3.1 (i) If μ is a signed measure and f ∈ L1(|μ|), then

|∫f dμ| ≤

∫|f | d|μ|.

(ii) (Dominated Convergence). Let μ be a signed measure, {fn} a sequenceof functions in L1(|μ|) and g ∈ L1(|μ|) such that |fn| ≤ |g| a.e. (|μ|) for

5.3 Integral with respect to signed measures 93

each n = 1, 2, . . . . If f is a measurable function such that fn → f a.e.(|μ|) then f ∈ L1(|μ|) and

∫|fn – f | d|μ| → 0,

∫fn dμ →

∫f dμ as n→ ∞.

Proof (i) By using the corresponding property for measures (Theorem4.4.5) and Ex. 4.11, we have by the definition

∫f dμ =

∫f dμ+ –

∫f dμ–,

∣∣∣∫ f dμ∣∣∣ ≤ ∣∣∣∫ f dμ+

∣∣∣ +∣∣∣∫ f dμ–

∣∣∣ ≤ ∫|f | dμ+ +

∫|f | dμ– =

∫|f | d|μ|.

(ii) The first limit is just dominated convergence for the measure |μ| (The-orem 4.5.5), and the second limit follows from the first and the inequalityin (i). �

The next result is the transformation theorem for signed measures. Asfor measures it may be extended to nonintegrable cases where integrals aredefined.

Theorem 5.3.2 Let (X,S) and (Y ,T ) be measurable spaces, μ a signedmeasure on S and T a measurable transformation defined a.e. (|μ|) on Xinto Y. Then the set function μT–1 defined on T by (μT–1)(E) = μ(T–1E),E ∈ T , is a signed measure on T , and if f is a T -measurable functiondefined a.e. (μT–1) on Y and such that f T ∈ L1(|μ|), then f ∈ L1(|μT–1|) and

∫Y

f dμT–1 =∫

Xf T dμ.

Proof Exactly as when μ is a measure it is seen that μT–1 is countablyadditive (Theorem 3.7.1) and that μT–1(∅) = 0. Also, since μ assumes atmost one of the values ±∞, so does μT–1. Thus μT–1 is a signed measureon T .

Now assume first for simplicity that T is defined on X. Then T–1T isa σ-field (Theorem 3.2.2) and let λ denote the restriction of μ from S toT–1T ⊂ S. Clearly λT–1 = μT–1. Let Y = A ∪ B be a Hahn decompositionof Y for λT–1, with A positive and B negative. We now show that X =(T–1A) ∪ (T–1B) is a Hahn decomposition of X for λ. Indeed T–1A andT–1B are disjoint sets in T–1T with union X. Now if E is a T–1T -measurablesubset of T–1A, then E = T–1G, for some G ∈ T . Since E = T–1G ⊂ T–1Awe have E = T–1(G∩A) and thus λ(E) = λT–1(G∩A) ≥ 0 since A is positivefor λT–1. It follows that T–1A is positive for λ and similarly T–1B is negativefor λ.


Now let λ = λ+ – λ– be the Jordan decomposition of λ. We show thatλT–1 = (λ+ – λ–)T–1 = λ+T–1 – λ–T–1 is the Jordan decomposition of λT–1.Indeed for each E ∈ T ,

(λ+T–1)(E) = λ(T–1E ∩ T–1A) = λ{T–1(E ∩ A)} = (λT–1)(E ∩ A)

= (λT–1)+(E)

since Y = A ∪ B is a Hahn decomposition of Y for λT–1. Hence λ+T–1 =(λT–1)+ and similarly λ–T–1 = (λT–1)–. It thus follows that λT–1 =λ+T–1 – λ–T–1 is the Jordan decomposition of λT–1, and

|λT–1| = λ+T–1 + λ–T–1 = (λ+ + λ–)T–1 = |λ|T–1.

Notice that |λ|(E) ≤ |μ|(E) for each E ∈ T–1T since

|λ|(E) = λ+(E) + λ–(E) = λ(E ∩ T–1A) – λ(E ∩ T–1B)

= μ(E ∩ T–1A) – μ(E ∩ T–1B) ≤ |μ|(E ∩ T–1A) + |μ|(E ∩ T–1B)

= |μ|(E).

Thus by Theorem 4.6.1∫Y|f | d|μT–1| =

∫Y|f | d|λT–1| =

∫Y|f | d|λ|T–1 =

∫X|f T | d|λ|

≤∫

X|f T | d|μ|

(the inequality being an easy exercise whose details are left to the interestedreader). Hence f T ∈ L1(|μ|) implies f ∈ L1(|μT–1|) and, again by Theorem4.6.1, ∫

Yf dμT–1 =

∫Y

f dλT–1 =∫

Yf dλ+T–1 –

∫Y

f dλ–T–1

=∫

Xf T dλ+ –

∫X

f T dλ– =∫

Xf T dλ =

∫X

f T dμ

with the last equality from Ex. 4.10. Thus the theorem follows when T isdefined on X.

The requirement that T is defined on X, may then be weakened to Tdefined a.e. (|μ|) on X in the usual straightforward way (i.e. if T is definedon E ∈ S with |μ|(Ec) = 0 apply the previous result to the transformationT ′ which is defined on X by T ′x = Tx, x ∈ E, and T ′x = y0, x ∈ Ec, wherey0 is any fixed point in Y). This completes the proof of the theorem. �

5.4 Absolute continuity and singularity

In this section (X,S) will be a fixed measurable space and μ, ν two signedmeasures on S (in particular one or both of μ and ν may be measures).Then ν is said to be absolutely continuous with respect to μ, written ν�μ,

5.4 Absolute continuity and singularity 95

if ν(E) = 0 for all E ∈ S such that |μ|(E) = 0. Of course when μ is ameasure |μ| = μ and ν � μ if all measurable sets with μ-measure zero havealso ν-measure zero. In any case, the involvement of |μ| in the definitionimplies trivially that ν � μ if and only if ν � |μ|. If μ and ν are mutuallyabsolutely continuous, that is if ν � μ and μ � ν, then μ and ν are saidto be equivalent, written μ ∼ ν. When both μ and ν are measures, they areequivalent if and only if they have the same zero measure sets.

Theorem 5.2.3 provides an example of a signed measure ν which is ab-solutely continuous with respect to a measure μ: the indefinite μ-integraldefined by ν(E) =

∫E

f dμ where f is such that f+ ∈ L1(μ) or f– ∈ L1(μ).In fact the celebrated Radon–Nikodym Theorem of the next section (Theo-rem 5.5.3) shows that when μ is a σ-finite measure then all σ-finite signedmeasures ν with ν � μ are indefinite μ-integrals.

For two signed measures we now show that ν � μ if and only if |ν| � |μ|,i.e. ν � μ whenever all measurable sets with total μ-variation zero havealso total ν-variation zero. It follows that μ ∼ ν if and only if the totalvariations |μ| and |ν| give zero measure to the same class of measurablesets.

Theorem 5.4.1 If μ and ν are signed measures on the measurable space(X,S) then the following are equivalent

(i) ν � μ

(ii) ν+ � μ and ν– � μ

(iii) |ν| � |μ|.

Proof To see that (i) implies (ii), fix E ∈ S with |μ|(E) = 0, and letX = A ∪ B be a Hahn decomposition of X with respect to ν. Then since|μ| is a measure, |μ|(E) = 0 implies |μ|(E ∩ A) = |μ|(E ∩ B) = 0. Sinceν � μ, ν(E ∩ A) = ν(E ∩ B) = 0 and thus ν+(E) = ν–(E) = 0. It followsthat ν+ � μ, ν– � μ, and |ν| � μ giving (ii). Clearly (ii) implies (iii) since|ν|(E) = ν+(E) + ν–(E) = 0 if |μ|(E) = 0.

Finally to show that (iii) implies (i), let E ∈ S with |μ|(E) = 0. By (iii)|ν|(E) = 0, so that |ν(E)| ≤ |ν|(E) = 0 showing ν(E) = 0 and hence (i). �

Notice that, by Theorem 5.4.1, ν � μ if and only if |ν| � |μ| and thusif and only if |ν|(E) = 0 whenever |μ|(E) = 0, or equivalently, |μ|(E) > 0whenever |ν|(E) > 0. In particular μ ∼ ν if and only if |μ| ∼ |ν| and thusif and only if |μ| and |ν| assign strictly positive measure to the same classof sets. A notion “opposite” to equivalence (∼), and thus also to absolutecontinuity (�), would therefore be one under which |μ| and |ν| are concen-trated on disjoint sets, so that they have essentially distinct classes of sets ofstrictly positive measure. Specifically two signed measures μ, ν defined on


S are called singular, written μ ⊥ ν, if and only if there is a set E ∈ S suchthat |μ|(E) = 0 = |ν|(Ec). It then follows that for every F ∈ S, |μ|(F ∩ E) = 0and |ν|(F ∩ Ec) = 0 and thus

μ(F) = μ(F ∩ Ec) and ν(F) = ν(F ∩ E),

i.e. the measure μ is concentrated on the set Ec and the measure ν is con-centrated on the set E.

Important implications of the notions of absolute continuity and singu-larity are contained in the Lebesgue decomposition and the Radon–Nikodym Theorem given in the following section.

5.5 Radon–Nikodym Theorem and the Lebesgue decomposition

The Lebesgue–Radon–Nikodym Theorem asserts that everyσ-finite signedmeasure ν may be written as the sum of two signed measures of which thefirst is an indefinite integral of a given σ-finite measure μ and the second issingular with respect to μ. We establish this result first for finite measures,and then extend it to the σ-finite and signed cases. A function f satisfyinga certain property is said to be essentially unique if when g is any otherfunction with this property then f = g a.e.

Lemma 5.5.1 Let (X,S, μ) be a finite measure space and ν a finite mea-sure on S. Then there exist two uniquely determined finite measures ν1 andν2 on S such that

ν = ν1 + ν2, ν1 � μ, ν2 ⊥ μ,

and an essentially unique μ-integrable function f such that for all E ∈ S,

ν1(E) =∫E

f dμ.

The function f may be taken nonnegative.

Proof Uniqueness is most readily shown. For suppose ν = ν1 +ν2 = ν3 +ν4

where ν1 � μ, ν2 ⊥ μ, ν3 � μ, ν4 ⊥ μ. Then λ = ν1 – ν3 = ν4 – ν2 isa finite signed measure which is both absolutely continuous and singularwith respect to μ (Ex. 5.11) and hence must be zero (Ex. 5.12). That is,ν1 = ν3 and ν2 = ν4 as required for uniqueness of the decompositionν = ν1 + ν2. Further if ν1(E) =

∫E

f dμ =∫

Eg dμ for all E ∈ S, it follows

from Theorem 4.4.8 (Corollary) that f = g a.e. (μ). Hence the uniquenessstatements are proved.

5.5 Radon–Nikodym Theorem and the Lebesgue decomposition 97

Turning now to the existence of ν1, ν2 and f , letK denote the class of allnonnegative measurable functions f on X such that

∫E

f dμ ≤ ν(E) for all E ∈ S.

The method of proof is to find f ∈ K maximizing∫

f dμ and thus “extract-ing as much of ν as is possible by ν1(E) =

∫E

f dμ”, the remainder ν2 = ν–ν1

being shown to be singular.Note that K is nonempty since it contains the function which is identi-

cally zero. Write

α = sup{∫

Xf dμ : f ∈ K

},

and let {fn} be a sequence of functions in K such that∫

Xfn dμ→ α.

Write gn(x) = max{f1(x), . . . , fn(x)} ≥ 0. Then if E ∈ S, for fixed n, E canbe written as ∪n

i=1Ei where the Ei are disjoint measurable sets and gn(x) =fi(x) for x ∈ Ei. (Write E1 = {x : gn(x) = f1(x)}, E2 = {x : gn(x) = f2(x)} – E1,etc.) Thus

∫E

gn dμ =∑n

i=1

∫Ei

gn dμ =∑n

i=1

∫Ei

fi dμ ≤ ∑ni=1ν(Ei) = ν(E),

showing that gn ∈ K . Since {gn} is an increasing sequence it has a limitf (x) = limn→∞ gn(x) and by monotone convergence

∫E

f dμ = limn→∞

∫E

gn dμ ≤ ν(E).

It follows that f ∈ K and∫

Xf dμ = limn→∞

∫X

gn dμ ≥ limn→∞∫

Xfn dμ=α

so that∫

Xf dμ = α.

Write now

ν1(E) =∫E

f dμ and ν2(E) = ν(E) – ν1(E) for all E ∈ S.

Then ν1 is clearly a finite measure (Theorem 5.2.3) with f ≥ 0, f ∈ L1(μ)and ν1 � μ. Further ν2 is finite, countably additive, and ν2(E) ≥ 0 for allE ∈ S since f ∈ K implies that ν1(E) =

∫E

f dμ ≤ ν(E). Hence ν2 is a finitemeasure, and it only remains to show that ν2 ⊥ μ.

To see this, consider the finite signed measure λn = ν2–n–1μ (n = 1, 2, . . .)and let X = An ∪ Bn be a Hahn decomposition of X for λn (An positive, Bn

negative). If hn = f + n–1χAn, then for all E ∈ S,

∫E

hn dμ =∫E

f dμ + n–1μ(An ∩ E) = ν(E) – ν2(E) + n–1μ(An ∩ E)

= ν(E) – ν2(E ∩ Bn) – λn(An ∩ E) ≤ ν(E)

since ν2 is a measure and An is positive for λn. Thus hn ∈ K so that

α ≥∫

Xhn dμ =

∫X

f dμ + n–1μ(An)

= α + n–1μ(An)

which implies that μ(An) = 0. If A = ∪∞n=1An, then μ(A) = 0. Since Ac ⊂Ac

n = Bn we have λn(Ac) ≤ 0 and thus ν2(Ac) ≤ n–1μ(Ac) for each n. Thusν2(Ac) = 0 = μ(A) showing that ν2 ⊥ μ, and thus completing the proof. �

We next establish the Lebesgue Decomposition Theorem in its generalform.

Theorem 5.5.2 (Lebesgue Decomposition Theorem) If (X,S, μ) is a σ-finite measure space and ν is a σ-finite signed measure on S, then thereexist two uniquely determined σ-finite signed measures ν1 and ν2 such that

ν = ν1 + ν2, ν1 � μ, ν2 ⊥ μ.

If ν is a measure, so are ν1 and ν2. ν = ν1 + ν2 is called the Lebesguedecomposition of ν with respect to μ.

Proof The existence of ν1 and ν2 will first be shown when both μ andν are σ-finite measures. Then clearly X = ∪∞n=1Xn, where Xn are disjointmeasurable sets with 0 ≤ μ(Xn) < ∞, 0 ≤ ν(Xn) < ∞. For each n =1, 2, . . . , define

μ(n)(E) = μ(E ∩ Xn) and ν(n)(E) = ν(E ∩ Xn) for all E ∈ S.

Then μ(n), ν(n) are finite measures and by Lemma 5.5.1,

ν(n) = ν(n)1 + ν(n)

2 where ν(n)1 � μ(n), ν(n)

2 ⊥ μ(n).

Now define the set functions ν1, ν2 for E ∈ S by (writing∑

n for∑∞

n=1)

ν1(E) =∑

n ν(n)1 (E), ν2(E) =

∑n ν

(n)2 (E).

Then ν = ν1 + ν2 since ν(E) =∑

n ν(n)(E) =

∑n (ν(n)

1 (E) + ν(n)2 (E)). Also ν1

and ν2 are readily seen to be σ-finite measures. For countable additivity, ifE = ∪∞k=1Ek where Ek are disjoint sets of S then

ν1(E) =∑

n ν(n)1 (E) =

∑n∑

k ν(n)1 (Ek) =

∑k∑

n ν(n)1 (Ek) =

∑k ν1(Ek)

by interchanging the order of summation of the double series whose termsare nonnegative. Hence ν1 is a measure, and similarly so is ν2. σ-finitenessfollows since X (and hence each set of S) may be covered by ∪∞n=1Xn, where

νi(Xn) =∑

m ν(m)i (Xn) ≤ ∑

m ν(m)(Xn) = ν(Xn) < ∞, i = 1, 2.


To show that ν1 � μ, fix E ∈ S with μ(E) = 0. Then μ(n)(E) = μ(E ∩Xn) = 0 and since ν(n)

1 � μ(n) we have ν(n)1 (E) = 0. It follows that ν1(E) =∑

n ν(n)1 (E) = 0 and hence ν1 � μ.

The proof (when ν is a σ-finite measure) is completed by showing thatν2 ⊥ μ. Since for each n = 1, 2, . . . , ν(n)

2 ⊥ μ(n) there is a set En ∈ S suchthat

μ(n)(En) = 0 and ν(n)2 (Ec

n) = 0.

Let Fn = En ∩ Xn, F = ∪∞1 Fn. Then the sets Fn are disjoint and

μ(F) =∑

n μ(Fn) =∑

n μ(n)(En) = 0.

On the other hand ν(n)(Xcn) = ν(Xn ∩ Xc

n) = 0 implies ν(n)2 (Xc

n) = 0 and sinceFc

n = Ecn ∪ Xc

n it follows that ν(n)2 (Fc

n) = 0. Now

ν2(Fc) =∑

n ν(n)2 (Fc) ≤ ∑

n ν(n)2 (Fc

n) = 0

since Fc ⊂ Fcn. Hence μ(F) = 0 = ν2(Fc) and thus ν2 ⊥ μ as desired. Thus

the result follows when ν is a σ-finite measure.When ν is a σ-finite signed measure it has the Jordan decomposition

ν = ν+ – ν–, where at least one of the measures ν+, ν– is finite and the otherσ-finite. Using the theorem for σ-finite measures, write ν+ = ν+,1 + ν+,2 andν– = ν–,1 +ν–,2 where ν+,1, ν–,1 � μ, ν+,2, ν–,2 ⊥ μ. If, for instance, ν– is finite,then so are the measures ν–,1, ν–,2, and hence ν = (ν+,1 – ν–,1) + (ν+,2 – ν–,2) =ν1 + ν2 with ν1 = ν+,1 – ν–,1 � μ and ν2 = ν+,2 – ν–,2 ⊥ μ (Ex. 5.11).

Thus existence of the Lebesgue decomposition follows when ν is a σ-finite signed measure. To show uniqueness, suppose first that ν is a σ-finitemeasure and ν = ν1 + ν2 = ν3 + ν4 where ν1, ν3 � μ and ν2, ν4 ⊥ μ.Since both μ and ν are σ-finite we again write X = ∪∞n=1Xn where Xn aredisjoint measurable sets with both μ(Xn), ν(Xn) finite. For each n = 1, 2, . . .define the finite measures μ(n), ν(n)

i , i = 1, 2, 3, 4 by μ(n)(E) = μ(E ∩ Xn) andν(n)

i (E) = νi(E ∩ Xn) for all E ∈ S. Then clearly

ν(n)1 + ν(n)

2 = ν(n)3 + ν(n)

4 ; ν(n)1 , ν(n)

3 � μ(n); ν(n)2 , ν(n)

4 ⊥ μ(n).

By the uniqueness part of Lemma 5.5.1, ν(n)1 = ν(n)

3 and ν(n)2 = ν(n)

4 for alln = 1, 2, . . . , so that

ν1 =∑

n ν(n)1 =

∑n ν

(n)3 = ν3

and similarly ν2 = ν4. Thus uniqueness follows when ν is a σ-finite mea-sure. If ν is a σ-finite signed measure with two decomposition ν1 + ν2 =ν3 + ν4, uniqueness follows by using the Jordan decomposition for each

νi, rearranging the equation so that each side is positive, and applying theresult for measures. �

We now prove the general form of the Radon–Nikodym Theorem.

Theorem 5.5.3 (Radon–Nikodym Theorem) Let (X,S, μ) be a σ-finitemeasure space and ν a σ-finite signed measure on S. If ν � μ then thereis an essentially unique finite-valued measurable function f on X such thatfor all E ∈ S,

ν(E) =∫E

f dμ.

f is μ-integrable if and only if ν is finite. In general at least one of f+, f– isμ-integrable and these happen as ν+ or ν– is finite. If ν is a measure then fis nonnegative.

Proof The existence of f follows from Lemma 5.5.1 if μ, ν are finitemeasures. For by the uniqueness of the Lebesgue decomposition of ν =ν1 + ν2 = ν + 0 (regarding zero as a measure) we must have ν1 = ν and thusν(E) = ν1(E) =

∫E

f dμ, E ∈ S, for some nonnegative μ-integrable f which(by Theorem 4.4.2 (iv)) may be taken to be finite-valued.

Assume now that μ, ν are σ-finite measures. As in previous proofs writeX = ∪∞n=1Xn where Xn are disjoint measurable sets with μ(Xn) < ∞, ν(Xn) <∞, and define μ(n)(E) = μ(E ∩ Xn), ν(n)(E) = ν(E ∩ Xn). Then μ(n), ν(n)

are finite measures on S with ν(n) � μ(n), and by the result just shownfor finite measures, ν(n)(E) =

∫E

fn dμ(n), all E ∈ S, for some nonnegative,finite-valued, measurable fn. Thus (using Ex. 4.9)

ν(E ∩ Xn) = ν(n)(E) =∫χE fn dμ(n) =

∫XnχE fn dμ =

∫χEχXn

fn dμ.

Hence, writing f =∑∞

n=1 χXnfn and using monotone convergence,

ν(E) =∑∞

n=1ν(E ∩ Xn) =∑∞

n=1

∫χEχXn

fn dμ =∫χE f dμ =

∫E

f dμ.

f is a nonnegative measurable function and is finite-valued (Xn are disjointand thus f (x) = fn(x) on each Xn). Thus the existence of f follows when μ, νare σ-finite measures.

When ν is a σ-finite signed measure, it has Jordan decomposition ν =ν+ – ν–, where at least one of the measures ν+, ν– is finite and the otherσ-finite. Using the results just shown for finite and σ-finite measures wehave ν+(E) =

∫E

f+ dμ, ν–(E) =∫

Ef– dμ, E ∈ S, for some nonnegative finite-

valued measurable functions f+, f–, at least one of which is μ-integrable.Notice that if X = A ∪ B is a Hahn decomposition of X for ν, ν+(B) = 0 =ν–(A) and thus we may take f+ = 0 on B and f– = 0 on A. Then clearly

ν(E) =∫

Ef dμ, all E ∈ S, where f = f+ – f– (and f+, f– are the positive and

negative parts of f ) has all properties stated in the theorem.Thus the existence of f is shown. To show its essential uniqueness let g

be another function with the same properties as f . Write X = ∪∞n=1Xn, whereXn are disjoint measurable sets with μ(Xn) and ν(Xn) finite. Then for eachfixed n,

ν(n)(E) = ν(E ∩ Xn) =∫E

fχXndμ =

∫E

gχXndμ for all E ∈ S.

Since ν(n) is a finite signed measure, fχXnand gχXn

are μ-integrable (seeTheorem 5.2.3 and the discussion following its proof) and by Theorem4.4.8 (Corollary), fχXn

= gχXna.e. (μ) for all n. Thus f = g a.e. (μ) on

X. It follows that f is essentially unique and the proof of the theorem iscomplete. �

The following result provides an informative equivalent definition of ab-solute continuity for finite signed measures. This may be given a straight-forward direct proof but as shown here follows neatly as a corollary to theabove theorem, from the result for the indefinite integral of an L1-functionshown in Theorem 4.5.3.

Corollary Let (X,S, μ) be a σ-finite measure space and ν a finite signedmeasure on S. Then ν � μ if and only if given any ε > 0 there existsδ = δ(ε) > 0 such that |ν(E)| < ε whenever E ∈ S and μ(E) < δ.

Proof If the stated condition holds, and μ(E) = 0 then |ν(E)| < ε for anyε > 0 and thus ν(E) = 0, i.e. ν � μ. Conversely, a finite signed measureν with ν � μ may be written as ν(E) =

∫E

f dμ for some f ∈ L1 by thetheorem and hence the result just restates Theorem 4.5.3. �

The Lebesgue decomposition and Radon–Nikodym Theorem may becombined into the following single statement which provides a useful rep-resentation of a measure in terms of another. This generalizes the morelimited statement of Lemma 5.5.1.

Theorem 5.5.4 (Lebesgue–Radon–Nikodym Theorem) Let (X,S, μ) be aσ-finite measure space and ν a σ-finite signed measure on S. Then thereexist two uniquely determined σ-finite signed measures ν1 and ν2 such that

ν = ν1 + ν2, ν1 � μ, ν2 ⊥ μ,

and an essentially unique finite-valued measurable function f on X suchthat f+ or f– is μ-integrable and for all E ∈ S,

ν1(E) =∫E

f dμ.


Thus for some E0 ∈ S with μ(E0) = 0 we have for all E ∈ S,

ν(E) =∫E

f dμ + ν2(E ∩ E0) =∫E

f dμ + ν(E ∩ E0)

since μ(E0) = 0 ⇒ ν1(E ∩ E0) = 0. f is μ-integrable if and only if ν1 isfinite. ν � μ if and only if ν(E0) = 0. If ν is a measure so are ν1, ν2 and f isnonnegative.

Note that both the Lebesgue decomposition theorem and the Radon–Nikodym Theorem may fail in the absence of σ-finiteness. For a simpleexample see Ex. 5.20.

5.6 Derivatives of measures

If μ is aσ-finite measure and ν aσ-finite signed measure on (X,S) such thatν � μ, then the function f appearing in the relation ν(E) =

∫E

f dμ is calledthe Radon–Nikodym derivative of ν with respect to μ, and written dν

dμ (ordν/dμ). It is not defined uniquely for every point x, since any measurableg equal to f a.e. (μ) will satisfy ν(E) =

∫E

g dμ for all E ∈ S. However,dν/dμ is essentially unique, in the sense already described. (f and g may beregarded as “versions” of dν/dμ.)

An important use of the Radon–Nikodym Theorem concerns a changeof measure in an integral. If μ, ν are two σ-finite measures, and if ν �μ, the following result shows that

∫f dν =

∫f dν

dμ dμ (as if the dμ werecancelled). This and other properties of the Radon–Nikodym derivativejustify the quite suggestive symbol used to denote it.

Theorem 5.6.1 Let μ, ν be σ-finite measures on the measurable space(X,S), with ν � μ. If f is a measurable function defined on X and is eithernonnegative or ν-integrable, then

∫f dν =

∫f (dν/dμ) dμ.

Proof Write dν/dμ = g. If E ∈ S then∫χE g dμ =

∫E

g dμ = ν(E) =∫χE dν. Thus the desired result holds whenever f is the indicator func-

tion of a measurable set E. Hence, it also holds for a nonnegative simplefunction f and, by monotone convergence, for a nonnegative measurablefunction f (in the usual way, let fn be an increasing sequence of nonnega-tive simple functions converging to f at each point x. Note that g ≥ 0 a.e.(μ), hence fng increases to fg a.e. and thus Theorem 4.5.2 applies). Finally,by expressing any ν-integrable f as f+ – f– we see that the result holds forsuch an f also. �

5.6 Derivatives of measures 103

A comment on the requirement that f be defined for all x may be helpful.If f ∈ L1(X,S, ν), the set where f is not defined has ν-measure zero, but notnecessarily zero μ-measure. However, the result is true if f is defined a.e.(μ). It is, indeed, true if f ∈ L1(X,S, ν) even if f is not defined a.e. (μ), pro-vided the definition of f is extended in any way (preserving measurability)to all or almost all (μ-measure) points x. (See Ex. 5.21.)

Theorem 5.6.1 expresses the integral with respect to ν as an integralwith respect to μ when ν � μ. If moreover μ {x : dν/dμ = 0} = 0 thenμ � ν so that μ ∼ ν. For if f = dν

dμ then∫

E1f dν =

∫E

1f

dνdμ dμ = μ(E)

so that μ � ν and dμ/dν = (dν/dμ)–1 a.e. (ν). Hence μ-integrals can beexpressed as ν-integrals as well (see Ex. 5.18). In general (when no absolutecontinuity assumptions are made) one can still express ν-integrals in termsof μ-integrals and a “remainder” term. This is an immediate corollary of theLebesgue–Radon–Nikodym Theorem 5.5.4, the change of measure rule ofTheorem 5.6.1 and Ex. 4.9.

Corollary Let μ, ν, f and E0 be as in Theorem 5.5.4 (μ(E0) = 0). If g is ameasurable function defined on X, and either nonnegative or ν-integrable,then ∫

g dν =∫

gf dμ +∫E0

g dν.

Radon–Nikodym derivatives may in some ways be manipulated likeordinary derivatives of functions. For example it is obvious that d(λ+ν)

dμ =dλdμ + dν

dμ a.e. (μ) if λ � μ and ν � μ. A “chain rule” also follows as acorollary of the previous theorem.

Theorem 5.6.2 Let μ, ν be σ-finite measures on the measurable space(X,S) and λ a σ-finite signed measure on S. Then if λ � ν � μ,

dλdμ

=dλdν· dν

dμa.e. (μ).

Proof Assume that λ is a measure (the signed measure case can be ob-tained from this by the Jordan decomposition). For each E ∈ S,

∫E

dλdμ

dμ = λ(E) =∫E

dλdν

dν =∫E

dλdν· dν

dμ· dμ

by Theorem 5.6.1. Now the essential uniqueness of the Radon–Nikodymderivative (Theorem 5.5.3) implies that dλ/dμ = (dλ/dν) · (dν/dμ) a.e. (μ).

�

This section concerns some applications of the previous results to the realline as well as some further results valid only on the real line. As usual Rwill denote the real line, B the Borel sets of R, and m Lebesgue measureon B.

We begin with a refinement of the Lebesgue decomposition for aLebesgue–Stieltjes measure with respect to Lebesgue measure. A measureν on B is called discrete or atomic if there is a countable set C such thatν(Cc) = 0, i.e. if the measure ν has all its mass concentrated on a count-able set of points. This means, if ν � 0, then ν({x}) > 0 for some (or all)x ∈ C. Since countable sets have zero Lebesgue measure, discrete mea-sures are singular with respect to Lebesgue measure. Recall that a measureν on B is a Lebesgue–Stieltjes measure if and only if ν{(a, b]} < ∞ for all–∞ < a < b < ∞, or equivalently if and only if ν = μF , the Lebesgue–Stieltjes measure corresponding to a finite-valued, nondecreasing, right-continuous function F on R (Theorem 2.8.1). Since such a measure ν isσ-finite it has by Theorem 5.5.2, a Lebesgue decomposition with respectto Lebesgue measure m which we will here write as ν = ν0 + ν1, whereν0 ⊥ m and ν1 � m. It will be shown that the singular part ν0 of ν may befurther decomposed into two parts, one of which is discrete and the other issingular with respect to m and has no mass “at any one point”, i.e. havingno atoms.

Theorem 5.7.1 If ν is a Lebesgue–Stieltjes measure on B, then there arethree uniquely determined measures ν1, ν2, ν3 onB such that ν = ν1+ν2+ν3

and such that ν1 � m, ν2 is discrete, and ν3 ⊥ m with ν3({x}) = 0 for allx ∈ R.

Proof As noted above we may write ν = ν0+ν1 where ν0 ⊥ m and ν1 � m.Now let C = {x : ν0({x}) > 0}. Then since ν0({x}) ≤ ν({x}) for each x and theatoms of ν are countable (Lemma 2.8.2) it follows that C is a countable set.

Write ν2(B) = ν0(B∩C), ν3(B) = ν0(B∩Cc) for B ∈ B. Then ν0 = ν2 + ν3

and hence ν = ν1 + ν2 + ν3. Now ν2 is discrete since ν2(Cc) = 0; andν3 ⊥ m since ν0 ⊥ m implies ν0(G) = m(Gc) = 0 for some G and henceν3(G) ≤ ν0(G) = 0 = m(Gc). Further, for any x ∈ R, by definition of C,

ν3({x}) = ν0({x} ∩ Cc) ={ν0(∅) = 0 if x ∈ Cν0({x}) = 0 if x � C.

To prove uniqueness suppose that ν = ν1 +ν2 +ν3 = ν′1 +ν′2 +ν′3, where ν′i hasthe same properties as νi. Since (ν2 +ν3) and (ν′2 +ν′3) are both singular with

respect to m, the uniqueness of the Lebesgue decomposition gives ν1 = ν′1,ν2 + ν3 = ν′2 + ν′3 = ν0, say. Then clearly there is a countable set C such thatν2(Cc) = ν′2(Cc) = 0 (the union of the countable sets supporting ν2 and ν′2),so that for B ∈ B,

ν2(B) = ν2(B ∩ C) =∑

x∈B∩C ν2({x}) =∑

x∈B∩C ν0({x}).

Similarly this is also ν′2(B) so that ν2 = ν′2 and ν3 = ν′3. �

ν1 is called the absolutely continuous part of ν, ν2 is the discrete singularpart of ν (usually called just the “discrete part”), and ν3 is the continuoussingular part of ν (usually called just the “singular part”). From Theorem5.7.1 we can obtain a corresponding decomposition of F if ν = μF , andthus of any nondecreasing right-continuous function F. Before stating thisdecomposition the following terminology is needed.

Let F be a nondecreasing right-continuous function defined on R andμF its corresponding Lebesgue–Stieltjes measure. If μF � m, F is saidto be absolutely continuous with density function f = dμF /dm. SinceμF {(a, b]}<∞ for all –∞ < a < b < ∞, it follows from the Radon–NikodymTheorem that f ∈ L1(a, b) and that

F(b) – F(a) = μF {(a, b]} =∫

(a,b]f (t) dt =

∫ b

af (t) dt.

Thus for each a and all x,

F(x) = F(a) +∫ x

af (t) dt

where we write∫ x

af (t) dt = –

∫ a

xf (t) dt when x < a. Also by Theorem

5.6.1, ∫g(x) dF(x) =

∫g(x)f (x) dx

whenever g is a nonnegative measurable function on R or μF -integrable.If F is continuous and μF ⊥ m, F is said to be (continuous) singular.

Recall that F is continuous if and only if μF ({x}) = 0 for all x ∈ R. Thus “Fis singular” means that μF ⊥ m and μF ({x}) = 0, all x ∈ R.

If μF is atomic (discrete) F is called discrete. Then μF (Cc) = 0 for somecountable set C = {xn}∞n=1 and for –∞ < a < b < ∞,

F(b) – F(a) = μF {(a, b]} = μF {(a, b] ∩ C} =∑

a<xn≤bμF ({xn}).

Thus if pn = μF ({xn}), then F(x) = F(a) +∑

a<xn≤xpn for all x ≥ a. Notethat if the xn may be put in increasing order of size, F(x) may be usefullyvisualized as an increasing “step” function. This is not possible if there isno ordering by size (such as for the countable set {rn} of rational numbers).

Corollary to Theorem 5.7.1 Every nondecreasing and right-continuousfunction F defined on R has a decomposition

F(x) = F1(x) + F2(x) + F3(x), x ∈ R,

where F1, F2, F3 are nondecreasing and right-continuous, and F1 is ab-solutely continuous, F2 is discrete, and F3 is singular. Each of F1, F2, F3

is unique up to an additive constant. F has at most countably many dis-continuities, arising solely from possible jumps in the discrete componentF2.

Proof Let μF = ν1 + ν2 + ν3 be the decomposition of μF into its threecomponents. Write Fi(x) = νi{(0, x]} for x ≥ 0, and –νi{[x, 0)} for x < 0(as in the proof of Theorem 2.8.1). Then the corollary follows immediatelyfrom Theorem 5.7.1 by noting that F(x) – F(0) = F1(x) + F2(x) + F3(x) andby adding the constant F(0) to any of the Fi’s. Since each νi (i = 1, 2, 3) isunique, each Fi is unique up to an additive constant by Theorem 2.8.1.

Lemma 2.8.2 showed that F has at most countably many (jump) dis-continuities. This also follows from the above decomposition since theabsolutely continuous and singular components of a Lebesgue–Stieltjesmeasure have no atoms. Hence the only atoms arise from the discrete com-ponent. �

We introduced the notion of an absolutely continuous nondecreasingfunction F defined on R (or on [a, b]) and showed that for any –∞ < a <

b < ∞, there exists an essentially unique nonnegative function f ∈ L1(a, b)such that for all a ≤ x ≤ b

F(x) = F(a) +∫ x

af (t) dt.

This definition can be extended by allowing f to take negative as well aspositive values, but still of course requiring f ∈ L1(a, b). The resulting func-tions F are also said to be absolutely continuous. As will be seen later inthis section, the set function μF {(x, y]} = F(y) – F(x), a ≤ x < y ≤ b, can beextended to a finite signed (Lebesgue–Stieltjes) measure on B[a, b] whichis such that μF � m with dμF /dm = f . This property justifies the terminol-ogy used. F is also clearly continuous (this is an immediate application ofdominated convergence), and in fact is differentiable with derivative f a.e.(Theorem 5.7.3).

This a.e. differentiability suggests that it should be possible to use an ab-solutely continuous function F for substitution of variables in integration,i.e. to evaluate

∫g(x) dx as

∫g(F(t))f (t) dt (i.e. formally writing x = F(t)

and regarding f (t) as the derivative F′(t)). This is readily seen to be true

for nondecreasing (absolutely continuous) F, for which it is simply checked(Ex. 2.19) that μF F–1 = m, Lebesgue measure, and hence by Theorem 4.6.1,for appropriate functions g,

∫g(y) dy =

∫g dμF F–1 =

∫(g ◦ F) dμF =

∫(g ◦ F)

dμF

dmdm

=∫

g(F(x))f (x) dx

by Theorem 5.6.1.When F is not monotone the proof still relies on the above simple ar-

gument but requires the splitting of the interval of integration into parts asseen in the figure in Theorem 5.7.2. The proof is straightforward but moretedious and is given here for reference.

Theorem 5.7.2 Let F be an absolutely continuous function on [a, b],–∞ < a < b < +∞, with F(x) = F(a) +

∫ x

af (t) dt, f ∈ L1(a, b), and g

a Borel measurable function defined on R. If g(F(t))f (t) ∈ L1(a, b) theng(x) ∈ L1(F(a), F(b)) or g(x) ∈ L1(F(b), F(a)) according as F(a) < F(b) orF(b) < F(a) respectively, and∫ F(b)

F(a)g(x) dx =

∫ b

ag(F(t))f (t) dt

(where∫ β

αg(x) dx = –

∫ α

βg(x) dx for β < α).

Proof For E ∈ B denote the Borel subsets of E by B(E) and write m forLebesgue measure on B(E). Define ν for E ∈ B(a, b) by ν(E) =

∫E

f (t) dt.Since f ∈ L1(a, b), ν is a finite signed measure by Theorem 5.2.3. Alsoν � m and by the Radon–Nikodym Theorem 5.5.3, dν/dm = f .

Consider the function F as a transformation from ((a, b), B(a, b), ν) into(R,B). Since F is continuous it is measurable (Ex. 3.10) and induces thesigned measure νF–1 on B. We will show that if F(a) < F(b) then

νF–1(B) = m{B ∩ (F(a), F(b))} for all B ∈ B, (5.1)

and if F(b) < F(a) then

νF–1(B) = –m{B ∩ (F(b), F(a))} for all B ∈ B. (5.2)

(For F nondecreasing this was shown in Ex. 2.19.)Let m, M be the minimum and maximum values of (the continuous func-

tion) F on [a, b]. Assume first that F(a) < F(b). Let I = (c, d) be an openinterval of R. Since F is continuous, F–1I is an open subset of (a, b) andas such it may be written as a countable union of open intervals; these arefacts of elementary real line topology. Clearly F–1I is nonempty if and only

if I ∩ [m, M] is nonempty, and this is henceforth assumed without loss ofgenerality.

Consider first the case where I contains neither F(a) nor F(b), i.e. I eitheris a subset of or is disjoint from (F(a), F(b)). Then, by the continuity of F,open intervals Jn = (α, β) can be found such that F(x) ∈ I = (c, d) for allx ∈ Jn, and

F(α) = c, F(β) = d or F(α) = d, F(β) = c (interval of type 1)or F(α) = F(β) = c (interval of type 2)or F(α) = F(β) = d (interval of type 3)

(see figure below). It follows that F–1I = ∪kJ1k∪p J2p∪q J3q where for eachi = 1, 2, 3, k = 1, 2, 3, . . . Jik are the distinct intervals of type i. Since ν(Jik) =∫

Jikf (t) dt which is d – c = m(I) or c – d = –m(I) for i = 1 and, is zero for

i = 2, 3,

(νF–1)(I) = ν(F–1I) =∑

k ν(J1k).

Also |ν(J1k)| = m(I) for all k, which implies |ν|(J1k) ≥ |ν(J1k)| = m(I).However, since ν is a finite signed measure, |ν| is finite and

∑k|ν|(J1k) =

|ν|(∪kJ1k) < ∞, it follows that the number of nonempty J1k’s is finite. Theymay therefore be ordered as {J11, J12, . . . , J1s}.

Now it is quite clear from the continuity of F that ν(J1k) + ν(J1 k+1) = 0,since if ν(J1k) = m(I) then F is “increasing overall” on J1k, hence overalldecreasing on the next interval J1 k+1, and thus ν(J1 k+1) = –m(I); similarlyif ν(J1 k) = –m(I) then ν(J1 k+1) = m(I). Since (νF–1)(I) = ν(J11) + · · ·+ ν(J1s)it follows that (νF–1)(I) = 0 when s is even, and (νF–1)(I) = m(I) when s isodd. If I ⊂ (F(a), F(b)) it is clear that s is odd and thus (νF–1)(I) = m(I).On the other hand if I ⊂ R – (F(a), F(b)) then s is even and (νF–1)(I) = 0.In either case (νF–1)(I) = m{I ∩ (F(a), F(b))}.

Now consider the case where I contains F(b) but not F(a). We can thenwrite

F–1I = ∪kJ1k∪pJ2p∪qJ3q ∪ (b′, b)

where (b′, b) is disjoint from all intervals Jik and F(b′) = c. It is again clearthat the number s of nonempty J1k is even and thus

(νF–1)(I) = ν{(b′, b)} =∫ b

b′f (t) dt = F(b) – F(b′)

= F(b) – c = m{I ∩ (F(a), F(b))}.

The same result is obtained similarly when I contains F(a) but not F(b).

It then follows that for every open interval I in R

(νF–1)(I) = m{I ∩ (F(a), F(b))}.

Hence the same is true for semiclosed intervals and then, by Lemma 5.2.4,for all Borel sets. Thus (5.1) is established, i.e. νF–1 is Lebesgue measurem on the Borel subsets of (F(a), F(b)), and the zero measure on the Borelsubsets of R – (F(a), F(b)). Similarly, when F(b) < F(a), (5.2) is estab-lished, i.e. νF–1 is negative Lebesgue measure (–m) on the Borel subsets of(F(b), F(a)) and the zero measure on the Borel subsets of R – (F(b), F(a)).

Now g(F(t))f (t) ∈ L1(a, b) implies∫ b

a|g(F(t))||f (t)| dt < ∞. By the dis-

cussion following Theorem 5.2.3 we have |ν|(E) =∫

E|f (t)| dt, and thus by

the Radon–Nikodym Theorem and Theorem 5.6.1,∫ b

a|g(F(t))| d|ν|(t)<∞,

i.e. gF ∈ L1(|ν|). Hence, g ∈ L1(|ν|F–1), i.e. g ∈ L1(F(a), F(b)) if F(a) <F(b), g∈L1(F(b), F(a)) if F(b) < F(a), and by the transformation theoremfor signed measures (Theorem 5.3.2),

∫ ∞–∞ g dνF–1 =

∫ b

agF dν =

∫ b

ag(F(t))f (t) dt

by the Radon–Nikodym Theorem and Theorem 5.6.1. Also, by what hasbeen shown, when F(a) < F(b),

∫ ∞–∞ g dνF–1 =

∫ F(b)

F(a)g(x) dx.

When F(b) < F(a),∫ ∞

–∞ g dνF–1 = –∫ F(a)

F(b)g(x) dx =

∫ F(b)

F(a)g(x) dx, and

hence ∫ F(b)

F(a)g(x) dx =

∫ b

ag(F(t))f (t) dt

in all cases which completes the proof of the theorem. �

Absolutely continuous functions have many important properties someof which we now state. Their proofs may be found in standard texts on RealAnalysis. First, there is an equivalent definition of absolute continuity morein line with the definition of continuity (in fact of uniform continuity) asfollows. A function F is absolutely continuous on [a, b] if and only if forevery ε > 0 there is a δ = δ(ε) > 0 such that∑n

i=1|F(x′i) – F(xi)| < ε

for every finite collection {(xi, x′i)}ni=1 of disjoint intervals in [a, b] with∑n

i=1|x′i – xi| < δ. An important property of absolutely continuous functions istheir differentiability a.e.

Theorem 5.7.3 Every absolutely continuous function is differentiable a.e.(m). In particular if F is absolutely continuous on [a, b] and F(x) = F(a) +∫ x

af (t) dt, a ≤ x ≤ b, f ∈ L1(a, b), then F′(x) = f (x) a.e. (m) on [a, b]. If

moreover f is continuous, then F′(x) = f (x) for all a ≤ x ≤ b.

This property makes precise the sense in which integration is the inverseof differentiation, and vice versa. Thus if f ∈ L1(a, b) we have

ddx

∫ x

af (t) dt = f (x) a.e. (m),

and if F is absolutely continuous on [a, b], then∫ b

aF′(t) dt = F(b) – F(a).

A further important class of functions are the functions of bounded vari-ation. A real-valued function F defined on [a, b], –∞ < a < b < +∞,is said to be of bounded variation if it is the difference of two nonde-creasing functions defined on [a, b] (the term “bounded variation” will bejustified below and in Ex. 5.26). Since nondecreasing functions have atmost a countable number of points of discontinuity (which must be jumps),the same is true for functions of bounded variation. Hence it can be eas-ily seen that if the function F of bounded variation is right-continuous,then F = F1 – F2 where the functions F1 and F2 are nondecreasing andmay be taken to be both right-continuous, e.g. by replacing F1(x), F2(x) byF1(x + 0), F2(x + 0) – cf. Ex. 5.27. The relationship between nondecreas-ing functions and (Lebesgue–Stieltjes) measures given in Theorem 2.8.1,

provides a corresponding relationship between functions of bounded vari-ation and signed measures.

Theorem 5.7.4 (i) If F is a right-continuous function of bounded vari-ation on [a, b], –∞ < a < b < +∞, then there is a unique finitesigned measure μF on the Borel subsets of (a, b] such that μF {(x, y]} =F(y) – F(x) whenever a ≤ x < y ≤ b.

(ii) Conversely, if ν is a finite signed measure on the Borel subsets of (a, b],–∞ < a < b < +∞, then there exists a right-continuous function F ofbounded variation on [a, b] such that ν = μF . F is unique up to anadditive constant.

Proof (i) Let F = F1 – F2, where F1 and F2 are nondecreasing and right-continuous functions on [a, b]. Let μF1

and μF2be the Lebesgue–Stieltjes

measures corresponding to F1 and F2, and define μF = μF1– μF2

. ClearlyμF is a finite signed measure on the Borel subsets of (a, b] and whenevera ≤ x < y ≤ b,

μF {(x, y]} = μF1{(x, y]} – μF2

{(x, y]}= F1(y) – F1(x) – {F2(y) – F2(x)}= {F1(y) – F2(y)} – {F1(x) – F2(x)} = F(y) – F(x).

Hence μF {(x, y]} depends on F but not on its particular representation asF1 – F2. The uniqueness of μF now follows from the fact that if two finitesigned measures ν1, ν2 agree on the semiring P(a, b] of intervals (x, y],a ≤ x ≤ y ≤ b, then they agree on B(a, b] = S(P(a, b]) (Lemma 5.2.4).

(ii) Conversely, if ν is a finite signed measure on B(a, b], let ν = ν+ – ν–

be its Jordan decomposition and define F1(x) = ν+(a, x], F2(x) = ν–(a, x],a ≤ x ≤ b. Clearly F1 and F2 are nondecreasing and right-continuous andif F = F1 – F2, then F is a right-continuous function of bounded variationon [a, b]. Clearly μF and ν are equal on P(a, b] and hence also on B(a, b](Lemma 5.2.4), i.e. ν = μF . Finally if G is another right-continuous functionof bounded variation such that μG = ν = μF we have for all a ≤ x ≤ b, G(x)–G(a) = μG (a, x] = μF (a, x] = F(x) – F(a). Hence G(x) = F(x) + G(a) – F(a),which shows that F is unique up to an additive constant. �

If F is a right-continuous function of bounded variation on [a, b] and ga Borel measurable function such that the integral

∫(a,b]

g dμF is defined wewrite ∫

(a,b]g(x) dF(x) =

∫(a,b]

g dF =∫

(a,b]g dμF ,

and thus define the Lebesgue–Stieltjes Integral∫

(a,b]g dF by

∫(a,b]

g dμF .

Absolutely continuous functions on [a, b] are of bounded variation andin fact their Lebesgue–Stieltjes signed measures are absolutely continuouswith respect to Lebesgue measure. Indeed if F is absolutely continuous on[a, b] then F(x) = F(a)+

∫ x

af (t) dt, a ≤ x ≤ b, f ∈ L1[a, b]. Writing f = f+ –f–

gives

F(x) = F(a) +∫ x

af+(t) dt –

∫ x

af–(t) dt.

Since f+(t) ≥ 0, f–(t) ≥ 0, their integrals are nondecreasing functions in xand thus F is of bounded variation. Clearly whenever a ≤ x ≤ y ≤ b,

μF {(x, y]} = F(y) – F(x) =∫ y

xf (t) dt =

∫(x,y]

f (t) dt

and hence

μF (B) =∫

Bf (t) dt for all B ∈ B(a, b]

since the two finite signed measures agree on P(a, b]. Thus μF � m anddμF /dm = f .

We finally mention that, as shown in Ex. 5.26, a function F is of boundedvariation on [a, b] if and only if

sup∑N

n=1 |F(xn) – F(xn–1)| < ∞

where the supremum is taken over all N and all subdivisions a = x0 < x1 <

· · · < xN = b. This justifies the use of the term bounded variation, and infact the sup is called the total variation of F on [a, b].

One can similarly consider functions F of bounded variation on R, inwhich case the corresponding Lebesgue–Stieltjes measure μF is a finitesigned measure on B.

Exercises5.1 Give an example of a signed measure μ on a measurable space (X,S) for

which there is a measurable set E with μ(E) = 0 and a measurable subset Fof E with μ(F) > 0.

5.2 If μi are measures define μ(E) =∑∞

1 μi (E). Is μ a measure? If μi are finite, isμ necessarily either finite or σ-finite? If each μi is a finite signed measure isμ a signed measure?

5.3 If ν is a finite signed measure on the measurable space (X,S), then show thatthere exists a finite constant M such that |ν(E)| ≤ M for all E ∈ S.

5.4 If λ, ν are finite signed measures, show that so is aλ + bν, where a, b are realnumbers. If λ, ν are signed measures show that so is aλ + bν provided thatab > 0 if λ and ν assume the same infinite value, and ab < 0 if one of λ, νassumes the value +∞ and the other –∞.

Exercises 113

5.5 If λ and ν are finite signed measures, or signed measures assuming the sameinfinite value +∞ or –∞ (if at all), show that |λ + ν| ≤ |λ| + |ν|, i.e. that foreach measurable set E,

|λ + ν|(E) ≤ |λ|(E) + |ν|(E).

5.6 Let μ be a signed measure on (X,S) and μ = μ+ – μ– its Jordan decomposi-tion. (i) Show that μ+ ⊥ μ– and that (μ+, μ–) is the unique pair of singularmeasures on S whose difference is μ (this is a uniqueness property of theJordan decomposition). (ii) If μ = λ1 – λ2 where λ1, λ2 are measures on S,show that

μ+ ≤ λ1 and μ– ≤ λ2

(this is a “minimal property” of the Jordan decomposition).5.7 Let μ be a finite signed measure on a measurable space (X,S). Show that for

all E ∈ S,

|μ|(E) = sup∑n

i=1|μ(Ei)|

where the sup is taken over all finite partitions of E into disjoint measurablesets Ei, E = ∪n

i=1Ei, and also

|μ|(E) = sup∣∣∣∫

E f dμ∣∣∣

where the sup is taken over all measurable functions f such that |f | ≤ 1 a.e.(|μ|) on X.

5.8 Let (X,S, μ) be a measure space and ν a signed measure on S. Show thatν ⊥ μ if and only if both ν+ ⊥ μ, ν– ⊥ μ.

5.9 If (X,S, μ) is a measure space and ν is a signed measure on S, show thatν ⊥ μ if and only if there is a set G ∈ S with μ(G) = 0 and such that ν(E) = 0for every measurable subset E of Gc.

5.10 Let (X,S, μ) be a measure space, and let λ, ν each be a signed measure onS such that |λ(E)| ≤ |ν|(E) for all E ∈ S. (In particular this holds if |λ(E)| ≤|ν(E)| for all E ∈ S.) Show that

(i) If ν � μ then λ � μ.(ii) If ν ⊥ μ then λ ⊥ μ.

5.11 Let μ be a measure and let λ, ν be signed measures on a measurable space(X,S) such that both λ and ν assume the same infinite value +∞ or –∞. Showthat

(i) If λ � μ, ν � μ then λ + ν � μ.(ii) If λ ⊥ μ, ν ⊥ μ then λ + ν ⊥ μ.

Note: To show (ii) find a set G such that μ(G) = 0 and both |λ|(Gc) = |ν|(Gc) =0 and use Ex. 5.4.

5.12 If μ is a measure on (X,S) and ν is a signed measure on S such that bothν � μ and ν ⊥ μ, show that ν = 0 (i.e. ν(E) = 0 for all measurable E). Note:It is simplest to show that |ν|(X) = 0, using Theorem 5.4.1.


5.13 If ν is a signed measure show that ν+ ⊥ ν– and that ν � |ν|.5.14 Let (X,S) and (Y ,T ) be measurable spaces, let T be a measurable transfor-

mation from (X,S) into (Y ,T ), and let μ, ν be two measures on S. Showthat

(i) If ν � μ, then νT–1 � μT–1.(ii) If ν ∼ μ, then νT–1 ∼ μT–1.

(iii) If νT–1 ⊥ μT–1, then ν ⊥ μ.

(The converse statements are not true in general.)5.15 Let μ and ν be two σ-finite measures on the measurable space (X,S) such

that ν(E) ≤ μ(E) for all E in S. Show that ν is absolutely continuous withrespect to μ and that the Radon–Nikodym derivative f = dν/dμ satisfies0 ≤ f ≤ 1 a.e. (μ).

5.16 If μ is a σ-finite measure and ν a σ-finite signed measure on (X,S) such thatν � μ show that

|ν|{x :dνdμ

(x) = 0} = 0.

5.17 Let μ, ν be σ-finite measures on a measurable space (X,S). Show that ν �μ + ν and

0 ≤ dνd(μ + ν)

≤ 1 a.e. (μ + ν).

If also ν � μ, show that one of the inequalities is strict.5.18 All measures considered here are σ-finite measures on the measurable space

(X,S).

(i) If ν � μ and dνdμ = f , then show that ν ∼ μ if and only if μ{x ∈ X : f (x) =

0} = 0, and then dμ/dν = 1/f .(ii) If νi ∼ μ and dνi/dμ = fi, i = 1, 2, then show that ν1 ∼ ν2 and dν2/dν1 =

f2/f1 a.e. (μ).(iii) On the measurable space (R,B) (R = the real line, B = the Borel sets

of R) give the following examples:

(a) a finite measure equivalent to Lebesgue measure,(b) two (mutually) singular measures each of which is absolutely con-

tinuous with respect to Lebesgue measure.

5.19 Let μ, ν and f be as in Theorem 5.5.4. Show that

(i) μ{x : f (x) > 0} = 0 if and only if μ ⊥ ν.(ii) μ{x : f (x) = 0} = 0 if and only if μ � ν.

5.20 Let X = [0, 1],S be the class of Lebesgue measurable subsets of X, mLebesgue measure on S, and ν counting measure on S (i.e. if E ∈ S is afinite set of points, ν(E) is the number of points in E; otherwise ν(E) = +∞).

Exercises 115

(i) Show that ν has no Lebesgue decomposition with respect to m.(ii) Show that m � ν but that there is no nonnegative, ν-integrable function

f on X such that m(E) =∫E f dν for all E ∈ S.

Note that ν is not σ-finite and thus σ-finiteness cannot be dropped in theLebesgue decomposition theorem and the Radon–Nikodym Theorem.

5.21 With the notation of Theorem 5.6.1 suppose that f is nonnegative, measur-able, and defined a.e. (ν). Let f * be defined for all x in such a way thatf *(x) = f (x) when x ∈ A, the set where f is defined, and so that f * ismeasurable. Show that

∫f dν =

∫f * dν

dμ dμ. (Note that the right hand sideis

∫A f * dν

dμ dμ since dνdμ = 0 a.e. (μ) on Ac.) Show a corresponding result if

f ∈ L1(X,S, ν).5.22 Let 0 = x0 < x1 < · · · < xn < +∞, let a0, a1, . . . , an be positive numbers and

let F be defined on the real line by

F(x) =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩0 for x < 0∑k

i=0ai + 1 – e–x for xk ≤ x < xk+1, k = 0, 1, . . . , n – 1∑ni=0ai + 1 – e–x for x ≥ xn.

If μF is the Lebesgue–Stieltjes measure corresponding to F, find:

(i) a Hahn decomposition for μF ,(ii) the Lebesgue decomposition of μF with respect to Lebesgue measure,

(iii) the Radon–Nikodym derivative of the absolutely continuous part of μF

with respect to Lebesgue measure,(iv) the discrete and the continuous singular part of μF .

5.23 Let R be the real line, R+ = (0, +∞), B the Borel sets of R, B+ the Borelsets of R+ (i.e. the σ-field generated by P = {(a, b] : 0 < a ≤ b < +∞}), andm Lebesgue measure. Let the transformation T from (R,B, m) into (R+,B+)be defined by Tx = ex for all x ∈ R. Show that T is measurable and thatthe measure mT–1 it induces on B+ is absolutely continuous with respect toLebesgue measure with Radon–Nikodym derivative 1/x (= (dmT–1/dμ)(x)).(Hint: Use the property

∫ ba

1x dx = log b – log a for 0 < a ≤ b < +∞, and the

extension theorem.)5.24 Let R be the real line, L the σ-field of Lebesgue measurable sets, and μ a

σ-finite measure on L. For every a in R, let Ta be the transformation from(R,L, μ) to (R,L) defined by Ta(x) = x + a for all x ∈ R and let μa = μT–1

a .Then a is called an admissible translation of μ if μa is absolutely continuouswith respect to μ.If a is an admissible translation of μ write fa = dμa /dμ. Prove that if a andb are admissible translations then so is a + b and that fa+b = fa(x)fb(x – a)a.e. (μ).

5.25 Let R be the real line, B the Borel sets of R, m Lebesgue measure on B, Ia bounded interval, BI the Borel subsets of I, and mI Lebesgue measure onBI (i.e. the restriction of m to BI). Let f be a real-valued, Borel measurable

function defined on I. Then the induced measure ν = mI f–1 on B is called

the occupation time measure of f; ν(E) is the “amount of time” in I spent byf at values in E ∈ B. Also, if ν is absolutely continuous with respect to m, itsRadon–Nikodym derivative φ is called the local time of f . Denote by fA therestriction of f to A ∈ BI and by νA the occupation time measure of fA. Showthe following:

(a) If f has local time φ then for every A ∈ BI , fA has local time, denoted byφA, and φA ≤ φ a.e. (m).

(b) For every A, B ∈ BI ,∫A φB(f (t)) dt =

∫ ∞–∞φA(x)φB(x) dx =

∫BφA(f (t)) dt.

(c) φ(f (t)) > 0 a.e. (mI ). (Hint: Let A = {t ∈ I : φ(f (t)) = 0} and show thatφA = 0 a.e. (m) by using (a) and (b).)

5.26 Let F be a real-valued function on [a, b] and define the extended real-valuedfunction V(x) on [a, b] by

V(x) = sup∑N

n=1|F(xn) – F(xn–1)|, a ≤ x ≤ b,

where the supremum is taken over all N and all subdivisions a = x0 < x1 <

· · · < xn = x. Clearly 0 ≤ V(x) ≤ V(y) ≤ ∞ whenever a ≤ x < y ≤ b. Showby the following steps that F is of bounded variation on [a, b] (Section 5.7)if and only if V(b) < ∞, thus justifying the term used.

(i) If F is of bounded variation show that V(b) < ∞. (Write F = F1 –F2, F1, F2 nondecreasing and show that V(b) ≤ F1(b) – F1(a) + F2(b) –F2(a).)

(ii) If V(b) < ∞ show that F is of bounded variation as follows. First showthat |F(y) – F(x)| ≤ V(y) – V(x) whenever a ≤ x < y ≤ b. Then define

F1(x) = (V(x) + F(x))/2, F2(x) = (V(x) – F(x))/2, a ≤ x ≤ b,

and show that F1, F2 are nondecreasing functions and F = F1 – F2.(iii) If F is a right-continuous function of bounded variation on [a, b] show

that |μF |(a, x] = V(x), a ≤ x ≤ b. (V(x) ≤ |μF |(a, x] follows directlyfrom the definition of V . For the reverse inequality notice that by (ii),|μF (x, y]| ≤ μV (x, y], hence |μF (B)| ≤ μV (B) for all B ∈ B[a, b], and|μF |(B) ≤ μV (B).)

5.27 Show that if a function F(x) of bounded variation is right-continuous, thenthe nondecreasing functions F1(x), F2(x) in the representation F = F1 – F2

may each be taken to be right-continuous.5.28 State the change of variable of integration result (Theorem 5.7.2) for a func-

tion F of bounded variation. Are any adjustments needed in the proof ofTheorem 5.7.2 in this case?

Exercises 117

5.29 Let μ be a complex measure on the measurable space (X,S). Then μ maybe written as μ = μ1 + iμ2 where μ1, μ2 are finite signed measures. Writeν = |μ1| + |μ2|. Then by Ex. 5.17, further write g1 = dμ1/dν, g2 = dμ2/dν anddefine the total variation of the complex measure μ as, for all E ∈ S,

|μ|(E) =∫

E

√g2

1 + g22 dν.

Show that |μ| is a finite measure on (X,S), and there is a complex-valuedmeasurable function f (i.e. f = f1 + if2 where f1, f2 are measurable) such that|f | = 1 and for all E ∈ S, μ(E) =

∫E f d|μ|.

(This may be written f = dμ/d|μ|, and is called the polar representationor decomposition of μ. This definition of the total variation of a complexmeasure μ is equivalent to the more intuitive definition as,

|μ|(E) = sup∑n

k=1 |μ(Ek)|,

where the sup is taken over all n and over all disjoint partition of E such thatE = ∪n

k=1Ek.)

6

Convergence of measurable functions,Lp-spaces

6.1 Modes of pointwise convergence

Throughout this chapter (X,S, μ) will denote a fixed measure space. Con-sider a sequence {fn} of functions defined on E ⊂ X and taking values inR

*. If f is a function on E (to R*) and fn(x) → f (x) for all x ∈ E, then fn

converges pointwise on E to f . If E ∈ S and μ(Ec) = 0 then fn → f (point-wise) a.e. (as in Chapter 4). It is clear that if fn → f , fn → g a.e. then f = ga.e. since the limit is unique where it exists.

If fn is finite-valued on E, and given any ε > 0, x ∈ E, there exists N =N(x, ε) such that | fn(x) – fm(x)| < ε for all n, m > N, then {fn} is said to be a(pointwise) Cauchy sequence on E. If E ∈ S and μ(Ec) = 0, {fn} is calledCauchy a.e. Since each Cauchy sequence of real numbers has a finite limit,if {fn} is Cauchy on E (or Cauchy a.e.) there is a finite-valued function fsuch that fn → f on E (or fn → f a.e.).

If {fn} is a sequence of finite-valued functions on a set E and f is finite-valued on E, we say that fn converges to f uniformly on E if, given anyε > 0, there exists N = N(ε) such that | fn(x) – f (x)| < ε for all n ≥ N, x ∈ E.If E ∈ S and μ(Ec) = 0, we say that fn → f uniformly a.e. Similarly, ifgiven any ε > 0, there exists N = N(ε) such that | fn(x)– fm(x)| < ε whenevern, m > N, x ∈ E, {fn} is called a uniformly Cauchy sequence on E. Such asequence is pointwise Cauchy on E and thus has a pointwise limit f (x) on E.By letting m→ ∞ in the definition just given, it follows that | fn(x)–f (x)| < εfor all n ≥ N, x ∈ E; that is fn → f uniformly on E.

One may also talk about a sequence which is convergent or Cauchy(pointwise or uniformly) a.e. on a set E ∈ S. (For example fn → f a.e.on E if fn(x) → f (x) on E – F for some F ∈ S, μ(F) = 0.) The aboveremarks all hold for such sequences (e.g. if fn is Cauchy a.e. on E then fn

converges a.e. on E to some f ).In addition to pointwise convergence (a.e.) and uniform convergence

(a.e.), a third (technically useful) concept is that of “almost uniform

118

6.1 Modes of pointwise convergence 119

convergence”. Specifically if {fn} and f are functions defined on E ∈ Sand taking values in R*, fn is said to converge to f almost uniformly on Eif, given any ε > 0, there is a measurable set F = Fε with μ(F) < ε andsuch that fn → f uniformly on E – F. (In particular, this requires fn and fto be finite-valued on E – Fε for any ε > 0, and it is easily seen that thisrequires fn and f to be finite-valued a.e. on E.) Similarly a sequence {fn} of(a.e. finite-valued) functions on E is said to be almost uniformly Cauchy onE if given any ε > 0 there is a measurable subset F = Fε with μ(F) < ε suchthat fn is uniformly Cauchy on E – F. We abbreviate “almost uniformly” toa.u. It is worth remarking that while uniform convergence a.e. clearly im-plies convergence almost uniformly, the converse is not true (Ex. 6.1). Thefollowing result shows that, as would be expected, almost uniform conver-gence implies convergence a.e.

Theorem 6.1.1 If {fn} is a sequence of functions on E ∈ S to R*, and fn isalmost uniformly Cauchy on E (or fn → f almost uniformly on E), then fn

is Cauchy a.e. on E (or fn → f a.e. on E).

Proof Suppose {fn} is a.u. Cauchy on E. Then given any integer p ≥ 1there exists a measurable set Fp such that μ(Fp) < 1/p and {fn} is uniformlyCauchy on E – Fp, and hence pointwise Cauchy on E – Fp. Let F = ∩∞p=1Fp.μ(F) ≤ μ(Fp) < 1/p and hence μ(F) = 0. If x ∈ E–F then x ∈ E–Fp for somep and hence {fn(x)} is a Cauchy sequence. That is, {fn} is pointwise Cauchyon E – F. This proves the first assertion. The second follows similarly. �

This result will be used to show that a sequence which is almost uni-formly Cauchy converges almost uniformly.

Theorem 6.1.2 If {fn} is almost uniformly Cauchy on E ∈ S, then thereexists a function f such that fn → f almost uniformly on E.

Proof If {fn} is Cauchy a.u., it is Cauchy a.e. on E by Theorem 6.1.1, andhence there is a function f on E such that fn → f a.e. on E. Since fn is a.u.Cauchy, given ε > 0 there is a measurable set F = Fε , μ(F) < ε, such that fn

is uniformly Cauchy on E – F. The set of points of E where fn �→ f may beincluded in F without increasing its measure. But fn is uniformly Cauchyand hence converges uniformly to a function g on E – F. Since uniformconvergence implies convergence at each x it follows that fn converges toboth f and g on E – F. Thus f = g there and fn → f uniformly on E – F. Butthis shows that fn → f a.u. on E, as required. �

120 Convergence of measurable functions, Lp-spaces

One would not necessarily expect convergence a.e. to imply almost uni-form convergence, i.e. the converse to Theorem 6.1.1 to hold. This does infact hold, however, for measurable functions on sets of finite measure.

Theorem 6.1.3 (Egoroff’s Theorem) Let E ∈ S, with μ(E) < ∞, and let{fn} and f be measurable functions defined and finite a.e. on E and suchthat fn → f a.e. on E. Then fn → f almost uniformly on E.

Proof By excluding the zero measure subset of E where fn or f is notdefined, or infinite, or where fn(x) �→ f (x), it is seen that no generality islost in assuming that fn(x), f (x) are defined and finite and that fn(x)→ f (x)for all x ∈ E. Write, for m, n = 1, 2, . . . ,

Emn = ∩∞i=n{x ∈ E : | fi(x) – f (x)| < 1/m}.

Then Emn ∈ S, and for each fixed m, {Em

n } is monotone increasing in n withlimn Em

n = E (since fn → f on E). Thus E – Emn is decreasing in n and

limn(E – Emn ) = ∅. Since μ(E) < ∞ it follows that μ(E – Em

n )→ 0 as n→ ∞.Hence, given ε > 0 there is an integer Nm = Nm(ε) such that μ(E – Em

n ) <ε/2m for n ≥ Nm. Write F = Fε = ∪∞m=1(E – Em

Nm). Then clearly F ⊂ E, F ∈ S

and

μ(F) ≤∞∑

m=1

μ(E – EmNm

) <

∞∑m=1

ε

2m = ε.

We now show that fn → f uniformly on E – F. If x ∈ E – F, then x ∈EmNm

,m = 1, 2, . . . , and thus

| fi(x) – f (x)| < 1/m for all i ≥ Nm.

Hence given any δ > 0, m may be chosen such that 1/m < δ giving | fi(x) –f (x)| < δ for all i ≥ Nm and all x ∈ E – F. (Note Nm does not depend on x.)It follows that fn → f uniformly on E – F, and thus fn → f a.u. on E. �

6.2 Convergence in measure

We turn now to another form of convergence (particularly important inapplications to probability theory). Consider a measurable set E and a se-quence of measurable functions { fn} defined on E, and finite a.e. on E. Thenif f is a measurable function defined and finite a.e. on E we say that fn → fin measure on E if for any given ε > 0,

μ{x ∈ E : | fn(x) – f (x)| ≥ ε} → 0 as n→ ∞.

6.2 Convergence in measure 121

That is, the emphasis is not on the difference between fn and f at eachpoint, but rather with the measure of the set where the difference is at leastε. Similarly fn is a Cauchy sequence in measure on E if for each ε > 0,

μ{x ∈ E : | fn(x) – fm(x)| ≥ ε} → 0 as n, m→ ∞.

The set E will be regarded as the precise set of definition of the fn and f(even if some of these functions have been defined on larger sets). Then Emay be omitted in the above expressions.

Finally, if μ(Ec) = 0 and fn → f in measure on E (or { fn} is Cauchy inmeasure on E) we say that fn → f in measure (or { fn} is Cauchy in measure)without reference to a set.

It will be seen next that a sequence which converges in measure is Cauchyin measure, and the limits in measure are essentially unique.

Theorem 6.2.1 (i) If { fn} converges in measure (to f , say) on E ∈ S,then { fn} is Cauchy in measure on E.

(ii) If { fn} converges in measure on E to both f and g, then f = g a.e. on E,i.e. limits in measure are “essentially unique”.

Proof Since | fn – fm| ≤ | fn – f | + | f – fm|, it follows that for any ε > 0

{x : | fn(x) – fm(x)| ≥ ε} ⊂ {x : | fn(x) – f (x)| ≥ ε/2}∪ {x : | fm(x) – f (x)| ≥ ε/2}

(for if x is not in the right hand side, then | fn(x)–fm(x)| < ε). The measure ofeach set on the right tends to zero as n, m→ ∞ since fn → f in measure onE. Hence also so does the measure of the set on the left hand side, showingthat { fn} is Cauchy in measure on E.

To prove (ii) note that it follows in an exactly analogous way that for anyε > 0,

μ{x : | f (x) – g(x)| ≥ ε} ≤ μ{x : | f (x) – fn(x)| ≥ ε/2}+ μ{x : | fn(x) – g(x)| ≥ ε/2}

→ 0 as n→ ∞.

Hence μ{x : | f (x) – g(x)| ≥ ε} = 0 for each ε > 0 and thus

μ{x : f (x) � g(x)} = μ[∪∞n=1{x : | f (x) – g(x)| ≥ 1/n}] = 0,

so that f = g a.e. on E, as required. �

We now turn to the relationship between convergence in measure, andalmost uniform (and hence also a.e.) convergence. It will first be shown that

almost uniform convergence of measurable functions implies convergencein measure.

Theorem 6.2.2 Let { fn}, f be measurable functions defined on E ∈ S andfinite a.e. on E.

(i) If { fn} is Cauchy almost uniformly on E, it is Cauchy in measure on E.(ii) If fn → f almost uniformly on E, then fn → f in measure on E.

Proof If { fn} is Cauchy a.u. on E, given any δ > 0 there is a measurable setFδ ⊂ E such that μ(Fδ) < δ and fn–fm → 0 uniformly on E–Fδ as n, m→ ∞.Hence if ε > 0, there exists N = N(ε, δ) such that | fn(x) – fm(x)| < ε for alln, m ≥ N, and all x ∈ E – Fδ. Thus

μ{x : | fn(x) – fm(x)| ≥ ε} ≤ μ(Fδ) < δ for m, n ≥ N,

or μ{x : | fn(x) – fm(x)| ≥ ε} → 0 as n, m → ∞. Hence (i) follows and theproof of (ii) is virtually the same. �

As a corollary, convergence of measurable functions a.e. on sets of finitemeasure implies convergence in measure.

Corollary If μ(E) < ∞ and fn → f a.e. on E, then fn → f in measureon E.

Proof By Egoroff’s Theorem (Theorem 6.1.3) fn → f a.u. on E and thusby Theorem 6.2.2 (ii), fn → f in measure on E. �

In the converse direction we show that convergence in measure impliesalmost uniform (and hence also a.e.) convergence of a subsequence of theoriginal sequence. This is a corollary of the following result which showsthat if a sequence is Cauchy in measure, it has a limit in measure (a prop-erty, i.e. completeness, of all modes of convergence considered previously).

Theorem 6.2.3 Let { fn} be a sequence of measurable functions on a setE ∈ S which is Cauchy in measure on E. Then

(i) There is a subsequence { fnk } which is Cauchy almost uniformly on E,and

(ii) There is a measurable function f on E such that fn → f in measure onE. By Theorem 6.2.1 (ii) f is essentially unique on E.

Proof (i) For each integer k there exists an integer nk such that for n,m ≥ nk

μ{x : | fn(x) – fm(x)| ≥ 2–k} ≤ 2–k.

6.2 Convergence in measure 123

Further we may take n1 < n2 < n3 < · · · . Write

Ek = {x : | fnk (x) – fnk+1 (x)| ≥ 2–k}, k = 1, 2, . . .

Fk = ∪∞m=kEm.

Then μ(Ek) ≤ 2–k and μ(Fk) ≤ ∑∞m=k μ(Em) ≤ 2–k+1. Now given ε > 0,

choose k such that 2–k+1 < ε and hence μ(Fk) < ε. Also for all x ∈E – Fk,x ∈ E – Em for m ≥ k and hence | fnm (x) – fnm+1 (x)| < 2–m for all m ≥ k, andthus for all � ≥ m ≥ k,

| fnm (x) – fn� (x)| ≤�–1∑i=m

| fni (x) – fni+1 (x)| < 2–m+1 → 0 as m→ ∞.

Hence { fnm} is uniformly Cauchy on E – Fk where μ(Fk) < ε. Thus { fnm} isCauchy a.u., as required.

(ii) By (i) there is a subsequence { fnk } of { fn} which is Cauchy a.u. andthus converges a.u. to a measurable f on E (Theorem 6.1.2). Given ε > 0,

{x : | fk(x) – f (x)| ≥ ε} ⊂ {x : | fk(x) – fnk (x)| ≥ ε/2}∪ {x : | fnk (x) – f (x)| ≥ ε/2}.

Since { fn} is Cauchy in measure (and nk → ∞ as k → ∞) the measure ofthe first set on the right tends to zero as k → ∞. But the measure of thesecond set also tends to zero, since fnk → f a.u. and hence by Theorem6.2.2, in measure. Thus μ{x : | fk(x) – f (x)| ≥ ε} → 0 as k → ∞, showingthat fn → f in measure. �

Corollary If fn → f in measure on E then there is a subsequence { fnk }such that fnk → f almost uniformly, and hence also a.e.

Proof By Theorem 6.2.1 (i), { fn} is Cauchy in measure on E, and by (i)of Theorem 6.2.3 it has a subsequence { fnk } which is Cauchy a.u. on E,and hence convergent a.u. on E to some function g (Theorem 6.1.2). Thenby Theorem 6.2.2, fnk → g in measure also and hence f = g a.e. on E byTheorem 6.2.1. Thus fnk → f a.u. on E, and hence also fnk → f a.e. on E(Theorem 6.1.1). �

The final theorem of this section gives a necessary and sufficient con-dition (akin to the definition of convergence in measure) for convergencea.e. on a set of finite measure. This result is interesting in applications toprobability.

Theorem 6.2.4 Let { fn}, f be measurable functions defined and a.e.finite-valued on E ∈ S, where μ(E) < ∞. Write, for ε > 0 and

n = 1, 2, . . . , En(ε) = {x : | fn(x) – f (x)| ≥ ε}. Then fn → f a.e. on E ifand only if for every ε > 0,

limn→∞

μ{∪∞m=nEm(ε)} = 0.

Proof fn may fail to converge to f at points x ∈ E for which f (x) hasinfinite values – assumed to be a zero measure set. Aside from these pointsfn(x) �→ f (x) if and only if x ∈ D = ∪∞k=1limnEn(1/k) since x ∈ D if and onlyif for some k, | fn(x) – f (x)| ≥ 1/k for infinitely many n. Since limnEn(1/k) isclearly monotone nondecreasing in k,

μ(D) = limk→∞

μ{limnEn(1/k)} = limk→∞

limn→∞

μ{Fn(1/k)},

where Fn(ε) = ∪∞m=nEm(ε) (μ(E) being finite).If limn→∞ μ{Fn(ε)} = 0 for each ε > 0, it thus follows that μ(D) = 0 and

hence fn → f a.e. on E. Conversely, if fn → f a.e. on E, thenμ(D) = 0. But thismeans limn→∞ μ{Fn(1/k)} = 0 for each k since this quantity is nonnegativeand nondecreasing in k. Given ε > 0 choose k with 1/k < ε. Then

0 ≤ limn→∞

μ{Fn(ε)} ≤ limn→∞

μ{Fn(1/k)} = 0

which yields the desired conclusion limn→∞ μ{Fn(ε)} = 0. �

Note that the corollary to Theorem 6.2.2 also follows simply from thepresent theorem.

The principal relationships between the forms of convergence con-sidered for measurable functions are illustrated diagrammatically inSection 6.5.

6.3 Banach spaces

In this section we introduce the notion of a Banach space, which will bereferred to in the following sections. Although the results of the next sectionmay be developed without it, the framework and language of Banach spaceswill be helpful and useful. The discussion is kept here to the bare minimumnecessary for stating the results of Section 6.4. It is first useful to define ametric space and some related concepts.

A set L is called a metric space if there is a real-valued function d(f , g)defined for f , g ∈ L and called a distance function or metric such that for allf , g, h in L,

(i) d(f , g) ≥ 0 and d(f , g) = 0 if and only if f = g(ii) d(f , g) = d(g, f )

(iii) d(f , g) ≤ d(f , h) + d(h, g).

6.3 Banach spaces 125

Since by definition a metric space consists of a set L together with a metricd, we will denote it by (L, d) (clearly one may be able to define severalmetrics on a set).

The simplest example of a metric space is the real line L = R, withd(f , g) = | f – g|; or the finite-dimensional space L = Rn, with the Euclideanmetric d(f , g) = {∑n

k=1(xk – yk)2}1/2 where f = (x1, . . . , xn), g = (y1, . . . , yn).Once an appropriate measure of distance is introduced one can define

the notion of convergence. A sequence { fn} in a metric space (L, d) will besaid to converge to f ∈ L (fn → f or limn fn = f ), if d(fn, f ) → 0 as n → ∞.A simple property of convergence for later use is the following.

Lemma 6.3.1 Let (L, d) be a metric space and fn, f , g elements of L. Then

(i) The limit of a convergent sequence is unique, i.e. if fn → f and fn → g,then f = g.

(ii) If fn → f , gn → g, then d(fn, gn)→ d(f , g).

Proof (i) Assume that fn → f and fn → g. For each n

0 ≤ d(f , g) ≤ d(f , fn) + d(fn, g)

and since both terms on the right hand side converge to zero as n → ∞, itfollows that d(f , g) = 0 and thus f = g.

(ii) Applying properties (iii) and (ii) of a distance function twice it fol-lows that

d(fn, gn) ≤ d(fn, f ) + d(f , g) + d(gn, g)

d(f , g) ≤ d(f , fn) + d(fn, gn) + d(gn, g)

and thus,

|d(fn, gn) – d(f , g)| ≤ d(fn, f ) + d(gn, g).

Hence fn → f , gn → g implies d(fn, gn)→ d(f , g). �

A sequence { fn} in a metric space (L, d) is called Cauchy if d(fn, fm)→ 0as n, m→ ∞. Note that if fn → f , then it follows from the inequality

d(fn, fm) ≤ d(fn, f ) + d(f , fm)

that { fn} is Cauchy. Thus a sequence in a metric space which converges toan element of the metric space is Cauchy. However, the converse is not al-ways true, i.e. a Cauchy sequence does not necessarily converge in a metric

space. Whenever every Cauchy sequence in a metric space converges to anelement of the metric space, the metric space is called complete. The realline with d(x, y) = |x – y| is of course a complete metric space.

Let (L, d) be a metric space. A subset E of L is said to be dense in L if forevery f ∈ L and every ε > 0 there is g ∈ E with d(f , g) < ε. A metric spaceis called separable if it has a countable dense subset. Again the real linewith d(f , g) = | f – g| is separable, since the set of rational numbers forms acountable dense subset of R.

Another useful concept is that of a linear space. Specifically, set L iscalled a linear space (over the real numbers) if there is

(i) a map, called addition, which assigns to each f and g in L an elementof L denoted by f + g, with the following properties

(1) f + g = g + f , for all f , g ∈ L,(2) f + (g + h) = (f + g) + h, for all f , g, h ∈ L,(3) there is an element of L, denoted by 0, such that f + 0 = 0 + f = f for

all f ∈ L,(4) for each f ∈ L there exists an element of L (denoted by –f ) such that

f + (–f ) = 0. One naturally then writes g – f for g + (–f ).

(ii) a map, called scalar multiplication, which assigns to each real a andf ∈ L an element of L denoted simply by af with the properties that for alla, b ∈ R and f , g ∈ L,

(1) a(f + g) = af + ag(2) (a + b)f = af + bf(3) a(bf ) = (ab)f(4) 0f = 0, 1f = f .

The simplest example of a linear space is the set of real numbers R,or Rn. Also the set of all finite-valued measurable functions defined on ameasurable space (X,S) (or defined a.e. on a measure space (X,S, μ)) isa linear space with addition and scalar multiplication defined in the usualway: (f + g)(x) = f (x) + g(x) and (af )(x) = af (x). Finally L1(X,S, μ) is alsoa linear space.

A linear space L is called a normed linear space, if there is a real-valuedfunction defined on L, called norm and denoted by ‖ · ‖, such that for allf , g ∈ L, and a ∈ R,

(i) ‖ f ‖ ≥ 0 and ‖ f ‖ = 0 if and only if f = 0(ii) ‖af ‖ = |a| ‖ f ‖

(iii) ‖ f + g‖ ≤ ‖ f ‖ + ‖g‖.

6.4 The spaces Lp 127

It is straightforward to verify that the following are all normed linearspaces. Rn is a normed linear space with ‖ f ‖ = {∑n

k=1 x2k}1/2 where f =

(x1, . . . , xn). The set C[0, 1] of all continuous real-valued functions on [0, 1],is a normed linear space with ‖ f ‖ = sup0≤t≤1 | f (t)|. L1(X,S, μ) is a normedlinear space with ‖ f ‖ =

∫| f | dμ, if we put f = g in the space L1 whenever

f = g a.e.A normed linear space clearly becomes a metric space with distance

function

d(f , g) = ‖ f – g‖ .

A complete normed linear space is called a Banach space (the completionis of course meant with respect to the distance induced by the norm asabove). Again the simplest example of a Banach space is the real line R,or Rn. Also C[0, 1] with norm ‖ f ‖ = sup0≤t≤1 | f (t)| can be easily seento be a Banach space. It will be shown in Section 6.4 that L1(X,S, μ) is aBanach space. Of course there are normed linear spaces that are not Banachspaces. As an example, it may be easily seen that ‖ f ‖ = (

∫ 1

0| f (t)|2 dt)1/2

defines a norm on C[0, 1], but this normed linear space is not complete, asthe following Cauchy sequence { fn} shows, where fn(t) = 0 for 0 ≤ t ≤ 1/2,fn(t) = 1 for 1/2 + 1/n ≤ t ≤ 1, and fn(t) = n(t – 1/2) for 1/2 ≤ t ≤ 1/2 + 1/n(in fact its “completion” is the space L2[0, 1] defined in Section 6.4).

6.4 The spaces Lp

In this section the class L1 of functions is generalized in an obvious wayand the properties of the resulting class are studied. (X,S, μ) will be a fixedmeasure space throughout.

For each real p > 0 and measurable f defined a.e., write

‖ f ‖p = (∫| f |p dμ)1/p

(= ∞ if∫| f |p dμ = ∞). The subclass of all such f for which ‖ f ‖p< ∞

is denoted by Lp = Lp(X,S, μ). Equivalently Lp is clearly the class of allmeasurable functions f such that | f |p ∈ L1. It is convenient and useful todefine the class L∞ = L∞(X,S, μ) as the set of all measurable functionsdefined a.e. which are essentially bounded in the sense that | f (x)| ≤ Ma.e. for some finite M. For each f ∈ L∞, ‖ f ‖∞ will denote the essentialsupremum of f , that is the least such M, i.e.

‖ f ‖∞ = ess sup | f | = inf{M > 0 : μ{x : | f (x)| > M} = 0}.

In the following we concentrate on the classes of functions Lp for 0 <

p ≤ ∞. With addition of functions and scalar multiplication defined in theusual way (i.e. (f + g)(x) = f (x) + g(x) at all points x for which the summakes sense, and (af )(x) = af (x) at all points x where f is defined) it issimply shown that each Lp, 0 < p ≤ ∞, is a linear space. Of course forp = 1 this was already established in Theorem 4.4.3.

Theorem 6.4.1 Each Lp, 0 < p ≤ ∞, is a linear space. In particular iff1, . . . , fn are in Lp and a1, . . . , an real numbers then a1f1 + · · · + anfn ∈ Lp.

Proof If f ∈ Lp and a is a real number it is clear that af ∈ Lp. Thatf , g ∈ Lp implies f + g ∈ Lp is again clear when p = ∞, and for 0 < p < ∞we have

| f (x) + g(x)| ≤ | f (x)| + |g(x)|,| f (x) + g(x)|p ≤ 2p max(| f (x)|p, |g(x)|p)

≤ 2p(| f (x)|p + |g(x)|p)

at all points for which f + g is defined, and hence a.e. Since the right handside is in L1, so is | f + g|p (Theorem 4.4.6), showing that f + g ∈ Lp, asrequired. It is now quite clear that all properties of addition and scalar mul-tiplication are satisfied so that each Lp is a linear space. �

Further properties of Lp-spaces are based on the following importantclassical inequalities.

Theorem 6.4.2 (Holder’s Inequality) Let 1 ≤ p, q ≤ ∞ be such that1/p + 1/q = 1 (with q = ∞ when p = 1). If f ∈ Lp and g ∈ Lq then fg ∈ L1

and

‖ fg‖1 ≤ ‖ f ‖p ‖g‖q .

For 1 < p, q < ∞ equality holds if and only if f = 0 a.e. or g = 0 a.e.or | f |p = c|g|q a.e. for some c > 0. If p = q = 2 the last equality of coursebecomes | f | = c|g|, some c > 0.

Proof For p = 1, q = ∞ we have |g(x)| ≤ ‖g‖∞ a.e. and thus

‖ fg‖1 =∫| fg| dμ ≤ ‖g‖∞

∫| f | dμ = ‖ f ‖1 ‖g‖∞ (< ∞),

and similarly for p = ∞, q = 1.Now assume that 1 < p, q < ∞. If 0 < α < 1, then

tα – 1 ≤ α(t – 1)

for all t ≥ 1, with equality only when t = 1. (This is easily seen from theequality at t = 1 and the fact that the derivative of the left side is strictlyless than that of the right side for t > 1.) Putting t = a/b we thus have fora ≥ b > 0,

aαb1–α ≤ αa + (1 – α)b 0 < α < 1. (6.1)

This inequality holds for a ≥ b > 0 and thus for a ≥ b ≥ 0 with equalityonly if a = b (≥ 0). But by symmetry it holds also if b ≥ a ≥ 0, and thusfor all a ≥ 0, b ≥ 0, with equality only when a = b.

If f = 0 a.e. or g = 0 a.e., the conclusions of the theorem are clearlytrue. It may therefore be assumed that neither f nor g is zero a.e.; that is weassume ‖ f ‖pp =

∫| f |p dμ > 0, ‖ g ‖qq =

∫|g|q dμ > 0 (Theorem 4.4.7). Then

by (6.1), writing a = | f (x)|p/ ‖ f ‖pp, b = |g(x)|q/ ‖g‖qq, α = 1/p, 1 – α = 1/q, itfollows that

| f (x)| |g(x)|‖ f ‖p ‖g‖q

≤ | f (x)|p

p ‖ f ‖pp+|g(x)|q

q ‖g‖qq(6.2)

for all x for which f and g are both defined and finite, and hence a.e. Sincethe right hand side is in L1 (| f |p ∈ L1, |g|q ∈ L1), it follows from Theorem4.4.6 that | fg| ∈ L1, and by Theorem 4.4.4, the integral of the left hand sideof (6.2) does not exceed that on the right, i.e.∫

| fg| dμ‖ f ‖p ‖g‖q

≤∫| f |p dμ

p ‖ f ‖pp+

∫|g|q dμ

q ‖g‖qq=

1p

+1q

= 1.

Hence fg ∈ L1 and ‖ fg‖1 =∫| fg| dμ ≤ ‖ f ‖p ‖g‖q. Finally if equality holds,

∫ {| f (x)|p

p ‖ f ‖pp+|g(x)|q

q ‖g‖qq–| f (x)g(x)|‖ f ‖p‖g‖q

}dμ(x) = 0

and since by (6.2) the integrand is nonnegative, it must be zero a.e. byTheorem 4.4.7. But since equality holds in (6.1) only when a = b, we mustthus have | f (x)|p/‖ f ‖pp = |g(x)|q/‖g‖qq a.e. from which the final conclusion ofthe theorem follows. �

In the special case when p = q = 2 Holder’s Inequality is usually calledthe Schwarz Inequality. When 0 < p < 1 and 1/p + 1/q = 1 (hence q < 0) areverse Holder’s Inequality holds for nonnegative functions (see Ex. 6.18).

Theorem 6.4.3 (Minkowski’s Inequality) If 1≤p≤∞ and f , g ∈ Lp thenf + g ∈ Lp and

‖ f + g‖p ≤ ‖ f ‖p + ‖g‖p .

For 1 0. For p = 1 equality holds if and only if fg ≥ 0 a.e.

Proof Theorem 6.4.1 shows that f + g ∈ Lp. Since | f (x) + g(x)| ≤ | f (x)| +|g(x)| for all x where both f and g are defined and finite, and thus a.e., theinequality clearly follows for p = 1 and p = ∞. When p = 1 equality holdsif and only if | f + g| = | f | + |g| a.e., which is equivalent to fg ≥ 0 a.e.

Assume now that 1 1 there exists q > 1 such that 1/p + 1/q = 1. Further (p – 1)q = p,so that | f +g|(p–1)q = | f +g|p ∈ L1 and hence | f +g|p–1 ∈ Lq. Thus by Holder’sInequality,∫| f | | f + g|p–1 dμ ≤ ‖ f ‖p (

∫| f + g|(p–1)q dμ)1/q = ‖ f ‖p ‖ f + g‖p/q

p (6.4)

and similarly for |g| | f + g|p–1. It then follows that

‖ f + g‖pp =∫| f + g|p dμ ≤ (‖ f ‖p + ‖g‖p) ‖ f + g‖p/q

p

and since p – p/q = 1, ‖ f + g‖p ≤ ‖ f ‖p + ‖g‖p as required.Equality holds if and only if equality holds a.e. in (6.3), and in both (6.4)

as stated and with f , g interchanged. That is if and only if fg ≥ 0 and (byTheorem 6.4.2)

f = 0 or f + g = 0 or | f + g|p = c1| f |p, c1 > 0

and

g = 0 or f + g = 0 or | f + g|p = c2|g|p, c2 > 0

where each relationship is meant a.e. This is easily seen to be equivalent tof = 0 a.e. or g = 0 a.e. or f = cg a.e. for some c > 0. �

When 0 < p < 1 a reverse Minkowski Inequality holds for nonnegativefunctions in Lp (see Ex. 6.18). However, the following inequality also holds.

Theorem 6.4.4 If 0 < p < 1 and f , g ∈ Lp then f + g ∈ Lp and

‖ f + g‖pp =∫| f + g|p dμ ≤

∫| f |p dμ +

∫|g|p dμ = ‖ f ‖pp + ‖g‖pp

with equality if and only if fg = 0 a.e.

Proof Since 0 < p < 1 we have (1 + t)p ≤ 1 + tp for all t ≥ 0 with equalityonly when t = 0. (This is easily seen again from the equality at t = 0 and

the fact that the derivative of the left side is strictly less than that of theright side for t > 0.) Putting t = a/b we thus have for a ≥ 0, b > 0,

(a + b)p ≤ ap + bp. (6.5)

This inequality holds for a ≥ 0, b > 0, and thus also for a, b ≥ 0 withequality only when a = 0 or b = 0, i.e. ab = 0.

Now f +g ∈ Lp by Theorem 6.4.1. By (6.5), | f +g|p ≤ (| f |+|g|)p ≤ | f |p+|g|pa.e. and the result follows by integrating both sides (Theorem 4.4.4). Alsothe equality holds if and only if | f + g|p = | f |p + |g|p a.e. i.e. fg = 0 a.e., sincethere is equality in (6.5) only when ab = 0. �

It is next shown that ‖ · ‖p may be used to introduce a metric on eachLp, 0 < p ≤ ∞, provided we do not distinguish between two functions inLp which are equal a.e. That is equality of two elements f , g in Lp (writtenf = g) is taken to mean that f (x) = g(x) a.e. (More precisely Lp could bedefined as the set of all equivalence classes of measurable functions f with| fp|p ∈ L1 under the equivalence relation f ∼ g if f = g a.e.) This metricturns out to be different for 0 < p < 1 and for 1 ≤ p ≤ ∞.

Theorem 6.4.5 (i) For 1 ≤ p ≤ ∞, Lp is a normed linear space withnorm ‖ f ‖p and hence metric dp(f , g) = ‖ f – g‖p.

(ii) For 0 < p < 1, Lp is a metric space with metric dp(f , g) = ‖ f – g‖pp.

Proof (i) Assume 1 ≤ p ≤ ∞ and f , g ∈ Lp. Then ‖ f ‖p ≥ 0 and ‖ f ‖p = 0if and only if f = 0 a.e., and thus f = 0 as an element of Lp. Also for1 ≤ p < ∞,

‖af ‖p = (∫|af |p dμ)1/p = |a| ‖ f ‖p,

and quite clearly ‖ af ‖∞ = |a| ‖ f ‖∞. Finally by Minkowski’s Inequality,‖ f + g ‖p ≤ ‖ f ‖p + ‖ g ‖p. Hence ‖ f ‖p is a norm on Lp, which thus is anormed linear space, proving (i).

(ii) Assume 0 < p < 1. As in (i) it is quite clear that dp(f , g) ≥ 0 withdp(f , g) = 0 if and only if f = g, and that dp(f , g) = dp(g, f ). The last(triangle) property follows from Theorem 6.4.4,

dp(f , g) = ‖ f – g‖pp = ‖ f – h + h – g‖pp≤ ‖ f – h‖pp + ‖h – g‖pp = dp(f , h) + dp(h, g).

Hence Lp is a metric space with distance function dp, for 0 < p < 1. �

Thus each Lp, 0 < p ≤ ∞, is a metric space with distance function

dp(f , g) ={‖ f – g‖pp for 0 < p < 1‖ f – g‖p for 1 ≤ p ≤ ∞.

From now on all properties of each Lp as a metric space will be meant withrespect to this distance function dp. For instance fn → f in Lp will meanthat dp(fn, f ) → 0, or equivalently ‖ fn – f ‖p → 0, and thus for 0 < p <

∞,∫|fn – f |p dμ→ 0 and for p = ∞, ess sup | fn – f | → 0.

The next result shows that convergence in Lp implies convergence inmeasure as well as convergence of the integrals of the pth absolute powers.

Theorem 6.4.6 Let 0 < p ≤ ∞ and fn, f be elements in Lp.

(i) If { fn} is Cauchy in Lp, then it is Cauchy in measure if p < ∞, and forp = ∞ uniformly Cauchy a.e. (hence also Cauchy a.u. and in measure).

(ii) If fn → f in Lp, then fn → f in measure if p < ∞, and for p = ∞uniformly a.e. (hence also a.u. and in measure), and ‖ fn ‖p → ‖ f ‖p.Thus for 0 < p < ∞ ∫

| fn|p dμ →∫| f |p dμ.

Proof (ii) Assume that fn → f in Lp. Since the zero function belongs toLp, Lemma 6.3.1 shows that dp(fn, 0) → dp(f , 0), where dp is defined inthe discussion preceding the theorem. It follows, for all 0 0 write En(ε) = {x :| fn(x) – f (x)| ≥ ε}. Then

| fn – f |p ≥ | fn – f |pχEn(ε) ≥ εpχEn(ε) a.e.

Thus ‖ fn – f ‖pp ≥ εpμ{En(ε)}, showing that μ{En(ε)} → 0 since ‖ fn – f ‖p→ 0.Hence fn → f in measure as required.

For p = ∞, it follows from the facts that | fn(x) – f (x)| ≤ ‖ fn – f ‖∞ a.e. and‖ fn – f ‖∞ → 0 that fn → f uniformly a.e.

(i) is shown similarly. �

The next theorem is the main result of this section showing that eachLp, 0 < p ≤ ∞, is complete as a metric space, i.e. whenever { fn} is aCauchy sequence in Lp, there exists f ∈ Lp such that fn → f in Lp. For1 ≤ p ≤ ∞ this means that Lp is a Banach space. As before we put f = g iff = g a.e.

Theorem 6.4.7 (i) For 1 ≤ p ≤ ∞, Lp is a Banach space with norm‖ f ‖p.

(ii) For 0 < p < 1, Lp is complete metric space with metric dp(f , g) =‖ f – g‖pp.

Proof Since by Theorem 6.4.5 each Lp, 0 < p ≤ ∞, is a metric space withmetric dp (defined as in (i) or (ii)) it suffices to show that it is complete, i.e.that each Cauchy sequence in Lp converges to an element of Lp.

First assume that 0 < p < ∞ and let { fn} be a Cauchy sequence in Lp.By Theorem 6.4.6 (i), { fn} is Cauchy in measure and by Theorem 6.2.3 (ii),there is a measurable f (defined a.e.) such that fn → f in measure. By thecorollary to Theorem 6.2.3, there is a subsequence { fnk } converging to f a.e.Hence for all k,

‖ fnk – f ‖pp =∫| fnk – f |p dμ =

∫(lim

j| fnk – fnj |p) dμ

≤ lim infj

∫| fnk – fnj |p dμ (Fatou’s Lemma)

= lim infj‖ fnk – fnj‖pp

and thus for all p > 0,

dp(fnk , f ) ≤ lim infj

dp(fnk , fnj ).

But since { fn} is Cauchy in Lp, given ε > 0, there exists N = N(ε) suchthat dp(fn, fm) < ε/2 when n, m ≥ N. Thus if nk, nj ≥ N it follows thatdp(fnk , fnj ) < ε/2 and hence lim infj dp(fnk , fnj ) ≤ ε/2, so that dp(fnk , f ) ≤ ε/2for nk ≥ N. In particular this implies that ‖ fnk– f ‖p < ∞ and thus (fnk– f ) ∈ Lp

and also f = (f – fnk ) + fnk ∈ Lp, since Lp is a linear space (Theorem 6.4.1).Furthermore for all k ≥ N (requiring nk to be strictly increasing so thatnk ≥ k ≥ N)

dp(fk, f ) ≤ dp(fk, fnk ) + dp(fnk , f ) < ε

from which it follows that dp(fk, f )→ 0 giving fk → f in Lp.Now let p = ∞ and let { fn} be a Cauchy sequence in L∞. By combining

a countable number of zero measure sets a set E ∈ S with μ(Ec) = 0 can befound such that for all x ∈ E and all n, m

| fn(x) – fm(x)| ≤ ‖ fn – fm‖∞.

Since ‖ fn – fm‖∞ → 0 as n, m→ ∞, { fn} is uniformly Cauchy on E. Hencethere is a function f defined on E such that fn → f uniformly on E. ByTheorem 3.4.7, f is measurable and thus may be extended to a measurablefunction defined on the entire space X by putting f (x) = 0 for x ∈ Ec.Since fn → f uniformly on E, supx∈E | fn(x) – f (x)| → 0. Hence given ε > 0,there exists N = N(ε) such that supx∈E | fn(x) – f (x)| < ε when n ≥ N. Then

| f (x)| ≤ | f (x) – fn(x)| + | fn(x)|, x ∈ E, implies that for n ≥ N,

supx∈E| f (x)| ≤ sup

x∈E| f (x) – fn(x)| + sup

x∈E| fn(x)| < ε + ‖ fn‖∞.

Since μ(Ec) = 0, it follows that f ∈ L∞. Also for n ≥ N we have | fn – f | < εa.e. which implies ‖ fn – f ‖∞ < ε. Hence ‖fn – f ‖∞ → 0 and thus fn → f inL∞. �

The final result of this section shows that the spaces Lp, 0 < p ≤ ∞, areordered by inclusion when the underlying measure space is finite, a resultespecially important in probability theory.

Theorem 6.4.8 If (X,S, μ) is a finite measure space (μ(X) < ∞) and0 < q ≤ p ≤ ∞ then Lp ⊂ Lq and for f ∈ Lp:

‖ f ‖q ≤ ‖ f ‖p{μ(X)}1q – 1

p .

Proof Assume first that p = ∞ and f ∈ L∞. Then | f (x)| ≤ ‖ f ‖∞ a.e. andthus ∫

| f (x)|q dμ(x) ≤ ‖ f ‖q∞ μ(X) < ∞

which implies that f ∈ Lq and ‖ f ‖q ≤ ‖ f ‖∞{μ(X)}1q , as required.

Now assume that 0 < q < p < ∞ and let f ∈ Lp. Put r = p/q ≥ 1. Then∫(| f q|)r dμ =

∫| f |p dμ < ∞ implies that | f |q ∈ Lr. Define r′ by 1/r+1/r′ = 1.

Since μ(X) < ∞, the constant function 1 ∈ Lr′ and by Holder’s Inequality| f |q · 1 ∈ L1. Hence f ∈ Lq. Again by Holder’s Inequality,

‖ f ‖qq =∫| f |q dμ ≤ (

∫(| f |q)r dμ)1/r(

∫1r′ dμ)1/r′

= (∫| f |p dμ)q/p{μ(X)}1– q

p = ‖ f ‖qp {μ(X)}1– qp

and the desired inequality follows by taking qth roots. �

Corollary If (X,S, μ) is a finite measure space and 0 < q < p ≤ ∞,convergence in Lp implies convergence in Lq.

6.5 Modes of convergence – a summary

This chapter has concerned a variety of convergence modes including con-vergence (pointwise) a.e., almost uniform, in measure, and in Lp. The dia-gram below indicates some of the important relationships between theseforms of convergence (which have been shown to hold in this chapter). Thearrows indicate that one form of convergence implies another. The word“finite” indicates that the corresponding implication holds when μ is finite,

Exercises 135

but not in general. The word “subsequence” indicates that one mode ofconvergence for { fn} implies another for some subsequence { fnk }.

Examples showing that no further relationships hold in general are givenin the exercises (Exs. 6.2, 6.7 and 6.11).

Exercises6.1 Consider the unit interval with Lebesgue measure. Let

fn(x) = 1, 0 ≤ x ≤ 1/n

= 0, 1/n < x ≤ 1 and

f (x) = 0, 0 ≤ x ≤ 1.

Does { fn} converge to f

(a) for all x?(b) a.e.?(c) uniformly on [0,1]?(d) uniformly a.e. on [0,1]?(e) almost uniformly?(f) in measure?(g) in Lp?

6.2 Let X = {1, 2, 3, . . .}, S = all subsets of X, and let μ be counting measure onX. Define fn(x) = χ{1,2,...,n}(x). Does fn converge

(a) pointwise?(b) almost uniformly?(c) in measure?

Comment concerning Theorem 6.1.1, and the corollary to Theorem 6.2.2.6.3 Let { fn} be a Cauchy sequence a.e. on (X,S, μ) and E ∈ S with 0 < μ(E) <

∞. Show that there exists a real number C and a measurable set F ⊂ E suchthat μ(F) > 0 and | fn(x)| ≤ C for all x ∈ F, n = 1, 2, . . . . (Show in fact thatgiven any ε > 0, F ⊂ E may be chosen so that μ(E – F) < ε.)

6.4 Let { fn}, {gn} be a.e. finite measurable functions on (X,S, μ). If fn → f inmeasure and gn → g in measure, show that

(i) afn → af in measure, for any real a(ii) fn + gn → f + g in measure, and hence

(iii) afn + bgn → af + bg in measure for any real a, b.

6.5 If fn → f in measure, show that | fn| → | f | in measure.6.6 Let (X,S, μ) be a finite measure space. Let { fn}, f , {gn}, g (n = 1, 2, . . .) be

a.e. finite measurable functions on X.

(i) Show that given any ε > 0 there exists E ∈ S, μ(Ec) < ε and a constantC such that |g(x)| ≤ C for all x ∈ E.

(ii) If fn → 0 in measure, show that f 2n → 0 in measure.

(iii) If fn → f in measure, show that fng→ fg in measure (use (i)).(iv) If fn → f in measure, show that f 2

n → f 2 in measure (apply (ii) to fn – fand use (iii) with g = f ).

(v) If fn → f in measure, gn → g in measure, show that fngn → fg inmeasure (fngn = 1

4 {(fn + gn)2 – (fn – gn)2} a.e.).

6.7 Let (X,S, μ) be the unit interval [0, 1] with the Borel sets and Lebesguemeasure. For n = 1, 2, . . . let

Ein = [(i – 1)/n, i/n] i = 1, . . . , n

with indicator function χin. Show that the sequence {χ1

1,χ12,χ2

2,χ13,χ2

3,χ3

3, . . .} converges in measure to zero but does not converge at any point ofX.

6.8 Let { fn} be a sequence of measurable functions on (X,S, μ), which is Cauchyin measure. Suppose { fnk }, { fmk } are two subsequences converging a.e. to f , grespectively. Show that f = g a.e.

6.9 Let (X,S, μ) be a finite measure space and F a field generating S. If f is anS-measurable function defined and finite a.e., show that given any ε, δ > 0there is a simple F -measurable function g (i.e. g =

∑ni=1 aiχEi

where Ei ∈ F )such that

μ{x : | f (x) – g(x)| > ε} < δ.

Exercises 137

Hence every S-measurable finite a.e. function can be approximated “in mea-sure” by a simple F -measurable function. (Hint: Use Theorem 3.5.2 and itscorollary and Theorem 2.6.2.) The result remains true if f is measurable withrespect to the σ-field obtained by completing the measure μ.

6.10 Let (X,S, μ) be a finite measure space and L the set of all measurable func-tions defined and finite a.e. on X. For any f , g ∈ L define

d(f , g) =∫

X

| f – g|1 + | f – g| dμ.

Show that (L, d) is a metric space (identifying f and g if f = g a.e.). Provethat convergence with respect to d is equivalent to convergence in measure.Is (L, d) complete?

6.11 Give an example of a sequence converging in measure but not in Lp, for anarbitrary but fixed 0 < p ≤ ∞. (Hint: Modify appropriately fn of Ex. 6.1.)

6.12 Let { fn} and f be in Lp, 0 < p < ∞. If fn → f a.e. and ‖ fn‖p → ‖ f ‖p, thenshow that fn → f in Lp. (Hint: Apply Fatou’s Lemma to {2p(| fn|p + | f |p) – | fn –f |p}.) In Chapter 11 (Theorem 11.4.2) it is shown that a.e. convergence maybe replaced by convergence in measure, when the measure space is finite.

6.13 Let p ≥ 1, 1p + 1

q = 1, and fn, f ∈ Lp and gn, g ∈ Lq, n = 1, 2, . . . . If fn → f inLp and gn → g, in Lq show that fngn → fg in L1.

6.14 If 0 1, q > 1, r > 1, 1p + 1

q + 1r = 1 and let f ∈ Lp, g ∈ Lq, h ∈

Lr. Show that fgh ∈ L1 and ‖ fgh‖1 ≤ ‖ f ‖p ‖g‖q ‖h‖r. (Show fg ∈ Ls,i.e. | f |s|g|s ∈ L1 where 1/s = 1 – 1/r.) The Holder Inequality may thus begeneralized to apply to the product of n > 2 functions.

6.16 Let (X,S, μ) be the unit interval (0, 1) with the Borel sets and Lebesguemeasure and let f (x) = x–a, a > 0. Show that f ∈ Lp for all 0 0, limp→∞ a1p = 1 to show that for each ε > 0

(1 – ε)‖ f ‖∞ ≤ lim infp→∞

‖ f ‖p ≤ lim supp→∞

‖ f ‖p ≤ ‖ f ‖∞.)

6.18 Let (X,S, μ) be a measure space and 0 < p < 1.

(i) If f ∈ Lp and g ∈ Lq where 1p + 1

q = 1 (hence q < 0) show that

‖ fg‖1 ≥ ‖ f ‖p‖g‖q

provided∫|g|q dμ > 0. (Notice that fg may not belong to L1.) (Hint: Let

r = 1p > 1, 1

r + 1r′ = 1, φ = | fg|p, ψ = |g|–p, and use Holder’s Inequality

for φ and ψ with r and r′.)(ii) If f , g ∈ Lp and fg ≥ 0 a.e. show that

‖ f + g‖p ≥ ‖ f ‖p + ‖g‖p.

(Hint: Proceed as in the proof of Minkowski’s Inequality and use (i).)(iii) If X contains two disjoint measurable sets each having a finite positive

measure, show that ‖ f ‖p is not a norm by constructing two functionsf , g ∈ Lp such that ‖ f + g‖p > ‖ f ‖p + ‖g‖p. (Hint: If E, F are the twodisjoint sets take f = aχE , g = bχF , and determine a, b using (1 + t)p <

1 + tp for t > 0.)(iv) If the assumption of (iii) is not satisfied determine all elements of Lp

and show that it is a Banach space with norm ‖ f ‖p, but a trivial one. Infact this is true for all 0 < p < ∞. (Hint: If there are no sets of finitepositive measure, show that Lp = {0}, i.e. Lp consists of only the zerofunction. If there is a measurable set E of finite positive measure, showthat Lp consists of all multiples of the indicator function of E.)

6.19 Let 0 < p < ∞ and �p be the set of all real sequences {an}∞n=1 such that∑∞n=1 |an|p < ∞. Let also �∞ be the set of all bounded real sequences {an}∞n=1,

i.e. |an| ≤ M for all n and some 0 < M < ∞.

(i) Show that �p = Lp(X,S, μ), 0 < p ≤ ∞, where X is the set of positiveintegers, S the class of all subsets of X, and μ is counting measure onS.

(ii) Show that �p, 1 ≤ p ≤ ∞, is a Banach space, and write down its norm;show that �p, 0 < p < 1, is a complete metric space, and write downits distance function; show that if 1 < p < ∞, 1

p + 1q = 1, and {an}∞n=1 ∈

�p, {bn}∞n=1 ∈ �q, then {anbn}∞n=1 ∈ �1 and

|∞∑

n=1

anbn| ≤∞∑

n=1

|anbn| ≤ (∞∑

n=1

|an|p)1p (∞∑

m=1

|bm|q)1q ;

and that if 1 ≤ p < ∞ and {an}∞n=1, {bn}∞n=1 ∈ �p then

(∞∑

n=1

|an + bn|p)1p ≤ (

∞∑n=1

|an|p)1p + (

∞∑n=1

|bn|p)1p .

(iii) If 0 < p < q < ∞ show that �p ⊂ �q ⊂ �∞.

6.20 Let (X,S, μ) be a measure space and S the class of all simple functions φ onX such that μ{x ∈ X : φ(x) � 0} < +∞. If 0 < p < +∞ then prove that S isdense in Lp.

Exercises 139

6.21 Let (X,S, μ) be the real line with the Borel sets and Lebesgue measure. Thenshow that for 0 < p < +∞:

(i) Lp = Lp(X,S, μ) is separable,(ii) the set of all continuous functions that vanish outside a bounded closed

interval is dense in Lp.(Hints:

(i) Use Ex. 6.20 and the approximation of every measurable set of finiteLebesgue measure by a finite union of intervals, and of an interval by aninterval with rational end points (the class of all intervals with rationalend points is countable).

(ii) Use Ex. 6.20, part (c) of Ex. 3.12, and a natural approximation of a stepfunction by a continuous function.)

6.22 Let (X,S, μ) be the real line with the Borel sets and Lebesgue measure. If fis a function on X and t ∈ X define the translate ft of f by t as the functiongiven by ft(x) = f (x – t). Let 1 ≤ p < ∞ and f ∈ Lp.

(i) Show that for all t ∈ X, ft ∈ Lp and ‖ ft‖p = ‖ f ‖p.(ii) Show that if t → s in X, then ft → fs uniformly in Lp, i.e. given any

ε > 0 there exists δ > 0 such that ‖ ft – fs‖p < ε whenever |t – s| < δ. Inparticular ft → f in Lp and

limt→0

∫ ∞

–∞| f (x – t) – f (x)|p dx = 0.

(Hint: Prove this first for a continuous function which vanishes outside abounded closed interval and then use Ex. 6.21 (ii).)

6.23 Let (X,S, μ) be the unit interval [0, 1] with the Borel sets and Lebesguemeasure, let g ∈ Lp, 1 ≤ p ≤ +∞, and define f on [0, 1] by

f (x) =∫ x

0g(u) du for all x ∈ [0, 1].

(i) Show that f is uniformly continuous on [0, 1].(ii) Show that for 1 < p < +∞

supN∑

n=1

| f (yn) – f (xn)|p

(yn – xn)p–1 ≤ ‖g‖pp < ∞

where the supremum is taken over all positive integers N and all nonover-lapping intervals {(xn, yn)}Nn=1 in [0, 1].

6.24 Let (X,S) be a measurable space and μ1, μ2 two probability measures on S.If λ is a measure on S such that μ1 � λ and μ2 � λ (for example μ1 + μ2 issuch a measure) and if fi is the Radon–Nikodym derivative of μi with respectto λ, i = 1, 2, define

hλ(μ1, μ2) =∫

(f1f2)1/2 dλ.


(i) Prove that h does not depend on the measure λ used in its definition, andthus we write h(μ1, μ2) for hλ(μ1, μ2). (Hint: If λ′ is another measureon S such that μ1 � λ′ and μ2 � λ′, put ν = λ + λ′ and show thathλ(μ1, μ2) = hν(μ1, μ2) = hλ′ (μ1, μ2).)

(ii) Show that

0 ≤ h(μ1, μ2) ≤ 1

and that in particular h(μ1, μ2) = 0 if and only if μ1 ⊥ μ2 and thath(μ1, μ2) = 1 if and only if μ1 = μ2.

(iii) Here take X to be the real line, S the Borel sets and μ the measure onS which is absolutely continuous with respect to Lebesgue measure on

S with Radon–Nikodym derivative 1√2π

e– x22 . For every a ∈ X let Ta

be the transformation from (X,S, μ) to (X,S) defined by Ta(x) = x – afor all x ∈ X, and let μa = μT–1

a . Find h(μ, μa) as a function of a, anduse this expression to conclude that for mutually absolutely continuousprobability measures μ1 and μ2 (μ1 ∼ μ2), h(μ1, μ2) can take any valuein the interval (0, 1].

7

Product spaces

7.1 Measurability in Cartesian products

Up to this point, our attention has focussed on just one fixed space X. Con-sider now two (later more than two) such spaces X, Y , and their Cartesianproduct X × Y , defined to be the set of all ordered pairs (x, y) with x ∈ X,y ∈ Y . The most familiar example is, of course, the Euclidean plane whereX and Y are both (copies of) the real line R.

Our main interest will be in defining a natural measure-theoretic struc-ture in X × Y (i.e. a σ-field and a measure) in the case where both X and Yare measure spaces. However, for slightly more generality it is useful to firstconsider σ-rings S,T in X, Y , respectively and define a natural “product”σ-ring in X × Y .

First, a rectangle in X × Y (with sides A ⊂ X, B ⊂ Y) is defined to be aset of the form A × B = {(x, y) : x ∈ A, y ∈ B}. Rectangles may be regardedas the simplest subsets of X × Y and have the following property.

Lemma 7.1.1 If S,T are semirings in X, Y respectively, then the class Pof all rectangles A × B such that A ∈ S, B ∈ T , is a semiring in X × Y.

Proof P is clearly nonempty. If Ei ∈ P, i = 1, 2, then Ei = Ai × Bi whereAi ∈ S, Bi ∈ T . It is easy to verify that

E1 ∩ E2 = (A1 ∩ A2) × (B1 ∩ B2)

and hence E1 ∩ E2 ∈ P since A1 ∩ A2 ∈ S, B1 ∩ B2 ∈ T .It is also easily checked (draw a picture!) that

E1 – E2 = [(A1 ∩ A2) × (B1 – B2)] ∪ [(A1 – A2) × B1].

The two sets forming the union on the right are clearly finite disjoint unionsof sets of P, and are disjoint since (A1 – A2) is disjoint from A1 ∩ A2. ThusE1 – E2 is expressed as a finite disjoint union of sets of P. Hence P is asemiring. �

141

142 Product spaces

If S,T are σ-rings, the σ-ring in X × Y generated by this semiring P iscalled the product σ-ring of S and T , and is denoted by S × T . It is clearthat if S and T are both σ-fields, so is S × T which is also then calledthe product σ-field of S and T . Thus if (X,S) and (Y ,T ) are measurablespaces then so is (X × Y ,S × T ). The sets of P may be called measurablerectangles (cf. Ex. 7.1).

An important notion is that of sections of sets in the product space. IfE ⊂ X × Y is a subset of X × Y , then for each x ∈ X, and y ∈ Y , the setsEx ⊂ Y and Ey ⊂ X defined by

Ex = {y : (x, y) ∈ E} and Ey = {x : (x, y) ∈ E}

are called the x-section of E and the y-section of E, respectively. Note thatif A ⊂ X and B ⊂ Y , (A × B)x = B or ∅ according as x ∈ A or x ∈ Ac, and(A × B)y = A or ∅ according as y ∈ B or y ∈ Bc.

It is convenient to introduce (for each fixed x ∈ X) the transformationTx from Y into X × Y defined by Txy = (x, y), and for each fixed y ∈ Ythe transformation Ty from X into X × Y defined by Tyx = (x, y). Then ifE ⊂ X × Y its sections are simply given by Ex = T–1

x E and Ey = (Ty)–1E.

Lemma 7.1.2 If E, F are subsets of X × Y and x ∈ X, then (E – F)x =Ex – Fx. If Ei are subsets of X × Y for i = 1, 2, . . . , and x ∈ X, then(∪∞1 Ei)x = ∪∞1 (Ei)x, (∩∞1 Ei)x = ∩∞1 (Ei)x. Corresponding conclusions holdfor y-sections.

Proof These are easily shown directly, or follow immediately using thetransformation Tx by, e.g. (using Lemma 3.2.1)

(E – F)x = T–1x (E – F) = T–1

x E – T–1x F = Ex – Fx. �

It also follows easily in the next result that Tx, Ty are measurable, andthat sections of measurable sets are measurable:

Theorem 7.1.3 If (X,S), (Y ,T ) are measurable spaces then the transfor-mations Tx and Ty are measurable transformations from (Y ,T ) and (X,S)respectively into (X × Y, S × T ). Thus Ex ∈ T and Ey ∈ S for everyE ∈ S × T , x ∈ X, y ∈ Y.

Proof For each x ∈ X, A ∈ S, B ∈ T , T–1x (A × B) = (A × B)x = B or

∅ ∈T , and it follows that T–1x E ∈ T for each E in the semiring P of rectan-

gles A×B with A ∈ S, B ∈ T . Since S(P) = S×T the measurability of Tx

follows from Theorem 3.3.2. Measurability of Ty follows similarly. �

7.2 Mixtures of measures 143

It also follows that measurable functions on the product space have mea-surable “sections”, just as measurable sets on the product space do. Letf (x, y) be a function defined on a subset E of X × Y . For each x ∈ X, thex-section of f is the function fx defined on Ex ⊂ Y by fx(y) = f (Txy) =f (x, y), y ∈ Ex; i.e. fx is the function on a subset of Y resulting by holdingx fixed in f (x, y). Similarly for each y ∈ Y , the y-section of f is the functionf y defined on Ey ⊂ X by f y(x) = f (Tyx) = f (x, y), x ∈ Ey.

Theorem 7.1.4 Let (X,S) and (Y ,T ) be measurable spaces and let f bean S × T -measurable function defined on a subset of X × Y. Then everyx-section fx is T -measurable and every y-section f y is S-measurable.

Proof For each x ∈ X, fx is the composition f Tx of the measurable func-tion f and measurable transformation Tx (Theorem 7.1.3). Hence each fx isT -measurable and similarly each f y is S-measurable. �

7.2 Mixtures of measures

In this section it will be shown that under appropriate conditions, a familyof measures may be simply “mixed” to form a new measure. This willnot only give an immediate definition of an appropriate “product measure”(as will be seen in the next section) but is important for a variety of e.g.probabilistic applications.

It is easily seen (cf. Ex. 5.2) that if λi is a measure on a measurable space(X,S) for each i = 1, 2, . . . , then λ defined for E ∈ S by λ(E) =

∑∞1 λi(E)

is also a measure on S. λ may be regarded as a simple kind of mixture ofthe measures λi. More general mixtures may be defined as shown in thefollowing result.

Theorem 7.2.1 Let (X,S, μ) be a measure space, and (W,W) a measur-able space. Suppose that for every x ∈ X, λx is a measure onW, such thatfor every fixed E ∈ W, λx(E) is S-measurable in x, and for E ∈ W, define

λ(E) =∫

Xλx(E) dμ(x).

Then λ is a measure on W. Further λ(E) = 0 if and only if λx(E) = 0a.e. (μ).

Proof If Ei are disjoint sets inW and E = ∪∞1 Ei,

λ(E) =∫

Xλx(∪∞1 Ei) dμ(x) =

∫X

∑∞1 λx(Ei) dμ(x)

=∑∞

1

∫Xλx(Ei) dμ(x) =

∑∞1 λ(Ei)

144 Product spaces

using the corollary to Theorem 4.5.2. Thus λ is countably additive andhence a measure, since λ(∅) = 0. The final statement follows at once fromTheorem 4.4.7. �

For obvious reasons λ will be termed a mixture of the measures λx, withrespect to the measure μ. Note that in the example λ =

∑∞1 λi given prior to

the theorem, μ is simply counting measure on X = {1, 2, 3, . . .}.The next task is to show that integration with respect to λ may be done

in two stages, as a “repeated” integral, first with respect to λx and then withrespect to μ; i.e. that

∫W

f dλ =∫

X{∫

Wf dλx} dμ(x), for any suitable f on W.

For clarity this is split into two parts, first showing the result when f isnonnegative and defined at all points of W.

Lemma 7.2.2 Let f be a nonnegative W-measurable function definedat all points of W and let λ be as in Theorem 7.2.1. Then

∫W

f dλx is anonnegative, S-measurable function of x and

∫X{∫

Wf dλx} dμ(x) =

∫W

f dλ.

Proof If f is a nonnegative simple function, f (w) =∑n

1aiχEi (w), say (Ei

disjoint sets inW) then∫

Wf dλx =

∑n1aiλx(Ei)

which is nonnegative and S-measurable since λx(Ei) is measurable for eachEi. Further

∫X{∫

Wf dλx} dμ(x) =

∑n1ai

∫Wλx(Ei) dμ(x) =

∑n1aiλ(Ei) =

∫W

f dλ.

Thus the result holds for nonnegative simple functions. If f is a nonnegativemeasurable function defined on all of W, write f = limn→∞ fn where {fn}is an increasing sequence of nonnegative simple functions. By monotoneconvergence (or simply definition)

∫W

f dλx = limn→∞

∫W

fn dλx

so that∫

Wf dλx is a limit of nonnegative measurable functions and hence

is nonnegative and measurable. Also∫

X{∫

Wf dλx} dμ(x) =

∫X{ limn→∞

∫W

fn dλx} dμ(x)

= limn→∞

∫X{∫

Wfn dλx} dμ(x)

7.2 Mixtures of measures 145

by monotone convergence, since∫

Wfn dλx is nonnegative and nondecreas-

ing in n. But the final expression above is (since fn is simple)

limn→∞

∫W

fn dλ =∫

Wf dλ

again using monotone convergence, so that the result follows. �

This result will now be generalized as the main theorem of the section.

Theorem 7.2.3 Let (X,S, μ) be a measure space, (W,W) a measurablespace and λx a measure on W for each x ∈ X, such that λx(E) is S-measurable as a function of x for each E ∈ W. Let λ be the mixture ofthe λx as defined above, and f be aW-measurable function defined a.e. (λ)on W. Then

(i) If f is nonnegative a.e. (λ) on W, then∫

Wf dλx is a nonnegative S-

measurable function defined a.e. (μ) on X, and∫W

f dλ =∫

X{∫

Wf dλx} dμ(x). (7.1)

(ii) If∫

W|f | dλ < ∞ (i.e. f ∈ L1(W,W, λ)) or if

∫X{∫

W|f | dλx} dμ(x) < ∞

then f ∈ L1(W,W, λx) for a.e. x (μ),∫

Wf dλx ∈ L1(X,S, μ) and (7.1)

holds.

Proof (i) Let E (∈ W) be the set where f is defined and nonnegative, andwrite f *(w) = f (w) for w ∈ E, f *(w) = 0 otherwise. Thus f * = f a.e. (λ)and f * is defined everywhere. Now since f is defined a.e. (λ), λ(Ec) = 0 andhence λx(Ec) = 0 a.e. (μ) by Theorem 7.2.1. That is if A = {x : λx(Ec) = 0}we have A ∈ S (since λx(Ec) is S-measurable), and μ(Ac) = 0.

Now f = f * on E and if x ∈ A, λx(Ec) = 0 so that f = f * a.e. (λx) and∫f dλx =

∫f * dλx, which is S-measurable by Lemma 7.2.2. Thus

∫f dλx,

defined precisely on A ∈ S is S-measurable (Lemma 3.4.1) and defineda.e. since μ(Ac) = 0.

Finally∫

Wf dλx =

∫W

f * dλx for x ∈ A and hence a.e. (μ) since μ(Ac) = 0,so that∫

X{∫

Wf dλx} dμ(x) =

∫X(∫

Wf * dλx) dμ(x) =

∫W

f * dλ =∫

Wf dλ

since f * = f a.e. (λ), as required.(ii) Note first that by (i) with |f | for f we have∫

W|f | dλ =

∫X{∫

W|f | dλx} dμ(x)

so that finiteness of one side implies that of the other, and the two finitenessconditions in the statement of (ii) are equivalent. For brevity write L1(λ) for

146 Product spaces

L1(W,W, λ), L1(λx) for L1(W,W, λx), and L1(μ) for L1(X,S, μ). Then as-suming f ∈ L1(λ) we have f+ ∈ L1(λ), f– ∈ L1(λ) (Theorem 4.4.5). Now∫

Wf+ dλx is S-measurable by (i) and∫

X{∫

Wf+ dλx} dμ(x) =

∫W

f+ dλ < ∞. (7.2)

Hence∫

Wf+ dλx < ∞ a.e. (μ) so that f+ ∈ L1(λx) a.e. (μ). The same is true

with f– instead of f+ and hence f = f+ – f– ∈ L1(λx) a.e. (μ) which proves thefirst statement of (ii). Further∫

Wf dλx =

∫W

f+ dλx –∫

Wf– dλx a.e. (μ)

and since by (7.2)∫

Wf+ dλx ∈ L1(μ) (and correspondingly

∫W

f– dλx ∈L1(μ)) we have

∫W

f dλx ∈ L1(μ) (which is the second statement of (ii))and ∫

X{∫

Wf dλx} dμ(x) =

∫X{∫

Wf+ dλx} dμ(x) –

∫X{∫

Wf– dλx} dμ(x)

=∫

Wf+ dλ –

∫W

f– dλ

(again using (7.2) and its counterpart for f–). But this latter expression isjust

∫W

f dλ so that the final statement of (ii) follows. �

7.3 Measure and integration on product spaces

If (X,S), (Y ,T ) are measurable spaces, the product measurable space issimply (X×Y , S×T ) where S×T is defined as in Section 7.1. This productspace will be identified with the space (W,W) of the previous section,and a mixed measure thus defined on S × T from “component measures”μ on S and νx defined on T for each x ∈ X. These will be assumed tobe uniformly σ-finite for x ∈ X, in the sense that there are sets Bn ∈ T ,∪nBn = Y such that νx(Bn) < ∞ for all x ∈ X. Clearly the sets Bn can (andwill) be taken to be disjoint. The results thus obtained have important usese.g. in probability theory. In the next section the measures νx will be takento be independent of x, leading to traditional “product measures”.

Theorem 7.3.1 Let (X,S, μ) be a measure space, (Y ,T ) a measurablespace, and let νx be a measure on T for each x ∈ X. Suppose that νx(B) isS-measurable in x for each fixed B ∈ T and that {νx : x ∈ X} is a uniformlyσ-finite family. Then

(i) νx(Ex) is S-measurable for each E ∈ S×T , and λ defined on S×T by

λ(E) =∫

Xνx(Ex) dμ(x) for E ∈ S × T ,

7.3 Measure and integration on product spaces 147

is a measure on S × T satisfying

λ(A × B) =∫

Aνx(B) dμ(x) for A ∈ S, B ∈ T .

(ii) λ is the unique measure on S × T with this latter property if also∫Anνx(Bm) dμ(x) < ∞, m, n = 1, 2, . . . for some sequence of sets An ∈ S

with ∪∞1 An = X.

Proof (i) Write W = X × Y , W = S × T and for each x ∈ X, E ∈ W,define λx(E) = νx(Ex) (= νxT–1

x E where Tx again denotes the measurabletransformation Txy = (x, y)). It is clear that λx is a measure onW. That λmay be defined as in (i) and is a measure will follow at once from Theorem7.2.1 provided we show that νx(Ex) is S-measurable for each E ∈ W =S × T .

To see this let C be a set in T such that νx(C) < ∞ for all x ∈ X. Write

D = {E ∈ S × T : νx(Ex ∩ C) is S-measurable}.

Since for E, F ∈ D, with E ⊃ F, νx{(E – F)x ∩C} = νx(Ex ∩C) – νx(Fx ∩C)(νx(Fx∩C) ≤ νx(C) < ∞) and νx{(∪∞1 Ei)x∩C} =

∑∞1 νx(Ei,x∩C) for disjoint

sets Ei ∈ D, it is clear that D is a D-class. If E is a measurable rectangle(E = A × B, A ∈ S, B ∈ T ), then νx(Ex ∩ C) = νx(B ∩ C)χA (x) which ismeasurable since νx(B ∩ C) is measurable by assumption, and A ∈ S, sothat νx(Ex ∩D) is S-measurable for measurable rectangles E. SinceD thuscontains the semiring of measurable rectangles, it contains the generatedσ-ring S × T .

Hence νx(Ex ∩ C) is S-measurable for any E ∈ S × T . Replacing C byBm where Bm are as in the theorem statement we have for E ∈ S × T ,

νx(Ex) =∑∞

m=1νx(Ex ∩ Bm)

which is a countable sum of S-measurable functions and hence is measur-able as required. The final statement of (i) follows simply since, as notedabove, νx(A × B)x = νx(B)χA(x) for A ∈ S, B ∈ T .

(ii) will follow immediately from the uniqueness part of Theorem 2.5.4provided λ is σ-finite on the semiring P of measurable rectangles A × B,A ∈ S, B ∈ T . But under the assumptions of (ii)

X × Y =⋃∞

n=1⋃∞

m=1(An × Bm)

where λ(An × Bm) =∫

Anνx(Bm) dμ(x) < ∞. The double union may be

written as a single union, to show that λ has the required σ-finitenessproperty. �

148 Product spaces

Notice that if μ and each νx are probability measures, and if for eachfixed B ∈ T , νx(B) is S-measurable in x, then Theorem 7.3.1 is applicableand λ is also a probability measure.

Theorem 7.2.3 may now be applied to give the following result for inte-gration with respect to the measure λ on S × T .

Theorem 7.3.2 With the notation and conditions of Theorem 7.3.1 for theexistence of the measure λ on S × T given by λ(E) =

∫νx(Ex) dμ(x), let f

be a measurable function defined a.e. (λ) on S × T (with x-section fx asusual).

(i) If f ≥ 0 a.e. (λ) then∫

fx dνx is defined a.e. (μ) on X, S-measurable and∫

X×Yf dλ =

∫X{∫

Yfx dνx} dμ(x).

(ii) If∫|f | dλ < ∞, i.e. f ∈ L1(X ×Y, S×T , λ), or if

∫X{∫

Y|fx| dνx} dμ(x) <

∞, then∫

Yfx dνx ∈ L1(X,S, μ) and

∫X×Y

f dλ =∫

X{∫

Yfx dνx} dμ(x).

Proof As in Theorem 7.3.1 define the measure λx on S × T by

λx(E) = νx(Ex) = νxT–1x (E), where Txy = (x, y).

Then if e.g. f ≥ 0 a.e. (λ) we have∫

X×Yf dλx =

∫X×Y

f dνxT–1x =

∫Y(f Tx) dνx =

∫Yfx dνx

by the transformation theorem (Theorem 4.6.1). Hence (i) follows at oncefrom Theorem 7.2.3 by identifying (W,W) with (X × Y ,S × T ) (notingthat λ(E) =

∫νx(Ex) dμ(x) =

∫λx(E) dμ(x)) and hence

∫f dλ =∫

X{∫

X×Yf dλx} dμ =

∫X{∫

Yfx dνx} dμ.

(ii) follows in almost precisely the same way. �

It is sometimes convenient to refer to∫

X×Yf dλ as a double integral

(emphasizing the fact that the integration is over a product space X × Y ,even though only one integration is involved). Correspondingly we may call∫

X{∫

Yfx dνx} dμ(x) a repeated or iterated integral. Theorem 7.3.2 thus gives

conditions under which a double integral may be evaluated as arepeated integral.

The case of most immediate concern, that when νx is independent of x,will be considered in the next section.

7.4 Product measures and Fubini’s Theorem 149

7.4 Product measures and Fubini’s Theorem

As noted, this section specializes the results of the previous one to the casewhere νx = ν, independent of x. Then the measure λ is a true “productmeasure” in that the measure λ of a rectangle A × B is (as will be seen) theproduct μ(A)ν(B) of the measures of its sides.

Theorem 7.4.1 Let (X,S, μ) be a measure space and (Y ,T , ν) a σ-finitemeasure space. Then

(i) λ defined for E ∈ S × T by λ(E) =∫

Xν(Ex) dμ(x), is a measure on

S × T satisfying λ(A × B) = μ(A) · ν(B) when A ∈ S, B ∈ T .(ii) If further μ is σ-finite, then also λ(E) =

∫Yμ(Ey) dν(y) for E ∈ S × T .

Then λ is σ-finite and is the unique measure on S×T satisfying λ(A×B) = μ(A) · ν(B) for A ∈ S, B ∈ T .

Proof (i) follows immediately from Theorem 7.3.1 by noting that the con-stant ν(B) is S-measurable for each B ∈ T , and ν is σ-finite, uniformity notbeing an issue.

The first statement of (ii) follows by interchanging the roles of X and Y ,and the remainder follows simply from Theorem 7.3.1. �

If (X,S, μ), (Y ,T , ν) are σ-finite measure spaces the measure λ definedas above on S×T has (as noted) the property that λ(A× B) = μ(A)ν(B) forA ∈ S, B ∈ T . For this reason it is referred to as the product measure andis written as μ× ν. (X × Y ,S×T , μ× ν) is then called the product measurespace, and by Theorem 7.4.1 the product measure of a set E ∈ S × T isexpressed in terms of the measures of its sections by

(μ × ν)(E) =∫

Xν(Ex) dμ(x) =

∫Yμ(Ey) dν(y).

This is a general version of the customary way of calculating areas in calcu-lus and as an immediate corollary gives a useful criterion for a set E ∈ S×Tto have zero product measure.

Corollary Let (X,S, μ), (Y ,T , ν) be σ-finite measure spaces. Then forany fixed E ∈ S × T , (μ × ν)(E) = 0 if and only if ν(Ex) = 0 a.e. (μ), orequivalently if and only if μ(Ey) = 0 a.e. (ν).

The above corollary is sometimes referred to as (a part of) Fubini’sTheorem. However, the main part of Fubini’s Theorem is the followingcounterpart of Theorem 7.3.2 when νx is independent of x.

150 Product spaces

Theorem 7.4.2 (Fubini’s Theorem) Let (X,S, μ), (Y ,T , ν) beσ-finite mea-sure spaces and let f be an S × T -measurable function defined a.e. (λ =μ × ν) on S × T .

(i) If f ≥ 0 a.e. (λ), then∫

Yfx dν and

∫Xf y dμ are respectively S- and

T -measurable (defined a.e. (μ), (ν) respectively) and∫X×Y

f dλ =∫

X{∫

Yfx dν} dμ(x) =

∫Y{∫

Xf y dμ} dν(y). (7.3)

(ii) The three conditions∫X×Y|f | dλ < ∞,

∫X{∫

Y|fx| dν} dμ(x) < ∞,

∫Y{∫

X|f y| dμ} dν(y) < ∞,

are equivalent and each guarantees that fx ∈ L1(Y ,T , ν) a.e. (μ), f y ∈L1(X,S, μ) a.e. (ν),

∫Yfx dν ∈ L1(X,S, μ),

∫Xf y dμ ∈ L1(Y ,T , ν) and

that (7.3) holds.

Proof This follows at once from Theorem 7.3.2 – in part directly, and inpart by interchanging the roles of X and Y in an obvious way. �

It is convenient to write∫ ∫

f dν dμ and∫ ∫

f dμ dν respectively for therepeated integrals

∫X{∫

Yfx dν} dμ(x),

∫Y{∫

Xf y dμ} dν(y). The main use of

Theorem 7.4.2 is to invert the order of such repeated integrals e.g. of∫ ∫f dν dμ to obtain

∫ ∫f dμ dν. By the theorem, this may be done whenever

the (S×T -measurable) function f is nonnegative, or, if f can take both posi-tive and negative values, whenever one of

∫ ∫|f | dν dμ,∫ ∫

|f | dμ dν can be shown to be finite.It should also be noted that commonly one wishes to invert the order of

integration of∫

X{∫

Exfx dν} dμ(x) where E ∈ S×T . Replacing f by fχE one

sees that this integral is simply∫

Ef d(μ × ν) or

∫Y{∫

Ey f y dμ} dν(y) underthe appropriate conditions from Theorem 7.4.2.

The product measure space (X × Y ,S × T , μ × ν) is not generally com-plete even if both spaces (X,S, μ) and (Y ,T , ν) are complete (cf. Ex. 7.5).Sometimes one wishes to use Fubini’s Theorem on the completed space(X × Y , S × T , μ × ν), where S × T is the completion of S × T with re-spect to μ× ν, and μ × ν is the extension of μ× ν from S×T to S × T (seeSection 2.6). The results of Theorem 7.4.2 hold for the completed productspace as we show now, the only difference being that almost all, rather thanall, sections of f are measurable in this case.

Theorem 7.4.3 Let (X,S, μ) and (Y ,T , ν) be two complete σ-finitemeasure spaces and let f be defined a.e. (μ × ν) on X × Y, andS × T -measurable.

7.4 Product measures and Fubini’s Theorem 151

(i) If f is nonnegative a.e. (μ × ν), then fx is T -measurable for a.e. x (μ),f y is S-measurable for a.e. y (ν), the functions

∫fx dν and

∫f y dμ are

defined for a.e. x, y, are S- and T -measurable respectively, and∫

f d(μ × ν) =∫ ∫

f dμ dν =∫ ∫

f dν dμ. (7.4)

(ii) If f ∈ L1(X × Y, S × T , μ × ν) then fx ∈ L1(Y ,T , ν) for a.e. x (μ), f y ∈L1(X,S, μ) for a.e. y (ν),

∫fx dν ∈ L1(X,S, μ),

∫f y dμ ∈ L1(Y ,T , ν),

and (7.4) holds.

Proof (i) Since f is S × T -measurable, there is an S × T -measurablefunction g defined on (all of) X × Y such that f = g a.e. (μ × ν) (Ex. 3.9)and it may be assumed that g ≥ 0 on X ×Y since f ≥ 0 a.e. (μ × ν). We willshow that for a.e. x (μ) we have fx = gx a.e. (ν). Let

E = {(x, y) : f (x, y) = g(x, y)}.

Then E ∈ S × T and (μ × ν)(Ec) = 0, and by the corollary to Theorem7.4.1 ν(Ec

x) = 0 for a.e. x (μ). But Ex = {y : fx(y) = gx(y)} and thus for a.e.x (μ) we have fx = gx a.e. (ν). Since each gx is T -measurable (by Theorem7.1.4) and (Y ,T , ν) is complete, it follows from Theorem 3.6.1 that fx isT -measurable for a.e. x (μ). Hence

∫fx dν =

∫gx dν for a.e. x (μ)

and since (X,S, μ) is also complete, again by Theorem 3.6.1,∫

fx dν is S-measurable. Finally

∫ ∫f dν dμ =

∫{∫

fx(y) dν(y)} dμ(x)

=∫{∫

gx(y) dν(y)} dμ(x)

=∫

g d(μ × ν) (Theorem 7.4.2 (i))

=∫

g d(μ × ν) (Ex. 4.10)

=∫

f d(μ × ν)

the last equality holding since f = g a.e. (μ × ν) and thus also a.e. (μ × ν).It is shown similarly that f y is S-measurable for a.e. y (ν), that

∫f y dμ is T -

measurable and that∫ ∫

f dμ dν =∫

f d(μ × ν), completing the proof of (i).(ii) is shown as (i): the details should be furnished by the reader as an

exercise. �

152 Product spaces

7.5 Signed measures on product spaces

It is of interest to note that products of signed (or even complex) measuresmay also be quite simply defined. In this section we briefly consider themost useful case of finite signed measures.

Theorem 7.5.1 Let (X,S) and (Y ,T ) be measurable spaces and μ andν finite signed measures on S and T respectively. There is a unique finitesigned measure μ × ν on S × T such that for all A ∈ S and B ∈ T ,

(μ × ν)(A × B) = μ(A)ν(B).

Moreover (μ × ν)+ = μ+ × ν+ + μ– × ν– and (μ × ν)– = μ+ × ν– + μ– × ν+, andthus |μ × ν| = |μ| × |ν| and for all E ∈ S × T ,

(μ × ν)(E) =∫

Xν(Ex) dμ(x) =

∫Yμ(Ey) dν(y).

Proof Let μ = μ+ – μ– and ν = ν+ – ν– be the Jordan decompositions of μand ν and define μ × ν by

μ × ν = [(μ+ × ν+) + (μ– × ν–)] – [(μ+ × ν–) + (μ– × ν+)].

Since μ+, μ–, ν+, ν– are measures, it follows immediately from Theorem7.4.1 that (μ × ν)(A × B) = μ(A)ν(B) and (μ × ν)(E) =

∫Xν(Ex) dμ(x) =∫

Yμ(Ey) dν(y).Now let X = A∪B, with A positive and B negative, be a Hahn decompo-

sition of (X,S, μ) and Y = C ∪ D, with C positive and D negative, a Hahndecomposition of (Y ,T , ν). Notice that if E×F ∈ S×T , E×F ⊂ A×C, then(μ × ν)(E × F) ≥ 0. Hence (μ × ν)(G) ≥ 0 for all finite disjoint unions G ofsuch measurable rectangles. But given ε > 0 it is readily shown from The-orem 2.6.2 that a measurable set G ⊂ A×C may be approximated by such aunion H of measurable rectangles in the sense that |μ× ν|(GΔH) < ε. Since(μ × ν)(H) ≥ 0 it follows that (μ × ν)(G) ≥ –ε and hence (μ × ν)(G) ≥ 0, εbeing arbitrary.

Thus any measurable subset of A × C has nonnegative μ × ν-measureso that A × C is positive for μ × ν. Similarly B × D is positive for μ × ν,whereas A × D and B × C are negative sets for μ × ν. Hence X × Y ={(A × C) ∪ (B × D)} ∪ {(A × D) ∪ (B × C)} is a Hahn decomposition for(X × Y ,S × T , μ × ν). It is then clear that (μ × ν)+, the restriction of μ × νto (A×C)∪ (B×D), equals μ+ × ν+ + μ– × ν–, since the two finite measuresagree on the measurable rectangles. Similarly (μ × ν)– = μ+ × ν– + μ– × ν+.Finally the uniqueness of μ × ν follows from the uniqueness of its restric-tion to each of the subsets A × C, A × D, B × C, B × D, i.e. from the


uniqueness of μ+ × ν+, μ+ × ν–, μ– × ν+, μ– × ν–, which is guaranteed byTheorem 7.4.1. �

Fubini’s Theorem holds for finite signed measures as well. In view ofTheorem 7.5.1, this is an immediate consequence of Fubini’s Theorem formeasures (Theorem 7.4.2) and we now state it, leaving the simple details tothe reader.

Theorem 7.5.2 Let (X,S) and (Y ,T ) be measurable spaces, and μ, ν fi-nite signed measures on S,T respectively. If f ∈ L1(X ×Y ,S×T , |μ| × |ν|),then fx ∈ L1(Y ,T , |ν|) for a.e. x (|μ|), f y ∈ L1(X,S, |μ|) for a.e. y (|ν|), thefunctions

∫fx dν and

∫f y dμ which are thus defined a.e. (|μ|) on X and a.e.

(|ν|) on Y are in L1(X,S, |μ|) and L1(Y ,T , |ν|) respectively, and∫

f d(μ × ν) =∫ ∫

f dμ dν =∫ ∫

f dν dμ.


This section concerns some applications to the real line R = (–∞, +∞).As usual B denotes the Borel sets of R and m Lebesgue measure on B.Write R2 for the plane R×R, and B×B = B2 the class of two-dimensionalBorel sets, or simply the Borel sets of R2, and m2 = m×m two-dimensionalLebesgue measure, or Lebesgue measure on R2. The completion B × B ofB×Bwith respect to m×m is called the class of two-dimensional Lebesguemeasurable sets, or the Lebesgue measurable sets of R2, and is denoted byL2. Notice that L2 � L × L, i.e. B × B � B × B as shown in Ex. 7.5.

In the sequel we will write L1(R) for L1(R,B, m), and L1(R2) for L1(R2,B2, m × m). Note that f , g ∈ L1(R) does not (in general) imply fg ∈ L1(R),as the example f (x) = g(x) = x–1/2χ(0,1)(x) demonstrates. However, thefollowing remarkable and useful result follows as a first application ofFubini’s Theorem.

Theorem 7.6.1 Let f , g be functions defined on R. If f , g ∈ L1(R) then fora.e. x ∈ R the function of y, f (x – y)g(y) belongs to L1(R), and if for thesex’s we define

h(x) =∫ ∞

–∞f (x – y)g(y) dy,

then h ∈ L1(R) and ‖h‖1 ≤ ‖f ‖1 ‖g‖1. h is called the convolution of f andg and is here denoted by f ∗ g.

154 Product spaces

Proof Define the function F(x, y) on R2 by F(x, y) = f (x – y)g(y) andassume for the moment that F isB2-measurable. Then by Fubini’s Theoremfor nonnegative functions (Theorem 7.4.2),∫

R2 |F| d(m × m) =∫ ∞

–∞

∫ ∞–∞|f (x – y)g(y)| dx dy

=∫ ∞

–∞|g(y)|(∫ ∞

–∞|f (x – y)| dx) dy

= ‖f ‖1∫ ∞

–∞|g(y)| dy = ‖f ‖1 ‖g‖1

since∫ ∞

–∞|f (x – y)| dx =∫ ∞

–∞|f (x)| dx by the translation invariance ofLebesgue measure (see last paragraph of Section 4.7). Thus F ∈ L1(R2) andby Fubini’s Theorem for integrable functions Fx ∈ L1(R) for a.e. x ∈ R (m),and h(x) =

∫ ∞–∞Fx(y) dy which is thus defined a.e. on R belongs to L1(R).

Applying again Fubini’s Theorem for nonnegative functions it follows asbefore that

‖h‖1 =∫ ∞

–∞|h(x)| dx ≤∫ ∞

–∞(∫ ∞

–∞|f (x – y)g(y)| dx) dy = ‖f ‖1 ‖g‖1.

It thus only remains to be shown that F is B2-measurable for the proof ofthe theorem to be complete. Consider the functions F1, F2 defined on R2

by F1(x, y) = x and F2(x, y) = y. Clearly F1 and F2 are B2-measurable.Since f and g are B-measurable, by Theorem 3.3.1 the compositionsf (x – y) = f {F1(x, y) – F2(x, y)} = (f ◦ (F1 – F2))(x, y) and g(y) = g{F2(x, y)} =(g ◦ F2)(x, y) are B2-measurable, and hence so also is their productF(x, y) = f (x – y)g(y) (Theorem 3.4.4). �

The notion of convolution of two integrable functions has an immediate,and useful, generalization to the convolution of two finite signed measuresgiven in Ex. 7.24.

The next application of Fubini’s Theorem gives the formula for integra-tion by parts in a general form.

Theorem 7.6.2 If F and G are right-continuous functions of boundedvariation on [a, b], –∞ < a < b < ∞, then∫

(a,b]G(x) dF(x) = F(b)G(b) – F(a)G(a) –

∫(a,b]

F(x – 0) dG(x).

Proof Let E = {(x, y) ∈ (a, b]×(a, b] : y ≤ x}. Then E ∈ B2 since the func-tions F1(x, y) = x, F2(x, y) = y are B2-measurable and E = {(a, b]× (a, b]}∩{(x, y) : F2(x, y) ≤ F1(x, y)}. If μF and μG are the finite signed Lebesgue–Stieltjes measures onB(a, b] corresponding to F and G (see Theorem 5.7.4)then by Theorem 7.5.1,

(μF × μG)(E) =∫

(a,b]μG(Ex) dμF(x) =

∫(a,b]

μF(Ey) dμG(y).

7.7 Finite-dimensional product spaces 155

Since Ex = (a, x] and Ey = [y, b] this is written∫(a,b]{G(x) – G(a)} dF(x) =

∫(a,b]{F(b) – F(y – 0)} dG(y)

so that ∫(a,b]

G(x) dF(x) – G(a){F(b) – F(a)}

= F(b){G(b) – G(a)} –∫

(a,b]F(y – 0) dG(y)

and the desired expression follows by cancelling the terms F(b)G(a). �

For absolutely continuous functions integration by parts has a simplerform.

Corollary If F and G are absolutely continuous functions on [a, b], –∞ <

a < b < ∞, with F(x) = F(a) +∫ x

af (t) dt, G(x) = G(a) +

∫ x

ag(t) dt, f , g ∈

L1(a, b), then∫ b

aG(x)f (x) dx +

∫ b

aF(x)g(x) dx = F(b)G(b) – F(a)G(a).

Proof The result follows immediately from the theorem since F is con-tinuous and dμF/dm = f , and similarly for G. �

Further real line applications are given in the exercises.

7.7 Finite-dimensional product spaces

The results of Sections 7.1, 7.3–7.5 may be generalized to include the prod-uct of a finite number of factor spaces. To see this, first let X1, . . . , Xn

be spaces and∏n

1 Xi = X1 × X2 × . . . × Xn their Cartesian product, i.e.{(x1, . . . , xn) : xi ∈ Xi, i = 1, . . . , n}.

If Si are semirings of subsets of Xi, i = 1, . . . , n, the class Pn of allrectangles A1×A2× . . .×An such that Ai ∈ Si for each i, is again a semiring.In fact the proof of Lemma 7.1.1 generalizes at once by noting that (A1 ×A2 × . . . × An) – (B1 × B2 × . . . × Bn) may be expressed as the finite disjointunion ∪n

1Er where

Er = (A1 ∩B1)× (A2 ∩B2)× . . .× (Ar–1 ∩Br–1)× (Ar – Br)×Ar+1 × . . .×An.

(Note that if r < s, Er ⊂ A1 × A2 × . . .× (Ar – Br)× Ar+1 × . . .× An whereasEs ⊂ A1 × A2 × . . . × (Ar ∩ Br) × Ar+1 × . . . × An and hence Er ∩ Es = ∅.)

For σ-rings S1,S2, . . . ,Sn the product σ-ring∏n

1 Si = S1×S2× . . .×Sn

is simply defined to be the σ-ring generated by this semiring Pn. We as-sume now that Si are σ-fields, so that (X1,S1), . . . , (Xn,Sn) are measurablespaces, and (X1 × X2 × . . . × Xn, S1 × S2 × . . . × Sn) is a measurable space,the “product measurable space” (

∏n1 Xi,

∏n1 Si).

156 Product spaces

If E is a subset of X1 × X2 × . . . × Xn, a section may be defined by fixingany number of x1, x2, . . . , xn (xi ∈ Xi) to be a subset of the product of theremaining spaces Xi. For example

Ex1,x2,...,xr = {(xr+1, xr+2, . . . , xn) : (x1, x2, . . . , xn) ∈ E}= T–1

x E ⊂ Xr+1 × Xr+2 × . . . × Xn

where Tx, for x = (x1, x2, . . . , xr), is the mapping of Xr+1 × . . . × Xn intoX1 × X2 × . . . × Xn given by Tx(xr+1, xr+2, . . . , xn) = (x1, x2, . . . , xn).

It is easily seen that Theorem 7.1.3 generalizes so that each Tx is mea-surable and if E ∈ S1 × S2 × . . . × Sn then any section is a member of theappropriate σ-field (Sr+1 × Sr+2 × . . . × Sn in the example given).

Suppose now that μ1, . . . , μn are σ-finite measures on S1, . . . ,Sn. WriteYn = X1×X2× . . .×Xn and Tn = S1×S2× . . .×Sn. Then a product measureλn, denoted by μ1 × μ2 × . . . × μn, may be defined (e.g. inductively) on Tn,with the property that

λn(A1 × A2 × . . . × An) = μ1(A1)μ2(A2) . . . μn(An)

where Ai ∈ Si, i = 1, . . . , n. To see this more precisely, we suppose thatλn–1 has been defined on Tn–1 with this product property. We may “iden-tify” Yn with the product space Yn–1 × Xn in a natural way by the mappingT((x1, . . . , xn–1), xn) = (x1, . . . , xn) from Yn–1 × Xn to Yn. That is, while Yn

is the product of n factor spaces, it may be regarded as the product of twospaces (of which one is itself a product) in this way. It may be shown thatif E ∈ Tn then T–1E ∈ Tn–1 × Sn (Ex. 7.30) and thus λn is naturally definedby λn = (λn–1 × μn)T–1. If E = A1 ×A2 × . . .×An (Ai ∈ Si, i = 1, . . . , n) thenT–1E = (A1 × A2 × . . . × An–1) × An and hence

λn(E) = λn–1(A1 × A2 × . . . × An–1)μn(An) = μ1(A1)μ2(A2) . . . μn(An)

as required. λn is the unique measure on Tn with this property since anyother such measure must coincide with λn on the semiring Pn and henceon S1 × S2 × . . . × Sn (σ-finiteness on Pn is clear). λn is also thus σ-finite.Thus in summary the following result holds.

Theorem 7.7.1 Let (Xi,Si, μi) be σ-finite measure spaces for i = 1, 2, . . . ,n. Then there exists a unique measure λn (written μ1 × μ2 × . . . × μn) on theσ-field S1 × S2 × . . . × Sn such that

λn(A1 × A2 × . . . × An) =n∏

i=1

μi(Ai)

for each such rectangle with Ai ∈ Si, i = 1, . . . , n. λn is σ-finite.

7.7 Finite-dimensional product spaces 157

The results of Section 7.4 also generalize to apply to a product of n > 2measure spaces using the same “identification” of Yn with Yn–1 × Xn asabove. For example, suppose that the function f (x1, . . . , xn) defined on Yn,is S1 × S2 × . . . × Sn-(i.e. Tn-) measurable and, say, nonnegative. It isusually convenient to evaluate

∫f dλn as a repeated integral

∫ ∫. . .

∫f dμ1

dμ2 . . . dμn, say. It is clear what is meant by such a repeated integral.First for fixed x2, x3, . . . , xn the “section” fx2,...,xn (x1) = f (x1, . . . , xn) is in-tegrated over X1, giving a function f (2)(x2, . . . , xn) say, on X2 × . . . × Xn.Then f (2)

x3,...,xn (x2) is integrated over X2 to give f (3)(x3, . . . , xn), and so on. Thatis the repeated integral may be precisely defined by

∫. . .

∫f dμ1 dμ2 . . . dμn =

∫Xn

f (n)(xn) dμn(xn)

where f (1) = f and the f (i) are defined inductively on Xi × . . . × Xn by

f (i+1)(xi+1, . . . , xn) =∫

Xif (i)xi+1,...,xn

(xi) dμi(xi).

To show the equality of∫

f dλn and the repeated integral we regard f asa function f * on Yn–1 × Xn by writing f *{(x1, . . . , xn–1), xn} = f (x1, . . . , xn);i.e. f * = f T where T denotes the mapping used above. T is a measurabletransformation (Ex. 7.30) and thus by Theorem 4.6.1 and the fact that λn =(λn–1 × μn)T–1,

∫Yn

f dλn =∫

Ynf d(λn–1 × μn)T–1 =

∫Yn–1×Xn

f T d(λn–1 × μn)

=∫

Yn–1×Xnf * d(λn–1 × μn) =

∫Xn{∫

Yn–1f *xn

dλn–1} dμn(xn)

by Fubini’s Theorem for positive functions. But f *xn

is a function on Yn–1

whose value at (x1, . . . , xn–1) is f (x1, . . . , xn) and hence f *xn

= fxn . Thus

∫Yn

f dλn =∫

Xn{∫

Yn–1fxn dλn–1} dμn(xn).

The inner integral on the right (with respect to λn–1) may clearly be reducedin the same way, and so on, leading to the repeated integral. (The precisenotational details are indicated as Ex. 7.31.)

Thus∫

f dλn may be evaluated as a repeated integral in the indicatedorder. Similarly, any other order may be used (see e.g. Ex. 7.32). Fubini’sTheorem for L1-functions also generalizes in the obvious way to the caseof a product of n measure spaces. We state this together with a summary ofthe above discussion as a theorem.

158 Product spaces

Theorem 7.7.2 (Fubini, n factors) Let (Xi,Si, μi) beσ-finite measure spacesfor i = 1, . . . , n, and denote their product by (Yn,Tn, λn). Let f be a Tn-measurable function defined on Yn.

(i) If f is nonnegative then∫

f dλn may be expressed as a repeated integralin any chosen order (e.g.

∫ ∫. . .

∫f dμ1 dμ2 . . . dμn). In particular the

repeated integrals taken in any two distinct orders have the same value.(ii) The same conclusions hold if f ∈ L1(Yn,Tn, λn). This latter condition

is equivalent (by (i)) to the finiteness of any repeated integral of |f | e.g.∫. . .

∫|f | dμ1 . . . dμn < ∞.

For each i = 1, 2, . . . , n, let Xi = R the real line, Si = B the Borelsets of R, and mi = m Lebesgue measure. Write Rn for the n-dimensionalEuclidean space X1 × X2 × . . . × Xn, Bn for S1 × S2 × . . . × Sn, the classof n-dimensional Borel sets, or the Borel sets of Rn, and mn for m1 × m2 ×. . . × mn called n-dimensional Lebesgue measure, or Lebesgue measureon Rn. The completion Bn of Bn with respect to mn is called the class ofn-dimensional Lebesgue measurable sets, or Lebesgue measurable sets ofR

n, and is denoted by Ln. (As for n = 2, Ln � L × L × . . . × L.)

7.8 Lebesgue–Stieltjes measures on Rn

The previous section concerned product measures, where the measure of arectangle is the product of measures of the sides. It is natural to considermore general measures (useful in particular for probability applications in-volving dependence) and we do so in this section in the context of themeasurable space (Rn,Bn) where Rn is the n-dimensional Euclidean spaceand Bn is the class of Borel sets of Rn. As defined Bn is the σ-field gen-erated by the semiring of measurable rectangles E1 × E2 × . . . × En whereeach Ei is a Borel set of R. It is also generated by an even simpler semir-ing: if a = (a1, a2, . . . , an), b = (b1, b2, . . . , bn), a ≤ b (i.e. ai ≤ bi foreach i), let (a, b] denote the “bounded semiclosed interval” of Rn definedby (a, b] = (a1, b1] × (a2, b2] × . . . × (an, bn]. It is not difficult to check thatthe class Pn of all such bounded semiclosed intervals is a semiring, andthat its generated σ-ring is Bn (Ex. 7.33).

In Section 2.8 it was shown how a nondecreasing right-continuous func-tion F(x) can be used to define a Lebesgue–Stieltjes measure on B, andconversely. In this section the procedure will be generalized to define mea-sures on Bn. Such measures are of fundamental importance in the theoryof probability and stochastic processes.

7.8 Lebesgue–Stieltjes measures on Rn 159

The measures on B obtained in Section 2.8 did not have to be finite, pro-vided they took finite values on bounded intervals (and hence wereσ-finite,of course). Here we consider, for simplicity, only finite measures (whichwill be sufficient for all our applications). The main result, an analog ofTheorem 2.8.1, is as follows.

Theorem 7.8.1 (i) Let ν be a finite measure onBn. Then there is a uniquefunction F(x1, . . . , xn) onRn which is bounded, nondecreasing and right-continuous in each xi, tends to zero as any xi → –∞, and is such that

ν{(a, b]} =∑*(–)n–rF(c1, . . . , cn)

for all a = (a1, . . . , an), b = (b1, . . . , bn) with a ≤ b (ai ≤ bi, 1≤ i≤ n),where the * denotes that the sum is taken over all 2n distinct terms withci = ai or bi, i = 1, . . . , n and r is the number of ci equal to bi.

(ii) Conversely, let F(x1, . . . , xn) be a function on Rn which is bounded,nondecreasing and right-continuous in each xi, tends to zero as anyxi → –∞, and satisfies the condition

∑*(–)n–rF(c1, . . . , cn) ≥ 0

for all a ≤ b in Rn with the notation as in (i). Then there is a uniquefinite measure μF on Bn such that

μF{(a, b]} =∑*(–)(n–r)F(c1, . . . , cn)

for all a ≤ b. In particular for all x = (x1, . . . , xn),

μF{(–∞, x]} = F(x1, . . . , xn)

where (–∞, x] = (–∞, x1] × . . . × (–∞, xn].

Proof (i) Define F on Rn by F(x1, . . . , xn) = ν{(–∞, x]}, x = (x1, . . . , xn)∈ Rn. It is easily verified that F is bounded, nondecreasing, right-continuous, and that F(x1, . . . , xn) → 0 as any xi → –∞. In order to ex-press ν{(a, b]} in terms of F note that if Ai = (–∞, ai] and Bi = (–∞, bi]then for each x = (x1, . . . , xn) ∈ Rn,

χ(a,b](x) = Πni=1{χBi (xi) – χAi (xi)} =

∑*(–)n–rχC1 (x1) . . . χCn (xn)

where Ci = (–∞, ci] = Ai or Bi and the notation for * and r is as in (i) ofthe theorem statement. It follows that

ν{(a, b]} =∫Rnχ(a,b] dν =

∑*(–)n–rF(c1, . . . , cn).

160 Product spaces

Since letting a → –∞ in the last expression shows that ν{(–∞, b]}=F(b1, . . . , bn), it follows that F is uniquely determined by ν.

(ii) Define the nonnegative set function μF on the semiring Pn of inter-vals (a, b], a ≤ b, by

μF{(a, b]} =∑*(–)n–rF(c1, . . . , cn).

Notice that when a = b this gives μF(∅) = 0. It is shown in Lemma 7.8.2(below) that μF is finitely additive on Pn. Now let I = ∪∞k=1Ik where I, Ik ∈Pn and the Ik’s are disjoint. Then it is shown in Lemma 7.8.3 that μF(I) ≤∑∞

k=1μF(Ik), and it is easily seen (Ex. 2.18) that for each n,∑n

k=1μF(Ik) ≤μF(I) and hence

∑∞k=1μF(Ik) ≤ μF(I). Thus μF(I) =

∑∞k=1μF(Ik) and μF is

countably additive on Pn. Since μF is clearly finite on Pn, by the extensiontheorem (Theorem 2.5.4) μF has a unique extension to a finite measure onS(Pn) = Bn. �

The following two lemmas were used in the proof of the theorem.

Lemma 7.8.2 Let F be as in (ii) of Theorem 7.8.1, and define the setfunction μF on Pn by μF(∅) = 0 and μF(a, b] =

∑*(–)n–rF(c1, c2, . . . , cn) forall a ≤ b. Then μF is a (nonnegative) finitely additive set function on Pn.

Proof For simplicity of notation consider the two-dimensional case – thegeneral one follows inductively. Let I0 ∈ P2, I0 = ∪K

k=1Ik where Ik aredisjoint sets of P2.

Suppose first that the rectangles Ik occur in “regular stacks”, i.e. thatwe have

Specifically this means that the union may be written as I0 = ∪Mi=1 ∪N

j=1 Eij

where I0 = (a0, aM] × (b0, bN], Eij = (ai–1, ai] × (bj–1, bj], each Ik being one

7.8 Lebesgue–Stieltjes measures on Rn 161

of the terms Eij in the union. Then for fixed i,∑N

j=1μF(Eij) =∑N

j=1[F(ai, bj) – F(ai, bj–1)]

–∑N

j=1[F(ai–1, bj) – F(ai–1, bj–1)]

= F(ai, bN) – F(ai, b0) – [F(ai–1, bN) – F(ai–1, b0)]

so that∑M

i=1∑N

j=1μF(Eij) =∑M

i=1[F(ai, bN) – F(ai–1, bN)]

–∑M

i=1[F(ai, b0) – F(ai–1, b0)]

= F(aM, bN) – F(a0, bN) – F(aM, b0) + F(a0, b0)

which gives μF(I0) =∑

ijμF(Eij) =∑K

k=1μF(Ik) for this “stacked rectangle”case.

The general case may be reduced to the stacked one as follows. If Ik =(αk,α′k] × (βk, β′k], denote the distinct ordered values of α1,α′1,α2,α′2, . . . ,αK ,α′K (in increasing order of size) by a0, a1, . . . , aM, and those of β1, β′1,. . . , βK , β′K by b0, b1, . . . , bN . Then I0 is the union of the disjoint intervals(ai–1, ai]×(bj–1, bj] and by the above μF(I0) =

∑Mi=1

∑Nj=1μF{(ai–1, ai]×(bj–1, bj]}.

But each Ik is a disjoint union of a certain stacked group of these in-tervals and μF(Ik) is therefore just the sum of the corresponding termsμF{(ai–1, ai] × (bj–1, bj]}. Hence μF(I0) =

∑Kk=1μF(Ik), as required. �

Lemma 7.8.3 Under the same conditions and notation as Lemma 7.8.2,if I ∈ Pn, Ik ∈ Pn, k = 1, 2, . . . and I ⊂ ∪∞k=1Ik, then μF(I) ≤ ∑∞

k=1μF(Ik).

Proof Write Ik = (ak, bk], I = (a0, b0], h = (h, h, . . . , h). The right-continuity of F implies that μF{(a, b + h]} ↓ μF{(a, b]} as h ↓ 0. Hence foreach k, hk > 0 may be chosen so that

μF{(ak, bk + hk]} ≤ μF(Ik) + ε/2k

where ε > 0 is given. Now for any h > 0, [a0 + h, b0] ⊂ ∪∞k=1(ak, bk + hk)and hence by the Heine–Borel Theorem, for some K,

(a0 + h, b0] ⊂ [a0 + h, b0] ⊂ ∪Kk=1(ak, bk + hk) ⊂ ∪K

k=1(ak, bk + hk].

It is easy to see from this and Lemma 7.8.2 (cf. Ex. 2.18) that

μF{(a0 + h, b0]} ≤ ∑Kk=1μF{(ak, bk + hk]} ≤ ∑∞

k=1μF(Ik) + ε

from which the desired conclusion follows simply by letting first ε ↓ 0 andthen h ↓ 0, since the right-continuity of F implies that μF{(a0 + h, b0]} →μF{(a0, b0]}. �

162 Product spaces

The measure μF constructed in Theorem 7.8.1 (ii) is called the Lebesgue–Stieltjes measure on Bn corresponding to the function F. The expressionof μF{(a, b]} in terms of F becomes quite involved for large n but maybe described as the sum of the values of F at the vertices (c1, . . . , cn) ofthe interval (a, b] with alternating signs (this is easily seen pictorially forn = 2). μ{(a, b]}may also be expressed as a generalized difference of valuesof F (see Ex. 7.34).

Note that while the function F has been assumed bounded, the discus-sion may be generalized to the case where F is not bounded, but real-valued, yielding a σ-finite measure μF. As noted before, however, the casewhere μF is finite will be the most useful one in applications to probabil-ity. Further, a common special case of the above discussion occurs whenF(x1, x2, . . . , xn) = G1(x1)G2(x2) . . .Gn(xn) where each Gi is a nondecreas-ing, bounded, right-continuous function on R with Gi(–∞) = 0. It shouldbe verified (Ex. 7.35) that μF = μG1 × μG2 × . . . × μGn , i.e. the n-fold prod-uct of the Lebesgue–Stieltjes measures μGi determined by each Gi on B.This measure is useful in probability theory for dealing with independentrandom variables.

The final result of this section (used in the next) establishes “regularity”of finite measures on Bn, closely approximating a set B ∈ Bn in measure“from without” by an open set, and “from within” by a bounded closed (i.e.compact) set. While this is topological in nature (and capable of substantialgeneralization) only the very simplest and most familiar concepts of openand closed sets in Rn will be needed for the current context.

Lemma 7.8.4 (Regularity) Let μ be a finite measure on (Rn,Bn). GivenB ∈ Bn and ε > 0 there is an open set G and a bounded closed set F suchthat F ⊂ B ⊂ G and μ(B – F) < ε, μ(G – B) < ε.

Proof Since the semiring of rectangles (a1, b1] × (a2, b2] × . . . × (an, bn]generates Bn (Ex. 7.33) it follows from the extension procedure of Section2.5 that rectangles Bi, i = 1, 2, . . . of this form exist with ∪∞1 Bi ⊃ B and∑∞

1 μ(Bi) < μ(B) + ε/2. The sides of the rectangles may clearly be extendedto give rectangles Ei ⊃ Bi with open sides and such that μ(Ei) < μ(Bi) +ε/2i+1. Hence G = ∪∞1 Ei is an open set with G ⊃ B, μ(G) ≤ ∑

μ(Ei) ≤∑μ(Bi) + ε/2 < μ(B) + ε.To define the bounded closed set F, note that the above result may be

applied to Bc to give an open set U ⊃ Bc, μ(U) < μ(Bc) + ε so that clearlyμ(Uc) > μ(B) – ε (e.g. μ(Uc) = μ(Rn) – μ(U)). If Ir = [–r, r] × [–r, r] × . . . ×[–r, r] (= [–r, r]n), Ir ↑ Rn as r → ∞ so that Uc ∩ Ir ↑ Uc and hence

7.9 The space (RT ,BT) 163

μ(Uc ∩ Ir)→ μ(Uc). Thus for some N, μ(IN ∩Uc) > μ(B) – ε and the proofis completed on writing F for the bounded closed set IN ∩ Uc. �

7.9 The space (RT ,BT )

In previous sections, product and other Lebesgue–Stieltjes measures onfinite-dimensional product space were investigated. We now consider infi-nite product spaces in this section and corresponding product measures inthe next, as well as general (not necessarily product) measures on them.For simplicity we will deal with the case where all component measurablespaces are copies of the real line with its Borel sets. This is the most in-teresting case in connection with the theory of probability and stochasticprocesses. However, all results of these sections are also valid for more gen-eral component measurable spaces (which, incidentally, need not be copiesof the same measurable space) satisfying certain topological conditions.

Let T be an arbitrary (index) set. It may be convenient to think of Tas time, i.e. a subset of R, and draw pictures – but no conditions will beimposed on T throughout this section. For each t ∈ T let the measurablespace (Xt,St) be a copy of the real line R with its Borel sets, i.e.

(Xt,St) = (R,B) for all t ∈ T .

Recall that the finite-dimensional (Cartesian) productΠni=1Xti = Xt1×. . .×Xtn

is the set {(x(t1), . . . , x(tn)) : x(ti) ∈ R, i = 1, . . . , n}, in other words the setof all real-valued functions on the set (t1, . . . , tn). Similarly the product ofthe spaces Xt, t ∈ T , is defined to be the set of all real-valued functions onT , denoted by

RT = Πt∈TXt

and called the function space on T . Each element x in RT is a real-valuedfunction x(t) defined on T , each x(t) is called a coordinate of x, or the t-coordinate of x.

The first task is to define the product σ-field of the σ-fields St, t ∈ T ,for which the following notation will be used.

Let u = (t1, t2, . . . , tn) denote the ordered n-tuple of distinct pointsti ∈ T (with “order” denoting only that t1 is the first element, t2 thesecond – not a size ordering since set T may not be “size ordered” in anysense). In particular for distinct t1, t2, (t1, t2), (t2, t1) are different 2-tuples.

164 Product spaces

For u = (t1, t2, . . . , tn) write

Ru = Πn

i=1Xti = Xt1 × . . . × Xtn (= Rn)

Bu = Πni=1Sti = St1 × . . . × Stn (= Bn).

The projection map πu from RT onto Ru is defined by

πu(x) = (x(t1), . . . , x(tn)) for all x ∈ RT .

If � = (s1, s2, . . . , sk) is another such k-tuple, and k ≤ n, define � ⊂ u tomean that each element sj of � is one of the ti in u (not necessarily in thesame order), i.e. sj = tτj say, 1 ≤ j ≤ k. Then we define the “projectionmapping” from Ru to R� by

πu,�(x(t1), x(t2), . . . , x(tn)) = (x(s1), x(s2), . . . , x(sk))

noting that this involves both evaluation of x(t) at a subset of values ofthe tj and a possible permutation of their order. It is apparent that πu,� is ameasurable mapping.

If as above � = (s1, s2, . . . , sk) ⊂ u = (t1, t2, . . . , tn) and sj = tτj , 1 ≤ j ≤ k,then for x ∈ RT

πu,�πux = πu,�(x(t1), . . . , x(tn)) = (x(s1), . . . , x(sk)) = π�x

so that πu,�πu = π�.To fix ideas if u = (t1, t2, t3), � = (t1, t2) then πu,�(x(t1), x(t2), x(t3)) =

(x(t1), x(t2)), and if u = (t1, t2), � = (t2, t1) then πu,�(x(t1), x(t2)) = (x(t2),x(t1)).

Now for fixed u = (t1, . . . , tn) ⊂ T and B ∈ Bu the following subset of RT

C = {x ∈ RT : (x(t1), . . . , x(tn)) ∈ B}= {x ∈ RT : πux ∈ B} = π–1

u B

is called a cylinder set with base B at u = (t1, . . . , tn). A cylinder with baseat u is also a cylinder with base at any w ⊃ u, since if u = (t1, . . . , tn),w = (s1, . . . , sn+1) (with tj = sτj 1 ≤ j ≤ n) and B ∈ Bu then the cylinder withbase B ∈ Bu is

π–1u B = π–1

w π–1w,uB = π–1

w (set of Bw) which is a cylinder with base at w.

The class of all cylinder sets with base at a given u is denoted by

C(u) = C(t1, . . . , tn) = {π–1u B, B ∈ Bn} = π–1

u Bu = π–1t1,...,tnB

u

and each C(u) is a σ-field (by Theorem 3.2.2). The class of all cylinder setsis denoted by C, and each set in C is called a cylinder set in RT . Thus

C = ∪{u⊂T: u finite} C(u) = ∪n;t1,...,tn∈T C(t1, . . . , tn).

7.9 The space (RT ,BT) 165

Lemma 7.9.1 C is a field.

Proof Let E1, E2 ∈ C. Then by the definition of C, we have Ei ∈ C(ui),i = 1, 2, where u1, u2 are ordered finite subsets of T . Let u = u1 ∪ u2,consisting of all the distinct elements of u1 and u2 in some arbitrary butfixed order. Then E1, E2 ∈ C(u), and since C(u) is a σ-field it follows thatE1 ∪ E2, Ec

1 belong to C(u) and hence to C, so that C is a field. �

The σ-field generated by the field C is called the product σ-field ofSt, t ∈ T , or the product σ-field in RT , and is denoted by

BT = Πt∈TSt = S(C).

Note that for each ordered finite subset u = (t1, . . . , tn) of T the projectionmap πu is a measurable transformation from (RT ,BT) onto (Ru,Bu), sincefor each B ∈ Bu we have π–1

u B ∈ C(u) ⊂ C ⊂ BT . When u consists of asingle point, u = {t}, πu = πt is called the evaluation function at t sinceπt(x) = x(t) for all x ∈ RT . It can be easily seen that BT is the σ-field ofsubsets of RT generated by the evaluation functions πt, t ∈ T , i.e. BT isthe smallest σ-field of subsets of RT with respect to which all evaluationfunctions are measurable (Ex. 7.36).

When T is a countably infinite set, for example the set of positive in-tegers, T = {1, 2, . . .}, then RT becomes the set of all real sequences andwe use instead the more suggestive notation R∞,B∞. R∞ is also called the(real) sequence space.

Even though, when T is an uncountable set, the function space (RT ,BT)is clearly much larger than the sequence space (R∞,B∞), each measur-able set in (RT ,BT) essentially belongs to some (R∞,B∞) (Theorem 7.9.2).A corresponding statement holds for measurable functions on (RT ,BT), andthis property is often very useful in dealing with such functions. The pro-jection maps and cylinder sets have been defined for ordered finite subsets uof T . The same definitions apply quite clearly when u is an ordered count-able subset of T , u = (t1, t2, . . .). Then the projection map πu from RT toR

u (= R∞) is defined by

πu(x) = (x(t1), x(t2), . . .) for all x ∈ RT ,

a cylinder set with base B ∈ Bu at u is the subset π–1u B of RT , and the

class of all cylinder sets at u is again denoted by C(u), and is given byC(u) = π–1

u Bu. For every ordered subset � of u the map πu,� from Ru to R� isdefined similarly and by definition (i.e. applying the definition ofBT toBu),

Bu = σ(∪{�⊂u: � finite} π–1u,�B�)

166 Product spaces

since π–1u,�B� are the cylinder sets at � in Ru. The following result is not

needed in the sequel but provides the useful characterization of measurablesets as cylinders with base in countably many dimensions referred to above.

Theorem 7.9.2 With the above notation

BT = ∪{u⊂T: u countable} C(u).

Hence if E ∈ BT there is a countable subset S of T (depending on E) suchthat E ∈ C(S). Further, if f is aBT-measurable function there is a countablesubset S of T (depending on f ) such that f is C(S)-measurable.

Proof For each ordered u ⊂ T ,

C(u) = π–1u Bu = π–1

u σ(∪{�⊂u: � finite} π

–1u,�B�

)= σ

(∪{�⊂u: � finite} π

–1u π

–1u,�B�

)= σ

(∪{�⊂u: � finite} C(�))

since π–1u π

–1u,�B� = π–1

� B� = C(�). Since for each finite �, C(�) ⊂ BT , it followsthat C(u) ⊂ BT and thus

E = ∪{u⊂T: u countable} C(u) ⊂ BT .

In order to show the reverse inclusion BT ⊂ E it suffices to show thatE is a σ-field containing C (since BT = S(C)). Each set in C is in someC(t1, . . . , tn) and hence of the form π–1

(t1,...,tn)(B) for some B ∈ Bn. But thisset may also be written as π–1

(t1,...,tn,...)(B × R × R × . . .) for any choice oftn+1, tn+2, . . . , and thus it belongs to C(t1, . . . , tn, . . .) and also to E, sinceB × R × R × . . . ∈ B∞. It follows that E contains C. We now show thatE is a σ-field. For n = 1, 2, . . . , let En ∈ E. Then En ∈ C(un) for somecountable subset un of T . If u = ∪∞n=1un then u is also a countable subsetof T and En ∈ C(u) for all n. Hence En = π–1

u (Bn) for some Bn ∈ B∞ and∪∞n=1En = π–1

u (∪∞n=1Bn) implies that ∪∞n=1En belongs to C(u), and thus also toE so that E is closed under the formation of countable unions. Similarly, Eis closed under complementation.

Now let f be a BT-measurable function defined on RT . Then for arational r, f –1{–∞}, {x : f (x) ≤ r} belong respectively to C(u∞), C(ur)where u∞ and ur are countable subsets of T . Then u = u∞ ∪ (∪rur) is alsoa countable subset of T and f –1{–∞} ∈ C(u), {x : f (x) ≤ r} ∈ C(u) for eachrational r, i.e. f is C(u)-measurable. �

Theorem 7.9.2 shows that each set E ∈ BT is of the form

E = π–1S B = {x ∈ RT : (x(s1), x(s2), . . . , ) ∈ B}

7.10 Measures on RT , Kolmogorov’s Extension Theorem 167

for some countable subset S = (s1, s2, . . . , ) of T and some B ∈ B∞, i.e. it canbe described by conditions on a countable number of coordinates. Henceeach BT-measurable set, as well as function, depends only on a countablenumber of coordinates.

7.10 Measures on RT , Kolmogorov’s Extension Theorem

This section concerns the construction of (probability) measures on thespace (RT ,BT) from probability measures on “finite-dimensional” sub-spaces. For each u = (t1, . . . , tn) ⊂ T , πu (as defined above) is a measur-able transformation from (RT ,BT) onto (Ru,Bu). Hence if μ is a probabilitymeasure in (RT ,BT), each

νu = ν(t1,...,tn) = μπ–1u = μπ–1

(t1,...,tn)

is a probability measure on (Ru,Bu) = (Rn,Bn). The converse question isof interest in the theory of probability and stochastic processes, i.e. givenfor each ordered finite (nonempty) subset (t1, . . . , tn) of T , a probabilitymeasure ν(t1,...,tn) on (Rn,Bn), is there a probability measure μ on (RT ,BT)such that μπ–1

(t1,...,tn) = ν(t1,...,tn)? Note that if � ⊂ u and B ∈ B� then

νu(π–1u,�B) = μ(π–1

u π–1u,�B) = μ(π–1

� B) = ν�(B)

and thus νuπ–1u,� = ν�. This necessary (“consistency”) condition turns out to

be sufficient as well, which is the main result of this section. For clarity theresult will be shown in two parts and combined as Theorem 7.10.3.

Lemma 7.10.1 With the above notation let νu be a probability measureon (Ru,Bu) for each ordered finite subset u ⊂ T, and assumed consistent asdefined above. Then a set function μ may be defined unambiguously on thefield C of cylinder sets by μ(E) = νu(B) when E ∈ C(u), E = π–1

u (B). μ is ameasure on each C(u) and is finitely additive on C.

Proof If E ∈ C, then E ∈ C(u) for some finite subset u of T and henceE = π–1

u (B), B ∈ Bu. To show that μ is uniquely defined by μ(E) = νu(B) it isnecessary to check that different representations for E give the same valuefor μ(E).

Thus let E ∈ C and suppose that E = π–1u B = π–1

� C where B ∈ Bu, C ∈ B�and u, � are finite subsets of T . Let w = u ∪ �. Then E ∈ C(w) so thatE = π–1

w D for some D ∈ Bw. Now πw maps onto Rw and it is simply shownthat

D = πwπ–1w D = πwE = πwπ

–1u B = π–1

w,uB,

168 Product spaces

since u ⊂ w implies πw,uπw = πu, and by the consistency condition

νw(D) = νwπ–1w,u(B) = νu(B).

Similarly it can be shown that νw(D) = ν�(C). Hence νu(B) = ν�(C) and μ isuniquely defined on C by μ(E) = νu(B). Now if Ei are disjoint sets of C(u),Ei = π–1

u Bi where Bi are disjoint sets of Bu. Hence ∪Ei = π–1u (∪Bi) and

μ(∪∞1 Ei) = νu(∪∞1 Bi) =∑∞

1 νu(Bi) =∑∞

1 μ(Ei).

Hence μ is a measure on C(u), for each finite u ⊂ T .Finally, to show finite additivity of μ on C it is sufficient to show addi-

tivity since C is a field. If E, F are disjoint sets of C, E ∈ C(u), F ∈ C(�)say then both E and F belong to C(w) for w = u ∪ �. Since μ is a measureon C(w) it follows that μ(E ∪ F) = μ(E) + μ(F) as desired. �

The above result uses the given consistent measures on classes Bu todefine an additive set function μ on C which is a measure on each C(u).This will be combined with the following result which shows that such a setfunction μ is actually a measure on the field C and hence may be extendedto S(C). The proof may be recognized as a thinly disguised variant of thatfor Tychonoff’s Theorem for compactness of product spaces.

Theorem 7.10.2 Let μ be a finitely additive set function on C such thatμ is a probability measure on C(u) for each finite set u ∈ T. Then μ isa probability measure on C and hence may be extended to a probabilitymeasure on S(C) = BT .

Proof Since μ is finitely additive to show countable additivity it is suffi-cient by Theorem 2.2.6 to show that μ is continuous from above at ∅, i.e.that μ(En)→ 0 for any decreasing sequence of sets En ∈ C with ∩∞1 En = ∅.Equivalently it is sufficient to assume (as we now do) that En are decreasingsets of C with μ(En) ≥ h for some h > 0 and show that ∩∞1 En � ∅.

Now En ∈ C(un) where (replacing un by ∪nk=1uk) it may be assumed that

u1 ⊂ u2 ⊂ u3 ⊂ . . . , uj = (t1, t2, . . . , tnj ) say, and ∪uj = (t1, t2 . . .). ByLemma 7.8.4 the base of the cylinder En contains a bounded closed subsetapproximating it in νun (= μπ–1

un)-measure. Thus a cylinder Fn ⊂ En may be

constructed with bounded closed base in Run , and such that μ(En – Fn) <h/2n+1. The (decreasing) cylinders Cn = ∩n

r=1Fr have bounded closed basesBn in Run and

(En – Cn) = ∪nr=1(En – Fr) ⊂ ∪n

r=1(Er – Fr)

7.10 Measures on RT , Kolmogorov’s Extension Theorem 169

so that (since μ is additive and thus also monotone), μ(En – Cn) ≤∑nr=1μ(Er – Fr) ≤ h/2, giving

μ(Cn) = μ(En) – μ(En – Cn) ≥ h/2 > 0

from which it follows that no Cn is empty. Thus for each j, Cj containsa point xj say so that the point (xj(t1), . . . , xj(tnj )) of Ruj belongs to thebounded closed base Bj of the cylinder Cj ⊂ Ej.

If Σ denotes a subsequence {jr} of the positive integers (with j1 < j2 <

j3 < . . .) and aj is a sequence of real numbers we shall write “{aj : j ∈ Σ}converges” to mean that ajr converges as r → ∞.

Now the sequence {xj(t1)}∞j=1 of bounded (since xj ∈ C1) real numbers hasa convergent subsequence. That is, there is a subsequence Σ1 of the positiveintegers such that {xj(t1) : j ∈ Σ1} converges. Similarly a subsequence of{xj(t2) : j ∈ Σ1} converges and hence Σ1 has a subsequence Σ2 such that{xj(t2) : j ∈ Σ2} converges. Proceeding in this way we obtain subsequencesΣs of the positive integers such that Σ1 ⊃ Σ2 ⊃ Σ3 ⊃ . . . and {xj(ts) : j ∈ Σs}converges. Form now the “diagonal subsequence” Σ of positive integersconsisting of the first member of Σ1, the second of Σ2, and so on. Clearly{xj(ts) : j ∈ Σ} converges for each s. Writing Σ = {rk} this means that xrk (ts)converges to a limit, ys say, as k → ∞, for each s. Let y be any element ofR

T such that y(ts) = ys, s = 1, 2, . . . .Since (xj(t1), . . . , xj(tn1 )) belongs to the base B1 of C1 for every j and B1

is closed, it follows that (y(t1), . . . , y(tn1 )) = (y1, . . . , yn1 ) ∈ B1 and hencey ∈ C1. In a similar way we may show that y ∈ C2, y ∈ C3 and so on. That isy ∈ ∩∞j=1Cj ⊂ ∩∞j=1Fj ⊂ ∩∞j=1Ej, showing that ∩∞j=1Ej � ∅ and thus completingthe proof. �

The main theorem now follows by combining the last two results.

Theorem 7.10.3 (Kolmogorov’s Extension Theorem) Let T be an arbi-trary set and for each ordered finite subset u of T let νu be a probabilitymeasure on (Ru,Bu). If the family {νu : u ordered finite subset of T} is con-sistent, in the sense that νuπ

–1u,� = ν� whenever � ⊂ u, then there is a unique

probability measure μ on (RT ,BT) such that for all finite subsets u of T,μπ–1

u = νu.

Proof The set function μ defined as in Lemma 7.10.1 satisfies the condi-tions of Theorem 7.10.2 and hence is a probability measure on the field Cso that it has an extension to a probability measure on S(C) = BT . If λ isanother probability measure on C with λπ–1

u = νu then λ = μ on C(u) foreach finite u so that λ = μ on C and hence on S(C) = BT by the uniquenessof the extension from C to S(C). �

170 Product spaces

Corollary If for each t ∈ T , μt is a probability measure on (Xt,St) =(R,B), there is a unique probability measure μ on (RT ,BT) such that foreach u = (t1, . . . , tn) ⊂ T

μπ–1u = μt1 × . . . × μtn .

Proof Define

νu = μt1 × . . . × μtn on (Ru,Bu).

Let � ⊂ u and assume for simplicity of notation that � = (t1, . . . , tk), 1 ≤k ≤ n. Then for each B ∈ B�, π–1

u,�B = B × Xtk+1 × . . . × Xtn and

(νuπ–1u,�)(B) = νu(π–1

u,�B)

= (μt1 × . . . × μtn )(B × Xtk+1 × . . . × Xtn )

= (μt1 × . . . × μtk )(B)μtk+1 (Xtk+1 ) . . . μtn (Xtn )

= ν�(B).

Thus the family of probability measures {νu : u ordered finite subset of T}is consistent, and the conclusion follows from Kolmogorov’s ExtensionTheorem. �

The measure μ in this corollary is denoted by

μ =∏t∈T

μt.

In fact this corollary holds if (Xt,St) is an arbitrary measurable space foreach t, in contrast to the topological nature of Theorem 7.10.3, where theproduct space and product σ-field definitions extend those for the abovereal line cases in obvious ways e.g. as stated in the following theorem. (Forproof see e.g. [Halmos, Theorem 38 B].)

Theorem 7.10.4 Let (Xi,Si, μi) be a sequence of measure spaces withμi(Xi) = 1 for all i. Then there exists a unique measure μ on the σ-fieldS =

∏∞i=1 Si such that for every measurable set E of the form A ×∏∞

n+1 Xi,

μ(E) = (μ1 × μ2 × . . . × μn)(A).

Exercises7.1 If S,T are σ-rings on spaces X, Y respectively and A, B are nonempty sub-

sets of X, Y respectively, show that A×B ∈ S×T if and only if A ∈ S, B ∈ T(i.e. a rectangle A × B belongs to S × T if and only if it is a member of thesemiring P (cf. Lemma 7.1.1)).

Exercises 171

7.2 Let X = Y be the same uncountable set and let the σ-rings S = T each bethe class of all countable subsets of X, Y respectively. What is S × T ?

7.3 In Ex. 7.2 let D denote the “diagonal” in X×Y; i.e. D = {(x, y) : x = y}. Showthat Dx ∈ T , Dy ∈ S if x ∈ X, y ∈ Y , but that D � S×T (cf. Theorem 7.1.3).

7.4 Show that the functions f (x, y) = x, g(x, y) = y defined on the plane R2 areB2-measurable. Hence show that the “diagonal” D = {(x, y) : x = y} is aBorel set of the plane.

7.5 Let R be the real line, B the Borel sets of R and L the Lebesgue measurablesets of R, i.e. L = B, the completion of B with respect to Lebesgue measure.Assuming that there is a Lebesgue measurable set which is not a Borel set(cf. Halmos, Exs. 15.6, 19.4) show that B × B ⊂ L × L but B × B � L ×L.Is L × L the class of two-dimensional Lebesgue measurable sets defined inSection 7.6, i.e. is B × B = B × B? (Assume that there is a set E ⊂ R whichis not Lebesgue measurable (cf. Halmos, Theorem 16.D) and use Ex. 7.1applied to the set {x} × E for some fixed x.)

7.6 Let f be a real-valued function defined on R2 such that each fx is Borelmeasurable on R, and each f y is continuous on R. Show that f is Borelmeasurable on R2. (Hint: For n = 1, 2, . . . , define fn(x, y) = f ( k

2n , y) fork2n < x ≤ k+1

2n , k = 0,±1,±2, . . . and show that fn → f on R2.)7.7 Let E ⊂ R2 be such that each Ey is a Lebesgue measurable set in R and

{Ey, –∞ < y < ∞} form a monotone increasing (or decreasing) family, i.e.Ey ⊂ Ey′ whenever y < y′. Show that E is a Lebesgue measurable set in R2.(Hint: Fix any I = [a, b], –∞ < a < b < ∞, define the Lebesgue measurablesets Fn, Gn, n = 1, 2, . . . , of R2 by

Fyn = Eyk,n ∩ (I × I) = yk,n ≤ y < yk+1,n,

Gyn = Eyk+1,n ∩ (I × I) = yk,n < y ≤ yk+1,n

for k = 0, 1, . . . , 2n – 1, where yk,n = a + (b – a)k2–n, and show that Fn ↑F, Gn ↓ G, F ⊂ E ∩ (I × I) ⊂ G and (G – F) has Lebesgue measure zero.)

7.8 Let f be a real-valued function defined on R2 such that each fx is Lebesguemeasurable on R, and each f y is monotone on R. Show that f is Lebesguemeasurable on R2. (Hint: If all f y’s are increasing (or decreasing) the resultfollows from Ex. 7.7. The general case follows by showing that A = {y :f y is increasing} and B = {y : f y is decreasing} are Lebesgue measurablesets in R.)

7.9 Let f be a Borel measurable function on R2 and g a Borel measurable func-tion on R. Show that f (x, g(x)) is Borel measurable on R.

7.10 Let (X,S, μ), (Y ,T , ν) and (X × Y , S × T , λ) be finite measure spaces. If

λ(E × F) =∫

E×Ff d(μ × ν)

for all E ∈ S, F ∈ T , for some nonnegative S × T -measurable function f onX × Y , then prove that λ is absolutely continuous with respect to μ × ν withRadon–Nikodym derivative f .

172 Product spaces

7.11 Let (X,S, μ) and (Y ,T , ν) be σ-finite measure spaces. If E, F ∈ S × T andν(Ex) = ν(Fx) for a.e. x (μ), show that (μ × ν)(E) = (μ × ν)(F).

7.12 Let (X,S, μ) and (Y ,T , ν) be σ-finite measure spaces. If a subset E of X × Yis S × T -measurable and such that for every x ∈ X

either ν(Ex) = 0 or ν(Ecx) = 0,

then prove that μ(Ey) is a constant a.e. (ν). (Hint: Show that μ(EyΔA) = 0 a.e.(ν), where A = {x : ν(Ec

x) = 0}.)7.13 Let (X,S, μ) be a σ-finite measure space, let (Y ,T , ν) be the real line R with

Borel sets and Lebesgue measure, and let f1 and f2 be measurable functionson X. Prove that the set

E = {(x, y) ∈ X × Y : f1(x) < y < f2(x)}

is product measurable, i.e. E ∈ S × T , and that

(μ × ν)(E) =∫

A(f2 – f1) dμ

where A = {x ∈ X : f1(x) < f2(x)}. In particular if f is a nonnegativemeasurable function on x then

(μ × ν){(x, y) ∈ X × Y : 0 < y < f (x)} =∫

Xf dμ.

What happens if “<” in the definition of E is replaced by “≤”?7.14 Let (X,S, μ) be a σ-finite measure space, f a finite-valued nonnegative mea-

surable function defined on X and for each t ≥ 0, Et = {x : f (x) > t}. Let gbe a nonnegative function defined on (0,∞) and such that g ∈ L1(0, a) for alla > 0, and define G(x) =

∫ x0g(t) dt, x ≥ 0. Show that∫

XG{f (x)} dμ(x) =∫ ∞

0 μ(Et)g(t) dt

(applying Theorem 7.4.1 to E = {(x, t) ∈ X × [0,∞) : 0 < t < f (x)}) and that,in particular, ∫

Xf dμ =∫ ∞

0 μ(Et) dt

(which may serve as a definition of the abstract Lebesgue integral∫

Xf dμ ifthe Lebesgue integral over (0,∞) is defined), and for p > 1,∫

Xf p dμ = p∫ ∞

0 μ(Et)tp–1 dt.

7.15 Let (X,S, μ) and (Y ,T , ν) be two finite measure spaces and {fn}∞n=1, f beS × T -measurable functions defined on X × Y . If for a.e. y (ν)

f yn (x) → f y(x) in μ-measure as n→ ∞,

show the following.(i) fn → f in μ × ν-measure.(ii) There is a subsequence {fnk }∞k=1 such that for a.e. x (μ)

fnk ,x(y) → fx(y) a.e. (ν) as k → ∞.

Exercises 173

7.16 Let μ be Lebesgue measure on (R,B), ν be “counting measure” on (R,B)(ν(E) is the number of points in the set E ∈ B), D be the diagonal of R2,defined in Ex. 7.4, and f = χD. Evaluate

∫ ∫f dμ dν,

∫ ∫f dν dμ. What con-

clusion can you draw concerning Fubini’s Theorem?7.17 Let (X,S, μ), (Y ,T , ν) be σ-finite measure spaces, let f (x) and g(y) be inte-

grable functions on (X,S, μ) and (Y ,T , ν) respectively, and define h on X×Yby h(x, y) = f (x)g(y). Show that h is integrable on (X × Y ,S × T , μ × ν) andthat ∫

X×Y h d(μ × ν) =∫

Xf dμ ·∫

Y g dν.

7.18 With the notation and assumptions of Ex. 4.22, show that g is Lebesgueintegrable on the real line.

7.19 Let (X,S, μ) be a σ-finite measure space. Let Y be the set of positive inte-gers, T the class of all subsets of Y , and ν counting measure on Y . If {fn}is a sequence of nonnegative measurable functions on X, show by Fubini’sTheorem that

∫X(

∑∞n=1fn) dμ =

∑∞n=1

∫Xfn dμ (≤ ∞).

(Define g(n, x) = fn(x) on Y × X and note that

{(n, x) : g(n, x) < c} = ∪∞m=1({m} × {x : fm(x) < c}).)

This provides an alternative proof for the corollary to Theorem 4.5.2 but onlywhen μ is σ-finite; a similar proof for Ex. 4.20 may be constructed.

7.20 Let {an,m}∞n,m=1 be a double sequence of real numbers. Show that the relations

∑n∑

manm =∑

m∑

nanm

whenever an,m ≥ 0 for all n, m = 1, 2, . . . , or∑

n∑

m|anm| < ∞, are specialcases of Fubini’s Theorem.

7.21 Continuing Theorem 7.2.3, assume that ν is a measure onW. Show that ifλx � ν a.e. (μ) then λ � ν. Is the converse true? If λ and ν are σ-finite,λx � ν a.e. (μ) and the Radon–Nikodym derivative dλx

dν (w) is measurable in(x, w), what additional assumption is needed in order to show that

dλdν

(w) =∫

Xdλx

dν(w) dμ(x)?

7.22 Let (X,S) and (Y ,T ) be measurable spaces, μ and μ′ σ-finite measures onS, and ν and ν′ σ-finite measures on T . Show the following.

(i) If μ′ � μ and ν′ � ν, then μ′ × ν′ � μ × ν and d(μ′×ν′)d(μ×ν) (x, y) =

dμ′

dμ (x) dν′dν (y).

(ii) If μ′ ⊥ μ or ν′ ⊥ ν, then μ′ × ν′ ⊥ μ × ν.

174 Product spaces

(iii) If the subscripts 1 and 2 denote the absolutely continuous and the singu-lar parts in the Lebesgue decomposition of μ′ (ν′, μ′ × ν′) with respectto μ (ν, μ × ν), then

(μ′ × ν′)1 = μ′1 × ν′1 and (μ′ × ν′)2 = μ′1 × ν

′2 + μ′2 × ν

′1 + μ′2 × ν

′2.

7.23 Let f and g be functions defined on R and 1 ≤ p ≤ ∞. If f ∈ L1(R) andg ∈ Lp(R) show that the integral defining the convolution (f ∗ g)(x) exists fora.e. x ∈ R. Show that f ∗ g ∈ Lp and

‖f ∗ g‖p ≤ ‖f ‖1‖g‖p.

7.24 Let M be the set of all finite signed measures on (R,B).

(i) Show that M is a Banach space with respect to the norm ‖ν‖ = |ν|(R),ν ∈ M.

(ii) Let ν, λ ∈ M and define the set function ν ∗ λ on B by

(ν ∗ λ)(B) =∫ ∞

–∞ν(B – y) dλ(y)

for all B ∈ B, where B – y = {x – y : x ∈ B}. Show that ν ∗ λ ∈ M,ν ∗ λ = λ ∗ ν,

‖ν ∗ λ‖ ≤ ‖ν‖ · ‖λ‖,

and that ∫ ∞–∞f d(ν ∗ λ) =

∫ ∫ ∞–∞f (x + y) dν(x) dλ(y)

whenever either integral exists. (Hint: (ν ∗ λ)(B) = (ν × λ)(E) whereE = {(x, y) : x + y ∈ B}.) If δ ∈ M denotes the measure with total mass1 at 0 (i.e. δ({0}) = 1 and δ(B) = δ(B ∩ {0}), B ∈ B) show that for allν ∈ M

ν ∗ δ = ν = δ ∗ ν.

(iii) If ν, λ ∈ M and m is Lebesgue measure, show the following. If ν � mthen ν ∗ λ � m and

d(ν ∗ λ)dm

(x) =∫ ∞

–∞dνdm

(x – y) dλ(y).

If ν, λ � m thend(ν ∗ λ)

dm=

dνdm∗ dλ

dm.

If ν and λ are discrete (see Section 5.7) then so is ν ∗ λ.

7.25 Prove the following form of the formula for integration by parts. If F and Gare right-continuous functions of bounded variation on [a, b], –∞ < a < c <d < b < ∞, then∫

[c,d]G(x) dF(x) +∫

[c,d]F(x – 0) dG(x) = F(d)G(d) – F(c – 0)G(c – 0).

Exercises 175

7.26 If f ∈ L1(a, b) and G is a right-continuous function of bounded variation on[a, b], show that fG ∈ L1(a, b) and

∫ baf (x)G(x) dx = F(b)G(b) –

∫(a,b]F(x) dG(x)

where F(x) =∫ x

af (t) dt.7.27 Let f , g ∈ L1(R),

F(x) =∫ x

–∞f (t) dt, G(x) =∫ x

–∞g(t) dt, –∞ < x < ∞,

and F(∞) = limx→∞ F(x), G(∞) = limx→∞ G(x). Show that∫ ∞

–∞F(x)g(x) dx +∫ ∞

–∞G(x)f (x) dx = F(∞)G(∞).

7.28 Let –∞ < a < b < ∞, F be a continuous nondecreasing function on [a, b],and G a continuous function of bounded variation on [a, b]. Show that thereis a u, a ≤ u ≤ b, such that

∫[a,b]F(x) dG(x) = F(a){G(u) – G(a)} + F(b){G(b) – G(u)}.

(Hint: Use Theorem 7.6.2 and the first mean value theorem for integrals, Ex.4.4.) This is called the second mean value theorem for integrals. In particular,if F is as above and g ∈ L1(a, b), then there is a u, a ≤ u ≤ b, such that

∫ baF(x)g(x) dx = F(a)

∫ uag(x) dx + F(b)

∫ bug(x) dx.

7.29 Let S,T be σ-rings of subsets of spaces X, Y respectively and let μ, ν be σ-finite measures onS,T . Use Theorem 7.2.1 to show that there exists a unique(σ-finite) measure λ on the σ-ring S × T such that λ(A × B) = μ(A)ν(B) forall A ∈ S, B ∈ T . (Hint: It is sufficient to show that if λ is defined on thesemiring P of measurable rectangles A × B, A ∈ S, B ∈ T by λ(A × B) =μ(A)ν(B) and if A × B = ∪∞1 Ei for disjoint, nonempty Ei ∈ P then λ(A ×B) =

∑∞1 λ(Ei). This follows very simply from the theorem by considering the

spaces (A,S0, μ0) (B,T0, ν0) where S0 is the σ-field S∩A = {F∩A : F ∈ S}of subsets of A, T0 = T ∩ B and μ0 = μ, ν0 = ν on S0, T0 respectively.)

7.30 With the notation of Section 7.7 show that the mapping T((x1 . . . xn–1), xn)= (x1, x2, . . . , xn) is a measurable transformation from (Yn–1 ×Xn,Tn–1 ×Sn)to (Yn,Tn); i.e. that T–1E ∈ Tn–1 × Sn if E ∈ Tn.

7.31 In Section 7.7 (with the notation used there) it was shown that∫Yn

f dλn =∫

Xn{∫

Yn–1fxn dλn–1} dμn(xn). Then the identity

∫Yn

f dλn =∫. . .

∫f dμ1 . . . dμn can be shown as follows.

(i) Assume inductively that the result is true for integrals of functions of(n – 1) variables. Hence show that

∫Yn

f dλn =∫

Xn{∫. . .

∫fxn dμ1 . . . dμn–1} dμn(xn).

176 Product spaces

(ii) Check (from the precise definition of repeated integrals) that the righthand side is

∫. . .

∫f dμ1 . . . dμn. Show inductively that∫. . .

∫fxi,...,xn (x1, . . . , xi–1) dμ1 . . . dμi–1

= f (i)(xi, . . . , xn) = f (i)xi+1,...,xn (xi).

7.32 Let (Xi,Si, μi) be σ-finite measure spaces, i = 1, 2, 3. Let f be a nonnegativemeasurable function on (X1 × X2 × X3, S1 × S2 × S3). If λ = μ1 × μ2 × μ3

show that ∫f dλ =

∫ ∫ ∫f dμ2 dμ1 dμ3.

(Consider the transformation T of X1 × X2 × X3 to X2 × X1 × X3 given byT(x1, x2, x3) = (x2, x1, x3) and write f = f *T where f * is a certain function onX2 × X1 × X3.)

7.33 Show that the class Pn of bounded semiclosed intervals (a, b] of Rn is asemiring which generates the σ-field of Borel sets of Rn.

7.34 Let μ be a finite measure on the σ-field Bn of Borel sets of Rn and F(x1,x2, . . . , xn) = μ{(–∞, x1] × (–∞, x2] × . . . × (–∞, xn]}. Show that the measureof an interval (a, b] may be written as

μ{(a, b]} = Δh11 Δ

h22 . . .Δhn

n F(a1, a2, . . . , an)

where a = (a1, a2, . . . , an), b = (b1, b2, . . . , bn), hi = bi – ai and Δhi is the

difference operator defined by

Δhi F(x1, . . . , xn) = F(x1, . . . , xi–1, xi + h, xi+1, . . . , xn) – F(x1, . . . , xn).

7.35 For each i = 1, 2, . . . , n, let Gi(x) be a bounded nondecreasing function on Rwhich is right-continuous and such that limx→–∞ Gi(x) = 0. If F(x1, x2, . . . ,xn) = G1(x1)G2(x2) . . .Gn(xn) show that μF = μG1 × μG2 × . . . × μGn .

7.36 Show that BT is the smallest σ-field of subsets of RT with respect to whichall evaluation functions πt, t ∈ T , are measurable.

7.37 Let μ be a measure on (RT , BT ) and let BT be the completion of BT withrespect to μ. Show that if E ∈ BT (respectively, f is a BT -measurable func-tion) there is a countable subset S of T such that E ∈ C(S) (respectively, fis C(S)-measurable) where C(S) is the completion of the σ-field C(S) withrespect to the restriction of μ to C(S).

8

Integrating complex functions,Fourier theory and related topics

The intent of this short chapter is to indicate how the previous theory maybe extended in an obvious way to include the integration of complex-valuedfunctions with respect to a measure (or signed measure) μ on a measurablespace (X,S). The primary purpose of this is to discuss Fourier and re-lated transforms which are important in a wide variety of contexts – and inparticular the Chapter 12 discussion of characteristic functions of randomvariables which provide a standard and useful tool in summarizing theirprobabilistic properties.

Some standard inversion theorems will be proved here to help avoid over-load of the Chapter 12 material. However, methods of this chapter also ap-ply to other diverse applications e.g. to Laplace and related transforms usedin fields such as physics as well as in probabilistic areas such as stochasticmodeling, and may be useful for reference.

Finally it might be emphasized (as noted later) that the integrals consid-ered here involve complex functions as integrands and as for the precedingdevelopment, form a “Lebesgue-style” theory. This is in contrast to what istermed “complex variable” methodology, which is a “Riemann-style” the-ory in which integrals are considered with respect to a complex variablez along some curve in the complex plane. The latter methods – not con-sidered here – can be especially useful in providing means for evaluationof integrals such as characteristic functions which may resist simple realvariable techniques.

8.1 Integration of complex functions

Let (X,S, μ) be a measure space and f a complex-valued function definedon X with real and imaginary parts u, �:

f (x) = u(x) + i�(x).

f is said to be measurable if u and � are measurable functions.

177

178 Integrating complex functions, Fourier theory and related topics

We say f ∈ L1(X,S, μ) if u and � both belong to L1(X,S, μ) and write∫f dμ =

∫u dμ + i

∫� dμ.

As noted above this is not integration with respect to a complex variablehere, i.e. we are not considering contour integrals. The integral involvesa complex-valued function, integrated with respect to a (real) measure on(X,S).

Many properties of integrals of real functions hold in the complex casealso. Some of the most elementary and obvious ones are given in the fol-lowing theorem.

Theorem 8.1.1 Let (X,S, μ) be a measure space and write L1 = L1(X,S, μ). Let f be a complex measurable function on X, f = u + i�. Then

(i) f ∈ L1 if and only if |f | = (u2 + �2)1/2 ∈ L1.(ii) If f , g ∈ L1, α, β complex, then αf + βg ∈ L1 and

∫(αf + βg) dμ =

α∫

f dμ + β∫

g dμ.(iii) If f ∈ L1 then |

∫f dμ| ≤

∫|f | dμ.

Proof (i) Measurability of |f | follows from that of u, �. Also it is easilychecked that |u|, |v| ≤ |f | = (u2 + �2)1/2 ≤ |u| + |�| from which (i) follows inboth directions.

(ii) is easily checked by expressing f , g,α, β in terms of their real andimaginary parts and applying the corresponding result for real functions.

(iii) is perhaps slightly more involved to show directly than one mightimagine. Write z =

∫f dμ and z = reiθ. Then

|∫

f dμ| = r = e–iθz = e–iθ∫

f dμ =∫

(e–iθf ) dμ.

But since this is real, the imaginary part of the integral must vanish, giving

|∫

f dμ| =∫R[e–iθf ] dμ (R denoting “real part”)

≤∫|e–iθf | dμ

=∫|f | dμ

as required. �

Many of the simple results for real functions will be used for complexfunctions with little if any comment, in view of their obvious nature – e.g.Theorems 4.4.3, 4.4.6, 4.4.8, 4.4.9. Of course some results (e.g. Theorem4.4.4) simply have no immediate generalization to complex functions.

For the most part the more important and sophisticated theorems alsogeneralize in cases where the generalized statements have meaning. This

8.1 Integration of complex functions 179

is the case for Fubini’s Theorem for L1-functions (Theorem 7.4.2 (ii)),the “Transformation Theorem” (Theorem 4.6.1), Dominated Convergence(Theorem 4.5.5) and the uses of the Radon–Nikodym Theorem such asTheorem 5.6.1 (for complex integrable functions). It may be checked thatthese results follow from the real counterparts. As an example we prove thedominated convergence theorem in the complex setting.

Theorem 8.1.2 (Dominated Convergence for complex sequences) Let { fn}be a sequence of complex-valued functions in L1(X,S, μ) such that|fn| ≤ |g| a.e. where g ∈ L1. Let f be a complex measurable functionsuch that fn → f a.e. Then f ∈ L1 and

∫|fn – f | dμ → 0. In particular∫

fn dμ→∫

f dμ.

Proof Write fn = un + i�n, f = u + i�. Since fn → f a.e. it follows thatun → u, �n → � a.e. Also |un| ≤ |g|, |�n| ≤ |g|. Hence u, � ∈ L1 by Theorem4.5.5 (hence f ∈ L1), and∫

|un – u| dμ→ 0,∫|�n – �| dμ→ 0.

Thus ∫|(un + i�n) – (u + i�)| dμ ≤

∫(|un – u| + |�n – �|) dμ→ 0

or∫|fn – f | dμ→ 0 as required. Finally

|∫

fn dμ –∫

f dμ| = |∫

(fn – f ) dμ| ≤∫|fn – f | dμ

by Theorem 8.1.1 and thus the final statement follows. �

We conclude this section with some comments concerning Lp-spaces ofcomplex functions, and the Holder and Minkowski Inequalities.

As for real functions, if f is complex and measurable we define ‖f ‖p =(∫|f |p dμ)1/p for p > 0 and say that f ∈ Lp if ‖f ‖p < ∞. Clearly such (com-

plex, measurable) f ∈ Lp if and only if |f | ∈ Lp, i.e. |f |p ∈ L1. It is also easilychecked that if f = u + iv, then f ∈ Lp if and only if each of u, � are in Lp.(For if f ∈ Lp, |u|p ≤ |f |p ∈ L1, whereas if u, � ∈ Lp then |u| + |�| ∈ Lp and|f |p ≤ (|u| + |�|)p ∈ L1.)

Further if f , g are complex functions in Lp, it is readily seen that f +g ∈ Lp

and hence αf + βg ∈ Lp for any complex α, β. For |f |, |g| are real functionsin Lp and hence |f | + |g| ∈ Lp, so that |f + g| ≤ (|f | + |g|) ∈ Lp showing that|f + g|p ∈ L1 and hence f + g ∈ Lp.

Holder’s Inequality generalizes verbatim for complex integrands, sinceif f ∈ Lp, g ∈ Lq for some p ≥ 1, q ≥ 1, 1/p + 1/q = 1, then |f | ∈ Lp, |g| ∈ Lq

so that |fg| ∈ L1 by Theorem 6.4.2 and∫|fg| dμ =

∫|f ||g| dμ ≤ (

∫|f |p dμ)1/p(

∫|g|q dμ)1/q.

Armed with Holder’s Inequality, Minkowski’s Inequality follows by thesame proof as in the real case.

The complex Lp-space may be discussed in the same manner as the realLp-space (cf. Section 6.4). This is a linear space (over the complex field)and is normed by ‖f ‖p = (

∫|f |p dμ)1/p (p ≥ 1). It is easily checked that if

fn → f in Lp (i.e. ‖fn – f ‖ → 0) and if fn = un + i�n, f = u + i�, thenun → u, �n → � in Lp, and conversely (e.g. |un – u|p ≤ |fn – f |p and hence‖un – u‖ ≤ ‖fn – f ‖, whereas also ‖fn – f ‖ ≤ ‖un – u‖ + ‖�n – �‖). Using thesefacts, completeness of Lp follows from the results for the real case. As forthe real case Lp is a complete metric space for 0 < p < 1 (Theorem 6.4.7).

8.2 Fourier–Stieltjes, and Fourier Transforms in L1

Suppose that F is a real bounded, nondecreasing function (assumed right-continuous, for convenience) on the real line R and defining the measureμF. The Fourier–Stieltjes Transform F*(t) of F is defined as a complex func-tion on R by

F*(t) =∫ ∞

–∞eitx dF(x) (=∫

eitx dμF).

This integral exists since |eitx| = 1 and μF(R) < ∞.A function F on R is of bounded variation (b.v.) on R (cf. Section 5.7

for finite ranges) if it can be expressed as the difference of two boundednondecreasing functions, F = F1 – F2 (again assume F1, F2 to be right-continuous for convenience). If F is b.v. its Fourier–Stieltjes Transform isdefined as

F*(t) = F*1(t) – F*

2(t).

(Note that this definition is unambiguous since if also F = G1 – G2 thenG1 + F2 = G2 + F1, and it is readily checked that G*

1 + F*2 = G*

2 + F*1, giving

G*1 – G*

2 = F*1 – F*

2.)

Theorem 8.2.1 If F is b.v., its Fourier–Stieltjes Transform F*(t) isuniformly continuous on R.

Proof Suppose F is nondecreasing. For any real t, s, t – s = h,

|F*(t) – F*(s)| = |∫

(eitx – eisx) dF(x)|≤

∫|eisx(eihx – 1)| dF(x)

=∫|eihx – 1| dF(x).

As h → 0, |eihx – 1| → 0 and is bounded by |eihx| + 1 = 2 which isdF-integrable. Hence by Dominated Convergence (Theorem 8.1.2)

8.2 Fourier–Stieltjes, and Fourier Transforms in L1 181∫|eihx – 1| dF(x) → 0 as h → 0 (through any sequence and hence gener-

ally). Thus given ε > 0 there exists δ > 0 such that∫|eihx – 1| dF(x) < ε if

|h| < δ. Then |F*(t) – F*(s)| < ε for all t, s such that |t – s| < δ, which provesuniform continuity. If F is b.v. the result follows by writing F = F1 –F2. �

Suppose now that f is a real Lebesgue measurable function on R andf ∈ L1 = L1(–∞,∞) (Lebesgue measure). Then f (x)eitx ∈ L1 for all real t,and we define the L1 Fourier Transform f † of f by

f †(t) =∫ ∞

–∞eitxf (x) dx.

First note that f , g ∈ L1 then (αf + βg)† = αf † + βg† for any real constantsα, β.

It is also immediate that f †(t) = F*(t) where F(x) =∫ x

∞f (u) du. For if fis nonnegative, F is then nondecreasing and

F*(t) =∫

eitx dF(x) =∫

eitxf (x) dx

by Theorem 5.6.1. The general case follows by writing f = f+ – f–, F1(x) =∫ x

–∞f+(u) du, F2(x) =∫ x

–∞f–(u) du.If f ∈ L1 it follows from the above fact and Theorem 8.2.1 that f †(t) is

uniformly continuous on R.It is clear that a general Fourier–Stieltjes Transform F*(t) does not have

to tend to zero as t → ±∞. For example if F(x) has a single jump of sizeα at x = λ, then F*(t) = αeiλt. However, the Fourier Transform f †(t) of anL1-function f does tend to zero as t → ±∞ as the important Theorem 8.2.3shows. This depends on the following useful lemma.

Lemma 8.2.2 Let f ∈ L1(–∞,∞) (Lebesgue measure). Then given ε > 0there exists a function h of the form h(x) =

∑n1 αjχIj (x), where I1, . . . , In are

(disjoint) bounded intervals, such that∫ ∞

–∞|h – f | dx < ε.

Proof Since f ∈ L1, there exists A < ∞ such that∫

(|x|>A)|f (x)| dx < ε/3,

and hence∫|g – f | dx < ε/3 where g(x) = f (x) for |x| < A, and g(x) = 0 for

|x| ≥ A. By the definition of the integral, g(x) may be approximated by asimple function k(x) =

∑nj=1 αjχBj (x) where the Bj are bounded Borel sets

and where∫|g–k| dx < ε/3, so that

∫|f –k| dx < 2ε/3. Finally for each j there

is a finite union Ij of bounded intervals such that m(Bj�Ij) < ε/(3n max |αj|)where m denotes Lebesgue measure (Theorem 2.6.2), so that writing h(x) =∑n

1 αjχIj we have∫|k – h| dx ≤

∑|αj|

∫|χIj – χBj | dx =

∑|αj|m(Ij � Bj) < ε/3

and hence∫|f – h| dx < ε. The given form of h may now be achieved by a

simple change of notation – replacing each Ij by the intervals of which it iscomposed. �

Theorem 8.2.3 (Riemann–Lebesgue Lemma) Let f ∈ L1(–∞,∞) (i.e. f isLebesgue integrable). Then its Fourier Transform f †(t)→ 0 as t → ±∞.

Proof Let g be any function of the form cχ(a,b] for finite constants a, b, c.Then g†(t) = c

∫ b

aeitx dx = c[eitb – eita]/(it) which tends to zero as t → ±∞.

If h(x) =∑n

j=1 αjgj(x) where each gj is of the above type, then clearlyh†(t)→ 0 as t → ±∞.

Now given ε > 0 there is (by Lemma 8.2.2) a function h of the abovetype such that

∫|h(x) – f (x)| dx < ε. Hence

|f †(t)| = |∫

eitx(f (x) – h(x)) dx + h†(t)|≤

∫|f (x) – h(x)| dx + |h†(t)|

< ε + |h†(t)|.

Since h†(t) → 0 it follows that |f †(t)| can be made arbitrarily small for tsufficiently large (positive or negative) and hence f †(t)→ 0 as t → ±∞, asrequired. �

8.3 Inversion of Fourier–Stieltjes Transforms

The main result of this section is an inversion formula from which F may be“recovered” from a knowledge of its Fourier–Stieltjes Transform. In fact theformula gives not F itself but F(x) = 1

2 [F(x+0)+F(x–0)] = 12 [F(x)+F(x–0)],

assuming right-continuity. F itself is easily obtained from F since F = F atcontinuity points, and at discontinuities F(x) = F(x + 0).

Theorem 8.3.1 (Inversion for Fourier–Stieltjes Transforms) Let F be b.v.with Fourier–Stieltjes Transform F*. Then for all real a, b (a < b say) withthe above notation,

F(b) – F(a) = limT→∞

12π

∫ T

–T

e–ibt – e–iat

–itF*(t) dt.

Also, for any real a, the jump of F at a is

F(a + 0) – F(a – 0) = limT→∞

12T

∫ T

–Te–iatF*(t) dt

(which will be zero if F is continuous at a).

8.3 Inversion of Fourier–Stieltjes Transforms 183

Proof If the result holds for bounded nondecreasing functions, it clearlyholds for a b.v. function. Hence we assume that F is nondecreasing andbounded (and right-continuous for convenience). Now

12π

∫ T

–T

e–ibt – e–iat

–itF*(t) dt =

12π

∫ T

–T

e–ibt – e–iat

–it

∫ ∞–∞eitx dF(x) dt

=1

2π

∫ ∞–∞(

∫ T

–T

eit(x–b) – eit(x–a)

–itdt) dF(x)

by an application of Fubini’s Theorem (noting that the integrand may bewritten as

∫ x–a

x–beitu du and its modulus therefore does not exceed the constant

(b – a) which is integrable with respect to the product of Lebesgue measureon (–T , T) and F-measure). Now the inner integral above is∫ T

–T

∫ x–a

x–beitu du dt =

∫ x–a

x–b

∫ T

–Teitu dt du

= 2∫ x–a

x–b

sin Tuu

du = 2∫ T(x–a)

T(x–b)

sin uu

du

= 2{H[T(x – a)] – H[T(x – b)]}

where H(x) =∫ x

0sin u

u du. As is well known, H is a bounded, odd functionwhich converges to π

2 as x → ∞. Hence limT→∞H[T(x – a)] = – π2 , 0 or π

2according as x < a, x = a, or x > a. Thus (with the corresponding limit forH[T(x – b)]),

limT→∞{H[T(x – a)] – H[T(x – b)]} = 0 x < a or x > b

=π

2x = a or x = b

= π a < x < b.

Further {H[T(x – a)] – H[T(x – b)]} is dominated in absolute value by aconstant (which is dF-integrable) and hence, by dominated convergence,

limT→∞

12π

∫ T

–T

e–ibt – e–iat

–itF*(t) dt

=2

2π

[π

2(F(a) – F(a – 0)) + π(F(b – 0) – F(a)) +

π

2(F(b) – F(b – 0))

]

which reduces to F(b) – F(a), as required.The second expression is obtained similarly. Specifically

12T

∫ T

–Te–iatF*(t) dt =

12T

∫ T

–Te–iat

∫ ∞–∞eitx dF(x) dt

=1

2T

∫ ∞–∞

∫ T

–Teit(x–a) dt dF(x) =

∫ ∞–∞

sin T(x – a)T(x – a)

dF(x)

(using Fubini) where the value of the integrand at x = a is unity. The in-tegrand tends to zero as T → ∞ for all x � a and is bounded by one(dF-integrable). Hence the integral converges as T → ∞ by dominatedconvergence, to the value

μF({a}) = F(a) – F(a – 0) = F(a + 0) – F(a – 0)

as required. �

A most interesting case occurs when the (complex) function F*(t) is it-self in L1(–∞,∞). First of all it is then immediate that F must be continuoussince dominated convergence gives

limT→∞

∫ T

–Te–iatF*(t) dt =

∫ ∞–∞e–iatF*(t) dt

and hence it follows from the second formula of Theorem 8.3.1 that F(a+0)– F(a – 0) = 0. Similarly, the limit in the first inversion may be written as∫ ∞

–∞ instead of lim∫ T

–T(again by dominated convergence) and F = F (since

F is continuous) giving

F(b) – F(a) =1

2π

∫ ∞–∞

e–ibt – e–iat

–itF*(t) dt.

In fact even more is true and can be shown using the following obviouslemma.

Lemma 8.3.2 Let F = F1 – F2 be a b.v. function on R (F1, F2 boundednondecreasing) and g a real function in L1(–K, K) for any finite K, andsuch that F(b) – F(a) =

∫ b

ag(x) dx for all real a < b. Then g ∈ L1(–∞,∞)

and μF(E) =∫

Eg(x) dx for all Borel sets E (μF is defined to be μF1 – μF2 ).

Proof Fix K and define the finite signed measures

μ(E) = μF(E ∩ (–K, K)), ν(E) =∫

E∩(–K,K)g(x) dx.

Clearly μ = ν for all sets of the form (a, b] and hence for all Borel sets(Lemma 5.2.4). Thus the “total variations” |μ|, |ν| are equal giving∫

(–K,K)|g(x)| dx = |ν|(–K, K) = |μ|(–K, K) ≤ (μF1 + μF2 )(–K, K)

≤ (μF1 + μF2 )(R) < ∞.

Hence g ∈ L1(–∞,∞) by monotone convergence (K → ∞). Thus μF(E)and

∫Eg dx are two finite signed measures which are equal on sets (a, b]

and thus on B, as required. �

8.3 Inversion of Fourier–Stieltjes Transforms 185

Theorem 8.3.3 Let F be b.v. on R, with Fourier–Stieltjes Transform F*,and assume F* ∈ L1(–∞,∞). Then F is absolutely continuous, and specifi-cally

F(x) = F(–∞) +∫ x

–∞g(u) du

where g(u) = 12π

∫ ∞–∞e–iutF*(t) dt is real and in L1(–∞,∞).

Proof The formula just prior to Lemma 8.3.2 gives

F(b) – F(a) =1

2π

∫ ∞–∞

∫ b

ae–iutF*(t) du dt

=∫ b

ag(u) du

by Fubini’s Theorem (since F* ∈ L1) and the definition of g.To see that g is real note that the integral of its imaginary part over any

finite interval is zero, and it follows that the imaginary part of g has zerointegral over any Borel set E, and is thus zero a.e. (Theorem 4.4.8). But afunction which is continuous and zero a.e. is everywhere zero (as is easilychecked) and thus g is real.

The result now follows at once by applying Lemma 8.3.2 to F and g. �

We may now obtain an important inversion theorem for L1 Fourier Trans-forms when the transform is also in L1.

Theorem 8.3.4 Let f ∈ L1(–∞,∞). Then if its Fourier Transform f †(t) isin L1(–∞,∞), we have the inversion

f (x) =1

2π

∫ ∞–∞e–ixtf †(t) dt a.e. (Lebesgue measure).

Proof Write F(x) =∫ x

–∞f (u) du. Then by Theorem 8.3.3, for all a, b∫ b

af (u) du = F(b) – F(a) =

∫ b

ag(u) du

where g(x) = 12π

∫e–ixtf †(t) dt is real and in L1(–∞,∞). The finite signed

measures∫

Ef dx,

∫Eg dx are thus equal for all E of the form (a, b] and

hence for all E ∈ B (and finally for all Lebesgue measurable sets E). Hencef = g a.e. by the corollary to Theorem 4.4.8, as required. �

Note that the expression f (x) = 12π

∫ ∞–∞e–ixtf †(t) dt a.e. may be regarded as

displaying f as an “inverse Fourier Transform”. For (apart from the factor1

2π and the negative sign in the exponent) this has the form of the FourierTransform of the (assumed L1) function f †. Of course we have definedFourier Transforms of real functions since that is our primary interest (andf † may be complex) but one could also define the transform of a complex

L1-function. The “inverse transform” thus is an ordinary Fourier Transformwith a negative sign in the exponent and the factor 1

2π .

8.4 “Local” inversion for Fourier Transforms

In the last section it was shown that the inversion

f (x) =1

2π

∫ ∞–∞e–ixtf †(t) dt a.e.

holds when the transform f †(t) ∈ L1. There are important cases when f †

does not belong to L1 but where an inversion is still possible. For examplesuppose f (x) = 0 for x < 0 and f (x) = e–x for x > 0. Then

f †(t) =∫ ∞

0e–xeixt dx =

∫ ∞0

e–x cos xt dx + i∫ ∞

0e–x sin xt dx

=1

1 + t2 +it

1 + t2

=1

1 – it.

Clearly f †(t) � L1 since |f †(t)| = (1 + t2)–1/2.To obtain an appropriate inversion the following limit is needed.

Lemma 8.4.1 (Dirichlet Limit) If for some δ > 0, g(x) is a boundednondecreasing function of x in (0, δ), then

2π

∫ δ

0

sin Txx

g(x) dx→ g(0+)

as T → ∞.

Proof∫ δ

0( sin Tx

x ) dx =∫ Tδ

0( sin u

u ) du → π2 as T → ∞ (cf. proof of Theo-

rem 8.3.1).Thus it will be sufficient to show that

∫ δ

0

sin Txx

(g(x) – g(0+)) dx→ 0.

Given ε > 0 there exists η > 0 such that g(η) – g(0+) < ε. Then∫ η

0

sin Txx

(g(x) – g(0+)) dx = [g(η – 0) – g(0+)]∫ η

ξ

sin Txx

dx

for some ξ ∈ [0, η] by the second mean value theorem for integrals. Thelast expression may be written as

(g(η – 0) – g(0+))∫ ηT

ξT

sin xx

dx.

8.4 “Local” inversion for Fourier Transforms 187

But since∫ T

0(sin u/u) du is bounded,

∣∣∣∣∫ T2

T1(sin u/u) du

∣∣∣∣ < A for some A andall T1, T2 ≥ 0. Thus for all T∣∣∣∣∣

∫ η

0

sin Txx

(g(x) – g(0+)) dx∣∣∣∣∣ ≤ εA.

Now (g(x) – g(0+))/x ∈ L1([η, δ]) (g being bounded and η > 0). TheRiemann–Lebesgue Lemma (Theorem 8.2.3) applies equally well to a finiterange of integration (or the function may be extended to be zero outsidesuch a range). Considering the imaginary part of the integral we see that∫ δ

η(g(x) – g(0+))( sin Tx

x ) dx→ 0 as T → ∞. Hence

lim supT→∞

∣∣∣∣∣∫ δ

0

sin Txx

(g(x) – g(0+)) dx∣∣∣∣∣ ≤ εA

for any ε > 0 from which the required result follows. �

Recall from Section 5.7 that a function f is b.v. in a finite range if it canbe written as the difference of two bounded nondecreasing functions in thatrange. The Dirichlet Limit clearly holds for such b.v. functions (in (0, δ))also.

The desired inversion may now be obtained.

Theorem 8.4.2 (Local Inversion Theorem for L1 Transforms) If f ∈ L1,and f is b.v. in (x – δ, x + δ) for a fixed given x and for some δ > 0, then

12{f (x + 0) + f (x – 0)} = lim

T→∞

12π

∫ T

–Te–itxf †(t) dt.

Proof

12π

∫ T

–Te–itxf †(t) dt =

12π

∫ T

–T

∫ ∞–∞e–it(x–y)f (y) dy dt

=1

2π

∫ ∞–∞(

∫ T

–Te–it(x–y) dt)f (y) dy (Fubini)

=1π

∫ ∞–∞

sin T(x – y)x – y

f (y) dy

=1π

∫ ∞–∞

sin Tuu

f (x + u) du.

Now for x fixed, f (x + u)/u is in L1(δ,∞) and L1(–∞, –δ) for δ > 0 so that∫|u|>δ(sin Tu/u)f (x + u) du→ 0 as T → ∞

by the Riemann–Lebesgue Lemma. Thus we need consider only the range[–δ, δ] for the integral. Now f (x + u) is b.v. in (0, δ) and by the Dirichlet

Limit 1π

∫ δ

0(sin Tu/u)f (x + u) du → 1

2 f (x + 0). Similarly 1π

∫ 0

–δ(sin Tu/u)

f (x + u) du→ 12 f (x – 0) and hence

1π

∫ δ

–δ

sin Tuu

f (x + u) du→ 12

(f (x + 0) + f (x – 0))

giving the desired conclusion of the theorem. �

Corollary If f is continuous at x the stated inversion formula gives f (x).If also f † ∈ L1, f (x) = 1

2π

∫ ∞–∞e–ixtf †(t) dt.

In contrast to the previous inversion formula, that considered here ap-plies to the value of f at a given point x rather than holding a.e. It is of-ten convenient to use complex variable methods (i.e. contour integrals) toevaluate the formula. For example in the case f †(t) = 1

1–it one may con-sider 1

2π

∫C

e–izx

1–iz dz around upper and lower semicircles to recover f (x) = 0for x < 0 and f (x) = e–x for x > 0. (The limit as T → ∞ occurs naturally,making the semicircle larger.) The case x = 0 is easily checked directlygiving the value 1

2 (= (f (0+) + f (0–))/2).

9

Foundations of probability

9.1 Probability space and random variables

By a probability space we mean simply a measure space for which themeasure of the whole space is unity. It is customary to denote a probabilityspace by (Ω,F , P), rather than the (X,S, μ) used in previous chapters forgeneral measure spaces. That is, P is a measure on a σ-field F of subsets ofa space Ω, such that P(Ω) = 1 (and P is thus called a probability measure).

It will be familiar to the reader that this framework is used to provide amathematical (“probabilistic”) model for physical situations involving ran-domness i.e. a random experiment E – which may be very simple, suchas the tossing of coins or dice, or quite complex, such as the recording ofan entire noise waveform. In this model, each point ω ∈ Ω represents apossible outcome that E may have. The measurable sets E ∈ F are termedevents. An event E represents that “physical event” which occurs when theexperiment E is conducted if the actual outcome obtained corresponds toone of the points of E.

It will also be familiar that the complement Ec of an event E representsanother physical event – which occurs precisely when E does not occur if Eis conducted. Further, for two events E, F, E∪F represents that event whichoccurs if either or both of E, F occur, whereas E ∩F represents occurrenceof both these events simultaneously. If E∩F = ∅, the events E and F cannotoccur together when E is performed. Similar interpretations hold for otherset operations such as –, Δ, ∪∞1 and so on.

The probability measure P(E) (sometimes written also as Pr(E)) of anevent E, is referred to as the “probability that the event E occurs” when Eis conducted. As is intuitively reasonable, its values lie between zero andone (P being monotone). If E, F are events which cannot occur together(i.e. disjoint events – E ∩ F = ∅), it is also intuitively plausible that theprobability P(E ∪ F) of one or other of E, F occurring, should be equal toP(E) + P(F). This is true since the measure P is additive. (Of course, the

189

190 Foundations of probability

countable additivity of P implies a corresponding statement for a sequenceof disjoint events.)

It is worth recalling that these properties are also intuitively desirablefrom a consideration of the “frequency interpretation” of P(E) as the pro-portion of times E occurs in very many repetitions of E. Thus the require-ments which make P a probability measure are consistent with intuitiveproperties which probability should have.

We turn now to random variables. To conform to the notion of a randomvariable as a “numerical outcome of a random experiment”, it is intuitivelyreasonable to consider a function on Ω (i.e. an assignment of a numericalvalue to each possible outcome ω). For example for two tosses of a coin wemay write Ω = (HH, HT, TH, TT) and the number of heads ξ(ω) taking therespective values 2, 1, 1, 0. It will be convenient to allow infinite values onoccasions. Precisely, the following definitions will apply.

By an extended (real) random variable we shall mean a measurablefunction (Section 3.3) ξ = ξ(ω) defined a.e. on (Ω,F , P). If the valuesof ξ are finite a.e., we shall simply refer to ξ as a random variable (r.v.).

Note that the precise usage of the term random variable is not uniformamong different authors. Sometimes it is required that a r.v. be definedand finite for all ω, and sometimes defined for all ω and finite a.e. Thelatter definition is inesthetic since the sum of two such “r.v.’s” need not bedefined for all ω, and hence not a r.v. The former can be equally as good asthe definition above since a redefinition of an a.e. finite function will leadto one which is everywhere finite, with the “same properties except on azero measure set” (a fact which will be used from time to time anyway).Which definition is chosen is largely a matter of personal preference sincethere are compensating advantages and disadvantages of each, and in anycase the differences are of no real consequence.

As in previous chapters, B (B∗) will be used to denote the σ-field ofBorel sets (extended Borel sets – Section 3.1) on the real line R (extendedreal line R∗). By a Borel function f on R (R∗) we mean that f (either real orextended real) is measurable with respect to B (B∗).

An extended r.v. ξ viewed as a mapping (transformation) from Ω to R∗,induces the probability measure Pξ–1 on B∗ (Section 3.7). As discussed inthe next section this is the distribution of ξ, using the notation (for B ∈ B∗),

P{ξ ∈ B} = P(ξ–1B).

Similarly other obvious notation (such as P{ξ ≤ a} for Pξ–1(–∞, a]) will beclear and used even if not formally defined.

A further convenient notation is the use of the abbreviation “a.s.”(“almost surely”) which is usually preferred over “a.e.” when the measure

9.2 Distribution function of a random variable 191

involved is a probability measure. This is especially useful when anothermeasure (e.g. Lebesgue) is considered simultaneously with P, since then“a.s.” will refer to P, and “a.e.” to the other measure. It is also not un-common to use the phrase “with probability one” instead of “a.s.”. Thusstatements (for a Borel set B) such as

“ξ ∈ B a.e. (P)”, “ξ ∈ B a.s.”, “ξ ∈ B with probability one”, P{ξ ∈ B} = 1

are equivalent.Finally the measures P, Pξ–1 may or may not be complete (Section 2.6).

Completeness may, of course, be simply achieved where needed or desiredby the completion procedure of Theorem 2.6.1.

9.2 Distribution function of a random variable

As above a r.v. ξ on (Ω,F , P) induces the distribution Pξ–1 on (R∗,B∗) andalso, by restriction, on (R,B). Further if A denotes the (measurable) set ofpoints ω where ξ is either not defined or ξ(ω) = ±∞ then P(A) = 0 andPξ–1(R) = P(Ω) – P(A) = 1, so that Pξ–1 is a probability measure on B, and,since Pξ–1(R∗) = 1, also on B∗.

Now Pξ–1 as a measure on (R,B) is a Lebesgue–Stieltjes measure,corresponding to the point function (Theorem 2.8.1) by

F(x) = Pξ–1{(–∞, x]} = P{ξ ≤ x} ,

i.e. Pξ–1 = μF in the notation of Section 2.8. F is called the distributionfunction (d.f.) of ξ. According to Theorem 2.8.1 F(x) is nondecreasingand continuous to the right. Further it is easily checked, writing F(–∞) =limx→–∞ F(x), F(∞) = limx→∞ F(x) that F(–∞) = 0, F(∞) = 1. In factthese properties are also sufficient for a function F to be the d.f. of somer.v. ξ, as concluded in the following theorem.

Theorem 9.2.1 (i) For a function F onR to be the d.f. (P{ξ ≤ x}) of somer.v. ξ, it is necessary and sufficient that F be nondecreasing, continu-ous to the right and that limx→–∞ F(x) = 0, limx→∞ F(x) = 1.

(ii) Two r.v.’s ξ, η (on the same or different probability spaces) have thesame distribution (i.e. Pξ–1B = Pη–1B for all B ∈ B∗) if and only if theyhave the same d.f. F.

Proof The necessity of the conditions in (i) has been shown by the re-marks above. Conversely if F is a nondecreasing function with the proper-ties stated in (i), we may define a probability space (R,B, μF) where μF is


the measure defined by F (as in Theorem 2.8.1). Since

μF(R) = limn→∞

μF{(–n, n]} = limn→∞{F(n) – F(–n)} = 1,

it follows that μF is a probability measure. If ξ denotes the “identity r.v.” on(R,B, μF) given (for real ω) by ξ(ω) = ω, its d.f. is

μFξ–1{(–∞, x]} = μF{(–∞, x]} = F(x),

so that F is the d.f. of a r.v. ξ as required.To prove (ii), note that clearly if ξ, η have the same distribution (on either

B∗ or B) they have the same d.f. (Take B = (–∞, x].) Conversely if ξ, ηhave the same d.f., then by the uniqueness part of Theorem 2.8.1, Pξ–1 andPη–1 are equal on B (being measures on (R,B) corresponding to the samefunction F), i.e. Pξ–1(B) = Pη–1(B) for all B ∈ B. But this also holds if B isreplaced by B∪{∞}, B∪{–∞} or B∪{∞}∪{–∞} (since e.g. Pξ–1(B∪{∞}) =Pξ–1(B) = Pη–1(B) = Pη–1(B ∪ {∞})). That is Pξ–1 = Pη–1 on B∗ also. �

If two r.v.’s ξ, η (on the same or different probability spaces) have thesame distribution (Pξ–1B = Pη–1B for all B ∈ B, or equivalently for allB ∈ B∗) we say that they are identically distributed, and write ξ d= η. Bythe theorem it is necessary and sufficient for this that they have the samed.f. It is, incidentally, usually “distributional properties” of a r.v. which areimportant in probability theory. If ξ is a r.v. on some (Ω,F , P), we canalways find an identically distributed r.v. on the real line. For if F is the d.f.of ξ a r.v. η may be constructed on (R,B, μF) as above (η(x) = x). η has thesame d.f. F as ξ, and hence the same distribution as ξ, by Theorem 9.2.1.

As noted, if F is the d.f. of ξ, Pξ–1 is the Lebesgue–Stieltjes measure μF

defined by F as in Section 2.8. However, in addition to being everywherefinite, as required in Section 2.8, a d.f. is bounded (with values betweenzero and one).

A d.f. F may have discontinuities, but as noted above it is continuous tothe right. Also since F is monotone the limit F(x–0) = limh↓0 F(x–h) existsfor every x. The measure of a single point is clearly the jump μF({x}) =F(x) – F(x – 0). The following useful result follows from Lemma 2.8.2.

Lemma 9.2.2 Let F be a d.f. (with corresponding probability measureμF on B). Then μF has at most countably many “atoms” (i.e. points x withμF({x})> 0). Correspondingly F has at most countably many discontinuitypoints.

9.2 Distribution function of a random variable 193

Two extreme kinds of distribution and d.f. are of special interest. Thefirst corresponds to r.v.’s ξ whose distribution Pξ–1 on B is discrete. Thatis (cf. Section 5.7) there is a countable set C such that Pξ–1(Cc) = 0. IfC = {x1, x2, . . .} and Pξ–1{xi} = pi, we have for any B ∈ B

Pξ–1(B) = Pξ–1(B ∩ C) =∑{xi∈B}

Pξ–1{xi} =∑{xi∈B}

pi

and thus for the d.f.

F(x) = Pξ–1(–∞, x] =∑{xi≤x}

pi.

F increases by jumps of size pi at the points xi and is called a discreted.f. The r.v. ξ with such a d.f. is also said to be a discrete r.v. Note thatsuch a d.f. may often be visualized as an increasing “step function” withsuccessive stairs of heights pi. This is the case (cf. Section 5.7) if the xi

can be written as a sequence in increasing order of size. However, suchsize ordering is not always possible – as when the set of xi consists of allrational numbers.

Two standard examples of discrete r.v.’s are

(i) Binomial, where C = {0, 1, 2, . . . n} and

pr =(nr

)pr(1 – p)n–r, r = 0, 1, . . . , n (0 ≤ p ≤ 1),

(ii) Poisson, where C = {0, 1, 2 . . .} and

pr = e–mmr/r!, r = 0, 1, 2 . . . (m > 0).

At the “other extreme” the distribution Pξ–1 (= μF) of ξ may be abso-lutely continuous with respect to Lebesgue measure. Then for any B ∈ B

Pξ–1(B) =∫

Bf (x) dx

where the Radon–Nikodym derivative f (of Pξ–1 with respect to Lebesguemeasure) is nonnegative a.e. and hence may be taken as everywhere non-negative (by writing e.g. zero instead of negative values). f is in L1(–∞,∞)and its integral is unity. It is called the probability density function (p.d.f.)for ξ and the d.f. is given by

F(x) = Pξ–1(–∞, x] =∫ x

–∞ f (u) du.

(F is thus an absolutely continuous function – cf. Section 5.7.) We thensay that ξ has an absolutely continuous distribution or simply that ξ is anabsolutely continuous r.v. Common examples are


(i) the normal distribution N(μ,σ2) where

f (x) = (σ√

2π)–1 exp{–(x – μ)2/2σ2} (μ real, σ > 0)

(ii) the gamma distribution with parameters α > 0, β > 0, where f (x) =αβ(Γ(β))–1e–αxxβ–1 (x > 0). The case β = 1 gives the exponentialdistribution.

There is a third “extreme type” of r.v. which is not typically encounteredin classical statistics but has received significant more recent attention inconnection with use of fractals in important applied sciences. This is a r.v.ξ whose distribution is singular with respect to Lebesgue measure (Section5.4) and such that Pξ–1{x} = 0 for every singleton set {x}. That is Pξ–1 hasmass confined to a set B of Lebesgue measure zero, but unlike a discreter.v. Pξ–1 has no atoms in B (or Bc, of course). The corresponding d.f. Fis everywhere continuous, but clearly by no means absolutely continuous.Such a d.f. (and the r.v.) will be called singular (though continuous singularwould perhaps be a better name).

It is readily seen from Section 5.7 that any d.f. whatsoever may be repre-sented in terms of the three special types considered above, as the followingcelebrated result shows.

Theorem 9.2.3 (Lebesgue Decomposition for d.f.’s) Any d.f. F may bewritten as a “convex combination”

F(x) = α1F1(x) + α2F2(x) + α3F3(x)

where F1, F2, F3 are d.f.’s, F1 being absolutely continuous, F2 discrete, F3

singular, and where α1,α2,α3 are nonnegative with α1 + α2 + α3 = 1. Theconstants α1,α2,α3 are unique, and so is the Fi corresponding to any αi > 0(hence the term αiFi is unique for each i).

Proof By Theorem 5.7.1 (Corollary) we may write F(x) = F∗1(x) + F∗2(x) +F∗3(x), where F∗i (x) are nondecreasing functions defining measures μF∗iwhich are respectively absolutely continuous, discrete and singular (fori = 1, 2, 3). Further, noting that

∑3i=1 F∗i (–∞) = 0, we may replace F∗i by

F∗i – F∗i (–∞) and hence take F∗i (–∞) = 0 for each i. Write now αi = F∗i (∞)and Fi(x) = F∗i (x)/αi if αi > 0 (and an arbitrary d.f. of “type i” if αi = 0).Then Fi is a d.f. and the desired decomposition F(x) = α1F1(x) + α2F2(x) +α3F3(x) follows. Letting x→ ∞ we see that α1 + α2 + α3 = 1.

If there is another such decomposition, F = β1G1 +β2G2 +β3G3 say, then

μα1F1 + μα2F2 + μα3F3 = μβ1G1 + μβ2G2 + μβ3G3

9.3 Random elements, vectors and joint distributions 195

and hence by Theorem 5.7.1, μαiFi = μβiGi . Hence αiFi differs from βiGi atmost by an additive constant which must be zero since Fi and Gi vanish at–∞. Since Fi(∞) = Gi(∞) = 1 we thus have αi = βi and hence also Fi = Gi

(provided αi > 0). �

9.3 Random elements, vectors and joint distributions

It is natural to extend the concept of a r.v. by considering more generalmappings rather than just “measurable functions”. These will be precisely“measurable transformations” as discussed in Chapter 3, but the term“measurable mapping” will be more natural (and thus used) in the presentcontext. Specifically let ξ be a measurable mapping defined a.s. on a prob-ability space (Ω,F , P), to a measurable space (X,S) (i.e. ξ–1E ∈ F forall E ∈ S). Then ξ will be called a random element (r.e.) on (Ω,F , P)with values in X (or in (X,S)). An extended r.v. is thus a r.e. with valuesin (R∗,B∗). Another case of importance is when (X,S) = (R∗n,B∗n) andξ(ω) = (ξ1(ω), . . . , ξn(ω)). A r.e. of this form and such that each ξi is finitea.s. will be called a random vector or vector random variable. Yet moregenerally a stochastic process may be defined as a r.e. of (X,S) = (RT ,BT)(cf. Section 7.9) for e.g. an index set T = {1, 2, 3, . . .} or T = (0,∞). Aswill be briefly indicated in Chapter 15 this is alternatively described as aninfinite (countable or uncountable) family of r.v.’s.

Before pursuing probabilistic properties of random elements it will beconvenient to develop some notation and obvious measurability results inthe slightly more general framework in which ξ is a mapping defined on aspace Ω, not necessarily a probability space, with values in a measurablespace (X,S). Apart from notation this is precisely the framework of Sec-tion 3.2 replacing X by Ω and (Y ,T ) by (X,S), and identifying ξ with thetransformation T . It will be more natural in the present context to refer toξ as a mapping rather than a transformation but the results of Section 3.2apply. For such a mapping ξ the σ-field σ(ξ) generated by ξ is defined onΩ (cf. Section 3.2, identifying ξ with T) by

σ(ξ) = σ(ξ–1S) = σ(ξ–1E : E ∈ S).

As noted in Section 3.3, σ(ξ) is the smallest σ-field G on Ωmaking ξ G|S -measurable. Further if ξ(ω) is defined for every ω then the σ-ring ξ–1(S)contains ξ–1(X) = Ω and hence is itself the σ-field σ(ξ). Note that σ(ξ)depends on the “range” σ-field S.


More generally if C is any family of mappings on the same space Ω, butwith values in possibly different measurable spaces, we write

σ(C) = σ(∪ξ∈Cσ(ξ)).

If the family is written as an indexed set C = {ξλ : λ∈Λ}, where ξλ maps Ωinto (Xλ,Sλ), we write

σ(C) = σ{ξλ : λ ∈ Λ} = σ (∪λ∈Λσ(ξλ)) .

For Λ = {1, 2, . . . , n} write σ(C) = σ(ξ1, ξ2, . . . , ξn).The following lemma, stated for reference, should be proved as an exer-

cise (Ex. 9.7).

Lemma 9.3.1 (i) If C is any family of mappings on the space Ω, σ(C)is then the unique smallest σ-field on Ω with respect to which everyξ ∈ C is measurable. (σ(C) is called the σ-field generated by C.)

(ii) If C = {ξλ : λ ∈ Λ}, ξλ taking values in (Xλ,Sλ), then σ(C) = σ{ξ–1λ Bλ :

Bλ ∈ Sλ, λ ∈ Λ}.(iii) If Cλ is a family of mappings on the space Ω for each λ in an index set

Λ then

σ (∪λ∈ΛCλ) = σ (∪λ∈Λσ(Cλ)) .

As indicated above, we shall be especially interested in the case where(X,S) = (R∗n,B∗n) leading to random vectors. The following lemma willbe applied to show the equivalence of a random vector and its componentr.v.’s.

Lemma 9.3.2 Let ξ be a mapping defined on a space Ω with values in(R∗n,B∗n) so that ξ = (ξ1, ξ2, . . . , ξn) where ξi maps Ω into (R∗,B∗). Thenσ(ξ) = σ(ξ1, ξ2, . . . , ξn). That is the σ-field generated on Ω by the mappingξ into (R∗n,B∗n) is identical to that generated by the family of its compo-nents ξi, each mapping Ω into (R∗,B∗).

Proof If Bi ∈ B∗ for each i, then ξ–1(B1 × B2 × . . . × Bn) = ∩n1ξ

–1i Bi. Since

the rectangles B1 × B2 × . . . × Bn generate B∗n, the corollary to Theorem3.3.2 gives

σ(ξ) = σ{∩n1ξ

–1i Bi : Bi ∈ B∗} = σ{ξ–1

i Bi : Bi ∈ B∗, 1 ≤ i ≤ n}

as is easily checked. But this is just σ(ξ1, ξ2, . . . , ξn) by Lemma 9.3.1 (ii).�

We proceed now to consider random vectors – measurable mappingsξ = (ξ1, ξ2, . . . , ξn) defined a.s. on a probability space (Ω,F , P) with valuesin (R∗n,B∗n) its components ξi being finite a.s. (i.e. ξ ∈ Rn a.s.).

9.3 Random elements, vectors and joint distributions 197

The following result shows that a random vector ξ is, equivalently, just afamily of n r.v.’s (ξ1, . . . , ξn) (with σ(ξ) = σ(ξ1, . . . , ξn) as shown above).

Theorem 9.3.3 Let ξ be a mapping defined a.s. on a probability space(Ω,F , P), with values in R∗n. Write ξ = (ξ1, ξ2, . . . , ξn). Then σ(ξ) =σ(ξ1, ξ2, . . . , ξn). Further, ξ is a random element in (R∗n,B∗n) (i.e. F |B∗n-measurable) if and only if each ξi is an extended r.v. (i.e.F |B∗-measurable).Hence ξ is a random vector (r.e. of (Rn,Bn)) if and only if each ξi is a r.v.

Proof That σ(ξ) = σ(ξ1, ξ2, . . . , ξn) restates Lemma 9.3.2. The mappingξ is a r.e. on (Ω,F , P) with values in (R∗n,B∗n) iff it is F -measurable, i.e.σ(ξ) ⊂ F . But this is precisely σ(ξ1, ξ2, . . . , ξn) ⊂ F , which holds iff all ξi

are extended r.v.’s. The final statement also follows immediately. �

The distribution of a r.e. ξ on (Ω,F , P) with values in (X,S) is definedto be the probability measure Pξ–1 on S – directly generalizing the distri-bution of a r.v. Note that a corresponding point function (d.f.) is not definedas before except in special cases where e.g. X = Rn (or at least has some“order structure”). The distribution Pξ–1 of a random vector ξ = (ξ1, . . . , ξn),is a probability measure on B∗n, and its restriction to Bn is a probabilitymeasure on (Rn,Bn), as in the case n = 1 considered previously. The cor-responding point function (cf. Section 7.8) F(x1, . . . , xn) = P{ξi ≤ xi, 1 ≤i ≤ n} = Pξ–1{(–∞, x]} (x = (x1, . . . , xn)) is the joint distribution functionof ξ1, . . . , ξn. As shown in Theorem 7.8.1, such a function has the followingproperties:

(i) F is bounded, nondecreasing and continuous to the right in each xi.(ii) For any a = (a1, . . . , an), b = (b1, . . . , bn), ai < bi we have∑ ∗

(–)n–rF(c1, c2, . . . , cn) ≥ 0

where∑ ∗ denotes summation over the 2n distinct terms with ci = ai

or bi and r is the number of ci which are bi’s.

In addition since Pξ–1 is a probability measure it is easy to check that thefollowing also hold:

(iii) 0 ≤ F(x1, . . . , xn) ≤ 1 for all x1, . . . , xn, limxi→–∞ F(x1, . . . , xn) = 0 (forany fixed i), and

lim(x1,...,xn)→(∞,...,∞)

F(x1, . . . , xn) = 1.

In fact these conditions are also sufficient for F to be the joint d.f. of someset of r.v.’s as stated in the following theorem.

Theorem 9.3.4 A function F on Rn is the joint d.f. of some r.v.’s ξ1, . . . , ξn

if and only if it satisfies Conditions (i)–(iii) above. Then for ai ≤ bi,1≤ i≤n, P{ai < ξi ≤ bi, 1 ≤ i ≤ n} is given by the sum in (ii) above.

Sketch of Proof The necessity of the conditions has been noted. The suf-ficiency follows simply from the fact (Theorem 7.8.1) that F defines a mea-sure μF on (Rn,Bn). It is easily checked that μF is a probability measure.If Ω = Rn, F = Bn, P = μF and ξi(x1, x2, . . . , xn) = xi then ξ1, . . . , ξn arer.v.’s on Ω with the joint d.f. F. (The details should be worked through asan exercise.) �

As in the previous section, it is of particular interest to consider thecase when Pξ–1 is absolutely continuous with respect to n-dimensionalLebesgue measure, i.e. for every E ∈ Bn,

Pξ–1(E) =∫

Ef (u1, . . . , un) du1 . . . dun

for some Lebesgue integrable f which is thus (Radon–Nikodym Theorem)nonnegative a.e. (hence may be taken everywhere nonnegative) and inte-grates over Rn to unity. Equivalently, this holds if and only if

F(x1, . . . , xn) =∫ xn

–∞ . . .∫ x1

–∞ f (u1, . . . , un) du1 . . . dun

for all choices of x1, . . . , xn. We say that f is the joint p.d.f. of the r.v.’sξ1, . . . , ξn whose d.f. is F. As noted above its integral over any set E ∈ Bn

gives Pξ–1(E) which is the probability P{ξ ∈ E} that the value of the vector(ξ1(ω), . . . , ξn(ω)) lies in the set E.

Next note that if the r.v.’s ξ1, . . . , ξn have joint d.f. F, the joint d.f. ofany subset, say ξ1, . . . , ξk of the ξ’s may be obtained by letting the remain-ing x’s (xk+1, . . . , xn) tend to +∞; e.g. F(x1, . . . , xn–1,∞) = limxn→∞ F(x1, . . . ,xn–1, xn) is the joint d.f. of ξ1, . . . , ξn–1. This is easily checked. If F isabsolutely continuous, the joint density for ξ1, . . . , ξk may be obtained byintegrating the density f (x1, . . . , xn) (corresponding to F) over xk+1, . . . , xn.Again this is easily checked (Ex. 9.9). Of course, if we “put” x2 = x3 =· · · = xn = ∞ in the joint d.f. (or integrate the joint density over these vari-ables in the absolutely continuous case) we obtain just the d.f. (or p.d.f.) ofξ1. Accordingly the d.f. (or p.d.f.) of ξ1 is called a marginal d.f. (or p.d.f.)obtained from the joint d.f. (or p.d.f.) in this way.

Finally, note that if ξ1, . . . , ξn, ξ∗1, . . . , ξ∗n are r.v.’s such that ξ∗i = ξi a.s.for each i, then the joint d.f.’s of the two families (ξ1, . . . , ξn), (ξ∗1, . . . , ξ∗n)are equal. This is obvious, but should be checked.

9.4 Expectation and moments 199

9.4 Expectation and moments

Let (Ω,F , P) be a probability space. If ξ is a r.v. or extended r.v. on thisspace, we write Eξ to denote

∫ξ(ω) dP(ω) whenever this integral is de-

fined, e.g. if ξ is a.s. nonnegative or ξ ∈ L1(Ω,F , P). E thus simply denotesthe operation of integration with respect to P and Eξ is termed the mean orexpectation of ξ. In the case where ξ ∈ L1(Ω,F , P) (and hence in particularξ is a.s. finite and thus a r.v.) Eξ and E|ξ| are finite (since |ξ| ∈ L1 also). It isthen customary to say that the mean of ξ exists, or that ξ has a finite mean.Since E denotes integration, any theorem of integration theory will be usedwith this notation without comment.

Suppose now that ξ is finite a.s. (i.e. is a r.v.) with d.f. F. Let g(x) = |x|,so that g(ξ(ω)) is defined a.s. and then

E|ξ| =∫Ω

g(ξ(ω)) dP(ω) =∫R∗ g(x) dPξ–1(x)

viewing ξ as a transformation from Ω to R∗ (Theorem 4.6.1). But this latterintegral is just

∫R

g(x) dPξ–1(x) =∫|x| dF(x) (since Pξ–1 = μF – see Section

4.7) and hence

E|ξ| =∫|x| dF(x) ≤ ∞.

E|ξ| is thus finite if and only if∫|x| dF(x) < ∞, and in this case the same

argument but with g(x) = x gives

Eξ =∫

x dF(x).

If also ξ has an absolutely continuous distribution, with p.d.f. f then(Theorem 5.6.1)

Eξ =∫

xf (x) dx.

On the other hand, if ξ is discrete with P{ξ = xn} = pn, it is easily checked(Ex. 9.12) that E|ξ| = ∑

pn|xn| and, when E|ξ| < ∞, that Eξ =∑

pnxn.Suppose now that ξ is a r.v. on (Ω,F , P) and that g is a real-valued mea-

surable function on R. Then g(ξ(ω)) is clearly a r.v. (Theorem 3.4.3) and anargument along the precise lines as that given above at once demonstratesthe truth of the following result.

Theorem 9.4.1 If ξ is a r.v. and g is a finite real-valued measurablefunction on R, then E|g(ξ)| < ∞ if and only if

∫|g(x)| dF(x) < ∞. Then

Eg(ξ) =∫

g(x) dF(x).

In particular consider g(x) = xp for p = 1, 2, 3, . . . . We call E|ξ|p thepth absolute moment of ξ and when it is finite, say that the pth moment of

ξ exists, given by Eξp. This holds equivalently if ξ ∈ Lp(Ω,F , P) and thetheorem shows that Eξp =

∫xp dF(x).

If p > 0 but p is not an integer then xp is not real-valued for x < 0 andthus ξp(ω) is not necessarily defined a.s. However, if ξ is a nonnegative r.v.(a.s.) ξp(ω) is defined a.s. and the above remarks hold. In any case one canstill consider E|ξ|p for all p > 0 regardless of the signs of the values of ξ.

It will be seen in the next section that if ξ ∈ Lp = Lp(Ω,F , P) for somep > 1 (i.e. E|ξ|p < ∞) then ξ ∈ Lq for 1 ≤ q ≤ p. (This fact applies since Pis a finite measure – it does not apply to Lp classes for general measures.)Thus in this case the mean of ξ exists in particular, and (since any constantbelongs to Lp on account of the finiteness of P) if p is a positive integer,ξ – Eξ ∈ Lp or E|ξ – Eξ|p < ∞. This quantity is called the pth absolutecentral moment of ξ, and E(ξ – Eξ)p the pth central moment, p = 1, 2, . . . .

If p = 2, the quantity E(ξ – Eξ)2 is the variance of ξ (denoted by var(ξ)or σ2

ξ). It is readily checked (Ex. 9.13) that a central moment may be ex-pressed in terms of ordinary moments (and conversely) and in particularthat var(ξ) = Eξ2 – (Eξ)2.

Joint moments of two or more r.v.’s are also commonly used. For exam-ple if ξ, η have finite second moments (ξ, η ∈ L2) then as will be seen inTheorems 9.5.2, 9.5.1 they are both in L1 and (ξ – Eξ)(η – Eη) ∈ L1. Theexpectation γ = E{(ξ – Eξ)(η – Eη)} is termed the covariance (cov(ξ, η))of ξ and η, and ρ = γ/(σξση) is their correlation, where σ2

ξ = var(ξ) andσ2η = var(η). See Ex. 9.20 for some useful interpretations and properties

which should be checked.A most important family of r.v.’s in statistical theory and practice arising

from Theorem 9.3.4 is that of multivariate normal r.v.’s ξ1, ξ2, . . . , ξn whosejoint distribution is specified by their means, variances and covariances (orcorrelations). For the nonsingular case they have the joint p.d.f.

f (x1, x2, . . . , xn) = (2π)–n/2|Λ|–1/2 exp{– 12 (x – μ)′Λ–1(x – μ)}

where x = (x1, x2, . . . , xn)′, μ = (μ1, μ2, . . . , μn)′, (μi = Eξi) and Λ is thecovariance matrix with (i, j)th element γij = cov(ξi, ξj), assumed nonsingu-lar (that is, its determinant |Λ| is not zero). See Exs. 9.21, 9.22 for furtherdetails, properties and comments.

9.5 Inequalities for moments and probabilities

There are a number of standard and useful inequalities concerning mo-ments of a r.v., and probabilities of exceeding a given value. A few of

9.5 Inequalities for moments and probabilities 201

these will be given now, starting with a “translation” of the Holder andMinkowski Inequalities (Theorems 6.4.2, 6.4.3) into the expectationnotation.

Theorem 9.5.1 Suppose that ξ, η are r.v.’s on (Ω,F , P).

(i) (Holder’s Inequality) If E|ξ|p < ∞, E|η|q < ∞ where 1< p, q<∞,1/p + 1/q = 1, then E|ξη| < ∞ and

|Eξη| ≤ E|ξη| ≤ (E|ξ|p)1/p (E|η|q)1/q

with equality in the second inequality only if one of ξ, η is zero a.s. orif |ξ|p = c|η|q a.s. for some constant c > 0.

(ii) (Minkowski’s Inequality) If E|ξ|p <∞, E|η|p <∞ for some p ≥ 1 thenE|ξ + η|p < ∞ and

(E|ξ + η|p)1/p ≤ (E|ξ|p)1/p + (E|η|p)1/p

with equality (if p > 1) only if one of ξ, η is zero a.s. or if ξ = cη a.s.for some constant c > 0. For p = 1 equality holds if and only if ξη ≥ 0a.s.

(iii) If 0 < p < 1 and E|ξ|p < ∞, E|η|p < ∞, then E|ξ + η|p < ∞ andE|ξ + η|p ≤ E|ξ|p + E|η|p, with equality iff ξη = 0 a.s. (see also Ex.9.19).

The norm notation – writing ||ξ||p = (E|ξ|p)1/p – gives the neatest state-ments of the inequalities as in Section 6.4, in the case p ≥ 1. For Holder’sInequality may be written as ||ξη||1 ≤ ||ξ||p||η||q and Minkowski’s Inequalityas ||ξ + η||p ≤ ||ξ||p + ||η||p.

The following result, mentioned in the previous section, is an immediatecorollary of (i), and restates Theorem 6.4.8 (with μ(X) = 1).

Theorem 9.5.2 If ξ is a r.v. on (Ω,F , P) and E|ξ|p < ∞ for some p > 0,then E|ξ|q < ∞ for 0 < q ≤ p, and (E|ξ|q)1/q ≤ (E|ξ|p)1/p, i.e. ||ξ||q ≤ ||ξ||p.

In particular it follows that if Eξ2 < ∞ then E|ξ| < ∞ and (Eξ)2 ≤(E|ξ|)2 ≤ Eξ2 (which, of course, may be readily shown directly from E(|ξ| –E|ξ|)2 ≥ 0).

Another very simple class of (“Markov type”) inequalities relates prob-abilities such as P{ξ ≥ a}, P{|ξ| ≥ a} etc., to moments of ξ. The followingresult gives typical examples of such inequalities.

Theorem 9.5.3 Let g be a nonnegative, real-valued function on R, and letξ be a r.v.

(i) If g(x) is even, and nondecreasing for 0 ≤ x < ∞ then for all a ≥ 0,with g(a) � 0,

P{|ξ| ≥ a} ≤ E{g(ξ)}/g(a).

(ii) If g is nondecreasing on –∞ < x < ∞ then for all a with g(a) � 0,

P{ξ ≥ a} ≤ E{g(ξ)}/g(a).

Proof Note first that the monotonicity of g in each case implies its (Borel)measurability (cf. Ex. 3.11). With g as in (i) it is clear that g(ξ(ω)) is definedand finite a.s. and is thus a (nonnegative) r.v. and

Eg(ξ) =∫

g(ξ(ω)) dP(ω) ≥∫{ω:|ξ(ω)|≥a} g(ξ(ω)) dP(ω) ≥ g(a)P{|ξ| ≥ a},

since g(ξ(ω)) ≥ g(a) if |ξ(ω)| ≥ a. Hence (i) is proved, and the proof of (ii)is similar. �

For an inequality in the opposite direction see Ex. 9.18.

Corollary (i) If ξ is any r.v. and 0 0, then

P{|ξ| ≥ a} ≤ E|ξ|p/ap.

(ii) If ξ is a r.v. with Eξ2 < ∞, then for all a > 0,

P{|ξ – Eξ| ≥ a} ≤ var(ξ)a2 .

The inequality in (i) (which follows by taking g(x) = |x|p) is called “the”Markov Inequality. The case p = 2 in (i) is the well known ChebychevInequality.

The final inequality, which is sometimes very useful, concerns convexfunctions of a r.v. We recall that a function g defined on the real line isconvex if g(λx + (1 – λ)y) ≤ λg(x) + (1 – λ)g(y) for any x, y, 0 ≤ λ ≤ 1.A convex function is known to be continuous and thus Borel measurable.

Theorem 9.5.4 (Jensen’s Inequality) If ξ is a r.v. with E|ξ| < ∞ and g isa convex function on R such that E|g(ξ)| < ∞, then

g(Eξ) ≤ Eg(ξ).

Proof Since g is convex it is known that given any x0 there is a realnumber h = h(x0) such that g(x) – g(x0) ≥ (x – x0)h for all x. (Thismay be proved for example by showing that for all x < x0 < y we have

9.6 Inverse functions and probability transforms 203

(g(x0) – g(x))/(x0 – x) ≤ (g(y) – g(x0))/(y – x0) and taking h = supx<x0(g(x0) –

g(x))/(x0 – x).) Hence, putting x = ξ, x0 = Eξ we have, a.s.,

g(ξ) – g(Eξ) ≥ (ξ – Eξ)h (h = h(Eξ)).

The desired conclusion follows at once by taking expectations of both sidessince the expectation of the right hand side is zero. �

9.6 Inverse functions and probability transforms

If F is a strictly increasing continuous function on the real line (or a sub-interval thereof) and a = inf F(x), b = sup F(x), then its inverse functionF–1 is immediately defined for y ∈ (a, b) by F–1(y) = x, where x is theunique value such that F(x) = y. Then F–1(F(x)) = x for all x in the domainof F and F(F–1(y)) = y for all y in the domain (a, b), of F–1.

If F is strictly increasing but not everywhere continuous, F–1(y) is notthus defined in this way for all y ∈ (a, b) e.g. if x0 is a discontinuity point ofF and e.g. F(x0) > F(x0 – 0), there is no x for which F(x) = y if y ∈ (F(x0 –0), F(x0)). On the other hand, if F is continuous and nondecreasing but notstrictly increasing, there is an interval (x1, x2), on which F is constant, i.e.F(x) = y say for x1 < x < x2. Hence there is no unique x for which F(x) = y.

It is, however, useful to define an inverse function F–1 when F is nonde-creasing (or nonincreasing) but not necessarily strictly monotone or contin-uous, and this may be done in various equally natural ways to retain someof the useful properties valid for the strictly monotone continuous case. Weemploy the following (commonly used) form of definition.

Let F be a nondecreasing function defined on an interval and for y ∈(inf F(x), sup F(x)) define F–1(y) by

F–1(y) = inf{x : F(x) ≥ y}.

To see the meaning of this definition it is helpful to visualize its value atpoints y ∈ (F(x0 –0), F(x0 +0)) where F is discontinuous at x0 or at pointsy = F(x) for x such that F is constant in some neighborhood (x – ε, x + ε). Itis also helpful to determine the points x for which F–1(F(x)) � x, y suchthat F(F–1(y)) � y. The following results are examples of many usefulproperties of this form of the inverse function, the proofs of which maybe supplied as exercises by an interested reader.1

1 Or see e.g. [Resnick, Section 0.2] for an excellent detailed treatment.

Lemma 9.6.1 If F is a nondecreasing function on R with inverse F–1

defined as above, then

(i) (a) F–1 is nondecreasing and left-continuous (F–1(y – 0) = F–1(y))(b) F–1(F(x)) ≤ x(c) If F is strictly increasing from the left at x in the sense that F(a) <

F(x) whenever a < x, then F–1(F(x)) = x.(ii) If F is right-continuous then

(a) {x : F(x) ≥ y} is closed for each y(b) F(F–1(y)) ≥ y(c) F–1(y) ≤ x if and only if y ≤ F(x)(d) x < F–1(y) if and only if F(x) < y.

(iii) If for a given y, F is continuous at F–1(y) then F(F–1(y)) = y. Hence ifF is everywhere continuous then F(F–1(y)) = y for all y.

Results of this type are useful for transformation of r.v.’s to standard dis-tributions (“Probability transformations”). For example, it should be shownas an exercise (Ex. 9.4) that if ξ has a continuous distribution function F,then F(ξ) is a uniform r.v. and (Ex. 9.5) that if ξ is a uniform r.v. and Fsome d.f., then η = F–1(ξ) is a r.v. with d.f. F. Such results can be useful forsimulation and sometimes allow the proof of properties of general r.v.’s tobe done just under special assumptions such as uniformity, normality, etc.

We shall be interested later in the topic of “convergence in distribution”involving the convergence of d.f.’s Fn to a d.f. F at continuity points of thelatter. The following result (which may be proved as an exercise or refer-ence made to e.g. [Resnick]) involves the more general framework wherethe Fn’s need not be d.f.’s (and convergence at continuity points is thencommonly referred to as vague convergence – cf. Section 11.3).

Lemma 9.6.2 If Fn, n ≥ 1, F are nondecreasing and Fn(x)→ F(x) at allcontinuity points x of F, then F–1

n (y) → F–1(y) at all continuity points y ofF–1.

Exercises9.1 Let pj ≥ 0,

∑∞1 pj = 1, xj real, F(x) =

∑xj≤x pj. Show that ν(E) =

∑xj∈E pj

defines a measure on the Borel sets B and ν(E) = μF(E) for E ∈ B. (IfE = ∪∞1 Ek write χj = χE(xj), χjk = χEk (xj) so that ν(E) =

∑χjpj, ν(Ek) =∑

j χjkpj.) Thus for given pj ≥ 0,∑

pj = 1, there is a discrete r.v. ξ withP{ξ = xj} = pj and P{ξ ∈ E} =

∑xj∈E pj.

Exercises 205

9.2 Let F be a d.f. and F(x) =∫ x

–∞ f (t) dt where f ∈L1(–∞,∞). (It is not initiallyassumed that f ≥ 0.) Define the finite signed measure ν(E) =

∫E f dx. Show

that ν(E)=μF(E) on the Borel sets B. (Hint: Use Lemma 5.2.4.) Hence showthat f ≥ 0 a.e.

9.3 Let Ω be the unit interval, F its Borel subsets, and P Lebesgue measure onF . Let ξ(ω) = ω, η(ω) = 1 – ω. Show that ξ, η have the same distributionbut are not identical. In fact P(ξ � η) = 1.

9.4 Let ξ be a r.v. whose d.f. F is continuous. Let η = F(ξ) (i.e. η(ω) = F(ξ(ω))).Show that η is uniformly distributed on (0, 1), i.e. that its d.f. G is given byG(x) = 0 for x < 0, G(x) = x for 0 ≤ x ≤ 1 and G(x) = 1 for x > 1. What if Fis not continuous? (For simplicity assume F has just one jump.)

9.5 Let F be any d.f. and define its inverse F–1 as in Section 9.6. Show that if ξis uniformly distributed over (0, 1), then η = F–1(ξ) has d.f. F.

9.6 If ξ, η are discrete r.v.’s, is ξ + η discrete? What about ξη and ξ/η? Whathappens to these combinations if ξ is discrete and η continuous?

9.7 Prove Lemma 9.3.1. (Hints: For (i) it may be noted that (a) every ξ ∈ C isσ(C)-measurable and (b) if every ξ ∈ C is G-measurable (for some fixedσ-field G) then G ⊃ σ(ξ), each ξ ∈ C. Clearly in (ii) the σ-field on the leftcontains that on the right. However, each ξλ is measurable with respect to theσ-field on the right, which therefore contains the smallest σ-field yieldingmeasurability of all ξλ, viz. σ(C).)

9.8 In Theorem 9.3.3, the ξi are all defined on the same subset of Ω (i.e. whereξ is defined). If we start with mappings ξ1, . . . , ξn defined (and finite a.s.)on possibly different subsets D1, . . . , Dn (with P(Di) = 1) we may defineξ = (ξ1, . . . , ξn) on D = ∩n

1Di. If ξ1, . . . , ξn are each r.v.’s then ξ is a randomvector, as in the theorem. Show that the converse may not be true, that is, ifξ is a random vector, it is not necessarily true that the ξi are r.v.’s (it is true ifDi are measurable – e.g. if P is complete).

9.9 Let F be an absolutely continuous d.f. on Rn (with density f (x1, . . . , xn)) forr.v.’s ξ1, . . . , ξn. Show that the r.v.’s ξ1, . . . , ξk (k < n) have an absolutelycontinuous distribution and find their joint p.d.f.

9.10 The concept of a “continuous singular” d.f. or probability measure in R2 ismore common than in R. For example, let F be any continuous d.f. on R.For any Borel set B in R2 define μ(B) = μF(B0) where B0 is the section of Bdefined by y = 0. Show that μ has no point atoms but is singular with respectto two-dimensional Lebesgue measure.

9.11 More generally suppose the C is a simple curve in the plane given para-metrically as x = x(s), y = y(s), where x and y are (Borel) measurable1-1 functions of s. If μ is a probability measure on (R,B) we may definea probability measure on (R2,B2) by ν(E) = μT–1(E) where T is the mea-surable transformation Ts = (x(s), y(s)). The measure ν is singular with re-spect to Lebesgue measure and has no atoms if μ has no atoms. If s is dis-tance along the curve, ν(E) may be regarded as the μ-measure of E ∩ C

considered as a linear set with origin at s = 0. For example, if C is the di-agonal x = y we have x(s) = s/

√2 = y(s). Write down the two-dimensional

d.f. F(x, y) (= (P(–∞, x]× (–∞, y])) corresponding to ν in terms of the d.f. Gcorresponding to μ. Note that F(x, y) is continuous (but μF is not absolutelycontinuous with respect to Lebesgue measure).

9.12 Let ξ be discrete with P{ξ = xn} = pn. Show that E|ξ| =∑

pn|xn| and ifE|ξ| < ∞ then Eξ =

∑pnxn.

9.13 Let ξ be a r.v. with E|ξ|n < ∞ for some positive integer n. Express the nth cen-tral moment for ξ in terms of the first n ordinary moments, and conversely.

9.14 Let ξ be a r.v. with E|ξ|<∞ and let En be any sequence of sets with P(En)→0. Show that E(ξχEn ) → 0 (cf. Theorem 4.5.3). Show in particular thatE(ξχ(|ξ|>n))→ 0.

9.15 Let ξ be a r.v. on (Ω,F , P) and define En = {ω : |ξ(ω)| ≥ n}. Show that

∞∑n=1

P(En) ≤ E|ξ| ≤ 1 +∞∑

n=1

P(En)

and hence that E|ξ| < ∞ if and only if∑∞

n=1 P(En) < ∞. If ξ takes onlypositive integer values, show that Eξ =

∑∞n=1 P(En). (Hint: Let Fn = {ω : n ≤

|ξ(ω)| < n + 1} and note that∑∞

n=1 nP(Fn) =∑∞

1 P(En).)9.16 If ξ is a nonnegative r.v. with d.f. F show that

Eξ =∫ ∞

0 [1 – F(x)] dx.

(Hint: Use Fubini’s Theorem.) If ξ is a real-valued r.v. with d.f. F show that

E|ξ| =∫ 0

–∞ F(x) dx +∫ ∞

0 [1 – F(x)] dx

and thus E|ξ| < ∞ if and only if∫ 0

–∞ F(x) dx < ∞ and∫ ∞

0 [1 – F(x)] dx < ∞,in which case

Eξ =∫ ∞

0 [1 – F(x)] dx –∫ 0

–∞ F(x) dx.

9.17 Let F be any d.f. Show that, for any h > 0,∫ ∞–∞(F(x + h) – F(x)) dx = h.

Why does this not contradict the obvious statement that∫ ∞

–∞ F(x + h) dx =∫ ∞–∞ F(x) dx?

9.18 Let g be a nonnegative bounded function on R, and ξ a r.v. If g is even andnondecreasing on 0 < x < ∞, show that

P{|ξ| ≥ a} ≥ E{g(ξ) – g(a)}/M

for any M < ∞ such that g(ξ(ω)) ≤ M a.s. (e.g. M = sup g(x)). If g is insteadnondecreasing on (–∞,∞) show that the same inequality holds with ξ insteadof |ξ| on the left.

Exercises 207

9.19 Let ξ, η be r.v.’s with E|ξ|p < ∞, E|η|p < ∞. Show that for p > 0, E|ξ + η|p ≤cp{E|ξ|p + E|η|p} where cp = 1 if 0 1. (Hint:(1 + x)p ≤ cp(1 + xp) for x ≥ 0. Note equality when x = 0 for p ≤ 1, and x = 1for p > 1 and consider derivatives.)

9.20 Show that the covariance γ of two r.v.’s ξ1, ξ2 satisfies |γ| ≤ σ1σ2 where σi isthe standard deviation of ξi, i = 1, 2, and hence that the correlation ρ satisfies|ρ| ≤ 1. The parameters γ and especially ρ are regarded as simple measuresof dependence of ξ1, ξ2. What is the value of ρ if ξ1 = aξ2 (a) for some a > 0,(b) for a < 0?

9.21 Write down the covariance matrix Λ for a pair of r.v.’s ξ1, ξ2 in terms of theirmeans μ1, μ2, standard deviations σ1,σ2 and correlation ρ. Show that Λ isnonsingular if |ρ| < 1 and then obtain its inverse. Hence write down the jointp.d.f. of ξ1 and ξ2 in terms of μi, σi, i = 1, 2, ρ, when ξ1 and ξ2 are assumedto be jointly normal.

9.22 If ξ1, ξ2, . . . , ξn are jointly normal, means μi, 1 ≤ i ≤ n, nonsingular covari-ance matrix Λ, show that the members of any subgroup (e.g. ξ1, ξ2, . . . , ξk,k ≤ n) are jointly normal, writing down their covariance matrix in termsof Λ.

10

Independence

10.1 Independent events and classes

Two events A, B are termed independent if P(A ∩ B) = P(A) · P(B). Phys-ically this means (as can be checked by interpreting probabilities as longterm frequencies) that the proportion of those times A occurs, for which Balso occurs in many repetitions of the experiment E, is ultimately the sameas the proportion of times B occurs in all. That is, roughly “knowledgeof the occurrence or not of A does not affect the probability of B” (andconversely). We are, of course, interested primarily in the mathematicaldefinition given, and its consequences.

The definition of independence can be usefully extended to a class ofevents. We say that A is a class of independent events (or that the eventsof a class A are independent) if for every finite subclass of distinct eventsA1, A2, . . . , An of A, we have P(∩n

1Ai) =∏n

1 P(Ai). Note that it is not, ingeneral, sufficient for this that the events of A be pairwise independent(see Ex. 10.1).

A more general notion concerns a family of independent classes. If Aλ

is a class of events for each λ in some index setΛ, {Aλ : λ ∈ Λ} is said to bea family of independent classes of events (or that the classes {Aλ : λ ∈ Λ}are independent), if for every choice of one member Aλ from eachAλ, theevents {Aλ : λ ∈ Λ} are independent.

Note that a classA of independent events may be regarded as a family ofindependent classes of events, where the classes of the family each consistof just one event of A. This viewpoint is sometimes useful. Note also thatwhile the index set Λ may be infinite (of any order) a family A = {Aλ :λ ∈ Λ} is independent if and only if every finite subfamily {Aλ1 , . . . ,Aλn}is independent (for distinct λi). Thus it usually suffices to consider finitefamilies.

Remark IfA1, . . . ,An are classes of events such that eachAi contains aset Ci with P(Ci) = 1 (e.g. Ci = Ω) then to show that A1,A2, . . . ,An are

208

10.1 Independent events and classes 209

independent classes it is only necessary to show that P(∩n1Ai) =

∏n1 P(Ai)

for this one n, and all choices of Ai ∈ Ai, 1 ≤ i ≤ n. For this relation thenfollows at once for subfamilies – e.g.

n–1∏1

P(Ai) =n–1∏

1

P(Ai)P(Cn) = P((∩n–1

1 Ai) ∩ Cn

)

= P(∩n–1

1 Ai

)– P

((∩n–1

1 Ai) ∩ Ccn

)= P

(∩n–1

1 Ai

)since P(Cc

n) = 0.

A family of independent classes may often be enlarged without losingindependence. The following is a small result in this direction – its proof isleft as an easy exercise (cf. Ex. 10.3).

Lemma 10.1.1 Let {Aλ : λ ∈ Λ} be independent classes of events, andA*

λ = Aλ ∪ Gλ where, for each λ, Gλ is any class of sets E such thatP(E) = 0 or 1. Then {A*

λ : λ ∈ Λ} are independent classes.

The next result is somewhat more sophisticated and very useful.

Theorem 10.1.2 Let {Aλ : λ ∈ Λ} be independent classes of events, andsuch that eachAλ is closed under finite intersections. Let Bλ be the σ-fieldgenerated by Aλ, Bλ = σ(Aλ). Then {Bλ : λ ∈ Λ} are also independentclasses.

Proof Define A*λ = Aλ ∪ {Ω}. Then by Lemma 10.1.1 {A*

λ : λ ∈ Λ}are independent classes, and clearly Bλ is also the σ-field generated byA*

λ. Thus we assume without loss of generality that Ω ∈ Aλ for each λ.In accordance with a remark above, it is sufficient to show that any finitesubfamily {Bλ1 ,Bλ2 , . . . ,Bλn} (with distinct λi), are independent classes. Ifit is shown that {Bλ1 ,Aλ2 , . . . ,Aλn} are independent classes, the result willthen follow inductively.

Let G be the class of sets E ∈ F such that P(E ∩ A2 ∩ . . . ∩ An) =P(E)P(A2) . . .P(An) for all Ai ∈ Aλi (i = 2, . . . , n). If E ∈ G, F ∈ G andE ⊃ F, Ai ∈ Aλi (i = 2, . . . , n),

P{(E – F) ∩ A2 ∩ . . . ∩ An}= P(E ∩ A2 ∩ . . . ∩ An) – P(F ∩ A2 ∩ . . . ∩ An)

= P(E)P(A2) . . .P(An) – P(F)P(A2) . . .P(An)

= P(E – F)P(A2) . . .P(An).

210 Independence

Thus E – F ∈ G and G is therefore closed under proper differences. Sim-ilarly it is easily checked that G is closed under countable disjoint unionsso that G is a D-class. But G ⊃ Aλ1 which is closed under intersectionsand hence by Theorem 1.8.5 (Corollary) G contains the σ-ring generatedby Aλ1 . This σ-ring is the σ-field Bλ1 since Ω ∈ Aλ1 and hence G ⊃ Bλ1 .Hence (using the Remark preceding Lemma 10.1.1) {Bλ1 ,Aλ2 , . . . ,Aλn} areindependent classes and, as noted, this is sufficient for the result of thetheorem. �

If a class A of independent events is regarded as a family of indepen-dent classes in the manner described above (i.e. each class consisting ofone member of A) we may, according to the theorem, enlarge each (1-member) class {A} to the σ-field it generates, viz. {A, Ac,Ω, ∅}. Thus theseclasses constitute, for A ∈ A, a family of independent classes. A class ofindependent events may now be obtained by selecting one event from each{A, Ac,Ω, ∅}. Thus the following corollary to Theorem 10.1.2 holds.

Corollary IfA is a class of independent events, and if some of the eventsofA are replaced by their complements, then the resulting class is again aclass of independent events.

This result can, of course, be shown “by hand” from the definition. Forexample, if A, B are independent then it follows directly that so are A, Bc

(which should be shown as an exercise).The final result of this section is a useful extension of Theorem 10.1.2

involving the “grouping” of a family of independent classes. In this, by apartition of the set Λ we mean any class of disjoint sets {Λγ : γ ∈ Γ}with ∪γ∈ΓΛγ = Λ. If {Aλ : λ ∈ Λ} are independent classes, clearly the“grouped classes” {∪λ∈ΛγAλ : γ ∈ Γ} are independent. The following resultshows that the same is true for Bγ = σ(∪λ∈ΛγAλ), γ ∈ Γ provided eachAλ is closed under finite intersections. This does not follow immediatelyfrom Theorem 10.1.2 since ∪λ∈ΛγAλ need not be closed under intersections,but the classes may be expanded to have this closure property and allowapplication of the theorem.

Theorem 10.1.3 Let {Aλ : λ ∈ Λ} be independent classes, each being as-sumed to be closed under finite intersections. Let {Λγ :γ∈Γ} be a partitionof Λ, and Bγ = σ{∪λ∈ΛγAλ}. Then {Bγ : γ ∈ Γ} are independent classes.

Proof For each γ ∈ Γ let Gγ denote the class of all sets of the formA1 ∩ A2 ∩ . . .∩ An, for Ai ∈ Aλi , where λ1, . . . , λn are any distinct membersof Λγ (n = 1, 2, . . .). Gγ is closed under finite intersections since each

10.2 Independent random elements 211

Aλ is so closed. Further {Gγ : γ ∈ Γ} are independent classes (which iseasily checked from the definition of the sets of Gγ). Hence, by Theorem10.1.2, the σ-fields {σ(Gγ) : γ ∈ Γ} are independent classes. But clearly∪λ∈ΛγAλ ⊂ Gγ so that Bγ ⊂ σ(Gγ) and hence {Bγ : γ ∈ Γ} are independentclasses, as required. �

10.2 Independent random elements

We will be primarily concerned with the concept of independence in thecontext of random variables. However, the definition and results of thissection will apply more generally to arbitrary random elements, since thisextra generality can be useful.

Specifically, suppose that for each λ in an index set Λ, ξλ is a randomelement on a fixed probability space (Ω,F , P), with values in a measurablespace (Xλ,Sλ) – which may change with λ. (If ξλ is a r.v., of course, Xλ =R

*, Sλ = B*.) If the classes {σ(ξλ) : λ ∈ Λ} are independent, then {ξλ :λ ∈Λ} is said to be a family of independent r.e.’s or the r.e.’s {ξλ : λ ∈ Λ}are independent.

Since σ(ξλ) = σ(ξ–1λ Sλ) = σ{ξ–1

λ B : B ∈ Sλ} and ξ–1λ Sλ is closed un-

der intersections it follows at once from Theorem 10.1.2 that the followingcriterion holds – facilitating the verification of independence of r.e.’s.

Theorem 10.2.1 The r.e.’s {ξλ : λ ∈ Λ} are independent iff {ξ–1λ Sλ :

λ ∈ Λ} are independent classes, i.e. iff for each n = 1, 2, . . ., distinctλi ∈Λ, Bi ∈ Sλi , 1 ≤ i ≤ n

P(∩n

1ξ–1λi

Bi

)=

n∏1

P(ξ–1λi

Bi

).

Indeed these conclusions hold if each Sλ is replaced by Gλ where Gλ is anyclass of subsets of Xλ, closed under intersections and such that S(Gλ) = Sλfor each λ.

Proof The main conclusion follows as noted prior to the statement of thetheorem. The final conclusion follows by exactly the same pattern (see Ex.10.9). �

The above definition is readily extended to include independence of fam-ilies of r.e.’s. Specifically, let Cλ be a family of random elements for eachλ in an index set Λ. Then if the σ-fields {σ(Cλ) : λ ∈ Λ} are independentclasses of events, we shall say that {Cλ : λ ∈ Λ} are independent families ofrandom elements, or “the classes Cλ of r.e.’s are independent for λ ∈ Λ”.

212 Independence

Thus we have the notions of independence for random elements, andfor families of r.e.’s, parallel to the corresponding notions for events andclasses of events. (However, see Ex. 10.10.) Theorem 10.1.3 has the follow-ing obvious (and useful) analog for independent random elements.

Theorem 10.2.2 Let {Cλ : λ ∈ Λ} be independent families of randomelements on a space (Ω,F , P), let {Λγ : γ ∈ Γ} be a partition of Λ, andwriteHγ = ∪λ∈ΛγCλ. Then {Hγ : γ ∈ Γ} are independent families of randomelements.

Proof From Lemma 9.3.1 (iii) we have

σ(Hγ) = σ(∪λ∈Λγσ(Cλ)).

But since {σ(Cλ) : λ ∈ Λ} are independent classes (each closed underintersections), it follows from Theorem 10.1.3 that {σ(Hγ) : γ ∈ Γ} are alsoindependent classes. �

The following result gives a useful characterization of independence ofr.e.’s in terms of product forms for the distributions of finite subfamilies.This is especially important for the case of r.v.’s considered in the nextsection.

Theorem 10.2.3 Let ξ1, ξ2, . . . , ξn be r.e.’s on (Ω,F , P) with values inmeasurable spaces (Xi,Si), 1 ≤ i ≤ n. Then ξ = (ξ1, ξ2, . . . , ξn) is a r.e. on(Ω,F , P) with values in (

∏n1 Xi,

∏n1 Si), and ξ1, . . . , ξn are independent iff

Pξ–1 = Pξ–11 × Pξ–1

2 × . . . × Pξ–1n

⎛⎜⎜⎜⎜⎜⎝=n∏1

Pξ–1i

⎞⎟⎟⎟⎟⎟⎠i.e. the distribution of ξ is the product (probability) measure having theindividual distributions as components.

Thus, for a general index set Λ, r.e.’s (ξλ : λ ∈ Λ) are independent iff thedistribution of ξ = (ξλ1 , . . . , ξλn ) factors in the above manner for each n andchoice of distinct λi.

Proof That ξ = (ξ1, . . . , ξn) is a r.e. follows simply (as in Theorem 9.3.3for the special case of random variables and vectors) and

ξ–1(B1×B2 × . . . ×Bn) = ∩n1 ξ

–1i (Bi)

for any Bi ∈ Si, 1 ≤ i ≤ n. Thus if ξi are independent, Pξ–1(B1 × B2 ×. . . × Bn) =

∏n1 Pξ–1

i Bi so that Pξ–1 and the product measure∏n

1 Pξ–1i agree

10.3 Independent random variables 213

on measurable rectangles and hence on all sets of∏n

1 Si. Conversely ifPξ–1 =

∏n1 Pξ–1

i

P(∩n

1 ξ–1i Bi

)= Pξ–1(B1×B2 × . . . ×Bn) =

n∏1

Pξ–1i (Bi) .

As noted the same relation is automatic for subclasses of (ξ1, ξ2,. . . , ξn) by writing appropriate Bi = Xi, so that independence of (ξ1, . . . , ξn)follows. �

10.3 Independent random variables

The independence properties developed in the last section, of course, applyin particular to random variables, as will be seen in the following results.For simplicity these are mainly stated for finite families since the results forinfinite families involve just finite subfamilies.

Theorem 10.3.1 The following conditions are each necessary andsufficient for independence of r.v.’s ξ1, ξ2, . . . , ξn (on a probability space(Ω,F , P)).

(i) P(∩ni=1 ξ

–1i Bi) =

∏n1 P(ξ–1

i Bi) for every choice of extended Borel setsB1, . . . , Bn.

(ii) (i) holds for all choices of (ordinary) Borel sets B1, . . . , Bn (in placeof all extended Borel sets).

(iii) The distribution Pξ–1 of the random vector ξ = (ξ1, ξ2, . . . , ξn) on(Rn,Bn) (or (R*n,B*n)) is the product of the distributions Pξ–1

i on(R,B) (or (R*,B*)), i.e.

Pξ–1 = Pξ–11 × Pξ–1

2 × . . . × Pξ–1n .

(iv) The joint d.f. F1,...,n(x1, . . . , xn) of ξ1, . . . , ξn factors as∏n

1 Fi(xi), whereFi is the d.f. of ξi.

Proof Independence of (ξ1, ξ2, . . . , ξn) is readily seen to be equivalent toeach of (i)–(iii) using Theorem 10.2.3. (iii) at once implies (iv), and that(iv) implies e.g. (iii) is readily checked. �

The next result is a useful application of Theorem 10.2.2.

Theorem 10.3.2 Let (ξ11, . . . , ξ1n1 , ξ21, . . . , ξ2n2 , ξ31, . . .) be independentr.v.’s on a space (Ω,F , P). Define random vectors ξ1, ξ2, . . . by ξi = (ξi1, ξi2,. . . , ξini ). Then (ξ1, ξ2, . . .) are independent random vectors. Moreover if φi

214 Independence

is a finite-valued measurable function on (R*ni ,B*ni ) for i = 1, 2, . . ., andηi = φi(ξi), then (η1, η2, . . .) are independent r.v.’s.

Proof By Theorem 10.2.2 {(ξi1, ξi2, . . . , ξini ) : i = 1, 2, . . .} are indepen-dent families of r.v.’s so that {σ(ξi1, . . . , ξini ) : i = 1, 2, . . .} are indepen-dent classes of events. But, by Lemma 9.3.2, σ(ξi) = σ(ξi1, . . . , ξini ) so that(ξ1, ξ2, . . .) are independent random vectors, as required.

Further, a typical generating set of σ(ηi) is η–1i B for B ∈ B. But η–1

i B =ξ–1

i (φ–1i B) ∈ σ(ξi) so that σ(ηi) ⊂ σ(ξi). Since {σ(ξi) : i = 1, 2, . . .} are

independent classes, so are the classes {σ(ηi), i = 1, 2, . . .}, i.e. (η1, η2, . . .)are independent r.v.’s, completing the proof. �

Corollary The theorem remains true if the φi are defined only on (mea-surable) subsets Di ⊂ R*ni such that ξi ∈ Di a.s. (so that ηi may be definedat fewerω-points than ξi – though still a.s.). In particular the theorem holdsif Di = Rni i.e. if the φi are defined for finite values of their arguments only– the case of practical importance.

Proof Define φ*i = φi on (the measurable set) Di and zero on R*ni – Di.

Then if η*i = φ*

i ξi we have η*i = ηi a.s. Since (η*

1, η*2, . . .) are independent by

the theorem, so are (η1, η2, . . .) (Ex. 10.11). �

The next result concerns the existence of a sequence of independent r.v.’swith given d.f.’s.

Theorem 10.3.3 Let Fi be a d.f. for each i = 1, 2, . . . . Then there is aprobability space (Ω,F , P) and a sequence (ξ1, ξ2, . . .) of independent r.v.’ssuch that ξi has d.f. Fi.

Proof Write μi for the Lebesgue–Stieltjes (probability) measure on (R,B)corresponding to Fi. Then by Theorem 7.10.4, there exists a probabilitymeasure P on (R∞,B∞) such that for any n, Borel sets B1, B2, . . . , Bn,

P(B1×B2 × . . . ×Bn×R ×R × . . .) =n∏1

μi(Bi).

Write (Ω,F , P) for the probability space (R∞,B∞, P) and define ξ1, ξ2, . . .on this space by ξiω = xi when ω = (x1, x2, x3, . . .). Each ξi is clearly a r.v.and for Borel sets B1, B2, . . . , Bn

P{∩n1ξ

–1i (Bi)} = P(B1×B2 × . . . ×Bn×R × R . . .) =

n∏1

μi(Bi).

In particular, B1 = B2 = · · · = Bn–1 = R gives P(ξ–1n Bn) = μn(Bn) for each

n so that (writing i for n) P(∩ni=1ξ

–1i Bi) =

∏ni=1 P(ξ–1

i Bi) and hence the ξi are

10.3 Independent random variables 215

independent. Also Pξ–1n (–∞, x] = μn(–∞, x] = Fn(x) so that ξn has d.f. Fn as

required. �

Note that a more general result of this kind, where the ξi need not beindependent, will be indicated in Chapter 15 for Stochastic Process Theory.

If ξ1, ξ2 are r.v.’s in L2(Ω,F , P) then ξ1ξ2 ∈ L1(Ω,F , P) (i.e. E|ξ1ξ2| <∞). This is not the case in general if we just assume that ξ1 and ξ2 eachbelong to L1. However, it is an interesting and important fact that it is truefor independent r.v.’s, and then E(ξ1ξ2) = Eξ1· Eξ2. This will follow as acorollary from the following general result.

Theorem 10.3.4 Let ξ1, ξ2 be independent r.v.’s with d.f.’s F1, F2 and leth be a finite measurable function on (R2,B2). Then h(ξ1, ξ2) is a r.v. and

Eh(ξ1, ξ2) =∫Ω

∫Ω

h (ξ1(ω1), ξ2(ω2)) dP(ω1) dP(ω2)

=∫R

∫R

h(x1, x2) dF1(x1) dF2(x2),

whenever h is nonnegative, or E|h(ξ1, ξ2)| < ∞.

Proof It is clear that h(ξ1, ξ2) is a r.v. Writing ξ = (ξ1, ξ2) we have

Eh(ξ1, ξ2) =∫Ω

h (ξ(ω)) dP(ω) =∫R2 h(x1, x2) dPξ–1(x1, x2)

=∫R2 h(x1, x2) d(Pξ–1

1 ×Pξ–12 )

by Theorem 4.6.1 and Theorem 10.3.1 (iii). Fubini’s Theorem (the appro-priate version according as h is nonnegative, or h(ξ1, ξ2) ∈ L1) now givesthe repeated integral

Eh(ξ1, ξ2) =∫R

∫R

h(x1, x2) dPξ–11 (x1) dPξ–1

2 (x2)

which may be written either as∫R

∫R

h(x1, x2) dF1(x1) dF2(x2) or, byTheorem 4.6.1 applied in turn to each of ξ1, ξ2, as

∫Ω

∫Ω

h(ξ1(ω1),ξ2(ω2)) dP(ω1) dP(ω2). Hence the result follows. �

Theorem 10.3.5 Let ξ1, . . . , ξn be independent r.v.’s with E|ξi| < ∞ foreach i. Then E|ξ1ξ2 . . . ξn| < ∞ and E(ξ1ξ2 . . . ξn) =

∏n1 Eξi.

Proof Since by Theorem 10.3.2, ξ1 and (ξ2ξ3 . . . ξn) are independent theresult will follow inductively from that for n = 2. The n = 2 result followsat once from Theorem 10.3.4 first with h(x1, x2) = |x1x2| to give

E|ξ1ξ2| =∫Ω

∫Ω|ξ1(ω1)||ξ2(ω2)| dP(ω1) dP(ω2) = E|ξ1|E|ξ2| < ∞,

and then with h(x1, x2) = x1x2 to give E(ξ1ξ2) = Eξ1Eξ2. �

216 Independence

Corollary If ξ1, . . . , ξn are independent r.v.’s with Eξ2i < ∞ for each i,

then the variance of (ξ1 + ξ2 + · · · + ξn) is given by

var(ξ1 + ξ2 + · · · + ξn) = var(ξ1) + var(ξ2) + · · · + var(ξn).

The simple proof is left as an exercise.

10.4 Addition of independent random variables

We next obtain the distribution and d.f. of the sum of independent r.v.’s.

Theorem 10.4.1 Let ξ1, ξ2 be independent r.v.’s with distributionsPξ–1

1 = π1, Pξ–12 = π2. Then

(i) The distribution π of ξ1 + ξ2 is given for Borel sets B (writing B – y ={x – y : x ∈ B}) by

π(B) =∫ ∞

–∞ π1(B – y) dπ2(y) =∫ ∞

–∞ π2(B – y) dπ1(y)

= π1 ∗ π2(B),

where π1 ∗ π2 is called the convolution of the measures π1, π2 (cf.Section 7.6).

(ii) In particular the d.f. F of ξ1 + ξ2 is given in terms of the d.f.’s F1, F2

of ξ1, ξ2 by

F(x) =∫ ∞

–∞ F1(x – y) dF2(y) =∫ ∞

–∞ F2(x – y) dF1(y)

= F1 ∗ F2(x)

where F1 ∗ F2 is the (Stieltjes) convolution of F1 and F2.(iii) If F1 is absolutely continuous with density f1, F is then absolutely

continuous with density f (x) =∫

f1(x – y) dF2(y).(iv) If also F2 is absolutely continuous (with density f2) then

f (x) =∫ ∞

–∞ f1(x – y)f2(y) dy =∫ ∞

–∞ f2(x – y)f1(y) dy

= f1 ∗ f2(x),

i.e. the convolution of f1 and f2 (cf. Section 7.6).

Proof If φ(x1, x2) = x1 + x2 (measurable) and ξ = (ξ1, ξ2), we have

π(B) = P{ξ1 + ξ2 ∈ B} = P{φξ ∈ B}= P{ξ ∈ φ–1B} = Eχφ–1B(ξ)

=∫R

∫Rχφ–1B(x1, x2) dπ1(x1) dπ2(x2)

10.5 Borel–Cantelli Lemma and zero-one law 217

by Theorem 10.3.4. The integrand is one if x1 + x2 ∈ B, i.e. if x1 ∈ B – x2,and zero otherwise, so that the inner integral is π1(B – x2), measurableby Fubini’s Theorem giving the first result for π(B). The second followssimilarly. Thus (i) holds.

The expressions for F(x) in (ii) follow at once by writing B = (–∞, x],where e.g. π1(B – y) = F1(x – y) etc.

If F1 is absolutely continuous with density f1 we have

F(x) =∫ ∞

–∞ F1(x – y) dF2(y) =∫ ∞

–∞

{∫ x–y

–∞ f1(t) dt}

dF2(y)

=∫ ∞

–∞

{∫ x

–∞ f1(u – y) du}

dF2(y)

by the transformation t = u – y for fixed y in the inner integral. Thus

F(x) =∫ x

–∞

{∫ ∞–∞ f1(u – y) dF2(y)

}du

by Fubini’s Theorem for nonnegative functions. That is F(x) =∫ x

–∞ f (u) duwhere f (u) =

∫ ∞–∞ f1(u – y) dF2(y). It is easily seen that the (nonnegative)

function f is in L1(–∞,∞) (Lebesgue measure) and thus provides a densityfor F. Hence (iii) follows, and (iv) is immediate from (iii). �

10.5 Borel–Cantelli Lemma and zero-one law

We recall that if An is any sequence of subsets of the space Ω, then A =limAn = ∩∞n=1 ∪∞m=n Am is the set of all ω ∈ Ω which belong to An forinfinitely many values of n.

If An are measurable sets (i.e. events), so is A. In intuitive terms, A occursif infinitely many of the An occur (simultaneously) when the underlying ex-periment is performed. The following result gives a simple but very usefulcondition under which P(A) = 0, i.e. with probability one only a finitenumber of An occur.

Theorem 10.5.1 (Borel–Cantelli Lemma) Let {An} be a sequence ofevents of the probability space (Ω,F , P), and A = limAn. If

∑∞n=1 P(An)<∞,

then P(A) = 0.

Proof P(A) = P(∩∞n=1 ∪∞m=n Am) ≤ P(∪∞m=nAm) for any n = 1, 2, . . . . HenceP(A) ≤ ∑∞

m=n P(Am) for all n, and this tends to zero as n→ ∞ since∑

P(An)converges. Thus P(A) = 0. �

The converse result is not true in general (Ex. 10.12). However, it is trueif the events An form an independent sequence. Indeed, rather more is thentrue as the following result shows.

218 Independence

Theorem 10.5.2 (Borel–Cantelli Lemma for Independent Events) Let {An}be an independent sequence of events on (Ω,F , P), and A = lim An. ThenP(A) is zero or one, according as

∑∞1 P(An) < ∞ or

∑∞1 P(An) = ∞.

Proof Since P(A) = 0 when∑

P(An) < ∞ it will be sufficient to showthat P(A) = 1 when

∑P(An) = ∞. Suppose, then, that

∑P(An) = ∞. Then

P(A) = P(∩∞n=1 ∪∞m=n Am) = limn→∞

P(∪∞m=nAm)

= limn→∞

limk→∞

P(∪km=nAm).

Now

P((∪km=nAm)c) = P(∩k

m=nAcm) =

k∏m=n

P(Acm),

since the events Acn, Ac

n+1, . . . , Ack are independent by Theorem 10.1.2

(Corollary). Thus

P((∪km=nAm)c) =

k∏m=n

(1 – P(Am)) ≤k∏

m=n

e–P(Am)

(by using 1 – x ≤ e–x for all 0 ≤ x ≤ 1). The latter term ise–

∑km=n P(Am) which tends to zero as k → ∞ since

∑P(Am) = ∞. Thus

limk→∞ P(∪km=nAm) = 1, giving P(A) = 1, as required. �

Note (though not shown here) that this result is in fact true if the An

are only assumed to be pairwise independent. (See, for example, Theorem4.3.2 of [Chung].)

The above theorem states in particular that a certain event A must haveprobability zero or one. Results of such a kind are therefore often referredto as “zero-one laws”. A particularly well known result of this type is the“Kolmogorov Zero-One Law”, which is shown next. Theorem 10.5.2 is anexample of a zero-one law, together with necessary and sufficient condi-tions for the two alternatives.

First we require some general terminology. If Fn is a sequence of sub-σ-fields of F , then the σ-fields Gn = σ(∪∞k=n+1Fk) form a decreasing sequence(Gn ⊃ Gn+1) whose intersection ∩∞n=0Gn = F∞ (clearly a σ-field) is calledthe tail σ-field of the sequence Fn. Sets of F∞ are called tail events andF∞-measurable functions are called tail functions (or tail r.v.’s if definedand finite a.s.).

Theorem 10.5.3 (Kolmogorov Zero-One Law) Let (Ω,F , P) be a proba-bility space. If Fn is a sequence of independent sub-σ-fields of F , then eachtail event has probability zero or one, and each tail r.v. is constant a.s.

Exercises 219

Proof Write Hn = σ(∪n1Fi) and, as above, Gn = σ(∪∞k=n+1Fk). Then since

eachFi is closed under intersections, it follows simply from Theorem 10.1.3that Hn and Gn are independent classes. Since Gn ⊃ F∞, it follows thatHn and F∞ are independent, from which it also follows at once that F∞and ∪∞1 Hn are independent. Now ∪∞1 Hn is a field (note that Hn is nonde-creasing), and hence closed under intersections, so that by Theorem 10.1.2,F∞ and σ(∪∞1 Hn) are independent. But clearly σ(∪∞1 Hn) ⊃ σ(∪∞1 Fn) =G0 ⊃ F∞, so that {F∞,F∞} are independent. Thus if A ∈ F∞ we must haveP(A) = P(A ∩ A) = (P(A))2, so that P(A) is zero or one, asrequired.

Finally suppose that ξ is a tail r.v. with d.f. F. For any x, {ω : ξ(ω)≤x} isa tail event and hence has probability zero or one, i.e. F(x) = 0 or 1. SinceF is not identically either zero or one it must have a unit jump at a finitepoint a (= inf(x : F(x) = 1)) so that P{ξ = a} = 1. �

Corollary 1 Let {ξn : n = 1, 2, . . .} be a sequence of independent r.v.’s anddefine the tail σ-field F∞ = ∩∞n=0σ(ξn+1, ξn+2, . . .). Then each tail event hasprobability zero or one, and each tail r.v. is constant a.s.

Proof Identify Fn with σ(ξn) and hence Gn = σ(∪∞k=n+1σ(ξk)) = σ(ξn+1,ξn+2, . . .). �

Corollary 2 If {Cn : n = 1, 2, . . .} is a sequence of independent classes ofr.v.’s, the conclusion of the theorem holds, with tail σ-field F∞ =∩∞n=0σ(∪∞k=n+1Ck).

Corollary 2, which follows by identifying Fn with σ(Cn), and hence Gn

with σ(∪∞k=n+1σ(Ck)) = σ(∪∞k=n+1Ck) includes a zero-one law for an inde-pendent sequence of stochastic processes.

Exercises10.1 Let Ω consist of the integers {1, 2, . . . , 9} with probabilities 1/9 each. Show

that the events {1, 2, 3}, {1, 4, 5}, {2, 4, 6} are pairwise independent, but notindependent as a class.

10.2 Construct an example of three events A, B, C which are not independent butwhich satisfy P(A ∩ B ∩ C) = P(A)P(B)P(C).

10.3 Let {Aλ : λ ∈ Λ} be a family of independent classes of events. Show thatarbitrary events of probability zero or one may be added to any or all Aλ

while still preserving independence. Show that if Bλ is formed from Aλ

220 Independence

by including (i) all proper differences of two sets of Aλ, (ii) all countabledisjoint unions of sets ofAλ, or (iii) all limits of monotone sequences of setsofAλ then {Bλ : λ ∈ Ω} is a family of independent classes. (Hint: Considera finite index set Λ, Ω ∈ Aλ and show that independence is preserved whenjust oneAλ is replaced by Bλ.)

10.4 If E1, E2, . . . , En are independent, show thatn∑1

P(Ej) –∑j�k

P(Ej)P(Ek) ≤ P(∪n1Ej) ≤

n∑1

P(Ej).

If the events E(n)1 , . . . , E(n)

n change with n so that∑n

1 P(E(n)j )→ 0, show that

P(∪n1E(n)

j ) ∼ ∑n1 P(E(n)

j ) as n→ ∞.10.5 Let ξ, η be independent r.v.’s with E|ξ| < ∞. Show that, for any Borel set B,∫

η–1B ξ dP = Eξ P(η ∈ B).

10.6 Let ξ, η be random variables on the probability space (Ω,F , P), let E ∈ F ,and let f be a Borel measurable function on the plane. If ξ is independent ofη and E (i.e. if the classes of events σ(ξ) and σ{σ(η), E} are independent)show that ∫

E

∫Ω

f (ξ(ω1), η(ω2)) dP(ω1) dP(ω2)

=∫

E f (ξ(ω), η(ω)) dP(ω)

whenever f is nonnegative or E|f (ξ, η)| < ∞. (Hint: Prove this first for anindicator function f .) If the random variable ζ defined on the probabilityspace (Ω′,F ′, P′) has the same distribution as ξ, show that∫

E

∫Ω′

f (ζ(ω′), η(ω)) dP′(ω′)dP(ω)

=∫

E f (ξ(ω), η(ω)) dP(ω).

10.7 For n = 1, 2, . . . let Rn(x) be the Rademacher functions Rn(x) = +1 or –1according as the integer k for which k–1

2n < x ≤ k2n (0 ≤ x ≤ 1) is odd or

even. Let (Ω,F , P) be the “unit interval probability space” (consisting ofthe unit interval, Lebesgue measurable sets and Lebesgue measure). Provethat {Rn, n = 1, 2, . . .} are independent r.v.’s with the same d.f. Show thatany two of R1, R2, R1R2 are independent, but the three together are not.

10.8 A r.v. η is called symmetric if η and –η have the same distribution. Let ξ be ar.v. Let ξ1 and ξ2 be two independent r.v.’s each having the same distributionas ξ and let ξ* = ξ1 – ξ2.

(a) Show that ξ* is symmetric (it is called the symmetrization of ξ) and that

μ*(B) =∫ ∞

–∞ μ(x – B) dμ(x) =∫ ∞

–∞ μ(x + B) dμ(x)

for all Borel sets B, where μ, μ* are the distributions of ξ, ξ* respect-ively, and x – B = {x – y : y ∈ B}, x + B = {x + y : y ∈ B}.

Exercises 221

(b) Show that for all t ≥ 0, real a

P{|ξ*| ≥ t} ≤ 2P{|ξ – a| ≥ t/2}.

10.9 Criterion for independence of r.e.’s analogous to Theorem 10.1.2:Let ξλ be a random element on (Ω,F , P) with values in (Xλ,Sλ) say, foreach λ in an index set Λ. For each λ, let Eλ be a class of subsets of Xλ whichis closed under finite intersections and whose generated σ-ring S(Eλ) = Sλ,and write Gλ = ξ–1

λ Eλ (= {ξ–1λ E : E ∈ Eλ}).

Then {ξλ : λ ∈ Λ} is a class of independent random elements if and only if{Gλ : λ ∈ Λ} is a family of independent classes of events.

10.10 A weaker concept of independence of a family of classes of randomelements would be the following. Let {Cλ : λ ∈ Λ} be a family of classesof random elements and suppose that if for every choice of one memberξλ from each Cλ, {ξλ : λ ∈ Λ} is a class of independent random elements.Such a definition would be more strictly analogous to the procedure usedfor classes of sets. Show that it is, in fact, a weaker requirement than thedefinition in the text. (E.g. take two classes C1 = {ξ}, C2 = {η, ζ} where anytwo of ξ, η, ζ are independent but the three together are not (cf. Ex. 10.7).Show that {C1,C2} satisfies the weaker definition, but is not independent,however, in the sense of the text.)

10.11 For each λ in an index set Λ, let ξλ, ξ∗λ be random elements on (Ω,F , P),with values in (Xλ,Sλ) and such that ξλ = ξ∗λ a.s. Show that if {ξλ : λ ∈ Λ} isa class of independent random elements, then so is {ξ∗λ : λ ∈ Λ} (e.g. show(∩n

1ξ∗–1λi

Ei)Δ(∩n1ξ

–1λi

Ei) ⊂ ∪n1{ω : ξλi (ω) � ξ∗λi

(ω)}).10.12 A bag contains one black ball and m white balls. A ball is drawn at random.

If it is black it is returned to the bag. If it is white, it and an additional whiteball are returned to the bag. Let An denote the event that the black ball isnot drawn in the first n trials. Discuss the (converse to) the Borel–CantelliLemma with reference to the events An.

10.13 Let (Ω,F , P) be the “unit interval probability space” of Ex. 10.7. Definer.v.’s ξn by

ξn(ω) = χ[0, 12 + 1

n )(ω) + 2χ[ 12 + 1

n ,1](ω).

Find the tail σ-field of {ξn} and comment on the zero-one law.10.14 Let ξ be a r.v. which is independent of itself. Show that ξ is a constant, with

probability one.10.15 Let {ξn}∞n=1 be a sequence of independent random variables on the prob-

ability space (Ω,F , P). Prove that the probability of pointwise conver-gence of

(i) the sequence {ξn(ω)}∞n=1

(ii) the series∑∞

n=1 ξn(ω)

222 Independence

is equal to zero or one, and that whenever (i) converges its limit is equal toa constant a.s. (Hint: Show that the set C of all points ω ∈ Ω for which thesequence {ξn(ω)}∞n=1 converges is given by

C = ∩∞k=1 ∪∞N=1 ∩

∞n=N ∩

∞m=N {ω ∈ Ω : |ξn(ω) – ξm(ω)| ≤ 1/k}.)

10.16 Prove that a sequence of independent identically distributed random vari-ables converges pointwise with zero probability, except when all randomvariables are equal to a constant a.s. (Hint: Use the result and the hint of theprevious problem.)

11

Convergence and related topics

11.1 Modes of probabilistic convergence

Various modes of convergence of measurable functions to a limit functionwere considered in Chapter 6, and will be restated here with the specialterminology customarily used in the probabilistic context. In this sectionthe modes of convergence all concern a sequence {ξn} of r.v.’s on the sameprobability space (Ω,F , P) such that the values ξn(ω) “become close” (insome “local” or “global” sense) to a “limiting r.v.” ξ(ω) as n → ∞. In thenext section we shall consider the weaker form of convergence where theξn’s can be defined on different spaces, and where one is interested in onlythe limiting form of the distribution of the ξn (i.e. Pξ–1

n B for Borel sets B).This “convergence in distribution” has wide use in statistical theory andapplication.

The later sections of the chapter will be concerned with various import-ant relationships between the forms of convergence, convergence of seriesof independent r.v.’s, and related topics. Note that in certain calculationsconcerning convergence (especially in Section 11.5) it will be implicitlyassumed that the r.v.’s involved are defined for all ω. No comment will bemade in these cases, since it is a trivial matter to obtain these results forr.v.’s ξn not defined everywhere by considering ξ*

n defined for all ω, andequal to ξn a.s.

In this section, then, we shall consider a sequence {ξn} of r.v.’s on the samefixed probability space (Ω,F , P). The following definitions will apply:

Almost sure convergence

Almost sure convergence of a sequence of r.v.’s ξn to a r.v. ξ (ξn → ξ a.s.)is, of course, just a.e. convergence of ξn to ξ with respect to the probabilitymeasure P. This is also termed convergence with probability 1. Similarlyto say that {ξn} is Cauchy a.s. means that it is Cauchy a.e. (P), as defined inChapter 6.

223

224 Convergence and related topics

A useful necessary and sufficient condition for a.s. convergence is pro-vided by Theorem 6.2.4 which is restated in the present context:

Theorem 11.1.1 ξn → ξ a.s. if and only if for every ε > 0, writing En(ε) ={ω : |ξn(ω) – ξ(ω)| ≥ ε}

limn→∞

P(∪∞m=nEm(ε)

)(= P(limn→∞En(ε))) = 0.

That is, ξn → ξ a.s. if (except on a zero probability set) the events En(ε)occur only finitely often for each ε > 0, or, equivalently, the probabilitythat |ξm – ξ| ≥ ε for some m ≥ n, tends to zero as n→ ∞.

The following very simple but sometimes useful sufficient condition fora.s. convergence is immediate from the above criterion.

Theorem 11.1.2 Suppose that, for each ε > 0,∞∑

n=1

P{|ξn – ξ| ≥ ε} < ∞.

Then ξn → ξ a.s. as n→ ∞.

Proof This is an immediate and obvious application of the Borel–CantelliLemma (Theorem 10.5.1). �

A corresponding condition for {ξn} to be a Cauchy sequence a.s. (andhence convergent a.s. to some ξ) will now be obtained.

Theorem 11.1.3 Let {εn} be positive constants, n = 1, 2, . . . with∑∞n=1 εn < ∞ and suppose that

∞∑n=1

P{|ξn+1 – ξn| > εn} < ∞.

Then {ξn} is a Cauchy sequence a.s. (and hence convergent to some r.v.ξ a.s.).

Proof By the Borel–Cantelli Lemma (Theorem 10.5.1) the prob-ability is zero that |ξn+1 – ξn| > εn for infinitely many n. That is for eachω except on a set of P-measure zero, there is a finite N = N(ω) such that|ξn+1(ω) – ξn(ω)| ≤ εn when n ≥ N(ω). Given ε > 0 we may (by increasingN if necessary) require that

∑∞N εn < ε (N now depends on ε and ω, of

course). Thus if n > m ≥ N,

|ξn – ξm| ≤n–1∑k=m

|ξk+1 – ξk| ≤∞∑

k=N

|ξk+1 – ξk| ≤∞∑

k=N

εk < ε

and hence {ξn(ω)} is a Cauchy sequence, as required. �

11.1 Modes of probabilistic convergence 225

Convergence in probability

This is just convergence in measure, with the previous terminology. That is

ξn tends to ξ in probability (ξnP→ ξ) if for each ε > 0,

P{ω : |ξn(ω) – ξ(ω)| ≥ ε} → 0 as n→ ∞

i.e. P(En(ε)) → 0 as n → ∞, with the notation of Theorem 11.1.1, or inprobabilistic language P{|ξn – ξ| ≥ ε} → 0 for each ε > 0. That is, foreach (large) n there is high probability that ξn will be close to ξ – butnot necessarily high probability that ξm will be close to ξ simultaneouslyfor all m ≥ n. Thus convergence in probability is a weaker requirementthan almost sure convergence. This is made specific by the corollary toTheorem 6.2.2 (or implied by Theorem 11.1.1) which shows that if ξn → ξ

a.s., then ξnP→ ξ.

It also follows (from the corollary to Theorem 6.2.3) that if ξn convergesto ξ in probability, then a subsequence ξnk , say, of ξn converges to ξ a.s. Westate these two results as a theorem:

Theorem 11.1.4 (i) If ξn → ξ a.s., then ξnP→ ξ.

(ii) If ξnP→ ξ, then there exists a subsequence ξnk converging to ξ a.s. ({nk}

is the same for all ω).

The following result will be useful for later applications.

Theorem 11.1.5 (i) ξnP→ ξ if and only if each subsequence of {ξn} con-

tains a further subsequence which converges to ξ a.s.

(ii) If ξnP→ ξ, and f is a continuous function on R, then f (ξn)

P→ f (ξ).(iii) (ii) holds if f is continuous except for x ∈ D where Pξ–1D = 0.

Proof (i) If ξn → ξ in probability, any subsequence also converges to ξin probability, and, by Theorem 11.1.4 (ii), contains a further subsequenceconverging to ξ a.s.

Conversely suppose that each subsequence of {ξn} contains a further sub-sequence converging a.s. to ξ. If ξn does not converge to ξ in probability,there is some ε > 0 with P{|ξn – ξ| ≥ ε} � 0, and hence also some δ > 0such that P{|ξn – ξ| ≥ ε} > δ infinitely often. That is for some subsequence{ξnk }, P{|ξnk – ξ| ≥ ε} > δ, k = 1, 2, . . . . But this means that no subse-quence of {ξnk } can converge to ξ in probability (and thus certainly not a.s.),so a contradiction results. Hence we must have ξn → ξ in probability asasserted.

(ii) Suppose ξnP→ ξ and write ηn = f (ξn), η = f (ξ). Any subsequence

{ξnk } of {ξn} has, by (i), a further subsequence {ξm�}∞�=1, converging to ξ a.s.

Hence, by continuity, f (ξm�) → f (ξ) a.s. That is the subsequence {ηnk } of

{ηn} has a further subsequence converging to η a.s. and hence, again by (i),ηn → η in probability, so that (ii) holds.

For (iii) essentially the same proof applies – noting that f (ξm�) still con-

verges to f (ξ) a.s. since any further points ω where convergence does notoccur, are contained in the zero probability set ξ–1D. �

Convergence in pth order mean

Again, Lp convergence of measurable functions, (p > 0), includes Lp con-vergence for r.v.’s ξn. Specifically, if ξn, ξ have finite pth moments (i.e.ξn, ξ ∈ Lp(Ω,F , P)) we say that ξn → ξ in pth order mean if ξn → ξ inLp, i.e. if

E|ξn – ξ|p =∫|ξn – ξ|p dP → 0 as n→ ∞.

The reader should review the properties of Lp-spaces given in Section 6.4,including the inequalities restated in probabilistic terminology in Sec-tion 9.5. Especially recall that Lp is a linear space for all p > 0 (if ξ, η ∈ Lp

then aξ + bη ∈ Lp for any real a, b), and that Lp is complete. Many of theuseful results apply whether 0 0and ξn → ξ in Lp. Then

(i) ξnP→ ξ

(ii) E|ξn|p → E|ξ|p.

By (i) if ξn → ξ in Lp (p > 0) then ξnP→ ξ. This implies also, of

course, that a subsequence ξnk → ξ a.s. (Theorem 11.1.4 (ii)). However, thesequence ξn itself does not necessarily converge a.s. Conversely, nor doesa.s. convergence of ξn necessarily imply convergence in any Lp.

There is, however, a converse result when the ξn are dominated by an Lp

r.v. In particular the case p = 1 may be regarded as a form of the dominatedconvergence theorem applicable to finite measure (e.g. probability) spaces,

11.2 Convergence in distribution 227

with a.s. convergence replaced by convergence in probability. (We shallalso see a more general converse later – Theorem 11.4.2.)

Theorem 11.1.7 Let {ξn}, ξ be r.v.’s such that ξnP→ ξ. Suppose η ∈ Lp for

some p > 0, and |ξn| ≤ η a.s., n = 1, 2, . . . . Then ξn → ξ in Lp.

Proof Note first that clearly ξn ∈ Lp. Further, since ξnP→ ξ, a subsequence

ξnk → ξ a.s. so that |ξ| ≤ |η| a.s. Since η ∈ Lp it follows that ξ ∈ Lp. Now|ξn – ξ| ≤ 2η ∈ Lp and hence

E|ξn – ξ|p =∫|ξn–ξ|<ε |ξn – ξ|p dP +

∫|ξn–ξ|≥ε |ξn – ξ|p dP ≤ εp + 2p

∫(|ξn–ξ|≥ε) η

p dP.

The last term tends to zero by Theorem 4.5.3 since P{|ξn – ξ| ≥ ε} → 0 sothat lim supn→∞ E|ξn – ξ|p ≤ εp. Since ε is arbitrary, limn→∞ E|ξn – ξ|p = 0 asrequired. �

11.2 Convergence in distribution

As noted in the previous section, it is of interest to consider another formof convergence – involving just the distributions of a sequence of r.v.’s, andnot their values at each ω. That is, given a sequence {ξn} of r.v.’s we inquirewhether the distributions P{ξn ∈ B} converge to that of a r.v. ξ, i.e. P{ξ ∈ B},for sets B ∈ B.

In fact, it is a little too stringent to require this for all B ∈ B. For supposethat ξn has d.f. Fn(x) which is zero for x ≤ –1/n, one for x ≥ 1/n and is linearin (–1/n, 1/n). Clearly one would want to say that the limiting distributionof ξn is the probability measure πwith unit mass at zero, i.e. the distributionof the r.v. ξ = 0. But, taking B to be the “singleton set” {0}, we have P{ξn =0} = 0, which does not converge to P{ξ = 0} = 1.

It is easy to see (at least once one is told!) what should be done to give anappropriate definition. In the above example, the d.f.’s Fn(x) of ξn convergeto a limiting d.f. F(x) (zero for x < 0, one for x ≥ 0) at all points x otherthan the discontinuity point x = 0 of F at which Fn(0) = 1

2 . Equivalently, aswe shall see, Pξ–1

n {(a, b]} → μF{(a, b]} for all a, b with μF{a} = μF{b} = 0.This is conveniently used as the basis for a definition of convergence indistribution. It will also then be true – though we shall neither need norshow this – that Pξ–1

n (B) → μF(B) for all Borel sets B whose (topologi-cal) boundary has μ-measure zero. The definition below will be stated inwhat appears to be a slightly more general form, concerning a sequence{πn} of probability measures on B. The use of “π” in the present contextwill be helpful to distinguish probability measures on R from those on Ω.


Of course, each πn may be regarded as the distribution of some r.v. (Sec-tion 9.2). We shall speak of weak convergence of the sequence πn since itis this terminology which is used in the most abstract and general settingfor the subject described in a variety of treatises, beginning with the classicvolume [Billingsley].

Suppose, then, that {πn} is a sequence of probability measures on (R,B).Then we say that πn converges weakly to a probability measure π on B(πn

w→ π) if πn{(a, b]} → π{(a, b]} for all a, b such that π({a}) = π({b}) = 0,(i.e. each “π-continuity interval” (a, b]). It is readily seen (Ex. 11.10) thatopen intervals (a, b) or closed intervals [a, b] may replace the semiclosedinterval (a, b] in the definition.

Correspondingly if Fn is a d.f. for n = 1, 2, . . . , and F is a d.f. we writeFn

w→ F if Fn(x)→ F(x) for each x at which F is continuous.It is obvious that if Fn is the d.f. corresponding to πn, and F to π (πn =

μFn , π = μF), then Fnw→ F implies πn

w→ π. The converse is also quiteeasy to prove directly (Ex. 11.9) but will follow in the course of the proofof Theorem 11.2.1 below.

If {ξn} is a sequence of r.v.’s with d.f.’s {Fn}, and ξ is a r.v. with d.f. F,

we say that ξn converges in distribution to ξ (ξnd→ ξ), if Fn

w→ F (i.e.Pξ–1

nw→ Pξ–1). Note that the ξn do not need to be defined on the same

probability space for convergence in distribution.1 Further, even if they are

all defined on the same (Ω,F , P), the fact that ξd→ ξ does not require that

the values ξn(ω) approach those of ξ(ω) in any sense, as n→ ∞. This is incontrast to the other forms of convergence already considered and which (aswe shall see) imply convergence in distribution. For example, if {ξn} is anysequence of r.v.’s with the same d.f. F, then ξn converges in distribution toany r.v. ξ with the d.f. F. This emphasizes that convergence in distributionis concerned only with limits of probabilities P{ξn ∈ B} as n becomes large.Relationships with other forms of convergence will be addressed in the nextsection.

The following result is a central criterion for weak convergence, indeedleading to its definition in more abstract settings, in which the result issometimes termed the “Portmanteau Theorem” (e.g. [Billingsley]).

Theorem 11.2.1 Let {πn : n = 1, 2, . . .}, π, be probability measures on(R,B), with corresponding d.f.’s {Fn : n = 1, 2, . . .}, F. Then the followingare equivalent

1 Strictly we should write Pn since the ξn may be defined on different spaces (Ωn,Fn, Pn)but it is conventional to omit the n and unlikely to cause confusion.

(i) Fnw→ F

(i′) For each x, lim supn Fn(x) ≤ F(x), lim infn Fn(x) ≥ F(x – 0)

(ii) πnw→ π

(iii)∫ ∞

–∞ g dπn →∫ ∞

–∞ g dπ for every real, bounded continuous function g

on R.

Further, weak limits are unique (e.g. if Fnw→ F and Fn

w→ G then F = G).

Proof The uniqueness statement is immediate since, for example, ifFn

w→ F and Fnw→ G then F = G at all continuity points of both F, G,

and hence for all points x except in a countable set. From this it is seen atonce that F(x + 0) = G(x + 0) for all x, and hence F = G.

It is immediate that (i′) implies (i). On the other hand if (i) holds, forgiven x choose y > x such that F is continuous at y. Then lim sup Fn(x) ≤lim Fn(y) = F(y) from which it follows that lim sup Fn(x) ≤ F(x) by lettingy↓x. That lim infn Fn(x) ≥ F(x – 0) follows similarly. Hence (i) and (i′) areequivalent.

To prove the equivalence of (i), (ii), (iii), note first, as already pointedout above, that (i) clearly implies (ii).

Suppose now that (ii) holds. To show (iii) let g be a fixed, real, bounded,continuous function on R, and M = supx∈R |g(x)| (< ∞). We shall showthat lim sup

∫g dπn ≤

∫g dπ. Then replacing g by –g it will follow that

lim inf∫

g dπn = – lim sup∫

–g dπn ≥ –∫

–g dπ =∫

g dπ, to yield the de-sired result lim

∫g dπn =

∫g dπ. It will be slightly more convenient to

assume that 0 ≤ g(x) ≤ 1 for all x (which may be done by considering(g + M)/2M instead of g).

Let D be the set of atoms of π (i.e. discontinuities of F). By Lemma9.2.2, D is at most countable and thus every interval contains points of itscomplement Dc. Let ε > 0. Since π(R) = 1 there are thus points a, b inDc such that π{(a, b]} > 1 – ε/2. Hence also, since πn

w→ π, we must haveπn{(a, b]} > 1 – ε/2 for all n ≥ some N1 = N1(ε). Thus for n ≥ N1,∫ ∞

–∞ g dπn =∫

(a,b]g dπn +

∫(a,b]c g dπn ≤

∫(a,b]

g dπn + ε/2

since g ≤ 1 and πn{(a, b]c} < ε/2 when n ≥ N1. Hence

lim supn→∞

∫g dπn ≤ lim sup

n→∞

∫(a,b]

g dπn + ε/2.

Now g is uniformly continuous on the finite interval [a, b] and hencethere exists δ = δ(ε) such that |g(x) – g(y)| < ε/4 if |x – y| < δ, a ≤ x, y ≤ b.

Choose a partition a = x0 < x1 < . . . < xm = b of [a, b] such that xk � D,and xk – xk–1 < δ, k = 1, . . . , m. Then if xk–1 < x ≤ xk we have

g(x) ≤ g(xk) + ε/4 ≤ g(x) + ε/2

and hence∫

(a,b]g dπn ≤

m∑k=1

(g(xk) + ε/4)πn{(xk–1, xk]}.

Letting n → ∞ (with the partition fixed), πn{(xk–1, xk]} → π{(xk–1, xk]} giv-ing

lim supn→∞

∫(a,b]

g dπn ≤m∑

k=1

(g(xk) + ε/4)π{(xk–1, xk]}

≤∫

(a,b](g(x) + ε/2) dπ ≤

∫ ∞–∞ g dπ + ε/2.

Thus by gathering facts, we have,

lim supn→∞

∫ ∞–∞ g dπn ≤

∫ ∞–∞ g dπ + ε

from which the desired result follows since ε > 0 is arbitrary. Thus (ii)implies (iii).

Finally we assume that (iii) holds and show that (i′) follows, i.e.lim supn Fn(x) ≤ F(x), lim infn Fn(x) ≥ F(x – 0), for any fixed point x.

Let ε > 0 and write gε(t) for the bounded continuous function which isunity for t ≤ x, decreases linearly to zero at t = x+ε, and is zero for t > x+ε.Then

Fn(x) =∫

(–∞,x]gε(t) dπn(t) ≤

∫ ∞–∞ gε dπn →

∫ ∞–∞ gε dπ ≤ F(x + ε).

Hence lim supn→∞ Fn(x) ≤ F(x + ε) for ε > 0, and letting ε → 0 giveslim supn→∞ Fn(x) ≤ F(x).

It may be similarly shown (by writing hε(t) = 1 for t ≤ x – ε, zero fort ≥ x and linear in (x – ε, x)) that lim infn→∞ Fn(x) ≥ F(x – ε) for all ε > 0and, hence lim inf Fn(x) ≥ F(x – 0) as required, so that (iii) implies (i′) andhence (i), completing the proof of the equivalence of (i)–(iii). �

Corollary 1 If πnw→ π then (iii) also holds for bounded measurable

functions g just assumed to be continuous a.e. (π).

Proof It may be assumed (by subtracting its lower bound) that g is non-negative. Then a sequence {gn} of continuous functions may be found (cf.


Ex. 11.11 for a sketch of their construction) such that 0 ≤ gn(x) ↑ g(x) ateach continuity point x of g. Hence, for fixed m,

lim infn→∞

∫g dπn ≥ lim inf

n→∞

∫gm dπn =

∫gm dπ

by (iii) and hence by monotone convergence, letting m→ ∞,

lim infn→∞

∫g dπn ≥

∫g dπ.

The same argument with –g shows that lim inf∫

–g dπn ≥∫

–g dπ so thatlim sup

∫g dπn ≤

∫g dπ and hence (iii) holds for this g as required. �

The above criteria may be translated as conditions for convergence indistribution of a sequence of r.v.’s, as follows.

Corollary 2 If {ξn : n = 1, 2, . . .}, ξ are r.v.’s with d.f.’s {Fn : n = 1, 2, . . .},F, then the following are equivalent

(i) ξnd→ ξ

(ii) Fnw→ F

(iii) Pξ–1n

w→ Pξ–1

(iv) Eg(ξn)→ Eg(ξ) for every bounded continuous real function g on R.

If (iv) holds for all such g it also holds if g is just bounded and continuousa.e. (Pξ–1).

Proof These are immediate by identifying Pξ–1n , Pξ–1 with πn, π of The-

orem 11.2.1, and noting that (iv) here becomes the statement of Corollary 1of the theorem. �

The final result of this series is a very useful one which shows that an(a.e.) continuous function of a sequence converging in distribution alsoconverges in distribution.

Theorem 11.2.2 (Continuous Mapping Theorem) Let ξnd→ ξ where ξn, ξ

have distributions πn, π and let h be a measurable function on R which is

continuous a.e. (π). Then h(ξn)d→ h(ξ).

Proof This follows at once from the final statement in (iv) of Corollary2 on replacing the bounded continuous g by its composition g◦h, which isclearly bounded and continuous a.e. (π), giving

Eg(h(ξn)) = E(g◦h)(ξn)→ E(g◦h)(ξ) = Eg(h(ξ)). �

Note that this result may be equivalently stated that if πn, π are probabil-ity measures on B such that πn

w→ π, then πnh–1 w→ πh–1 if h is continuousa.e. (π). More general, useful forms of the mapping theorem are given in[Kallenberg 2, Theorem 3.2.7].

Remark The definition of weak convergence πnw→ π only involved

πn(a, b] → π(a, b] for intervals (a, b] with π{a} = π{b} = 0. It may, how-ever, then be shown that πn(B)→ π(B) for any Borel set B whose boundaryhas π-measure zero (so-called “π-continuity sets”). It may also be shownthat two useful further necessary and sufficient conditions for weak con-vergence may be added to those of Theorem 11.2.1, viz.

(iv) lim supn→∞ πn(F) ≤ π(F) all closed F(v) lim infn→∞ πn(G) ≥ π(G) all open G.

These are readily proved (see e.g. the “Portmanteau Theorem” of [Billings-ley]) and, of course, suggest extensions of the theory to more abstract (topo-logical) contexts.

We next obtain a useful and well known result, “Helly’s Selection The-orem”, concerning a sequence of d.f.’s. This theorem states that if {Fn} isany sequence of d.f.’s, a subsequence {Fnk }may be selected such that Fnk (x)converges to a nondecreasing function F(x) at all continuity points of thelatter. The limit F need not be a d.f., however, as is easily seen from theexample where Fn(x) = 0, x < –n, Fn(x) = 1, x > n, and Fn is linear in[–n, n]. (Fn(x)→ 1/2 for all x.) A condition which will be seen to be usefulin ensuring that such a limit is, in fact, a d.f., is the following.

A family H of probability measures (or corresponding d.f.’s) on B iscalled tight if given ε > 0 there exists A such that π{(–A, A]} > 1 – ε for allπ ∈ H (or F(A) – F(–A) > 1 – ε for all d.f.’s F with μF ∈ H). Note that ifπn

w→ π, it may be readily shown then that the sequence {πn} is tight (Ex.11.18).

Theorem 11.2.3 (Helly’s Selection Theorem) Let {Fn : n = 1, 2, . . .} bea sequence of d.f.’s. Then there is a subsequence {Fnk : k = 1, 2, . . .} and anondecreasing, right-continuous function F with 0 ≤ F(x) ≤ 1 for all x ∈ Rsuch that Fnk (x)→ F(x) as k → ∞ at all x ∈ R where F is continuous.

If in addition the sequence {Fn} is tight, then F is a d.f. and Fnk

w→ F.

Proof We will choose a subsequence Fnk whose values converge at all ra-tional numbers. Let {ri} be an enumeration of the rationals. Since {Fn(r1) :

n = 1, 2, . . .} is bounded, it has at least one limit point, and there is a subse-quence S1 of {Fn} whose members converge at x = r1.

Similarly there is a subsequence S2 of S1 whose members converge atr2 as well as at r1. Proceeding in this way we obtain sequences S1, S2, . . .which are such that Sn is a subsequence of Sn–1 and the members of Sn

converge at x = r1, r2, . . . , rn.Let S be the (infinite) sequence consisting of the first member of S1, the

second of S2, and so on (the “diagonal” sequence). Clearly the members ofS ultimately belong to Sn and hence converge at r1, r2, . . . , rn, for any n, i.e.at all rk.

Write S = {Fnk } and G(r) = limk→∞ Fnk (r) for each rational r. Clearly0 ≤ G(r) ≤ 1 and G(r) ≤ G(s) if r, s are rational (r < s). Now define F by

F(x) = inf{G(r) : r rational, r > x}.

Clearly F is nondecreasing, 0 ≤ F(x) ≤ 1 for all x ∈ R and G(x) ≤ F(x)when x is rational. To see that F is right-continuous, fix x ∈ R. Then forany y ∈ R and rational r with x < y < r,

F(x + 0) ≤ F(y) ≤ G(r)

so that F(x + 0) ≤ G(r) for all rational r > x. Hence

F(x + 0) ≤ inf{G(r) : r rational, r > x} = F(x),

showing that F is right-continuous.Now let x be a point where F is continuous. Then given ε > 0 there exist

rational numbers r, s, r < x < s such that

F(x) – ε < F(r) ≤ F(x) ≤ G(s) ≤ F(s) < F(x) + ε.

Also if r′ is rational, r < r′ < x, F(r) ≤ G(r′) ≤ F(r′) ≤ F(x) so that

F(x) – ε < G(r′) ≤ F(x) ≤ G(s) < F(x) + ε

giving

F(x) – ε < limk→∞

Fnk (r′) ≤ lim

k→∞Fnk (s) < F(x) + ε.

But Fnk (r′) ≤ Fnk (x) ≤ Fnk (s) and hence

F(x) – ε < lim infk→∞

Fnk (x) ≤ lim supk→∞

Fnk (x) < F(x) + ε

from which it follows by letting ε → 0 that Fnk (x)→ F(x) as required.The final task is to show that if the sequence {Fn} is tight, then F is a

d.f. Fix ε > 0 and let A be such that Fn(A) – Fn(–A) > 1 – ε for all n. Let


α ≤ –A, β ≥ A be continuity points of F. Then Fnk (β) – Fnk (α) > 1 – εfor all k, and hence F(β) – F(α) = lim(Fnk (β) – Fnk (α)) ≥ 1 – ε. It followsthat F(∞) – F(–∞) ≥ 1 – ε for all ε and hence F(∞) – F(–∞) = 1. ThusF(∞) = 1 + F(–∞) gives F(–∞) = 0 and F(∞) = 1. Thus F is d.f. andFnk

w→ F. �

An important notion closely related to tightness (in fact identical to tight-ness in this real line context) is that of relative compactness. Specifically afamilyH of probability measures on B is called relatively compact if everysequence {πn} of elements ofH has a weakly convergent subsequence {πnk }(i.e. πnk

w→ π for some probability measure π, not necessarily in H). If His a sequence this means that every subsequence has a further subsequencewhich is weakly convergent.

It follows from the previous theorem that a family which is tight is alsorelatively compact. In fact it is easily seen that the converse is also true (inthis real line framework and many other useful topological contexts). Thisis summarized in the following theorem.

Theorem 11.2.4 (Prohorov’s Theorem) A family H of probability mea-sures on B is relatively compact if and only if it is tight.

Proof In view of the preceding paragraph, we need only now prove thatif H is relatively compact it is also tight. If it is not tight, there is someε > 0 such that π(–a, a] ≤ 1 – ε for some π ∈ H , whatever a is chosen. Thismeans that for any n, there is a member πn of H with πn{(–n, n]} ≤ 1 – ε.But since H is relatively compact a subsequence πnk

w→ π, a probabilitymeasure, as k → ∞.

Let a, b be any points such that π({a}) = π({b}) = 0. Then for suffi-ciently large k, (a, b] ⊂ (–nk, nk] and hence π{(a, b]} = limk→∞ πnk {(a, b]} ≤lim supk πnk {(–nk, nk]} ≤ 1 – ε. But this contradicts the fact that we maychoose a, b with π({a}) = π({b}) = 0 so that π{(a, b]} > 1–ε (since π(R) = 1).ThusH is indeed tight. �

It is well known (and easily shown) that if every convergent subsequenceof a bounded sequence {an} of real numbers, has the same limit a, thenan → a (i.e. the whole sequence converges). The next result demonstratesan analogous property for weak convergence.

Theorem 11.2.5 Let {Fn} be a tight sequence of d.f.’s such that everyweakly convergent subsequence {Fnk } has the same limiting d.f. F. ThenFn

w→ F.

11.3 Relationships between forms of convergence 235

Proof Suppose the result is not true. Then there is a continuity point x ofthe d.f. F such that Fn(x) � F(x). By the above result stated for real se-quences, there must be a subsequence {Fnk } of {Fn} such that Fnk (x)→ λ �F(x). By Theorem 11.2.3, a subsequence {Fmk } of {Fnk } converges weakly,and by assumption its limit is F. Thus Fmk (x) → F(x), contradicting theconvergence of Fnk (x) to λ � F(x). �

Finally, as indicated earlier, the notion of weak convergence may be gen-eralized to apply to more abstract situations. The most obvious of these re-places R by Rk for which the generalization is immediate. Specifically wesay that a sequence {πn} of probability measures on Bk converges weakly toa probability measure π on Bk (πn

w→ π) if πn(I)→ π(I) for every “continu-ity rectangle” I; i.e. any rectangle I whose boundary has π-measure zero.In R the boundary of I = (a, b] is just the two points {a, b}. In R2 it is thefour edges, and in Rk it is the 2k bounding hyperplanes.

As in R we say that a sequence {Fn} of d.f.’s in Rk converges weakly toa d.f. F, Fn

w→ F, if Fn(x) → F(x) at all points x = (x1, . . . , xk) at whichF is continuous. It may then be shown that Fn

w→ F if and only if thecorresponding probability measures converge (i.e. πn = μFn

w→ π = μF). IfFn is the joint d.f. of r.v.’s (ξ(1)

n , . . . , ξ(k)n ) (= ξn say) and F is the joint d.f. of

(ξ(1), . . . , ξ(k)) = ξ, and Fnw→ F we say that ξn converges to ξ in distribution

(ξnd→ ξ) (i.e. Pξ–1

nw→ Pξ–1).

More abstract (topological) spaces than Rk do not necessarily have anorder structure to support the notions of distribution functions and of rect-angles. However, the notion of bounded continuous functions does existso that (iii) of Theorem 11.2.1 (

∫g dπn →

∫g dπ for every bounded con-

tinuous function g) can be used as the definition of weak convergence ofprobability measures πn

w→ π. This is needed for consideration of con-vergence in distribution of a sequence of random elements (e.g. stochasticprocesses) to a random element ξ in topological spaces more general thanR (Pξ–1

nw→ Pξ–1) but our primary focus on random variables does not re-

quire the generalization here. We refer the interested reader to [Billingsley]for an eminently readable detailed account.

11.3 Relationships between forms of convergence

Returning now to the real line context, it is useful to note some relationshipsbetween the various forms of convergence.

Convergence a.s. and convergence in Lp both imply convergence inprobability. It is also simply shown by the next result that convergence


in probability implies convergence in distribution. (For another proof seeEx. 11.12.)

Theorem 11.3.1 Let {ξn} be a sequence of r.v.’s on the same probability

space (Ω,F , P) and suppose that ξnP→ ξ as n → ∞. Then ξn

d→ ξ asn→ ∞.

Proof Let g be any bounded continuous function onR. By Theorem 11.1.5

(ii) it follows that g(ξn)P→ g(ξ). But |g(ξn)| is bounded by a constant and

any constant is in L1, so that g(ξn) → g(ξ) in L1 by Theorem 11.1.7, andhence, in particular Eg(ξn)→ Eg(ξ). Hence (iv) of Corollary 2 to Theorem

11.2.1 shows that ξnd→ ξ. �

Of course, the converse to Theorem 11.3.1 is not true (even though theξn are defined on the same space). However, if ξn converges in distribution

to some constant a, it is easy to show that ξnP→ a (Ex. 11.13).

Convergence in distribution by no means implies a.s. convergence (evenfor r.v.’s defined on the same (Ω,F , P)). However, the following represen-tation of Skorohod shows that a sequence {ξn} convergent in distributionmay for some purposes be replaced by an a.s. convergent sequence ξn withthe same individual distributions as ξn, such that ξn converges a.s. This canenable the use of simpler theory of a.s. convergence in proving results forconvergence in distribution.

Theorem 11.3.2 (Skorohod’s Representation) Let {ξn}, ξ be r.v.’s and

ξnd→ ξ. Then there exist r.v.’s {ξn}, ξ on the “unit interval probability

space” ([0, 1],B([0, 1]), m) (where m is Lebesgue measure) such that

(i) ξnd= ξn for each n, ξ d= ξ, and

(ii) ξn → ξ a.s.

Proof Let ξn, ξ have d.f.’s Fn, F, respectively and let U(u) = u for 0 ≤u ≤ 1. Then U is a uniform r.v. on [0, 1] and (cf. Section 9.6 and Ex. 9.5)ξn = F–1

n (U), ξ = F–1(U) have d.f.’s Fn, F, i.e. ξnd= ξn, ξ

d= ξ so that(i) holds.

Since ξnd→ ξ, Fn

w→ F, and hence by Lemma 9.6.2, F–1n → F–1 at

continuity points of F–1. Thus

1 ≥ m{u ∈ [0, 1] : ξn(u)→ ξ(u)}= m{u ∈ [0, 1] : F–1

n (u)→ F–1(u)} (ξn(u) = F–1n (U(u)) = F–1

n (u))

≥ m{u ∈ [0, 1] : F–1 is continuous at u} = 1,

11.3 Relationships between forms of convergence 237

since the discontinuities of F–1 are countable. Hence ξn(u) → ξ(u) for a.e.u, giving (ii). �

Note that while the r.v.’s ξn may be defined on different probabilityspaces, their “representatives” ξn are defined on the same probability space(as they must be if a.s. convergent).

Finally, note that weak convergence, πnw→ π, has been defined for prob-

ability measures πn, π but the same definition applies to measures μn andμ just assumed to be finite on B, i.e. μn(R) < ∞, μ(R) < ∞. Of course,μn(R) and μ(R) need not be unity but if μn

w→ μ it follows in particular thatμn(R)→ μ(R).

Suppose now that μn, μ are Lebesgue–Stieltjes measures i.e. measureson B which are finite on bounded sets but possibly having infinite totalmeasure (or equivalently are defined by finite-valued, nondecreasing butnot necessarily bounded functions F). Then the previous definition of weakconvergence could still be used but the important criterion (iii) of Theo-rem 11.2.1 does not apply sensibly since e.g. the bounded continuous func-tion g(x) = 1 may not be integrable. This is the case for Lebesgue measureitself, of course. However, an appropriate extended notion of convergencemay be given in this case.

Specifically if {μn}, μ are such measures on B (finite on bounded sets),we say that μn converges vaguely to μ (μn

v→ μ) if∫

f dμn →∫

f dμ

for every continuous function f with compact support, i.e. such thatf (x) = 0 if |x| > a for some constant a. Clearly

∫f dμn and

∫f dμ are defined

and finite for such functions.The notion of vague convergence applies in particular if μn and μ are fi-

nite measures and is clearly then implied by weak convergence. The follow-ing easily proved result (Ex. 11.20) summarizes the relationship betweenweak and vague convergence in this case when both apply.

Theorem 11.3.3 Let μn, μ be finite measures on B (i.e. μn(R) < ∞,μ(R) < ∞). Then, as n → ∞, μn

w→ μ if and only if μnv→ μ and μn(R) →

μ(R).

As for weak convergence, the notion of vague convergence can be ex-tended to apply in more general topological spaces than the real line. Dis-cussion of these forms of convergence and their relationships may be foundin the volumes [Kallenberg] and [Kallenberg 2].

11.4 Uniform integrability

We turn now to the relation between Lp convergence and convergence inprobability. Lp convergence implies convergence in probability (Theorem11.1.6). We have seen that the converse is true provided each term of thesequence is dominated by a fixed Lp r.v. (Theorem 11.1.7). A weaker con-dition turns out to be necessary and sufficient, and since it is important forother purposes, we investigate this now.

Specifically, a family {ξλ : λ ∈ Λ} of (L1) r.v.’s is said to be uniformlyintegrable if

supλ∈Λ

∫{|ξλ(ω)|>a} |ξλ(ω)| dP(ω)→ 0 as a→ ∞

or equivalently if supλ∈Λ∫{|x|>a} |x| dFλ(x) → 0 as a → ∞, where Fλ is

the d.f. of ξλ. From this latter form it is evident that (like convergencein distribution (Section 11.2)) uniform integrability does not require ther.v.’s to be defined on the same probability space. Of course, we alwayshave

∫{|ξλ |>a} |ξλ| dP → 0 (

∫{|x|>a} |x| dFλ(x) → 0) for each λ as a → ∞

(dominated convergence). The extra requirement is that these should beuniform in λ ∈ Λ. It is clear that identically distributed (L1) r.v.’s are uni-formly integrable since

∫{|x|>a} |x| dF(x) → 0 where F is the common d.f.

of the family. It is also immediate that finite families of (L1) r.v.’s are uni-formly integrable, and that an arbitrary family {ξλ} defined on the sameprobability space and each dominated (in absolute value) by an integrabler.v. ξ, is uniformly integrable. For then |ξλ|χ{|ξλ |≥a} ≤ |ξ|χ{|ξ|≥a} and hence∫{|ξλ |≥a} |ξλ| dP ≤

∫{|ξ|≥a} |ξ| dP.

The concept of uniform integrability is closely related to what is called“uniform absolute continuity”. If ξ ∈ L1, we know that (the measure)∫

E|ξ| dP is absolutely continuous with respect to P. Recall (Theorem 4.5.3)

that then, given ε > 0 there exists δ > 0 such that∫

E|ξ| dP < ε if P(E) < δ.

If {ξλ : λ ∈ Λ} is a family of (L1) r.v.’s, each indefinite integral∫

E|ξλ| dP

is absolutely continuous. If for each ε, one δ may be found for all ξλ (i.e.if

∫E|ξλ| dP < ε for all λ when P(E) < δ) then the family of indefinite

integrals {∫

E|ξλ| dP : λ ∈ Λ} is called uniformly absolutely continuous.

Theorem 11.4.1 A family of L1 r.v.’s {ξλ : λ ∈ Λ} is uniformly integrableif and only if:

(i) the indefinite integrals∫

E|ξλ| dP are uniformly absolutely continuous,

and

11.4 Uniform integrability 239

(ii) the expectations E|ξλ| are bounded; i.e. E|ξλ| < M for some M < ∞and all λ ∈ Λ.

Proof Suppose the family is uniformly integrable. To see that (i) holds,note that for any E ∈ F , λ ∈ Λ,∫

E|ξλ| dP =

∫E∩{|ξλ |≤a} |ξλ| dP +

∫E∩{|ξλ |>a} |ξλ| dP ≤ aP(E) +

∫{|ξλ |>a} |ξλ| dP.

Given ε > 0 we may choose a so that the last term does not exceed ε/2,for all λ ∈ Λ by uniform integrability. For P(E) < δ = ε/2a we thus have∫

E|ξλ| dP < ε for all λ ∈ Λ, so that (i) follows.(ii) is even simpler. For we may choose a such that

∫{|ξλ |>a} |ξλ| dP < 1 for

all λ ∈ Λ and hence E|ξλ| ≤ 1 +∫{|ξλ |≤a} |ξλ| dP ≤ 1 + a which is a suitable

upper bound.Conversely, suppose that (i) and (ii) hold and write

supλ∈ΛE|ξλ| = M < ∞.

Then by the Markov Inequality (Theorem 9.5.3 (Corollary)), for all λ ∈ Λ,and all a > 0,

P{|ξλ| > a} ≤ E|ξλ|/a ≤ M/a.

Given ε > 0, choose δ = δ(ε) so that∫

E|ξλ| dP < ε for all λ ∈ Λ when

P(E) < δ. For a > M/δ we have P{|ξλ| > a} < δ and thus∫{|ξλ |>a} |ξλ| dP < ε

for all λ ∈ Λ. But this is just a statement of the required uniform integra-bility. �

The following result shows in detail how Lp convergence andconvergence in probability are related, and in particular generalizes the(probabilistic form of) dominated convergence (Theorem 11.1.7), replacingdomination by uniform integrability.

Theorem 11.4.2 If ξn ∈ Lp (0< p<∞) for all n = 1, 2, . . . , and ξnP→ ξ,

then the following are equivalent

(i) {|ξn|p : n = 1, 2, . . .} is a uniformly integrable family(ii) ξ ∈ Lp and ξn → ξ in Lp as n→ ∞

(iii) ξ ∈ Lp and E|ξn|p → E|ξ|p as n→ ∞.

Proof We show first that (i) implies (ii).

Since ξnP→ ξ, a subsequence ξnk → ξ a.s. Hence, by Fatou’s Lemma,

and (ii) of the previous theorem,

E|ξ|p ≤ lim infk→∞

E|ξnk |p ≤ supn≥1E|ξn|p < ∞

so that ξ ∈ Lp. Further

E|ξn – ξ|p =∫{|ξn–ξ|p≤ε} |ξn – ξ|p dP +

∫{|ξn–ξ|p>ε} |ξn – ξ|p dP

≤ ε + 2p∫

En|ξn|p dP + 2p

∫En|ξ|p dP

where En = {ω : |ξn – ξ| > ε1/p} (hence P(En) → 0) and use has been madeof the inequality |a + b|p ≤ 2p(|a|p + |b|p) (cf. proof of Theorem 6.4.1).

Uniform integrability of |ξn|p implies the uniform absolute continuity of∫E|ξn|p dP (Theorem 11.4.1). Thus

∫E|ξn|p dP < ε when P(E) < δ (= δ(ε)),

for all n, and hence there is some N1 (making P(En) < δ for n ≥ N1) suchthat

∫En|ξn|p dP < ε when n ≥ N1. Correspondingly for n ≥ some N2 we

have∫

En|ξ|p dP < ε, and hence for n ≥ max(N1, N2), E|ξn–ξ|p < ε+2pε+2pε,

showing that ξn → ξ in Lp.Thus (i) implies (ii). That (ii) implies (iii) follows at once from Theorem

11.1.6.The proof will be completed by showing that (iii) implies (i). Let A be

any fixed nonnegative real number such that P{|ξ| = A} = 0, and define thefunction h(x) = |x|p for |x| < A, h(x) = 0 otherwise. Now since ξn → ξ

in probability and h is continuous except at ±A (but P{ξ = ±A} = 0), itfollows from Theorem 11.1.5 (iii) that h(ξn) → h(ξ) in probability. Sinceh(ξn) ≤ Ap ∈ L1 it follows from Theorem 11.1.7 that h(ξn) → h(ξ) in L1.Thus Eh(ξn)→ Eh(ξ), and hence by (iii),

E|ξn|p – Eh(ξn) → E|ξ|p – Eh(ξ)

or ∫{|ξn |>A} |ξn|p dP →

∫{|ξ|>A} |ξ|

p dP.

Now if ε > 0 we may choose A = A(ε) such that this limit is less than ε(and P{|ξ| = A} = 0), so that there exists N = N(ε) such that∫

{|ξn |>A} |ξn|p dP < ε

for all n ≥ N. Since as noted above the finite family {|ξn|p : n = 1, 2, . . . ,N – 1} is uniformly integrable, we have sup1≤n≤N–1

∫{|ξn |≥a} |ξn|p dP → 0 as

a→ ∞, and hence there exists A′ = A′(ε) such that

max1≤n≤N–1

∫{|ξn |p>A′} |ξn|p dP < ε.

Now taking A′′ = A′′(ε) = max(A, A′), we have∫{|ξn |>A′′} |ξn|p dP < ε for

all n, and hence, finally, supn

∫{|ξn |p>a} |ξn|p dP < ε whenever a > (A′′(ε))p,

demonstrating the desired uniform integrability. �

11.5 Series of independent r.v.’s 241

Note that (iii) states that∫

g dπn →∫

g dπ where πn, π are the distribu-tions of ξn and ξ, and g is the function g(x) = |x|p. This result would have

followed under weak convergence of πn to π only (i.e. ξnd→ ξ) if g were

bounded (by Theorem 11.2.1). It is thus the fact the |x|p is not bounded thatmakes the extra conditions necessary.

Finally, also note that while we are used to sufficient (e.g. “dominationtype”) conditions for (ii) the fact that (i) is actually necessary for (ii) indi-cates the appropriateness of uniform integrability as the correct condition

to consider for sufficiency when ξnP→ ξ.

11.5 Series of independent r.v.’s

It follows (Ex. 10.15) from the zero-one law of Chapter 10 that if {ξn} areindependent r.v.’s then

P{ω :∞∑

n=1

ξn(ω) converges} = 0 or 1.

In this section necessary and sufficient conditions will be obtained for thisprobability to be unity, i.e. for

∑∞1 ξn to converge a.s. First, two inequalities

are needed.

Theorem 11.5.1 (Kolmogorov Inequalities) Let ξ1, ξ2, . . . , ξn be indepen-dent r.v.’s with zero means and (possibly different) finite second momentsEξ2

i = σ2i . Write Sk =

∑kj=1 ξj. Then, for every a > 0

(i) P{max1≤k≤n |Sk| ≥ a} ≤ ∑ni=1 σ

2i /a2.

(ii) If in addition the r.v.’s ξi are bounded, |ξi| ≤ c a.s., i = 1, 2, . . . , n, thenP{max1≤k≤n |Sk| < a} ≤ (c + a)2/

∑ni=1 σ

2i .

Proof First we prove (i), so do not assume ξi bounded. Write

E = {ω : max1≤k≤n|Sk(ω)| ≥ a}

E1 = {ω : |S1(ω)| ≥ a}Ek = {ω : |Sk(ω)| ≥ a} ∩ ∩k–1

i=1 {ω : |Si(ω)| < a}, k > 1.

It is readily checked that χEk and χEk Sk are Borel functions of ξ1, . . . , ξk. ByTheorems 10.3.2 (Corollary) and 10.3.5 it follows that if i > k,

E(χEk Skξi) = E(χEk Sk) Eξi = 0, E(χEkξ2i ) = EχEkEξ2

i

and for j > i > k

E(χEkξiξj) = EχEkEξiEξj = 0.


Hence since

S2n = (Sk +

n∑k+1

ξi)2 = S2k + 2Sk

n∑k+1

ξi +n∑

k+1

ξ2i + 2

∑n≥j>i>k

ξiξj

it follows that

E(χEk S2n) = E(χEk S

2k) + P(Ek)

n∑k+1

σ2i , (11.1)

so that

E(χEk S2n) ≥ E(χEk S

2k) ≥ a2P(Ek)

since χEk S2k ≥ a2χEk by definition of Ek. Thus since E = ∪n

1Ek, and the setsEk are disjoint, χE =

∑n1 χEk and

a2P(E) = a2n∑1

P(Ek) ≤n∑1

E(χEk S2n) = E(S2

nχE) ≤ ES2n =

n∑1

σ2i

by independence of ξi. Thus P(E) ≤ ∑ni=1 σ

2i /a2, which is the desired result,

(i).To prove (ii) assume now that |ξi| ≤ c a.s. for each i, and note that the

equality (11.1) still holds, so that

E(χEk S2n) ≤ E(χEk S

2k) + P(Ek)

n∑1

σ2i ≤ (a + c)2P(Ek) + P(Ek)

n∑1

σ2i

since |Sk| ≤ |Sk–1| + |ξk| ≤ a + c on Ek. Summing over k from 1 to n we have

E(χES2n) ≤ (a + c)2P(E) + P(E)

n∑1

σ2i

and thus (noting that |Sn| ≤ a on Ec)

n∑1

σ2i = ES2

n = E(χES2n) + E(χEc S2

n)

≤ (a + c)2P(E) + P(E)n∑1

σ2i + a2P(Ec)

≤ (a + c)2 + P(E)n∑1

σ2i .

Rearranging gives

P(Ec) ≤ (a + c)2/n∑1

σ2i

or

P{max1≤k≤n|Sk| < a} ≤ (a + c)2/

n∑1

σ2i

which is the desired result. �

Note that the inequality (i) is a generalization of the Chebychev Inequal-ity (which it becomes when n = 1). Note also that the same inequalityholds for P{max1≤k≤n |Sk | ≤ a} in (ii) as for P{max1≤k≤n |Sk| < a}. (For wemay replace a in (ii) by a + ε and let ε ↓ 0.)

The next lemma will be useful in obtaining our main theorems concern-ing a.s. convergence of series of r.v.’s.

Lemma 11.5.2 Let {ξn} be a sequence of r.v.’s and write Sn =∑n

1 ξi. Then∑∞1 ξn converges a.s. if and only if

limk→∞

P{maxn≤r≤k|Sr – Sn| > ε} → 0 as n→ ∞

for each ε > 0. (Note that the k-limit exists by monotonicity.)

Proof Since∑∞

1 ξn converges if and only if the sequence {Sn} is Cauchy,it is readily seen that

{ω :∞∑1

ξn converges} = ∩∞m=1 ∪∞n=1 {ω : |Si – Sj| ≤ 1/m for all i, j ≥ n}

= ∩∞m=1 ∪∞n=1 ∩∞k=n{ω : maxn≤i,j≤k

|Si – Sj| ≤ 1/m}.

Now if Ecmnk denotes the set in braces, i.e. Emnk = {ω : maxn≤i,j≤k |Si – Sj| >

1/m}, it is clear that Emnk is nonincreasing in n (≤ k), and nondecreasingin both k (≥ n) and m so that, writing D for the set where

∑∞1 ξn does not

converge, we have

P(D) = P{∪∞m=1 ∩∞n=1 ∪∞k=nEmnk} = limm→∞

limn→∞

limk→∞

P(Emnk).

Since P(Emnk) is nondecreasing in m, P(D) = 0 if and only iflimn→∞ limk→∞ P(Emnk) = 0 for each m, which clearly holds if and only if

limk→∞

P{maxn≤i,j≤k

|Si – Sj| > ε} → 0 as n→ ∞

for each ε > 0. But for fixed n, k,

P{maxn≤i≤k|Si – Sn| > ε} ≤ P{max

n≤i,j≤k|Si – Sj| > ε} ≤ P{max

n≤i≤k|Si – Sn| > ε/2}

(since |Si –Sj| ≤ |Si –Sn|+ |Sn –Sj|), from which it is easily seen that P(D) = 0if and only if limk→∞ P{maxn≤r≤k |Sr –Sn| > ε} → 0 as n→ ∞ for each ε > 0,as required. �

The next theorem (which will follow at once from the above results),while not as general as the “Three Series Theorem” to be obtained subse-quently nevertheless gives a simple useful condition for a.s. convergence ofseries of independent r.v.’s when the terms have finite variances.

Theorem 11.5.3 Let {ξn} be a sequence of independent r.v.’s with zeromeans and finite variances Eξ2

n = σ2n. Suppose that

∑∞1 σ

2n < ∞. Then∑∞

1 ξn converges a.s.

Proof Writing Sn =∑n

1 ξi, and noting that Sr – Sn is (for r > n) the sum ofr – n r.v.’s ξi, we have by Theorem 11.5.1

P{maxn≤r≤k|Sr – Sn| > ε} ≤

k∑i=n+1

σ2i /ε2

so that

limk→∞

P{maxn≤r≤k|Sr – Sn| > ε} ≤

∞∑i=n+1

σ2i /ε2

which tends to zero as n → ∞ by virtue of the convergence of∑∞

1 σ2i .

Hence the result follows immediately from Lemma 11.5.2. �

The next result is the celebrated “Three Series Theorem”, which givesnecessary and sufficient conditions for a.s. convergence of series of inde-pendent r.v.’s, without assuming existence of any moments of the terms.

Theorem 11.5.4 (Kolmogorov’s Three Series Theorem) Let {ξn : n =1, 2, . . .} be independent r.v.’s and let c be a positive constant. Write En ={ω : |ξn(ω)| ≤ c} and define ξ′n(ω) as ξn(ω) or c according as ω ∈ En

or ω ∈ Ecn. Then a necessary and sufficient condition for the convergence

(a.s.) of∑∞

1 ξn is the convergence of all three of the series

(a)∞∑1

P(Ecn) (b)

∞∑1

Eξ′n (c)∞∑1

σ′n2

σ′n2 being the variance of ξ′n.

Proof To see the sufficiency of the conditions note that (a) may be rewrit-ten as

∑P(ξn � ξ′n), and convergence of this series implies (a.s.), by the

Borel–Cantelli Lemma, that ξn(ω) = ξ′n(ω) when n is sufficiently large (howlarge, depending on ω). Hence

∑ξn converges a.s. if and only if

∑ξ′n does.

But by Theorem 11.5.3 applied to ξ′n –Eξ′n (using (c), E(ξ′n –Eξ′n)2 = σ′n2)

we have that∑

(ξ′n – Eξ′n) converges a.s. Hence by (b)∑ξ′n converges a.s.,

and, by the discussion above, so does∑ξn, as required.

Conversely, suppose that∑∞

1 ξn converges a.s. Since this implies thatξn → 0 a.s. we must have ξn = ξ′n a.s. when n is sufficiently large, andhence

∑P{ξn � ξ′n} < ∞ by Theorem 10.5.2. That is, condition (a) holds,

and further∑ξ′n converges a.s.

Now let ηn, ζn be r.v.’s with the same distributions as ξ′n and such that{ηn, ζn : n = 1, 2, . . .} are all independent as a family. (Such r.v.’s maybe readily constructed using product spaces.) It is easily shown (cf. Ex.11.30) that

∑ηn and

∑ζn both converge a.s. (since

∑ξ′n does) and hence

so does∑

(ηn – ζn). Writing Sk =∑k

1(ηn – ζn) we have, in particular, thatthe series {|Sk| : k = 1, 2, . . .} is bounded for a.e. ω, i.e. P{supk≥1 |Sk| <∞} = 1, and hence lima→∞ P{supk≥1 |Sk| < a} = 1 so that P{supk≥1 |Sk | <a} > θ for some θ > 0, a > 0. Thus, for any n, P{max1≤k≤n |Sk| < a} > θ.But Theorem 11.5.1 (ii) applies to the r.v.’s ηk – ζk (with variance 2σ′k

2, andwriting 2c for c), to give (2c + a)2/(2

∑n1 σ′k

2) > P{max1≤k≤n |Sk| < a} > θ forall n. That is, for all n

n∑1

σ′k2 < (2c + a)2/2θ

which shows that∑∞

1 σ′k

2 converges; i.e. (c) holds.(b) is now easily checked, since the sequence of r.v.’s ξ′n – Eξ′n have zero

means, and the sum of their variances (∑σ′n

2) is finite. Hence∑

(ξ′n – Eξ′n)converges a.s., as does

∑ξ′n. By choosing some fixed ω where convergence

(of both) takes place, we see that∑Eξ′n must converge, concluding the

proof of the theorem. �

Note that it follows from the theorem that if the series (a), (b), (c) con-verge for some c > 0, they converge for all c > 0. Note also that the proofof the theorem will apply if ξ′n(ω) is defined to be zero (rather than c) whenω ∈ Ec

n. This definition of ξ′n can be simpler in practice.Convergence in probability does not usually imply convergence a.s. Our

final task in this section is to show, however, that convergence of a seriesof independent r.v.’s in probability does imply its convergence a.s.

Theorem 11.5.5 Let {ξn} be a sequence of independent r.v.’s. Then theseries

∑∞1 ξn converges in probability if and only if it converges a.s.

Proof Certainly convergence a.s. implies convergence in probability. ByLemma 11.5.2 (using 2ε in place of ε) the result will follow if it is shownthat for each ε > 0

limk→∞

P{maxn≤r≤k|Sr – Sn| > 2ε} → 0, as n→ ∞,

with Sn =∑n

1 ξi. Instead of appealing to Kolmogorov’s Inequality (as in theprevious theorem), the convergence in probability may be used to obtainthis as follows.

If n < r ≤ k and |Sr – Sn| > 2ε, |Sk – Sr | ≤ ε then

|Sk – Sn| = |(Sr – Sn) – (Sr – Sk)| ≥ |Sr – Sn| – |Sr – Sk| > ε

and hence

∪kr=n+1 {ω : max

n≤j<r|Sj – Sn| ≤ 2ε, |Sr – Sn| > 2ε, |Sk – Sr | ≤ ε}

⊂ {ω : |Sk – Sn| > ε}.

The sets of the union are disjoint. Also maxn<j<r |Sj – Sn| and |Sr – Sn| de-pend on ξn+1, . . . , ξr, whereas Sk – Sr depends on ξr+1, . . . , ξk. Hence, usingindependence of the ξi,

k∑r=n+1

P{maxn≤j<r|Sj – Sn| ≤ 2ε, |Sr – Sn| > 2ε}P{|Sk – Sr | ≤ ε}

≤ P{|Sk – Sn| > ε}.

Since∑∞

1 ξn converges in probability, {Sn} is a Cauchy sequence in prob-ability, and hence, given η > 0, there is an integer N with P{|Sk –Sn|> ε}<ηwhen k, n ≥ N. Hence also P{|Sk – Sr | ≤ ε} > 1 – η if k ≥ r ≥ N, giving

k∑r=n+1

P{maxn≤j<r|Sj – Sn| ≤ 2ε, |Sr – Sn| > 2ε} ≤ η/(1 – η)

if k > n ≥ N. Rephrasing this, we have

P{maxn≤r≤k|Sr – Sn| > 2ε} ≤ η/(1 – η)

and hence limk→∞ P{maxn≤r≤k |Sr – Sn| > 2ε} ≤ η/(1 – η) for n ≥ N, giving

limk→∞

P{maxn≤r≤k|Sr – Sn| > 2ε} → 0 as n→ ∞,

concluding the proof. �

11.6 Laws of large numbers 247

It may even be shown that if a series∑∞

1 ξn of independent r.v.’s con-verges in distribution it converges in probability and hence a.s. Since weshall use characteristic functions to prove it, the explicit statement andproof of this still stronger result is deferred to the next chapter (Theorem12.5.2).

11.6 Laws of large numbers

The last section concerned convergence of series of independent r.v.’s∑∞1 ξn. For convergence it is necessary in particular that the terms tend

to zero i.e. ξn → 0 a.s. Thus the discussion there certainly does not apply toany (nontrivial) independent sequences for which the terms have the samedistributions. It is mainly to such “independent and identically distributed”(i.i.d.) random variables that the present section will apply.

Specifically we shall consider an independent sequence {ξn} with Sn =∑n1 ξi and obtain conditions under which the averages Sn/n converge to a

constant either in probability or a.s., as n→ ∞. For i.i.d. random variableswith a finite mean, the constant will turn out to be μ = Eξi. Results of thistype are usually called laws of large numbers, convergence in probabilitybeing called a weak law and convergence with probability one a strong law.

Two versions of the strong law will be given – one applying to indepen-dent r.v.’s with finite second moments (but not necessarily having the samedistributions), and the other applying to i.i.d. r.v.’s with finite first moments.Since convergence a.s. implies convergence in probability, weak laws willfollow trivially as corollaries. However, the weak law for i.i.d. r.v.’s mayalso be easily obtained directly by use of characteristic functions as will beseen in the next chapter.

Lemma 11.6.1 If {yn} is a sequence of real numbers such that∑∞

n=1 yn/nconverges, then 1

n

∑ni=1 yi → 0 as n→ ∞.

Proof Writing sn =∑n

i=1 yi/i (s0 = 0), tn =∑n

1 yi it is easily checked thattn/n = – 1

n

∑n–1i=1 si + sn. Since 1

n

∑ni=1 si is well known (or easily shown) to

converge to the same limit as sn it follows that tn/n→ 0, which is the resultrequired. �

The first form of the strong law of large numbers requires the indepen-dent r.v.’s ξn to have finite variances but not necessarily to be identicallydistributed.

Theorem 11.6.2 (Strong Law, First Form) If ξn are independent r.v.’s withfinite means μn and finite variances σ2

n, satisfying∑∞

n=1 σ2n/n2 < ∞, then

1n

n∑i=1

(ξi – μi)→ 0 a.s.

In particular if 1n

∑ni=1 μi → μ (e.g. if μn → μ) then 1

n

∑ni=1 ξi → μ a.s.

Proof It is sufficient to consider the case where μn = 0 for all n since thegeneral case follows by replacing ξi by (ξi – μi). Assume then that μn = 0for all n and write ηn(ω) = ξn(ω)/n. Then Eηn = 0 and

∞∑n=1

var(ηn) =∞∑

n=1

σ2n/n2 < ∞.

Thus by Theorem 11.5.3,∑∞

n=1 ξn/n =∑∞

n=1 ηn converges a.s. and the desiredconclusion follows at once from Lemma 11.6.1. �

The following result also yields the most common form of the stronglaw, which applies to i.i.d. r.v.’s (but only assumes the existence of firstmoments).

Theorem 11.6.3 (Strong Law, Second Form) Let {ξn} be independent andidentically distributed r.v.’s with (the same) finite mean μ. Then,

1n

n∑i=1

ξi → μ a.s. as n→ ∞.

Proof Again, if the result holds when μ = 0, replacing ξi by (ξi –μ) showsthat it holds when μ � 0. Hence we assume that μ = 0.

Write ηn(ω) = ξn(ω) if |ξn(ω)| ≤ n, ηn(ω) = 0 otherwise (for n = 1, 2, . . .).First it will be shown that 1

n

∑n1(ξi – ηi)→ 0 a.s. We have

∞∑n=1

P(ξn � ηn) =∞∑

n=1

P(|ξn| > n) =∞∑

n=1

(1 – F(n))

where F is the (common) d.f. of the |ξn|. But 1 – F(n) ≤ 1 – F(x) for n – 1 <x ≤ n so that

∞∑n=1

(1 – F(n)) ≤∫ ∞

0(1 – F(x)) dx = E|ξ1| < ∞

by e.g. Ex. 9.16, so that∑

n P(ξn � ηn) < ∞. Hence by the Borel–CantelliLemma, for a.e. ω, ξn(ω) = ηn(ω) when n is sufficiently large and hence itfollows at once that 1

n

∑n1(ξi – ηi)→ 0 a.s.

Exercises 249

The proof will be completed by showing that 1n

∑n1 ηi → 0 a.s. Note first

that the variance of ηn satisfies

var(ηn) ≤ Eη2n =

∫|x|≤n

x2 dF(x)

since the |ξi| have d.f. F. Hence∞∑

n=1

n–2 var(ηn) ≤∞∑

n=1

n–2∫|x|≤n

x2 dF(x)

=∞∑

n=1

n–2n∑

k=1

∫{(k–1<|x|≤k)} x

2 dF(x)

=∞∑

k=1

∫{k–1<|x|≤k} x

2 dF(x)∞∑

n=k

n–2

≤∞∑

k=1

(C/k)∫{k–1<|x|≤k} x

2 dF(x)

where C is a constant such that∑∞

n=k 1/n2 < C/k for all k = 1, 2, . . . . (Itis easily proved that such a C exists – e.g. by dominating the sum by anintegral.) Hence

∞∑n=1

n–2 var(ηn) ≤∞∑

k=1

C∫{k–1<|x|≤k} |x| dF(x) = CE|ξ1| < ∞.

It thus follows from Theorem 11.6.2 (since the ηn are clearly independent)that n–1 ∑n

i=1(ηi – Eηi)→ 0 a.s. But Eηn = E(ξnχ|ξn |≤n) = Eξn – E(ξnχ|ξn |>n) =–E(ξnχ|ξn |>n) since Eξn = 0. Hence |Eηn| ≤ E(|ξn|χ|ξn |>n) =

∫ ∞n

x dF(x) →0 as n → ∞ (E|ξn| < ∞). Thus n–1 ∑n

i=1 Eηi → 0 so that by the aboven–1 ∑n

i=1 ηi → 0 a.s., as required to complete the proof. �

Exercises11.1 Let {ξn}∞n=1 be a sequence of r.v.’s with Eξ2

n < ∞ and let

μn = Eξn, σ2n = var(ξn).

If μn → μ and∑∞

1 σ2n < ∞, show that ξn → μ a.s.

11.2 Let {ξn}∞n=1 be a sequence of random variables on the probability space (Ω,F , P) and {cn}∞n=1 a sequence of positive numbers. Define the truncation ofξn at cn by ηn = ξn χAc

n , where

An = {ω ∈ Ω : |ξn(ω)| > cn}.

Prove that if∑∞

n=1 P(An) < ∞ and if ηn → ξ almost surely, then ξn → ξ

almost surely.


11.3 Prove that ξn → ξ in probability if and only if

limn→∞E(|ξn – ξ|

1 + |ξn – ξ|

)= 0.

11.4 The result of Ex. 11.3 may be expressed in terms of a “metric” d on the“space” of r.v.’s, provided we regard two r.v.’s which are equal a.s. as beingthe same in the space. Define d(ξ, η) = E

{ |ξ–η|1+|ξ–η|

}(d is well defined for any

ξ, η). Then d(ξ, η) ≥ 0 with equality only if ξ = η a.s., and d(ξ, η) = d(η, ξ)for all ξ, η. Show that the “triangle inequality” holds, i.e.

d(ξ, ζ) ≤ d(ξ, η) + d(η, ζ)

for any ξ, η, ζ. (Hint: For any a, b it may be shown that |a+b|1+|a+b| ≤

|b|1+|b| +

|a|1+|a| .)

Ex. 11.3 may then be restated as “ξn → ξ in probability if and only ifd(ξn, ξ)→ 0, i.e. ξn → ξ in this metric space”.

11.5 Show that the statement “If Eξn → 0 then ξn → 0 in probability” is false,though the statement “If ξn ≥ 0, and Eξn → 0 then ξn → 0 in probability”is true.

11.6 Let {ξn} be a sequence of r.v.’s. Show that there exist constants An such thatξn/An → 0 a.s.

11.7 If ξn → ξ a.s. show that given ε > 0 there exists M such that P{supn≥1 |ξn| ≤M} > 1 – ε.

11.8 Complement the uniqueness statement in Theorem 11.2.1 by showing ex-plicitly that if {πn : n = 1, 2, . . .}, π, π∗ are probability measures on (R,B)such that πn

w→ π, πnw→ π∗, then π = π∗ on B. (Consider the corresponding

d.f.’s.)11.9 Let {Fn} be a sequence of d.f.’s with corresponding probability measures

{πn}. Show directly from the definitions that if πnw→ π then Fn

w→ F. (Hint:Show that if a, x are continuity points of F then lim infn→∞ Fn(x) ≥ F(x) –F(a), and let a→ –∞.)

11.10 Show that in the definition πn(a, b] → π(a, b] for all finite a, b for weakconvergence of probability measures πn

w→ π, intervals (a, b] or open inter-vals (a, b) may be equivalently used. For example show that if πn

w→ π thenπn{b} → π{b} for any b such that π{b} = 0, and that this also holds under thealternative assumptions replacing semiclosed intervals by open or by closedintervals.

11.11 Prove the assertion needed in Corollary 1, Theorem 11.2.1 that if π is aprobability measure on B and g is a nonnegative bounded B-measurablefunction which is continuous a.e. (π) then a sequence {gn} of continuousfunctions may be found with 0 ≤ gn(x) ↑ g(x) at each continuity point xof g.This may be shown by defining continuous functions h1, h2, . . . such that0 ≤ hn(x) ≤ g(x) and supn hn(x) = g(x), and writing gn(x) = max1≤i≤n hi(x).

Exercises 251

(Hint: Consider hm,r defined for each integer m and rational r by hm,r(x) =min(r, m inf{|x – y| : g(y) ≤ r}) (inf(∅) = +∞).)

11.12 Let {ξn}∞n=1, ξ be r.v.’s with d.f.’s {Fn}∞n=1, F respectively. Assume that

ξnP→ ξ. Show that given ε > 0,

Fn(x) ≤ F(x + ε) + P{|ξn – ξ| ≥ ε}F(x – ε) ≤ Fn(x) + P{|ξn – ξ| ≥ ε}.

Hence show that ξnd→ ξ (by this alternative method to that of Theorem

11.3.1).11.13 Convergence in distribution does not necessarily imply convergence in

probability. However, if ξnd→ ξ and ξ(ω) = a, constant almost surely then

ξn → ξ in probability.

11.14 Let {ξn}, ξ be r.v.’s such that ξnd→ ξ.

(i) If each ξn is discrete, can ξ be absolutely continuous?(ii) If each ξn is absolutely continuous, can ξ be discrete?

11.15 Let {ξn}∞n=1 and ξ be random variables on (Ω,F , P) such that for each n andk = 0, 1, . . . , n,

P{ξn = k/n} = 1/(n + 1),

and ξ has the uniform distribution on [0, 1]. Prove that ξnd→ ξ.

11.16 Let {ξn}∞n=1 and ξ be random variables on (Ω,F , P) and let ξn = xn (constant)

a.s. for all n = 1, 2, . . . . Prove that ξnd→ ξ if and only if the sequence of real

numbers {xn}∞n=1 converges and ξ = limn xn a.s.11.17 Let the random variables {ξn}∞n=1 and ξ have densities {fn}∞n=1 and f respec-

tively with respect to Lebesgue measure m. If fn → f a.e. (m) on the real

line R, show that ξnd→ ξ. (Hint: Prove that fn → f in L1(R,B, m) by looking

at the positive and negative parts of f – fn.)11.18 Let {πn}∞n=1, π be probability measures on B. Show that if πn

w→ π then{πn}∞n=1 is tight.

11.19 Weak convergence of d.f.’s, may also be expressed in terms of a metric. IfF, G are d.f.’s, the “Levy distance” d(F, G) is defined by d(F, G) = inf{ε >0 : G(x – ε) – ε ≤ F(x) ≤ G(x + ε) + ε for all real x}, show that d is a metric,and Fn

w→ F if and only if d(Fn, F)→ 0.11.20 Prove Theorem 11.3.3, i.e. that for finite measures μn, μ on B, μn

w→ μ ifand only if μn

v→ μ and μn(R)→ μ(R) as n→ ∞.11.21 Suppose {ξu : u ∈ U}, {ηv : v ∈ V} are each uniformly integrable families.

Show that the family {ξu + ηv : u ∈ U, v ∈ V} is uniformly integrable.11.22 If the random variables {ξn}∞n=1 are identically distributed with finite means,

then ξn → ξ in probability if and only if ξn → ξ in L1.11.23 If the random variables {ξn}∞n=1 are such that supn E(|ξn|p) < ∞ for some

p > 1, show that {ξn}∞n=1 is uniformly integrable.

As a consequence, show that if the random variables {ξn}∞n=1 have uniformlybounded second moments, then ξn → ξ in probability if and only if ξn → ξ

in L1.11.24 Let {ξn} be r.v.’s with E|ξn| < ∞ for each n. Show that the family {ξn : n =

1, 2, . . .} is uniformly integrable if and only if the family {ξn : n ≥ N} isuniformly integrable for some integer N. Indeed this holds if given ε > 0there exist N = N(ε), A = A(ε) such that

∫{|ξn |≥a} |ξn| dP < ε for all n ≥

N(ε), a ≥ A(ε). Show that a corresponding statement holds for uniformabsolute continuity of the families {

∫E |ξn| dP : n ≥ 1} and {

∫E |ξn| dP :

n≥N}.11.25 Let {ξn}∞n=1 be a sequence of independent random variables such that ξn =

±1 each with probability 1/2 and let {an}∞n=1 be a sequence of real numbers.

(i) Find a necessary and sufficient condition for the series∑∞

n=1 anξn toconverge a.s.

(ii) If an = 2–n prove that∑∞

n=1 anξn has the uniform distribution over[–1, 1].

11.26 Let {ξn}∞n=1 be a sequence of independent random variables such that forevery n, ξn has the uniform distribution on [–n1/3, n1/3]. Find the probabilityof convergence of the series

∑∞n=1 ξn and of the sequence (1/n)

∑nk=1 ξk as

n→ ∞.11.27 The random series

∑∞n=1 ±1/n is formed where the signs are chosen indepen-

dently and the probability of a positive sign for the nth term is pn. Expressthe probability of convergence of the series in terms of the sequence {pn}∞n=1.

11.28 Let {ξn}∞n=1 be a sequence of independent r.v.’s such that each ξn has theuniform distribution on [an, 2an], an > 0. Show that the series

∑∞n=1 ξn con-

verges a.s. if and only if∑∞

n=1 an < ∞. What happens if∑∞

n=1 an = +∞?11.29 Let {ξn}∞n=1 be a sequence of nonnegative random variables such that for

each n, ξn has the density λne–λnx for x ≥ 0, where λn > 0.

(i) If∑∞

n=1 1/λn < ∞ show that∑∞

n=1 ξn < ∞ almost surely.(ii) If the random variables {ξn}∞n=1 are independent show that

∞∑n=1

1/λn < ∞ if and only if∞∑

n=1

ξn < ∞ a.s.

and∞∑

n=1

1/λn = ∞ if and only if∞∑

n=1

ξn = ∞ a.s.

11.30 Let {ξn}, {ξ*n} be two sequences of r.v.’s such that, for each n, the joint distri-

bution of (ξ1, . . . , ξn) is the same as that of (ξ*1, . . . , ξ*

n). Show that P{∑∞1 ξn

converges} = P{∑∞1 ξ*n converges}. (Hint: If D, D* denote respectively the

sets where∑ξn,

∑ξ*

n do not converge, use e.g. the expression for P(D) in

Exercises 253

the proof of Lemma 11.5.2, and the corresponding expression for P(D*) toshow that P(D) = P(D*).In particular this result applies if {ξn}, {ξ*

n} are each classes of independentr.v.’s and ξn has the same distribution as ξ*

n for each n – this is the case usedin Theorem 11.5.4.)

11.31 For any sequence of random variables {ξn}∞n=1 prove that

(i) if ξn → 0 a.s. then (1/n)∑n

k=1 ξk → 0 a.s.(ii) if ξn → 0 in Lp, p > 1, then (1/n)

∑nk=1 ξk → 0 in Lp and hence also in

probability.

11.32 Let {ξn}∞n=1 be a sequence of independent and identically distributed r.v.’swith

Eξn = μ � 0 and Eξ2n = a2 < ∞.

Find the a.s. limit of the sequence

ξ21 + · · · + ξ2

n

ξ1 + · · · + ξn.

11.33 Let {ξn}∞n=1 be a sequence of independent and identically distributed randomvariables and Sn =

∑n1 ξi. If E(|ξ1|) = +∞ prove that

lim supn→∞

|Sn|/n = +∞ a.s.

It then follows from the strong law of large numbers that (1/n)∑n

k=1 ξk con-verges a.s. if and only if E(|ξ1|) < +∞.(Hint: Use Ex. 9.15 to conclude that for every a > 0 the events {ω ∈ Ω :|ξn(ω)| ≥ an} occur infinitely often with probability one.)

12

Characteristic functions and central limittheorems

12.1 Definition and simple properties

This chapter is concerned with one of the most useful tools in probabilitytheory – the characteristic function of a r.v. (not to be confused with thecharacteristic function (i.e. indicator) of a set). We shall investigate prop-erties of such functions, and some of their many implications especiallyconcerning independent r.v.’s and central limit theory. Chapter 8 should bereviewed for the needed properties of integrals of complex-valued functionsand basic Fourier Theory.

If ξ is a r.v. on a probability space (Ω,F , P), eitξ(ω) is a complex F -measurable function (Chapter 8) (and therefore will be called a complexr.v.). The integration theory of Section 8.1 applies and Eξ will be used for∫ξ dP as for real r.v.’s. Since |eitξ | = 1 it follows that eitξ ∈ L1(Ω,F , P). The

function φ(t) =∫

eitξ(ω) dP(ω) (= Eeitξ) of the real variable t is termed thecharacteristic function (c.f.) of the r.v. ξ.

By definition, if ξ has d.f. F,

φ(t) = E cos tξ + iE sin tξ

=∫ ∞

–∞ cos tx dF(x) + i∫ ∞

–∞ sin tx dF(x)

=∫ ∞

–∞eitx dF(x).

Thus φ(t) is simply the Fourier–Stieltjes Transform F*(t) of the d.f. F of ξ(cf. Section 8.2). If F is absolutely continuous, with density f , it is imme-diate that

φ(t) =∫ ∞

–∞eitxf (x) dx,

showing that φ is the L1 Fourier Transform f †(t) of the p.d.f. f . If F isdiscrete, with mass pj at xj, j = 1, 2, . . . , then

φ(t) =∞∑j=1

pjeitxj .

254

12.1 Definition and simple properties 255

Some simple properties of a c.f. are summarized in the followingtheorem.

Theorem 12.1.1 A c.f. φ has the following properties

(i) φ(0) = 1,(ii) |φ(t)| ≤ 1, for all t ∈ R,

(iii) φ(–t) = φ(t), for all t ∈ R, where the bar denotes the complex conju-gate,

(iv) φ is uniformly continuous on R (cf. Theorem 8.2.1).

Proof(i) φ(0) = E1 = 1.(ii) |φ(t)| = |Eeitξ | ≤ E|eitξ | = E1 = 1, using Theorem 8.1.1 (iii).(iii) φ(–t) = Ee–itξ = Eeitξ = φ(t).(iv) Let t, s ∈ R, t – s = h. Then

|φ(t) – φ(s)| = |E(ei(s+h)ξ – eisξ)| = |Eeisξ(eihξ – 1)|≤ E|eihξ – 1| (|eisξ(ω)| = 1).

Now for all ω such that ξ(ω) is finite, limh→0 |eihξ(ω) – 1| = 0 and |eihξ(ω) –1| ≤ |eihξ(ω)| + 1 = 2 (which is P-integrable). Thus by dominated conver-gence, E|eihξ – 1| → 0 as h → 0. Finally this means that given ε > 0 thereexists δ > 0 such that E|eihξ – 1| < ε if |h| < δ. Thus |φ(t) – φ(s)| < ε for allt, s, such that |t – s| < δ which shows uniform continuity of φ(t) on R. �

The following result is simple but stated here for completeness.

Theorem 12.1.2 If a r.v. ξ has c.f. φ(t), and if a, b are real, then the r.v.η = aξ + b has c.f. eibtφ(at). In particular the c.f. of –ξ is φ(–t) = φ(t).

Proof

Eeit(aξ+b) = eitbEeitaξ = eibtφ(at). �

In Theorem 12.1.1 it was shown that φ(0) = 1 and |φ(t)| ≤ 1 for all t ifφ is a c.f. We shall see now that if |φ(t)| = 1 for any nonzero t then ξ mustbe a discrete r.v. of a special kind. We shall say that a r.v. ξ is of latticetype if there are real numbers a, b (b > 0) such that ξ(ω) belongs to the set{a + nb : n = 0,±1,±2, . . .} with probability one. The d.f. F of such a r.v.thus has jumps at some or all of these points and is constant between them.The corresponding c.f. is, writing pn = P{ξ = a + nb},

φ(t) =∞∑–∞

pnei(a+nb)t = eiat∞∑

–∞pneinbt.

Hence |φ(t)| = |∑∞–∞ pneinbt | is periodic with period 2π/b.

256 Characteristic functions and central limit theorems

Theorem 12.1.3 Let φ(t) be the c.f. of a r.v. ξ. Then one of the followingthree cases must hold:

(i) |φ(t)| < 1 for all t � 0,(ii) |φ(t0)| = 1 for some t0 > 0 and |φ(t)| < 1 for 0 < t < t0,

(iii) φ(t) = eiat for all t, some real a (and hence |φ(t)| = 1 for all t).

In case (ii), ξ is of lattice type, belonging to the set {a + n2π/t0 : n =0,±1, . . .} a.s., for some real a. The absolute value of its c.f. is then periodicwith period t0.

In case (iii), ξ = a a.s.Finally if ξ has an absolutely continuous distribution, then (i) holds. This

is also the case if ξ is discrete but not constant or of lattice type.

Proof Since |φ(t)| ≤ 1 it follows that either (i) holds or that |φ(t0)| = 1for some t0 � 0. Suppose the latter is the case. Then φ(t0) = eiat0 for somereal a. Consider the r.v. η = ξ – a. The c.f. of η is ψ(t) = e–iatφ(t) andψ(t0) = 1. Hence

1 = Eeit0η =∫

cos(t0η(ω)) dP(ω)

since the imaginary part must vanish (to give the real value 1). Hence∫[1 – cos(t0η(ω))] dP(ω) = 0.

The integrand is nonnegative and thus must vanish a.s. by Theorem 4.4.7.Hence cos(t0η(ω)) = 1 a.s., showing that

t0η(ω) ∈ {2nπ : n = 0,±1, . . .} a.s.

and thus

ξ(ω) ∈ {a + 2nπ/t0 : n = 0,±1, . . .} a.s.

Hence ξ is a lattice r.v.Now since we assume that (i) does not hold, either (ii) holds or else

every neighborhood of t = 0 contains such a t0 with |φ(t0)| = 1. In this casea sequence tk → 0 may be found such that ξ(ω) ∈ {ak+n2π/tk, n = 0,±1 . . .}a.s. (for some real ak), i.e. for each k, ξ belongs to a lattice whose pointsare 2π/tk apart.

At least one of the values a1 + 2nπ/t1 has positive probability, and if(ii) does not hold, there cannot be more than one. For if there were two,distance d apart we could choose k so that 2π/tk > d, and obtain a contra-diction since the values of ξ must also lie in a lattice whose points are 2π/tk

12.1 Definition and simple properties 257

apart. Thus if (ii) does not hold we have ξ = a a.s. where a is that one valueof a1 + 2nπ/t1 which has nonzero probability, and thus has probability 1.Hence (iii) holds and |φ(t)| = |eiat | = 1 for all t; indeed φ(t) = eiat. Notethat if (ii) or (iii) holds, ξ is discrete. Hence |φ(t)| < 1 for all t � 0 if ξ isabsolutely continuous. �

One of the most convenient properties of characteristic functions is thesimple means of calculating the c.f. of a sum of independent r.v.’s, as con-tained in the following result.

Theorem 12.1.4 Let ξ1, ξ2, . . . , ξn be independent r.v.’s with c.f.’s φ1, φ2,. . . , φn respectively. Then the c.f. φ of η = ξ1 + ξ2 + · · · + ξn is simply theproduct φ(t) = φ1(t)φ2(t) . . . φn(t).

Proof This follows by the analog of Theorem 10.3.5. For the complexr.v.’s eitξj , 1 ≤ j ≤ n, are obviously independent, showing that E∏n

1 eitξj =∏n1 Eeitξj . This may also be shown directly from that result by writing eitξj =

cos tξj + i sin tξj and using independence of (cos tξj, sin tξj) and (cos tξk,sin tξk) for j � k. �

We conclude this section with a few examples of c.f.’s.

(i) Degenerate distributionIf ξ = a (constant) a.s. then the c.f. of ξ is φ(t) = eita.

(ii) Binomial distribution

P(ξ = r) =(nr

)pr(1 – p)n–r, r = 0, 1, . . . , n, 0 < p < 1

φ(t) =n∑

r=0

(nr

)pr(1 – p)n–reitr =

n∑r=0

(nr

)(peit)r(1 – p)n–r

= (1 – p + peit)n = (q + peit)n, where q = 1 – p.

(iii) Uniform distribution on [–a, a].ξ has p.d.f. 1

2a , –a ≤ x ≤ a,

φ(t) = 12a

∫ a

–aeitx dx = eita–e–ita

2ita = sin atat

(φ(0) = 1).(iv) Normal distribution N(μ,σ2)

ξ has p.d.f. 1σ(2π)1/2 exp

{–(x–μ)2

2σ2

}φ(t) = 1

σ(2π)1/2

∫ ∞–∞eitx exp

{–(x–μ)2

2σ2

}dx.

This is perhaps most easily evaluated, first for μ = 0, σ = 1, as a contourintegral, making the substitution z = x – it to give

(2π)–1/2e–t2/2∫

Ce–z2/2 dz

where C is the line I(z) = –t (I denoting “imaginary part”). This may beevaluated along the real axis instead of C (by Cauchy’s Theorem) to givee–t2/2. If ξ is N(μ,σ2), η = (ξ – μ)/σ is N(0, 1) and thus has this c.f. e–t2/2.By Theorem 12.1.2, ξ thus has c.f. φ(t) = eiμt–σ2t2/2.

12.2 Characteristic function and moments

The c.f. of a r.v. ξ is very useful in determining the moments of ξ (whenthey exist), and the d.f. or p.d.f. of ξ. It is especially convenient to use thec.f. for either of these purposes when ξ is a sum of independent r.v.’s,

∑n1 ξi

say, for then the c.f. of ξ is simply obtained as the product of those of theξi’s. Both uses of the c.f. and related matters are explored here, first con-sidering the relation between existence of moments of ξ and of derivativesof φ.

Theorem 12.2.1 Let ξ be a r.v. with d.f. F and c.f. φ. If E|ξ|n < ∞ forsome integer n ≥ 1, then φ has a (uniformly) continuous derivative of ordern given by

φ(n)(t) = inE(ξneitξ) = in∫ ∞

–∞xneitx dF(x),

and, in particular, Eξn = φ(n)(0)/in.

Proof For any t, (φ(t + h) – φ(t))/h =∫

eitx(eihx – 1)/h dF(x). Since the func-tion (eihx – 1)/h → ix as h → 0 and |(eihx – 1)/h| = |

∫ x

0eihy dy| ≤ |x|, domi-

nated convergence shows that limh→0 (φ(t + h) – φ(t))/h =∫ ∞

–∞ix eitx

dF(x), i.e. the derivative φ′(t) exists, given by φ′(t) =∫ ∞

–∞ixeitx dF(x).The proof may be completed by induction using the same arguments.

Uniform continuity follows as for φ itself. �

Corollary If for some integer n ≥ 1, E|ξ|n < ∞ then, writing mk = Eξk,

φ(t) =n∑

k=0

(it)k

k!mk + o(tn) =

n–1∑k=0

(it)k

k!mk +

θtn

n!E|ξ|n

where θ = θt is a complex number with |θt| ≤ 1. (The “o(tn)” term above isto be taken as t → 0, i.e. o(tn) is a function ψ(t) such that ψ(t)/tn → 0 ast → 0.)

12.2 Characteristic function and moments 259

Proof The first relation follows at once from the Taylor series expansion

φ(t) =n∑

k=0

tk

k!φ(k)(0) + o(tn).

The second follows from the alternative Taylor expansion

φ(t) =n–1∑k=0

tk

k!φ(k)(0) +

tn

n!φ(n)(αt) (|α| < 1),

defining θ by

θE|ξ|n = φ(n)(αt) = (i)n∫ ∞

–∞xneitαx dF(x)

from which it follows that

|θ|E|ξ|n ≤∫ ∞

–∞|x|n dF(x) = E|ξ|n.

Thus |θ| ≤ 1 if E|ξ|n > 0, and in the degenerate case where E|ξ|n = 0, i.e.ξ = 0 a.s., we may clearly take θ = 0. �

The converse to Theorem 12.2.1 holds for derivatives and moments ofeven order, as shown in the following result (see also Exs. 12.12, 12.13,12.14).

Theorem 12.2.2 Suppose that, for some integer n ≥ 1, the c.f. φ(t) of ther.v. ξ has 2n finite derivatives at t = 0. Then E|ξ|2n < ∞.

Proof Consider first the second derivative (i.e. n = 1). Since φ′′ exists att = 0 we have

φ(t) = φ(0) + tφ′(0) +12

t2φ′′(0) + o(t2)

φ(–t) = φ(0) – tφ′(0) +12

t2φ′′(0) + o(t2)

and thus by addition of these two equations,

φ′′(0) = limt→0

φ(t) – 2φ(0) + φ(–t)t2

= limt→0

∫ ∞–∞

eitx – 2 + e–itx

t2 dF(x)

= –2 limt→0

∫ ∞–∞

1 – cos txt2 dF(x)

(F being the d.f. of ξ). But (1 – cos tx)/t2 → x2/2 as t → 0 and hence byFatou’s Lemma

–φ′′(0) = 2 limt→0

∫ ∞–∞

1 – cos txt2 dF(x) ≥

∫ ∞–∞x2 dF(x).

Since –φ′′(0) is (real and) finite it follows that∫

x2 dF(x) < ∞, i.e.Eξ2 < ∞.

The case for n > 1 may be obtained inductively from the n = 1 case asfollows. Suppose the result is true for (n – 1) and that φ(2n)(0) exists. ThenEξ2n–2 exists by the inductive hypothesis and by Theorem 12.2.1

φ(2n–2)(0) = (–)n–1∫ ∞

–∞x2n–2 dF(x).

If∫ ∞

–∞x2n–2 dF(x) = 0, F is the d.f. of the degenerate distribution with allits mass at zero, i.e. ξ = 0 a.s., so that the desired conclusion Eξ2n < ∞follows trivially. Otherwise write

G(x) =∫ x

–∞u2n–2 dF(u)/∫ ∞

–∞u2n–2 dF(u).

G is clearly a d.f. and has c.f. (writing λ–1 =∫ ∞

–∞u2n–2 dF(u))

ψ(t) =∫

eitx dG(x) = λ∫ ∞

–∞x2n–2eitx dF(x) = λ(–)n–1φ(2n–2)(t)

(λx2n–2 being the Radon–Nikodym derivative dμG /dμF ). Since φ(2n)(0)exists so does ψ′′(0) and by the first part of this proof (with n = 2 andψ for φ)

–ψ′′(0) ≥∫ ∞

–∞x2 dG(x) = λ∫ ∞

–∞x2n dF(x)

(Theorem 5.6.1). Thus∫

x2n dF(x) is finite as required. �

The corollary to Theorem 12.2.1 provides Taylor expansions of the c.f.φ(t) when n moments exist. The following is an interesting variant of suchexpansions when an even number of moments exists which sheds light onthe nature of the remainder term. It is given here for two moments (whichwill be useful in the central limit theory to be considered in Section 12.6).The extension to 2n moments is evident.

Lemma 12.2.3 Let ξ be a r.v. with zero mean, finite variance σ2, d.f. F,and c.f. φ. Then φ can be written as

φ(t) = 1 –12σ2t2ψ(t)

where ψ is a characteristic function. Specifically ψ corresponds to the p.d.f.

g(x) =2σ2

∫ ∞x

[1 – F(u)] du, x ≥ 0

=2σ2

∫ x

–∞F(u) du, x < 0.

12.3 Inversion and uniqueness 261

Proof Clearly g(x) ≥ 0. Further, using Fubini’s Theorem∫ ∞

0g(x) dx =

2σ2

∫ ∞0

dx∫ ∞

xdu

∫(u,∞)

dF(y)

=2σ2

∫(0,∞)

dF(y)∫ y

0du

∫ u

0dx

=1σ2

∫(0,∞)

y2 dF(y).

Similarly ∫ 0

–∞g(x) dx =1σ2

∫(–∞,0]

y2 dF(y)

and hence∫ ∞

–∞g(x) dx = 1. Thus g is a p.d.f. Now by the same inversion ofintegration order as above,

∫ ∞0

g(x)eitx dx =2σ2

∫(0,∞)

dF(y)∫ y

0du

∫ u

0eitx dx

=2

itσ2

∫(0,∞)

dF(y)∫ y

0(eitu – 1) du

=2

(it)2σ2

∫(0,∞)

(eity – 1 – ity) dF(y).

Similarly∫ 0

–∞g(x)eitx dx =2

(it)2σ2

∫(–∞,0]

(eity – 1 – ity) dF(y)

and hence the c.f. corresponding to g is

ψ(t) =∫ ∞

–∞eitxg(x) dx =2

σ2t2 (1 – φ(t))

since∫ ∞

–∞y dF(y) = Eξ = 0. Thus φ(t) = 1 – 12σ

2t2ψ(t), as required. �

Note that the conclusion of this lemma may be written as φ(t) = 1 –12σ

2t2 + 12 t2σ2(1 – ψ(t)). The final term is o(t2) as t → 0 since ψ(t) → 1

so that the standard representation φ(t) = 1 – 12σ

2t2 + o(t2) for a c.f. (withzero mean and finite second moments) also follows from this. However, thepresent result gives a more specific form for the o(t2) term since ψ is knownto be a c.f.

12.3 Inversion and uniqueness

The c.f. completely characterizes the distribution by specifying the d.f. Fprecisely. In fact since φ is the Fourier–Stieltjes Transform of F, this may

be shown from the inversion formulae of Sections 8.3 and 8.4, which aresummarized as follows.

Theorem 12.3.1 Let φ be the c.f. of a r.v. ξ with d.f. F. Then

(i) If F(x) = 12 (F(x) + F(x – 0)), for any a < b,

F(b) – F(a) = limT→∞

12π

∫ T

–T

e–ibt – e–iat

–itφ(t) dt

and for any real a the jump of F at a is

F(a) – F(a – 0) = limT→∞

12T

∫ T

–Te–iatφ(t) dt.

(ii) If φ ∈ L1, then F is absolutely continuous with p.d.f.

f (x) =1

2π

∫ ∞–∞e–ixtφ(t) dt a.e.

f is continuous and thus also is the (continuous) derivative of F ateach x.

(iii) If F is absolutely continuous with p.d.f. f which is of bounded varia-tion in a neighborhood of some given point x, then

12{f (x + 0) + f (x – 0)} = lim

T→∞

12π

∫ T

–Te–ixtφ(t) dt.

If φ ∈ L1 this may again be written as 12π

∫ ∞–∞e–ixtφ(t) dt.

Proof (i) follows from Theorem 8.3.1.(ii) It follows from Theorem 8.3.3 that F(x) =

∫ x

–∞f (u) du where f , de-fined as 1

2π

∫e–ixtφ(t) dt, is real, continuous, and in L1. We need to show

that f is nonnegative, whence it will follow that f is a p.d.f. for F. But iff were negative for some x it would, by continuity, be negative in a neigh-borhood of that x and hence F would be decreasing in that interval. Thusf (x) ≥ 0 for all x. Finally since f is continuous it follows at once thatF′(x) = d

dx

∫ x

–∞f (u) du = f (x) for each x.(iii) just restates Theorem 8.4.2 and its corollary. �

Theorem 12.3.1 shows that there is a one-to-one correspondence be-tween d.f.’s and their c.f.’s and this is now stated separately.

Theorem 12.3.2 (Uniqueness Theorem) The c.f. of a r.v. uniquely deter-mines its d.f., and hence its distribution, and vice versa, i.e. two d.f.’s F1, F2

are identical if and only if their c.f.’s φ1, φ2 are identical.

Proof It is clear that F1 ≡ F2 implies φ1 ≡ φ2. For the converse assumethat φ1 ≡ φ2. Then by Theorem 12.3.1 (i), F1(b) – F1(a) = F2(b) – F2(a) for

12.4 Continuity theorem for characteristic functions 263

all a, b and hence, letting a → –∞, F1(b) = F2(b) for all b. But, for anyd.f. F, limb↓x F(b) = F(x + 0) = F(x) and thus, for all x,

F1(x) = limb↓x

F1(b) = limb↓x

F2(b) = F2(x)

as required. �

12.4 Continuity theorem for characteristic functions

In this section we shall relate weak convergence of the previous chapter topointwise convergence of c.f.’s. It will be useful to first prove the followingtwo results.

Lemma 12.4.1 If ξ is a r.v. with d.f. F and c.f. φ, there exists a constantC > 0 such that for all a > 0

P{|ξ| ≥ a} =∫|x|≥a

dF(x) ≤ Ca∫ a–1

0R[1 – φ(t)] dt

(R denoting “real part”). C does not depend on ξ.

Proof

a∫ a–1

0R(1 – φ(t)) dt = a

∫ a–1

0{∫ ∞

–∞(1 – cos tx) dF(x)} dt

=∫ ∞

–∞{a∫ a–1

0(1 – cos tx) dt} dF(x) (Fubini)

=∫ ∞

–∞

(1 –

sin a–1xa–1x

)dF(x) ≥

∫|a–1x|≥1

(1 –

sin a–1xa–1x

)dF(x)

≥ inf|t|≥1

(1 –

sin tt

) ∫|x|≥a

dF(x)

which gives the desired result if C–1 = inf|t|≥1

(1 – sin t

t

). (Note that C–1 =

1 – sin 1 so that C is approximately 6.3.) �

The next result uses this one to provide a convenient necessary and suf-ficient condition for tightness of a sequence of d.f.’s in terms of their c.f.’s.

Theorem 12.4.2 Let {Fn} be a sequence of d.f.’s with c.f.’s {φn}. Then {Fn}is tight if and only if lim supn→∞ R(1 – φn(t))→ 0 as t → 0.

Proof If {Fn} is tight we may, given ε > 0, choose A so that Fn(–A) < ε/8,1 – Fn(A) < ε/8 for all n and hence

R[1 – φn(t)] =∫ ∞

–∞(1 – cos tx) dFn(x) ≤∫|x|≤A

(1 – cos tx) dFn(x) + ε/2.

Now if a > 0 and aA < π, 1 – cos tx ≤ 1 – cos aA for |x| ≤ A, |t| ≤ a andthus

R[1 – φn(t)] ≤ (1 – cos aA) + ε/2

when |t| ≤ a. Hence lim supn→∞ R[1 – φn(t)] < ε for |t| ≤ a if a is chosen sothat 1 – cos aA < ε/2, giving the desired conclusion.

Conversely suppose that lim supn→∞ R[1 – φn(t)] → 0 as t → 0. ByLemma 12.4.1 there exists C such that for any a > 0,

∫|x|≥a

dFn(x) ≤ Ca∫ a–1

0R[1 – φn(t)] dt.

Hence by Fatou’s Lemma (Theorem 4.5.4) applied to 2 –R[1 –φn(t)], or byEx. 4.17,

lim supn→∞

∫|x|≥a

dFn(x) ≤ Ca∫ a–1

0lim sup

n→∞R[1 – φn(t)] dt.

But given ε > 0 the integrand on the right tends to zero by assumptionand hence may be taken less than ε/C for 0 ≤ t ≤ a–1 if a = a(ε) ischosen to be large, and hence lim supn→∞

∫|x|≥a

dFn(x) < ε. Thus there ex-ists N such that

∫|x|≥a

dFn(x) < ε for all n ≥ N. Since the finite familyF1, F2, . . . , FN–1 is tight,

∫|x|>a′

dFn(x) < ε for some a′, n = 1, 2, . . . , N – 1and hence

∫|x|>A

dFn(x) < ε for all n if A = max{a, a′}. This exhibits therequired tightness of {Fn}. �

The following is the main result of this section (characterizing weakconvergence in terms of c.f.’s).

Theorem 12.4.3 (Continuity Theorem for c.f.’s) Let {Fn} be a sequenceof d.f.’s with c.f.’s {φn}.

(i) If F is a d.f. with c.f. φ and if Fnw→ F then φn(t)→ φ(t) for all t ∈ R.

(ii) Conversely if φ is a complex function such that φn(t) → φ(t) for allt ∈ R and if φ is continuous at t = 0, then φ is the c.f. of a d.f. F andFn

w→ F.

Proof(i) If Fn

w→ F then by Theorem 11.2.1,∫ ∞–∞ cos tx dFn(x)→

∫ ∞–∞ cos tx dF(x) and∫

sin tx dFn(x)→∫

sin tx dF(x)

and hence∫ ∞

–∞eitx dFn(x)→∫ ∞

–∞eitx dF(x), or φn(t)→ φ(t), as required.

12.5 Some applications 265

(ii) Since φn(t)→ φ(t) for all t, we have φ(0) = lim φn(0) = 1 and

lim supn→∞

R[1 – φn(t)] = 1 – R[φ(t)] → 0 as t → 0

since φ is continuous at t = 0. Thus by Theorem 12.4.2, {Fn} is tight.If now {Fnk } is any weakly convergent subsequence of {Fn}, Fnk

w→ Fsay where F has c.f. ψ, then, by (i), ψ(t) = limk→∞ φnk (t) = φ(t). Hence Fhas c.f. φ. Thus every weakly convergent subsequence has the same weaklimit F (determined by the c.f. φ), and the tight sequence {Fn} thereforeconverges weakly to F by Theorem 11.2.5, concluding the proof. �

Corollary If {ξn} is a sequence of r.v.’s with d.f.’s {Fn} and c.f.’s {φn}, and

if ξ is a r.v. with d.f. F and c.f. φ, then ξnd→ ξ (Fn

w→ F) if and only ifφn(t)→ φ(t) for all real t.

This follows at once from the theorem since φ is a c.f. and hence contin-uous at t = 0.

12.5 Some applications

In this section we give some applications of the continuity theorem forcharacteristic functions, beginning with a useful condition for a sequenceof r.v.’s to converge in distribution to zero. By Theorem 12.4.3, Corollary,this is equivalent to the convergence of their c.f.’s to one on the entire realline. As shown next it suffices for this special case that the sequence ofc.f.’s converges to one in some neighborhood of zero.

Theorem 12.5.1 If {ξn} is a sequence of r.v.’s with c.f.’s {φn}, the followingare equivalent

(i) ξn → 0 in probability,

(ii) ξnd→ 0,

(iii) φn(t)→ 1 for all t,(iv) φn(t)→ 1 in some neighborhood of t = 0.

Proof The equivalence of (i) and (ii) is already known from Ex. 11.13. If

ξnd→ 0 then by Theorem 12.4.3, φn(t)→ 1 for all t, so that (ii) implies (iii).

Since (iii) implies (iv) trivially the proof will be completed by showing that(iv) implies (ii).

Suppose then that for some a > 0, φn(t) → 1 for all t ∈ [–a, a]. Thenlim supn R(1–φn(t)) = 0 for |t| ≤ a and thus Theorem 12.4.2 applies triviallyto show that the sequence {Fn} is tight (where Fn is the d.f. of ξn). Let {Fnk }be any weakly convergent subsequence of {Fn}, Fnk

w→ F, say, where F has

c.f. φ. Then φnk (t) → φ(t) for all t by Theorem 12.4.3 and hence φ(t) = 1for |t| ≤ a. Thus by Theorem 12.1.3, φ(t) = eibt for all t (some b) and sinceφ(t) = 1 for |t| < a it follows that b = 0 and φ(t) = 1 for all t so that F(x) iszero for x < 0 and one for x ≥ 0. This means that any weakly convergentsubsequence of the tight sequence {Fn} has the weak limit F and henceby Theorem 11.2.5, Fn

w→ F. This, restated, is the desired conclusion (ii),

ξnd→ 0. �

Note that it is not true in general that if a sequence {φn} of c.f.’s convergesto a c.f. φ in some neighborhood of t = 0 then it converges to φ for all t. Itis true, however, as shown in this proof, in the special case where φ ≡ 1.(Cf. Ex. 12.26 also.)

In Theorem 11.5.5 it was shown that convergence of a series of inde-pendent r.v.’s in probability implies a.s. convergence. The following resultshows that convergence in distribution is even sufficient for a.s. conver-gence in such a case. It also provides a single necessary and sufficientcondition, expressed in terms of c.f.’s, for a.s. convergence of a series ofindependent r.v.’s and should thus be compared with Kolmogorov’s ThreeSeries Theorem 11.5.4.

Theorem 12.5.2 Let {ξn} be a sequence of independent r.v.’s with c.f.’s{φn}. Then the following are equivalent

(i) The series∑∞

1 ξn converges a.s.(ii)

∑∞1 ξn converges in probability.

(iii)∑∞

1 ξn converges in distribution.(iv) The products

∏nk=1 φk(t) converge to a nonzero limit as n → ∞, in

some neighborhood of the origin.

Proof That (i) and (ii) are equivalent follows from Theorem 11.5.5.Clearly (ii) implies (iii), and (iii) implies (iv). The proof will be completedby showing that (iv) implies (ii).

If (iv) holds,∏n

k=1 φk(t) → φ(t), say, where φ(t) � 0 for t ∈ [–a, a],some a > 0. Let {mk}, {nk} be sequences tending to infinity as k → ∞, withnk > mk. Then

nk∏j=mk

φj(t) =nk∏j=1

φj(t)/mk–1∏j=1

φj(t) → 1 as k → ∞ for |t| ≤ a.

By Theorem 12.5.1,∑nk

j=mkξj → 0 in probability. Since {mk} and {nk} are

arbitrary sequences it is clear that∑n

1 ξj is Cauchy in probability and hence∑∞1 ξj is convergent in probability, concluding the proof of the theorem. �

12.5 Some applications 267

The weak law of large numbers is, of course, an immediate corollaryof the strong law (Theorem 11.6.3). However, as noted in Section 11.6, itis useful to also obtain it directly since the use of c.f.’s gives a very easyproof.

Theorem 12.5.3 Let {ξn} be a sequence of independent r.v.’s with the samed.f. F and finite mean μ. Then

1n

n∑i=1

ξi → μ in probability as n→ ∞.

Proof If φ is the c.f. of each ξn, the c.f. of Sn =∑n

1 ξi is (φ(t))n and thatof Sn/n is ψn(t) = (φ(t/n))n. But since φ(t) = 1 + iμt + o(t) (Theorem 12.2.1,Corollary) we have, for any fixed t, φ( t

n ) = 1 + iμ tn + o( 1

n ) as n → ∞ andthus

ψn(t) =(1 + iμ

tn

+ o(1n

))n

.

It is well known (and if not should be made so!) that the right hand sideconverges to eiμt as n → ∞. Since eiμt is the c.f. of the constant r.v. μ it

follows that Sn/nd→ μ (by Theorem 12.4.3, Corollary) and by Ex. 11.13,

n–1Sn → μ in probability. �

The weak law of large numbers just proved shows that the average1n

∑n1 ξj independent and identically distributed (i.i.d.) r.v.’s is likely to lie

close to μ = Eξ1 as n becomes large. On the other hand, the simple formof the central limit theorem (CLT) to be given next shows how a limit-ing distribution may be obtained for 1

n

∑n1 ξj (suitably normalized). A more

general form of the central limit theorem is given in the next section.

Theorem 12.5.4 (Central Limit Theorem – Elementary Form) Let {ξn} bea sequence of independent r.v.’s with the same distribution and with finitemean μ and variance σ2. Then the sequence of normalized r.v.’s

Zn =1

σ√

n

n∑j=1

(ξj – μ) =√

nσ

(1n

n∑1

ξj – μ)

converges in distribution to a standard normal r.v. Z (p.d.f. (2π)–1/2e–x2/2).

Proof Write Zn = n–1/2 ∑n1 ηj where ηj = (ξj – μ)/σ are independent with

zero means, unit variances and the same d.f. Let φ(t) denote their commonc.f. which may (by Theorem 12.2.1, Corollary) be written as

φ(t) = 1 – t2/2 + o(t2).


The c.f. of Zn is by Theorems 12.1.2, 12.1.4

ψn(t) = [φ(tn–1/2)]n

which may therefore (for fixed t, as n → ∞) be written, by the corollary toTheorem 12.2.1

ψn(t) =[1 –

t2

2n+ o

(1n

)]n

→ e–t2/2 as n→ ∞.

Since this limit is the c.f. corresponding to the standard normal distribution

(Section 12.1), Znd→ Z by Theorem 12.4.3. �

12.6 Array sums, Lindeberg–Feller Central Limit Theorem

As seen in the elementary form of the CLT (Theorem 12.5.4) the partialsums

∑n1 ξi of i.i.d. r.v.’s with finite second moments have a normal limit

when standardized by means and standard deviations i.e.

1σ√

n

⎛⎜⎜⎜⎜⎜⎝n∑1

ξj – nμ

⎞⎟⎟⎟⎟⎟⎠ d→ N(0, 1).

A more general form of the result allows the ξi to have different distribu-tions with finite second moments and gives necessary and sufficient con-ditions for this normal limit. This is the Lindeberg–Feller result.

It is useful to generalize further by considering a triangular array {ξni :1 ≤ i ≤ kn, n ≥ 1}, independent in i for each n rather than just a singlesequence (but including that case – with kn = n, ξni = ξi) and consider thelimiting distribution of

∑kni=1 ξni. This is an extensively studied area, “Cen-

tral Limit Theory”, where the types of possible limit for such sums areinvestigated. For the case of pure sums (ξni = ξi) the limits are so-called“stable” r.v.’s (if ξ, η are i.i.d. with a stable distribution G, then the linearcombination αξ + βη, α > 0, β > 0, has the distribution G(ax + b), somea > 0, b).

For array sums the possible limits are (under natural conditions) themore general “infinitely divisible laws” corresponding to r.v.’s which maybe split up as the sum of n i.i.d. components for any n. Here we look atjust the special case of the normal limit for array sums under the so-calledLindeberg conditions using a proof due to W.L. Smith. The followinglemma will be useful in proving the main theorem. When unstated the rangeof j in a sum, or product is from j = 1 to kn.

Lemma 12.6.1 Let kn → ∞ and let {anj : 1 ≤ j ≤ kn, n = 1, 2, . . .} becomplex numbers such that

12.6 Array sums, Lindeberg–Feller Central Limit Theorem 269

(i) maxj |anj| → 0 and(ii)

∑j |anj| ≤ K all n, some K > 0.

Then∏

j(1 – anj) exp(∑

j anj)→ 1 as n→ ∞.

Proof This is perhaps most simply shown by use of the expansion

log(1 – z) = –z + ψ(z), |ψ(z)| ≤ A|z|2

for complex z, |z| < 1, valid for the “principal branch” of the logarithm. Itmay alternatively be shown from the version of this for real z, avoiding themultivalued logarithm but requiring more detailed calculation.

Using the above expansion we have, for sufficiently large n,

| log{∏

j

(1 – anj) exp(∑

j

anj)}| = |∑

j

(log(1 – anj) + anj)|

≤ A∑

j

|anj|2

≤ A(maxj|anj|)

∑j

|anj|

which tends to zero by the assumptions and hence the result∏

j(1 – anj) ×exp(

∑j anj)→ 1 as required. �

Theorem 12.6.2 (Array Form of Lindeberg–Feller Central Limit Theorem)Let {ξnj, 1 ≤ j ≤ kn, n = 1, 2, . . .} be a triangular array of r.v.’s, independentin j for each n, d.f. Fnj, mean zero and finite variance σ2

nj such that s2n =∑

j σ2nj → 1 as n → ∞. Let ξ be a standard normal (N(0, 1)) r.v. Then∑

j ξnjd→ ξ and maxj σ

2nj → 0 if and only if the Lindeberg condition (L)

holds, viz.,∑j

∫(|x|>ε) x2 dFnj(x) (=

∑j

Eξ2nj χ(|ξnj |>ε)

)→ 0 as n→ ∞, each ε > 0. (L)

Proof Note first that (L) implies that maxj σ2nj → 0 since clearly

maxj σ2nj ≤ ε2 +

∑j E{ξ2

nj χ(|ξnj |>ε)}. Hence maxj σ

2nj → 0 may be assumed

as a basic condition in the proof in both directions.Now let φnj be the c.f. of ξnj and ψnj the corresponding c.f. determined as

in Lemma 12.2.3, i.e.

φnj(t) = 1 –12σ2

njt2ψnj(t).

Then the c.f. of ζn =∑

j ξnj is

Φn(t) =∏

j

φnj(t) =∏

j

(1 –12σ2

njt2ψnj(t)).

It is easily checked that the conditions of Lemma 12.6.1 are satisfied withanj = σ2

njt2ψnj(t)/2 so that

Φn(t) exp(t2

2s2

nΨn(t)) → 1

where Ψn(t) = s–2n

∑j σ

2njψnj(t). Since s2

n → 1, if Ψn(t) → 1 it follows thatΦn(t)→ e–t2/2. Conversely if Φn(t)→ e–t2/2 clearly exp( t2

2 s2n(Ψn(t) – 1))→ 1

(since sn → 1, so that Ψn(t) → 1). Hence Φn(t) → e–t2/2 if and only ifΨn(t) → 1. But Ψn(t) is a convex combination of the c.f.’s ψnj (

∑σ2

nj = s2n)

and hence is clearly itself a c.f. for each n (see also next section). Thusζn =

∑j ξnj (with c.f. Φn) converges in distribution to a standard normal r.v.

if and only if Ψn(t) → 1 for each t or equivalently if and only if the d.f.Gn corresponding to Ψn converges weakly to U(x) = 0 for x < 0 and 1 forx ≥ 0.

Now it follows from Lemma 12.2.3 that Ψn corresponds to the p.d.f. gn

(d.f. Gn) where for x > 0

gn(x) =2s2

n

∑j

∫ ∞x

(1 – Fnj(u)) du.

Using the same inversions of integration as in Lemma 12.2.3 (or integrationby parts) it follows readily that for any ε > 0

∫ ∞ε

gn(x) dx =1s2

n

∑j

∫ ∞ε

(u – ε)2 dFnj(u).

This and the corresponding result for x < 0 (and noting sn → 1) show thatGn

w→ U if and only if for each ε > 0∑j

∫|x|>ε(|x| – ε)

2 dFnj(x) → 0 as n→ ∞. (L′)

Now (L′) has the same form as (L) with integrand (|x| – ε)2 instead ofx2 in the same range (|x| > ε). But in this range 0 < |x| – ε < |x| so that(|x| – ε)2 ≤ x2 and hence (L) implies (L′). Conversely if (L′) holds for eachε > 0 it holds with ε/2 instead of ε and hence (reducing the integrationrange) ∑

j

∫|x|>ε(|x| – ε/2)2 dFnj(x) → 0.

12.7 Recognizing a c.f. – Bochner’s Theorem 271

But in the range |x| > ε, 1 – ε/(2|x|) > 1/2 so that

(|x| – ε/2)2 = x2(1 –

ε

2|x|

)2

> x2/4

so that (L) holds. Thus (L) and (L′) are equivalent, completing the proof.�

Corollary 1 (“Standard” Form of Lindeberg–Feller Theorem) Let {ξn}be independent r.v.’s with d.f.’s {Fn}, zero means, and finite variances {σ2

n}with σ2

1 > 0. Write s2n =

∑nj=1 σ

2j . Then s–1

n∑n

j=1 ξjd→ ξ, standard normal,

and max1≤j≤n σ2j /s2

n → 0 if and only if the Lindeberg condition

s–2n

n∑j=1

∫|x|>εsn

x2 dFj(x) → 0 as n→ ∞, each ε > 0. (L′′)

Proof This follows from the theorem by writing ξnj = ξj/sn, 1≤ j≤ n,n = 1, 2, . . .. �

The theorem may also be formulated for r.v.’s with nonzero means in theobvious way:

Corollary 2 If {ξn} are independent r.v.’s with d.f.’s {Fn}, means {μn},and finite variances {σ2

n} with σ21 > 0, s2

n =∑n

j=1 σ2j , maxj σ

2j /s2

n → 0,then a necessary and sufficient condition for 1

sn

∑nj=1(ξj – μj) to converge in

distribution to a standard normal r.v. is the Lindeberg condition

1s2

n

n∑j=1

∫|x–μj |>εsn

(x – μj)2dFj(x) → 0 as n→ ∞ for each ε > 0. (L′′′)

12.7 Recognizing a c.f. – Bochner’s Theorem

A characteristic function is the Fourier–Stieltjes Transform of a d.f. It issometimes important to know whether a given complex-valued function isa c.f. or not (i.e. whether it can be written as such a transform) and oftenthis will not be immediately obvious. We shall, below, give necessary andsufficient conditions in terms of “positive definite” functions (Bochner’sTheorem). This is a most useful characterization for theoretical purposes –especially concerning applications to stationary stochastic processes – butit is not so readily used in the practical situation of recognizing whethera given function is a c.f. from its functional form. A simple sufficient cri-terion which is occasionally very useful in recognizing special types of c.f.is given in Theorem 12.7.4.

First of all it should be noted that c.f.’s may sometimes be recognizedby virtue of being certain combinations of known c.f.’s (see also [Chung]).For example, if φj(t), j = 1, . . . , n, are c.f.’s we know that

∏n1 φj(t) is a c.f.

(Theorem 12.1.4). So is any “convex combination”∑n

1 αjφj(t) (αj ≥ 0,∑n1 αj = 1) which corresponds to the “mixed” d.f.

∑n1 αjFj(x) if φj corre-

sponds to Fj. Indeed, we may have an infinite convex combination – asshould be checked. (See also Ex. 12.11.)

Of course, if φ is a c.f. so is eibtφ(at) for any real a, b (Theorem 12.1.2),and φ(–t). But φ(t) = φ(–t) and thus |φ(t)|2 = φ(t)φ(–t) is a c.f. also.

In all cases mentioned the reader should determine what r.v.’s the indi-cated c.f.’s correspond to, where possible. For example, if ξ, η are indepen-dent with the same d.f. F (and c.f. φ) it should be checked that the c.f. ofξ – η is |φ(t)|2.

Both Bochner’s Theorem and the criterion for recognizing certain c.f.’swill be consequences of the following lemma.

Lemma 12.7.1 Let φ(t) be a continuous complex function on R withφ(0) = 1, |φ(t)| ≤ 1 for all t and such that for all T

g(λ, T) =1

2π

∫ T

–Tμ(t/T)φ(t)e–iλt dt

is real and nonnegative for each real λ where μ(t) is 1 – |t| for |t| ≤ 1 andzero for |t| > 1. Then

(i) for each fixed T, g(λ, T) is a p.d.f. with corresponding c.f. φ(t)μ(t/T).

(ii) φ(t) is a c.f.

Proof (ii) will follow at once from (i) by Theorem 12.4.3 since φ(t) =limT→∞ φ(t)μ(t/T) (μ(t/T)→ 1 as T → ∞) and φ is continuous at t = 0.

To prove (i) we first show that g(λ, T) is integrable, i.e.∫ ∞

–∞g(λ, T) dλ<∞since g is assumed nonnegative. Let M > 0. Then (

∫=

∫ ∞–∞)

∫g(λ, T)μ(

λ

2M) dλ =

12π

∫μ(

λ

2M)(∫μ(

tT

)φ(t)e–iλt dt)

dλ.

By the definition of μ(t), both ranges of integration are really finite andsince the integrand is bounded (|φ(t)| ≤ 1) the integration order may be


changed to give∫

g(λ, T)μ(λ

2M) dλ =

12π

∫μ(

tT

)φ(t)(∫μ(

λ

2M)e–iλt dλ

)dt

=1

2π

∫μ(

tT

)φ(t)(∫ 2M

–2M(1 –

|λ|2M

)e–iλt dλ)

dt

=1π

∫μ(

tT

)φ(t)(∫ 2M

0(1 –

λ

2M) cos λt dλ

)dt

since cos λt is even, and sin λt is odd. Integration by parts then gives

∫g(λ, T)μ(

λ

2M) dλ =

Mπ

∫μ(

tT

)φ(t)(sin Mt

Mt

)2

dt

≤ Mπ

∫ (sin Mt

Mt

)2

dt (|φ(t)| ≤ 1, μ(tT

) ≤ 1)

=1π

∫ (sin t

t

)2

dt = 1,

as is well known. Now, letting M → ∞, monotone convergence (μ( λ2M ) ↑ 1)

gives∫

g(λ, T) dλ ≤ 1.Thus g(λ, T) ∈ L1(–∞,∞). To see that its integral is in fact equal to one,

note that as defined g(λ, T) is a Fourier Transform∫ (

12πμ( t

T )φ(t))

e–iλt dt ofthe L1-function 1

2πμ( tT )φ(t) (zero for |t| > T). Since g(λ, T) is itself in L1,

inversion (from Theorem 8.3.4 with obvious sign changes) gives1

2πμ(

tT

)φ(t) =1

2π

∫e+iλtg(λ, T) dλ.

This holds a.e. and hence for all t, since both sides are continuous. In par-ticular t = 0 gives ∫

g(λ, T) dλ = φ(0) = 1

so that g(λ, T) is a p.d.f. with the corresponding c.f.∫

eiλtg(λ, T) dλ =μ( t

T )φ(t), which completes the proof of (i), and thus of the lemma also. �

Corollary The function ψ(t) = 1 – |t|/T for |t| ≤ T, and zero for |t| > T isa c.f.

Proof Take φ(t) ≡ 1 in the lemma and note (cf. proof) that

12π

∫ T

–T

(1 –|t|T

)e–iλt dt =

T2π

(sin Tλ/2

Tλ/2

)2

≥ 0. �

We shall now obtain Bochner’s Theorem as a consequence of this lemma.For this it will first be necessary to define and state some simple propertiesof positive definite functions.


A complex function f (t) (t ∈ R) will be called positive definite (or non-negative definite) if for any integer n = 1, 2, 3, . . . , and real t1, . . . , tn andcomplex z1, . . . , zn we have

n∑j,k=1

f (tj – tk)zjzk ≥ 0 (12.1)

(“≥ 0” is here used as a shorthand for the statement “is real and ≥ 0”). No-tice that by a well known result in positive definite quadratic forms, (12.1)implies that the determinant of the matrix {f (tj – tk)}nj,k=1 is nonnegative. Theneeded simple properties of a positive definite function are given in thefollowing theorem.

Theorem 12.7.2 If f (t) is a positive definite function, then

(i) f (0) ≥ 0,(ii) f (–t) = f (t) for all t,

(iii) |f (t)| ≤ f (0) for all t,(iv) |f (t + h) – f (t)|2 ≤ 4f (0)|f (0) – f (h)| for all t, h,(v) f (t) is continuous for all t (indeed uniformly continuous) if it is con-

tinuous at t = 0.

Proof(i) That f (0) is real and nonnegative follows by taking n = 1, t1 = 0, z1 = 1

in (12.1).(ii) If n = 2, t1 = 0, t2 = t, z1 = z2 = 1 we obtain 2f (0) + f (t) + f (–t) ≥ 0

from (12.1), and hence f (t) + f (–t) is real (= α, say).If n = 2, t1 = 0, t2 = t, z1 = 1, z2 = i we see that if (t) – if (–t) is real and

hence f (t) – f (–t) is purely imaginary (= iβ, say).Thus f (t) = 1

2 (α + iβ) and f (–t) = 12 (α – iβ), giving f (–t) = f (t).

(iii) If t1 – t2 = t, nonnegativity of the determinant of the matrix{f (tj – tk)}j,k=1,2 gives f 2(0) ≥ f (t)f (–t) = |f (t)|2 so that |f (t)| ≤ f (0).

(iv) If n = 3, t1 = 0, t2 = t, t3 = t + h, then

det{f (tj – tk)}3j,k=1 =

∣∣∣∣∣∣∣∣∣f (0) f (–t) f (–t – h)f (t) f (0) f (–h)f (t + h) f (h) f (0)

∣∣∣∣∣∣∣∣∣ ≥ 0

gives

f 3(0) – f (0)|f (t)|2 – f (0)|f (t + h)|2 – f (0)|f (h)|2 + 2R[f (t)f (h)f (t + h)] ≥ 0


and thus, with obvious use of (iii),

f (0)|f (t + h) – f (t)|2 = f (0)|f (t + h)|2 + f (0)|f (t)|2 – 2f (0)R[f (t)f (t + h)]

≤ f 3(0) – f (0)|f (h)|2 + 2R[f (t)f (t + h){f (h) – f (0)}]≤ 2f 2(0){f (0) – |f (h)|} + 2f 2(0)|f (0) – f (h)|≤ 4f 2(0)|f (0) – f (h)|

from which the desired inequality follows (even if f (0) = 0, by (iii)).(v) is clear from (iv). �

Theorem 12.7.3 (Bochner’s Theorem) A complex function φ(t) (t ∈ R)is a c.f. if and only if it is continuous, positive definite, and φ(0) = 1. ByTheorem 12.7.2 (v) continuity for all t may be replaced by continuity att = 0.

Proof If φ is a c.f., it is continuous and φ(0) = 1. If t1, . . . , tn are real andz1, . . . , zn complex (writing φ(t) =

∫eitx dF(x)) then

n∑j,k=1

φ(tj – tk)zjzk =∫

(n∑

j,k=1

ei(tj–tk)xzjzk) dF(x)

=∫|

n∑j=1

zjeitjx|2 dF(x) ≥ 0

and hence φ is positive definite.Conversely suppose that φ is continuous and positive definite with

φ(0) = 1. As in Lemma 12.7.1, define g(λ, T) = 12π

∫ T

–T(1 – |t|

T )φ(t)e–iλt dt.It is easy to see that g may be written as

g(λ, T) =1

2πT

∫ T

0

∫ T

0φ(t – u)e–iλ(t–u) dt du

(by splitting the square of integration into two parts above and below thediagonal t = u and putting t – u = s; see figure below). But this latterintegral involves a continuous integrand and may be evaluated as the limitof Riemann sums of the form (using the same dissection {tj} on each axis)

12πT

n∑j,k=1

φ(tj – tk)zjzk

with zj = e–iλtj (tj – tj–1). Since φ is positive definite such sums are non-negative and hence so is g(λ, T).

Since |φ(t)| ≤ φ(0) by Theorem 12.7.2 (iii) and φ(0) = 1 the conditionsfor Lemma 12.7.1 are satisfied and φ is thus a c.f. �

We turn now to the “practical criterion” referred to above. As will beseen, this criterion provides sufficient conditions for a function to be ac.f. and, while these are useful, they are far indeed from being necessary.Basically the result gives conditions under which a real function φ(t) whichis convex on (0,∞) will be a c.f.

Theorem 12.7.4 Let φ(t) be a real, nonnegative, even, continuous func-tion on R such that φ(t) is nonincreasing and convex on t ≥ 0, and suchthat φ(0) = 1. Then φ is a c.f.

Proof Consider first a convex polygon φ(t) of the type shown in the figurebelow with vertices at 0 < a1 < a2 < . . . < an (and constant for t > an).

It is easy to see that φ(t) may be written as

φ(t) =n∑

k=1

λkμ(t/ak) + λn+1

where μ(t) = 1 – |t| for |t| ≤ 1 and μ(t) = 0 otherwise. (This expressionis clearly linear between ak and ak+1, and at aj takes the value φ(aj) =∑n

k=j+1 λkμ(aj/ak) + λn+1 so that λn+1, λn, . . . , λ1, may be successively cal-culated from φ(an), φ(an–1), . . . , φ(a1), φ(0) = 1.)

The polygon edge between aj and aj+1 has the form∑n

j+1 λkμ(t/ak) + λn+1

and hence (if continued back) intercepts t = 0 at height∑n+1

j+1 λk. By con-vexity these intercepts decrease as j increases and hence λj =

∑n+1j λk –∑n+1

j+1 λk > 0. Since φ(0) = 1 we also have∑n+1

1 λj = 1.

12.8 Joint characteristic functions 277

Now μ(t/ak) is a c.f. (Lemma 12.7.1, Corollary) for each k, and so alsois the constant function 1. φ(t) is thus seen to be a convex combination ofc.f.’s and is thus itself a c.f.

If now φ(t) is a function satisfying the conditions of the theorem, it mayclearly be expressed as a limit of such convex polygons (e.g. inscribed withvertices at r/2n, r = 0, 1, . . . , 2nn). Hence by Theorem 12.4.3, φ is a c.f. �

Applications of this theorem are given in the exercises.

12.8 Joint characteristic functions

It is also useful to consider the joint c.f. of m r.v.’s ξ1, . . . , ξm defined forreal t1, . . . , tm by

φ(t1, . . . , tm) = Eei(t1ξ1+···+tmξm).

We shall not investigate such functions in any great detail here, but willindicate a few of their more important properties. First it is easily shownthat if F is the joint d.f. of ξ1, . . . , ξm, then

φ(t1, . . . , tm) =∫Rm ei(t1x1+···+tmxm) dF(x1, . . . , xm)

(where “dF”, of course, means dμF = dP(ξ1, . . . , ξm)–1 in the notationof Section 9.3). Further, the simplest properties of c.f.’s of a single r.v.clearly generalize easily. For example, it is easily seen that φ(0, . . . , 0) = 1,|φ(t1, . . . , tm)| ≤ 1, and so on. The following obvious but useful propertyshould also be pointed out: The joint c.f. of ξ1, . . . , ξm is uniquely deter-mined by the c.f.’s of all linear combinations a1ξ1 + · · ·+ amξm, a1, . . . , am ∈R. Indeed if φa1,...,am (t) denotes the c.f. of a1ξ1+· · ·+amξm, i.e. E exp{it(a1ξ1+· · · + amξm)}, it is clear that φ(t1, . . . , tm) = φt1,...,tm (1).

Generalizations of the inversion, uniqueness and continuity theoremsare, of course, of interest. First a useful form of the inversion theorem maybe stated as follows (cf. Theorem 12.3.1).

Theorem 12.8.1 Let F and φ be the joint d.f. and c.f. of the r.v.’s ξ1, . . . , ξm.Then if I = (a, b], a = (a1, . . . , am), b = (b1, . . . , bm) (ai ≤ bi, 1 ≤ i ≤m) isany continuity rectangle (Section 10.2) for F,

μF (I) = limT→∞

1(2π)m

∫ T

–T· · ·

∫ T

–T

m∏j=1

(e–ibjtj – e–iajtj

–itj

)φ(t1, . . . , tm) dt1 . . . dtm

μF (I) is defined as in Lemma 7.8.2.

This result is obtained in a similar manner to Theorem 12.3.1 (from them-dimensional form of Theorem 8.3.1), and we do not give a detailed proof.

To obtain the uniqueness theorem, an m-dimensional form is needed ofthe fact that a d.f. F has at most countably many discontinuities (Lem-ma 9.2.2) (or equivalently that the corresponding measure μF has at mostcountably many points of positive mass, i.e. x such that μF ({x}) > 0). Con-sider the case m = 2, and for a given s let Ls denote the line x = s, –∞ <

y<∞. If μ is a probability measure on the Borel sets of R2 then by thesame argument as for m = 1, there are at most countably many values ofs for which μ(Ls) > 0. Similarly there are at most countably many val-ues of t such that μ(Lt) > 0 if Lt denotes the line y = t, –∞ < x < ∞.It thus follows that given any values s0, t0, there are values s, t arbitrar-ily close to s0, t0 respectively, such that μ(Ls) = μ(Lt) = 0. (Such Ls, Lt

will be called lines of zero μ-mass.) Precisely the same considerationshold in Rm for m > 2, with (m – 1)-dimensional hyperplanes of the form{(x1, . . . , xm) : xi = constant} taking the place of lines. With these observa-tions we now obtain the uniqueness theorem for m-dimensional c.f.’s.

Theorem 12.8.2 The joint c.f. of m r.v.’s uniquely determines their jointd.f., and hence their distribution, and conversely; i.e. two d.f.’s F1, F2 in Rm

are identical if and only if their c.f.’s φ1, φ2 are identical.

Proof It is clear that F1 ≡ F2 implies φ1 ≡ φ2. For the converse assumeφ1 ≡ φ2 and consider the case m = 2. (The case m > 2 follows with theobvious changes.) With the above notation let (a, b) be a point in R2 suchthat La, Lb have zero μF1

- and μF2-mass. Choose ak, bk, both tending to –∞

as k → ∞, and such that Lak , Lbk have zero μF1- and μF2

-mass (which ispossible since only countably many lines have positive (μF1

+ μF2)-mass).

12.8 Joint characteristic functions 279

Then writing Ik = (ak, a] × (bk, b],

F1(a, b) = limk→∞

[F1(a, b) – F1(ak, b) – F1(a, bk) + F1(ak, bk)]

= limk→∞

μF1(Ik)

= limk→∞

μF2(Ik)

by Theorem 12.8.1, since Ik is a continuity rectangle for both μF1and μF2

,and F1, F2 have the same c.f. But by the same argument (with F2 for F1),limk→∞ μF2

(Ik) = F2(a, b). Hence F1(a, b) = F2(a, b) for any (a, b) such thatLa and Lb have zero μF1

- and μF2-mass.

Finally for any a, b, ck ↓ a, dk ↓ b may be chosen such that Lck and Ldk

have zero μF1- and μF2

-mass and hence F1(ck, dk) = F2(ck, dk) by the above.By right-continuity of F1 and F2 in each argument F1(a, b) = F2(a, b), asrequired. �

The following characterization of independence of n r.v.’s ξ1, . . . , ξm maynow be obtained as an application. (Compare this theorem with Theo-rem 12.1.4.)

Theorem 12.8.3 The r.v.’s ξ1, . . . , ξm are independent if and only if theirjoint c.f. φ(t1, . . . , tm) =

∏mi φi(ti) where φi is the c.f. of ξi.

Proof If the r.v.’s are independent

φ(t1, . . . , tm) = Eei(t1ξ1+···+tmξm)

=m∏

j=1

φj(tj)

by (the complex r.v. form of) Theorem 10.3.5. Conversely if ξ1, . . . , ξm

have joint d.f. F and individual d.f.’s Fj, and φ(t1, . . . , tm) =∏m

j=1 φj(tj)for all t1, . . . , tm, then F(x1, . . . , xm) and F1(x1) . . .Fm(xm) are both d.f.’son Rm with the same c.f. (clearly

∫ei(t1x1+···+tmxm) d[F1(x1) . . .Fm(xm)] =∏m

j=1 φj(tj)). Hence by the uniqueness theorem, F(x1, . . . , xm) = F(x1) . . .F(xm), so that the r.v.’s are independent by Theorem 10.3.1. �

Finally, weak convergence of d.f.’s in Rm (Section 11.2) may be consid-ered by means of their c.f.’s, giving rise to the following general version ofthe continuity theorem (Theorem 12.4.3).

Theorem 12.8.4 Let {Fn(x1, . . . , xm)} be a sequence of m-dimensionald.f.’s with c.f.’s {φn(t1, . . . , tm)}.

(i) If F(x1, . . . , xm) is a d.f. with c.f. φ(t1, . . . , tm) and if Fnw→ F, then

φn(t1, . . . , tm)→ φ(t1, . . . , tm) as n→ ∞, for all t1, . . . , tm ∈ R.(ii) If φ(t1, . . . , tm) is a complex function which is continuous at (0, . . . , 0)

and if φn(t1, . . . , tm) → φ(t1, . . . , tm) as n → ∞, for all t1, . . . , tm ∈ R,then φ is the c.f. of a (m-dimensional) d.f. F and Fn

w→ F.

As a corollary to this result we may obtain an elegant simple device dueto H. Cramer and H. Wold, which enables convergence in distribution ofrandom vectors to be reduced to convergence of ordinary r.v.’s.

Theorem 12.8.5 (Cramer–Wold Device) Let ξ = (ξ1, . . . , ξm), ξn =(ξn1, . . . , ξnm), n = 1, 2, . . . , be random vectors. Then

ξnd→ ξ as n→ ∞

if and only if

a1ξn1 + · · · + amξnmd→ a1ξ1 + · · · + amξm as n→ ∞

for all a1, . . . , am ∈ R.

Proof By the continuity theorems 12.4.3 and 12.8.4, ξnd→ ξ is equivalent

to

Eei(t1ξn1+···+tmξnm) → Eei(t1ξ1+···+tmξm)

for all t1, . . . , tm ∈ R, and a1ξn1 +· · ·+amξnmd→ a1ξ1 +· · ·+amξm is equivalent

to

Eeit(a1ξn1+···+amξnm) → Eeit(a1ξ1+···+amξm)

for all t ∈ R. It is then clear that the former implies the latter (by takingtj = taj) and conversely (by taking t = 1). �

This result shows that to prove convergence in distribution of a sequenceof random vectors it is sufficient to consider convergence of arbitrary (butfixed) finite linear combinations of the components. This is especiallyuseful for jointly normal r.v.’s since then each linear combination is alsonormal.

Exercises12.1 Find the c.f.’s for the following r.v.’s

(a) Geometric: P{ξ = n} = pqn–1, n = 1, 2, 3 . . . (0 0)

Exercises 281

(c) Exponential: p.d.f. λe–λx, x ≥ 0 (λ > 0)(d) Cauchy: p.d.f. λ

π(λ2+x2) , –∞ < x < ∞ (λ > 0).

12.2 Let ξ, η be independent r.v.’s each being uniformly distributed on (–1, 1).Evaluate the distribution of ξ + η and hence its c.f. Check this with thesquare of (the absolute value of) the c.f. of ξ.

12.3 Let ξ be a standard normal r.v. Find the p.d.f. and c.f. of ξ2.12.4 If ξ1, . . . , ξn are independent standard normal r.v.’s, find the c.f. of

∑n1 ξ

2i .

Check that this corresponds to the p.d.f. 2–n/2Γ(n/2)–1x(n/2)–1e–x/2 (x > 0)(χ2 with n degrees of freedom).

12.5 Find two r.v.’s ξ, η which are not independent but have the same p.d.f. f , andare such that the p.d.f. of ξ + η is the convolution f ∗ f . (Hint: Try ξ = η withan appropriate d.f.)

12.6 According to Section 7.6 if f , g are in L1(–∞,∞) then the convolution h =f ∗ g ∈ L1 and has L1 Fourier Transform h = f g. In the case where f and gare nonnegative (e.g. p.d.f’s) give an alternative proof of this result based onTheorem 10.4.1 and Section 12.1. Give a corresponding result for Fourier–Stieltjes Transforms of the Stieltjes Convolution (F1 ∗ F2)(x) =

∫F1(x – y)

dF2(y) of two d.f.’s F1, F2.12.7 If ξ is a r.v. with c.f. φ show that

E|ξ| = 1π

∫ ∞–∞R[1 – φ(t)]

t2 dt.

(Hint:∫ ∞

–∞( sin tt )2 dt = π.)

12.8 Let φ be the c.f. of a r.v. ξ. Suppose that

limt↓0

(1 – φ(t))/t2 = σ2/2 < ∞.

Show that Eξ = 0 and Eξ2 = σ2. In particular if φ(t) = 1 + o(t2) showthat ξ = 0 a.s. (Hints: R[1 – φ(t)]/t2 =

∫[(1 – cos tx)/t2] dF(x) → σ2/2.

Apply Fatou’s Lemma to show∫

x2 dF(x) < ∞. Then use the corollary ofTheorem 12.2.1.)

12.9 A r.v. ξ is called symmetric if ξ and –ξ have the same d.f. Show that ξ issymmetric if and only if its c.f. φ is real-valued.

12.10 Show that the real part of a c.f. is a c.f. but that the same is never true ofthe imaginary part.

12.11 Let ξ1 and ξ2 be independent r.v.’s with d.f.’s F1 and F2 and c.f.’s φ1 and φ2.

(i) Show that the c.f. φ of ξ1ξ2 is given by

φ(t) =∫ ∞

–∞φ1(ty) dF2(y) =∫ ∞

–∞φ2(tx) dF1(x) for all t ∈ R.

(ii) If F2(0–) = F2(0), show that the r.v. ξ1/ξ2 is well defined and its c.f. φis given by

φ(t) =∫ ∞

–∞φ1(t/y) dF2(y) for all t ∈ R.

As a consequence of (i) and (ii), if φ is a c.f. and G a d.f., then∫ ∞

–∞φ(ty) dG(y)is a c.f. and so is

∫ ∞–∞φ(t/y) dG(y) if G(0–) = G(0).

12.12 If f (t) is a function defined on the real line write Δhf (t) = f (t + h) – f (t),for real h, and say that f has a generalized second derivative at t when thefollowing limit exists and is finite

limh,h′→0

Δh′Δhf (t)h′h

for all sequences h → 0 and h′ → 0. Show that if f has two derivativesat t then it has a generalized second derivative at t, and that the converseis not true. If φ(t) is a characteristic function show that the following areequivalent:

(i) φ has a generalized second derivative at t = 0,(ii) φ has two finite derivatives at t = 0,

(iii) φ has two derivatives at every real t,(iv)

∫ ∞–∞x2 dF(x) < ∞, where F is the d.f. of φ.

12.13 If f (t) is a function defined on the real line its first symmetric differencemay be defined by

Δ1s f (t) = f (t + s) – f (t – s)

for real s, and its higher order symmetric differences by

Δn+1s f (t) = Δ1

sΔns f (t)

for n = 1, 2, . . . . If the limit

lims→0

Δns f (t)

(2s)n

exists and is finite, we say that f has nth symmetric derivative at t. Now letφ be the c.f. of a r.v. ξ, and n a positive integer. Show that if

lim infs→0

∣∣∣∣∣∣Δ2n

s φ(0)(2s)2n

∣∣∣∣∣∣ < ∞then Eξ2n < ∞. (Hint: Show that

Δns f (t) =

n∑k=0

(–1)k(nk

)f [t + (n – 2k)s]

and

Δ2ns φ(t) =

∫ ∞–∞eitx(2i sin sx)2n dF(x).)

Show also that the following are equivalent

(i) φ has (2n)th symmetric derivative at t = 0,(ii) φ has 2n finite derivatives at t = 0,

Exercises 283

(iii) φ has 2n finite derivatives at every real t,(iv) Eξ2n < ∞.

12.14 Let ξ be a r.v. with c.f. φ and denote by ρn the nth symmetric difference of φat 0:

ρn(t) = Δnt φ(0)

(see Ex. 12.13). If 0 0, in which case

E|ξ|p ={

22n∫ ∞

0(sin x)2n

x1+pdx

}–1 ∫ ∞0|ρ2n(t)|

t1+pdt.

(Hint: Show that

∫ ε0|ρ2n(t)|

t1+pdt = 22n

∫ ∞–∞|x|

p{∫ ε|x|

0(sin u)2n

u1+pdu

}dF(x).)

12.15 Let φ be the c.f. corresponding to the d.f. F. Note that by Theorem 12.3.1the jump (if any) of F at x may be written as

F(x) – F(x – 0) = limT→∞

12T

∫ T–T e–ixtφ(t) dt.

If φ(t0) = 1 for some t0 � 0 show that the mass of F is concentrated onthe points {2nπ/t0 : n = 0,±1, . . .} and the μF -measure of the point 2nπ/t0 is1t0

∫ t00 φ(t)e–2πnit/t0 dt. (Compare Theorem 12.1.3.)

12.16 Show that | cos t| is not a c.f. (e.g. use the result of Ex. 12.15 with n = 4).Hence the absolute value of a c.f. is not necessarily a c.f.

12.17 If φ is the c.f. corresponding to the d.f. F (and measure μF ) prove that∑x∈R

[μF ({x})]2 = limT→∞

12T

∫ T–T |φ(t)|2 dt.

(Hint: Mimic proof of the last part of Theorem 8.3.1 or (more simply) applythe second inversion formula of Theorem 12.3.1 (i) for a = 0 and ξ = ξ1 – ξ2

where ξ1, ξ2 are i.i.d. with c.f. φ.) What is the implication of this if φ ∈L2(–∞,∞)?

12.18 If φ is the c.f. corresponding to the d.f. F and φ ∈ L2(–∞,∞), show that F isabsolutely continuous with density a multiple of the Fourier Transform of φ.(Hint: Use Parseval’s Theorem.) This is an L2 analog of Theorem 12.3.1 (ii).

12.19 Show that the conclusion of the continuity theorem for characteristic func-tions is not necessarily true if φ is not continuous at t = 0 by considering asequence of random variables {ξn}∞n=1 such that for each n, ξn has the uni-form distribution on [–n, n].

12.20 If φ(t) is a characteristic function, then so is eλ[φ(t)–1] for each λ > 0. (Hint:Use eλ(φ–1) = limn

(1 + λ(φ–1)

n

)n.)

12.21 If the random variable ξn has a binomial distribution with parameters(n, pn), n = 1, 2, . . . , and npn → λ > 0 as n → ∞, prove that ξn convergesin distribution to a random variable which has the Poisson distribution withparameter λ. Show also that otherwise as pn → 0, npn → ∞, then ξn (suit-ably standardized) has a limiting normal distribution.

12.22 If the r.v.’s ξ and {ξn}∞n=1 are such that for every n, ξn is normal with mean 0and variance σ2

n, show that the following are equivalent

(i) ξn → ξ in probability(ii) ξn → ξ in L2

and that in each case ξ is normal with zero mean.12.23 Let {ξn}∞n=1 be a sequence of random variables such that for each n, ξn has a

Poisson distribution with parameter λn. If ξnd→ ξ (after any normalization

needed) show that ξ has either a Poisson or normal distribution.12.24 Show that sin(t/2)

t/2 is the c.f. of the uniform distribution on (–1/2, 1/2) andprove by using the c.f.’s that for all real t,

limn→∞

(sin(n–1/2t)

n–1/2t

)n

= e–t2/6.

12.25 Let {ξn}∞n=1 be independent random variables with finite means μn andvariances σ2

n, and let s2n =

∑nk=1 σ

2k . Prove that the Lindeberg condition

is satisfied, and thus the Lindeberg Central Limit Theorem (Corollary 2 ofTheorem 12.6.2) is applicable, if the random variables {ξn}∞n=1:

(i) are uniformly bounded, i.e. for some 0 < M < ∞, |ξn| ≤ M a.s. forall n, and s2

n → ∞; or(ii) are identically distributed; or

(iii) satisfy Liapounov’s condition

1s2+δ

n

n∑k=1

E(|ξk – μk |2+δ)→ 0 for some δ > 0.

12.26 If two c.f.’s φ1, φ2 are equal on a neighborhood of zero then whatever deriva-tives of φ1 exist at zero must be equal to those of φ2 there. Hence existingmoments corresponding to each distribution must be the same. Show that,however, it is not necessarily true that φ1 = φ2, everywhere, and hence notnecessarily true that the d.f.’s are the same. Note that if φ2 ≡ 1 and φ1 = φ2

in a neighborhood of zero it is true that φ1 = φ2 everywhere.

13

Conditioning

13.1 Motivation

In this chapter (Ω,F , P) will, as usual, denote a fixed probability space. IfA and B are two events and P(B) > 0, the conditional probability P(A|B) ofA given B is defined to be

P(A|B) =P(A ∩ B)

P(B)

and has a good interpretation; given that event B occurs, the probability ofevent A is proportional to the probability of the part of A which lies in B. Ithas also an appealing frequency interpretation – as the proportion of thoserepetitions of the experiment in which B occurs, for which A also occurs.

It is also important to be able to define P(A|B) in many cases for whichP(B) = 0, for example if B is the event η = y where η is a continuousr.v. and y is a fixed value. There are various ways of making an appropri-ate definition depending on the purpose at hand. Here we are interested inintegration over y to provide formulae such as

P(A) =∫

P(A|η = y)f (y) dy (13.1)

if η has a density f which will be a particular case of the general definitionsto be given. Other situations require different conditioning definitions – e.g.especially if particular fixed values of y are involved without integration ina condition η = y. A particular such case occurs if η(t) is the value of saytemperature at time t and one is interested in defining P(A|η(t) = 0). Thedefinition used for (13.1) will not have the empirical interpretation as theproportion of those time instants t where η(t) = 0 for which A occurs. Insuch cases so-called “Palm distributions” can be appropriate.

Here, however, we consider the definitions of conditional probability andexpectation for obtaining the probability P(A) by conditioning on values ofa r.v. η and integrating over those values as in (13.1). This will be achieved

285

286 Conditioning

in a much more general setting via the Radon–Nikodym Theorem, (13.1)being a quite special case.

To motivate the approach it is illuminating to proceed from the specialcase where η is a r.v. which can take one of n possible values y1, y2, . . . , yn

with P(η = yj) = pj > 0, 1 ≤ j ≤ n,∑n

j=1 pj = 1. Then for all A ∈ FP(A|η = yj) = P(A ∩ η–1{yj})/Pη–1{yj} so that

P(A) =∑

j

P(A ∩ η–1(yj)) =∑

j

P(A|η = yj)pj

=∫ ∞

–∞P(A|η = y) dPη–1(y)

where P(A|η = y) is P(A|η = yj) at yj and (say) zero otherwise.More generally it is easily shown that for all A ∈ F and B ∈ B

P(A ∩ η–1B) =∫

BP(A|η = y) dPη–1(y). (13.2)

This relation holds in the above case where Pη–1 is confined to the pointsy1, y2, . . . , yn so that the condition “η = y” has positive probability for eachsuch value. However, in other cases where Pη–1 need not have atoms, therelation may (as will be seen) be used to provide a definition of P{A|η = y}.First, however, note that in the case considered (13.2) may be written withg(y) = P(A|η = y) as

P(A ∩ η–1B) =∫

Bg(y) dPη–1(y) =

∫η–1B

g(η(ω)) dP(ω).

Since σ(η) = σ{η–1(B) : B ∈ B} it follows that for E ∈ σ(η)

P(A ∩ E) =∫

Eg(η(ω)) dP(ω).

The function g(η(ω)) depends on the set A ∈ F and writing it explicitly asP(A|η)(ω) we have

P(A ∩ E) =∫

EP(A|η)(ω) dP(ω) (13.3)

for each A ∈ F , E ∈ σ(η). Since g is trivially Borel measurable, P(A|η)as defined on Ω is a σ(η)-measurable function for each fixed A ∈ F and isreferred to as the “conditional probability of A given η”. This is related tobut distinguished from the function P(A|η = y) in (13.2), naturally referredto as the “conditional probability of A given η = y”.

The version P(A|η)(ω) leads to a yet more general abstraction. The func-tion P(A|η)(ω) was defined in such a way that it is σ(η)-measurable and sat-isfies (13.3) for each E ∈ σ(η). These requirements involve η only throughits generated σ-field σ(η) (⊂ F ) and it is therefore natural to write alterna-tively

P(A|η)(ω) = P(A|σ(η))(ω)

13.2 Conditional expectation given a σ-field 287

for a σ(η)-measurable function of ω satisfying (13.3) for E ∈ σ(η). Thisimmediately suggests a generalization to consider arbitrary σ-fields G ⊂ Fand to define the conditional probability P(A|G)(ω) of A ∈ F with respectto the σ-field G ⊂ F as a G-measurable function such that P(A ∩ E) =∫

EP(A|G)(ω) dP(ω) for each A ∈ F , E ∈ G.Existence of such a function follows simply from the Radon–Nikodym

Theorem. However, this will be done within the context of conditional ex-pectations E(ξ|G) of a r.v. ξ (with E|ξ| < ∞) with P(A|G) = E(χA |G) appear-ing as a special case. The conditioning P(A|η = y) “given the value of a r.v.η” considered above, will be discussed subsequently.

13.2 Conditional expectation given a σ-field

Let ξ be a r.v. with E|ξ| < ∞ and G a sub-σ-field of F . The conditional ex-pectation of ξ given G will be defined in a way which extends the definitionof conditional probability suggested in the previous section.

Consider the set function ν defined for all E ∈ G by

ν(E) =∫

Eξ dP.

Then ν is a finite signed measure on G and ν � PG where PG denotesthe restriction of P from F to G. Thus by the Radon–Nikodym Theorem(Theorem 5.5.3) there is a finite-valued G-measurable and PG-integrablefunction f on Ω uniquely determined a.s. (PG) such that for all E ∈ G,

ν(E) =∫

Ef dPG =

∫Ef dP

(for the second equality see Ex. 4.10). We write f = E(ξ|G) and call itthe conditional expectation of ξ given the σ-field G. Thus the conditionalexpectation E(ξ|G) of ξ given G is a G-measurable and P-integrable r.v.which is determined uniquely a.s. by the equality∫

Eξ dP =

∫EE(ξ|G) dP for all E ∈ G.

It is readily seen that this definition extends that suggested in Section 13.1when G = σ(η) for a r.v. η taking a finite number of values (Ex. 13.1). Theequality may also be rephrased in “E-form” as E(χEξ) = E(χEE(ξ|G)) forall E ∈ G.

If η is a r.v. the conditional expectation E(ξ|η) of ξ given η is defined bytaking G = σ(η), i.e. E(ξ|η) = E(ξ|σ(η)) so that E(ξ|η) is a σ(η)-measurablefunction f satisfying

∫Eξ dP =

∫Ef dP for each E ∈ σ(η). It is enough that

this equality holds for all E of the form η–1(B) for B ∈ B since the class ofsuch sets is either σ(η) if η is defined for all ω or otherwise generates σ(η).

288 Conditioning

For a family {ηλ : λ ∈ Λ} of r.v.’s the conditional expectation E(ξ|ηλ :λ ∈ Λ) of ξ given {ηλ : λ ∈ Λ} is defined by

E(ξ|ηλ : λ ∈ Λ) = E(ξ|σ(ηλ : λ ∈ Λ))

where σ(ηλ : λ ∈ Λ) is the sub-σ-field of F generated by the union of theσ-fields {σ(ηλ) : λ ∈ Λ} (cf. Section 9.3).

The simplest properties of conditional expectations are stated in the fol-lowing result.

Theorem 13.2.1 ξ and η are r.v.’s with finite expectations and a, b realnumbers.

(i) E{E(ξ|G)} = Eξ.(ii) E(aξ + bη|G) = aE(ξ|G) + bE(η|G) a.s.

(iii) If ξ = η a.s. then E(ξ|G) = E(η|G) a.s.(iv) If ξ ≥ 0 a.s., then E(ξ|G) ≥ 0 a.s. Hence if ξ ≤ η a.s., then E(ξ|G) ≤E(η|G) a.s.

(v) If ξ is G-measurable then E(ξ|G) = ξ a.s.

Proof(i) Since Ω ∈ G we have

Eξ =∫Ωξ dP =

∫ΩE(ξ|G) dP = E{E(ξ|G)}.

(ii) For every E ∈ G we have∫E(aξ + bη) dP = a

∫Eξ dP + b

∫Eη dP

= a∫

EE(ξ|G) dP + b

∫EE(η|G) dP

=∫

E{aE(ξ|G) + bE(η|G)} dP

and since the r.v. within brackets is G-measurable the result follows fromthe definition.

(iii) This is obvious from the definition of conditional expectation.(iv) If ξ ≥ 0 a.s., ν (as defined at the start of this section, ν(E) =

∫Eξ dP)

is a measure (rather than a signed measure) and from the Radon–NikodymTheorem we have E(ξ|G) ≥ 0 a.s. The second part follows from the firstpart and (ii) since by (ii) E(η|G) – E(ξ|G) = E((η – ξ)|G) ≥ 0 a.s.

(v) This also follows at once from the definition of conditional expecta-tion. �

A variety of general results concerning conditional expectations willnow be obtained – some involving conditional versions of standard the-orems. The first is an important result on successive conditioning.

13.2 Conditional expectation given a σ-field 289

Theorem 13.2.2 If ξ is a r.v. with E|ξ| < ∞ and G1,G2 two σ-fields withG2 ⊂ G1 ⊂ F then

E{E(ξ|G1)|G2} = E(ξ|G2) = E{E(ξ|G2)|G1} a.s.

Proof Repeated use of the definition shows that for all E ∈ G2 ⊂ G1,∫

EE{E(ξ|G1)|G2} dP =

∫EE(ξ|G1) dP =

∫Eξ dP

which implies that E{E(ξ|G1)|G2} = E(ξ|G2) a.s. The right hand equalityfollows from Theorem 13.2.1 (v). �

The fundamental convergence theorems for integrals and expectations(monotone and dominated convergence, Fatou’s Lemma) have conditionalversions. We prove the monotone convergence result – the other two thenfollow from it in the same way as for the corresponding “unconditional”theorems.

Theorem 13.2.3 (Conditional Monotone Convergence Theorem) Let {ξn}be an increasing sequence of nonnegative r.v.’s with lim ξn = ξ a.s., whereEξ < ∞. Then

E(ξ|G) = limn→∞E(ξn|G) a.s.

Proof By Theorem 13.2.1 (iv) the sequence {E(ξn|G)} is increasing andnonnegative a.s. The limit limn→∞ E(ξn|G) is then G-measurable and twoapplications of (ordinary) monotone convergence give, for any E ∈ G,

∫E

limn→∞E(ξn|G) dP = lim

n→∞

∫EE(ξn|G) dP = lim

n→∞

∫Eξn dP

=∫

Eξ dP

showing that limn→∞ E(ξn|G) satisfies the conditions required to be a ver-sion of E(ξ|G) and hence the desired result follows. �

Theorem 13.2.4 (Conditional Fatou Lemma) Let {ξn} be a sequence ofnonnegative r.v.’s with Eξn < ∞ and E{lim infn→∞ ξn} < ∞. Then

E(lim inf ξn|G) ≤ lim infn→∞

E(ξn|G) a.s.

This and the next result will not be proved here since – as already noted –they follow from Theorem 13.2.3 in the same way as the ordinary versionsof Fatou’s Lemma and dominated convergence follow from monotone con-vergence.

290 Conditioning

Theorem 13.2.5 (Conditional Dominated Convergence Theorem) Let {ξn}be a sequence of r.v.’s with ξn → ξ a.s. and |ξn| ≤ η a.s. for all n whereE|η| < ∞. Then

E(ξ|G) = limn→∞E(ξn|G) a.s.

The following result is frequently useful.

Theorem 13.2.6 Let ξ, η be r.v.’s with E|η| < ∞, E|ξη| < ∞ and such thatη is G-measurable (ξ being F -measurable, of course). Then

E(ξη|G) = ηE(ξ|G) a.s.

Proof If η = χG for some G ∈ G then ηE(ξ|G) is G-measurable and forany E ∈ G,∫

EηE(ξ|G) dP =

∫E∩GE(ξ|G) dP =

∫E∩G

ξ dP =∫

Eξη dP

and hence E(ξη|G) = ηE(ξ|G) a.s. It follows from Theorem 13.2.1 (ii) thatthe result is true for simple G-measurable r.v.’s η.

Now if η is an arbitrary G-measurable r.v. (with η ∈ L1, ξη ∈ L1), let{ηn} be a sequence of simple G-measurable r.v.’s such that for all ω ∈Ω, limn ηn(ω) = η(ω) and |ηn(ω)| ≤ |η(ω)| for all n (Theorem 3.5.2, Corol-lary). It then follows from the conditional dominated convergence theorem(|ξηn| ≤ |ξη| ∈ L1) that

E(ξη|G) = limn→∞E(ηnξ|G) = lim

n→∞ηnE(ξ|G) = ηE(ξ|G) a.s. �

The next result shows that in the presence of independence conditionalexpectation is the same as expectation.

Theorem 13.2.7 If ξ is a r.v. with E|ξ| < ∞ and σ(ξ) and G are indepen-dent then

E(ξ|G) = Eξ a.s.

In particular if ξ and η are independent r.v.’s and E|ξ| < ∞, then E(ξ|η) =Eξ a.s.

Proof For any E ∈ G the r.v.’s ξ and χE are independent and thus∫Eξ dP = E(ξχE ) = Eξ · EχE =

∫EEξ dP.

Since the constant Eξ is G-measurable, it follows that E(ξ|G) = E(ξ) a.s.�

The conditional expectation E(ξ|η) of ξ given a r.v. η is σ(η)-measurableand hence it immediately follows as shown in the next result that it is aBorel measurable function of η.

13.3 Conditional probability given a σ-field 291

Theorem 13.2.8 If ξ and η are r.v.’s with E|ξ| < ∞ then there is a Borelmeasurable function h on R such that

E(ξ|η) = h(η) a.s.

Proof This follows immediately from Theorem 3.5.3 since E(ξ|η) isσ(η)-measurable, i.e. E(ξ|η)(ω) = h(η(ω)) for some (Borel) measurable h.

�

Finally in this list we note the occasionally useful property that condi-tional expectations satisfy Jensen’s Inequality just as expectations do.

Theorem 13.2.9 If g is a convex function on R and ξ and g(ξ) have finiteexpectations then

g(E{ξ|G}) ≤ E{g(ξ)|G} a.s.

Proof As stated in the proof of Theorem 9.5.4, g(x) ≥ g(y) + (x – y)h(y)for all x and y and some h(y) which is easily seen to be bounded on closedand bounded intervals. Thus whenever yn → x, g(yn)+(x–yn)h(yn)→ g(x).Hence for every real x,

g(x) = supr:rational

{g(r) + (x – r)h(r)}.

Putting x = ξ and y = r in the inequality gives

g(ξ) ≥ g(r) + (ξ – r)h(r) a.s.

and by taking conditional expectations and using (ii) and (iv) of Theo-rem 13.2.1

E{g(ξ)|G} ≥ g(r) + (E(ξ|G) – r)h(r) a.s.

Since the last inequality holds for all rational r, by taking the supremumof the right hand side and combining a countable set of events of zeroprobability we find

E{g(ξ)|G} ≥ supr:rational

{g(r) + (E(ξ|G) – r)h(r)} = g(E{ξ|G}) a.s. �

A different proof is suggested in Ex. 13.7.

13.3 Conditional probability given a σ-field

If A is an event in F and G is a sub-σ-field of F the conditional probabilityP(A|G) of A given G is defined by

P(A|G) = E(χA |G).

292 Conditioning

Then for E ∈ G, P(A ∩ E) =∫

EχA dP =

∫EE(χA |G) dP =

∫EP(A|G) dP so

that P(A|G) is a G-measurable (and P-integrable) r.v. which is determineduniquely a.s. by the equality

P(A ∩ E) =∫

EP(A|G) dP for all E ∈ G

(i.e. P(A ∩ E) = E{χE P(A|G)}). In particular (by putting E = Ω)

P(A) =∫Ω

P(A|G) dP (i.e. EP(A|G) = P(A))

for all A ∈ F . If η is a r.v. then the conditional probability P(A|η) of A ∈ Fgiven η is defined as P(A|η) = P(A|σ(η)) = E(χA |η). The particular conse-quence EP(A|η) = P(A) is, of course, natural.

The properties of conditional probability follow immediately from thoseof conditional expectation. Some of these properties are collected in thefollowing theorems for ease of reference.

Theorem 13.3.1 (i) If A ∈ G then

P(A|G)(ω) = χA (ω) ={

1 for ω ∈ A0 for ω � A

a.s.

(ii) If the event A is independent of the class G of events then

P(A|G)(ω) = P(A) a.s.

Theorem 13.3.2 (i) If A ∈ F then 0 ≤ P(A|G) ≤ 1 a.s.(ii) P(Ω|G) = 1 a.s., P(∅|G) = 0 a.s.

(iii) If {An} is a disjoint sequence of events in F and A = ∪∞n=1An then

P(A|G) =∞∑

n=1

P(An|G) a.s.

(iv) If A, B ∈ F and A ⊂ B then

P(A|G) ≤ P(B|G) a.s.

and

P(B – A|G) = P(B|G) – P(A|G) a.s.

(v) If {An}∞n=1 is a monotone (increasing or decreasing) sequence of eventsin F and A is its limit, then

P(A|G) = limn→∞

P(An|G) a.s.

13.4 Regular conditioning 293

Proof These conclusions follow readily from the properties establishedfor conditional expectations. For example, to show (iii) note that χA =∑∞

1 χAnand conditional monotone convergence (Theorem 13.2.3) gives

E(χA |G) =∑E(χAn

|G) a.s. which simply restates (iii). �

The above properties look like those of a probability measure, with theexception that they hold a.s., and it is natural to ask whether for fixedω ∈ Ω, P(A|G)(ω) as a function of A ∈ F is a probability measure. Un-fortunately the answer is in general negative and this is due to the fact thatthe exceptional G-measurable set of zero probability that appears in eachproperty of Theorem 13.3.2 depends on the events for which the propertyis expressed. In particular property (i) stated in detail would read:

(i) For every A ∈ F there is NA ∈ G depending on A such that P(NA) = 0and for all ω � NA

0 ≤ P(A|G)(ω) ≤ 1.

It is then clear that the statement

0 ≤ P(A|G) ≤ 1 for all A ∈ F a.s.

is not necessarily true in general, since to obtain this we would need to com-bine the zero probability sets NA to get a single zero probability set N. Thiscan be done (as in the example of Section 13.1) if there are only countablymany sets A ∈ F , but not necessarily otherwise. In fact, in general, theremay not even exist an event E ∈ G with P(E) > 0 such that

0 ≤ P(A|G)(ω) ≤ 1 for all A ∈ F and all ω ∈ E.

Thus in general there is no event E ∈ G with P(E) > 0 such that for everyfixed ω ∈ E, P(A|G)(ω) is a probability measure on F .

In the next section we consider the case where the conditional probabil-ity does have a version which is a probability measure for all ω (a “regularconditional probability”) and show that then conditional expectations canbe expressed as integrals with respect to this version.

13.4 Regular conditioning

As seen in the previous section, conditional probabilities are not in generalprobability measures for fixed ω. If a conditional probability has a versionwhich is a probability measure for allω, then this version is called a regularconditional probability. Specifically let G be a sub-σ-field of F . A functionP(A,ω) defined for each A ∈ F and ω ∈ Ω, with values in [0, 1] is called aregular conditional probability on F given G if

294 Conditioning

(i) for each fixed A ∈ F , P(A,ω) is a G-measurable function of ω, andfor each fixed ω ∈ Ω, P(A,ω) is a probability measure on F , and

(ii) for each fixed A ∈ F , P(A,ω) = P(A|G)(ω) a.s.Regular conditional probabilities do not always exist without any fur-

ther assumptions on Ω, F and G. As we have seen, a simple case whenthey exist is when G is the σ-field generated by a discrete r.v. However, ifa regular conditional probability does exist we can express conditional ex-pectations as integrals with respect to it, just as ordinary expectations areexpressed as integrals with respect to the probability measure. The nota-tion

∫Ωξ(ω′)P(dω′,ω) will be convenient to indicate integration of ξ with

respect to the measure P(·,ω).

Theorem 13.4.1 If ξ is a r.v. with E|ξ| < ∞, and P(A,ω) is a regularconditional probability on F given G, then

E(ξ|G)(ω) =∫Ωξ(ω′)P(dω′,ω) a.s.

Proof If ξ = χA for some A ∈ F , then∫Ωξ(ω′)P(dω′,ω) = P(A,ω) which

is G-measurable and equal a.s. to

P(A|G)(ω) = E(χA |G)(ω) = E(ξ|G)(ω).

Thus∫Ωξ(ω′)P(dω′,ω) is G-measurable and equal a.s. to E(ξ|G)(ω) when

ξ is a set indicator. It follows by Theorem 13.2.1 (ii) that the same is truefor a simple r.v. ξ and, by using the ordinary and the conditional monotoneconvergence theorem, it is also true for any r.v. ξ ≥ 0 with Eξ < ∞. Usingagain Theorem 13.2.1 (ii), this is also true for any r.v. ξ with E|ξ| < ∞. �

If one is only interested in expressing a conditional expectationE{g(ξ)|G} for a particular ξ and Borel measurable g, as an integral withrespect to a conditional probability (as in the previous theorem) then at-tention may be restricted to conditional probabilities P(A|G) of events Ain σ(ξ) since F may be replaced by σ(ξ) in defining integrals of ξ over Ω(Ex. 4.10). We will call this restriction the conditional probability of ξ givenG and it will be seen in Theorem 13.4.5 that a regular version exists undera simple condition on ξ. To be precise let ξ be a r.v. and G a sub-σ-fieldof F . A function Pξ|G(A,ω) defined for each A ∈ σ(ξ) and ω ∈ Ω, withvalues in [0, 1] is called a regular conditional probability of ξ given G if

(i) for each fixed A ∈ σ(ξ), Pξ|G(A,ω) is a G-measurable function of ω,and for each fixed ω ∈ Ω, Pξ|G(A,ω) is a probability measure on σ(ξ), and

(ii) for each fixed A ∈ σ(ξ), Pξ|G(A,ω) = P(A|G)(ω) a.s.Theorem 13.4.5 will show that under a very mild condition on ξ (that

the range of ξ is a Borel set) Pξ|G of ξ given G exists for all G. Also as

already noted if G = σ(η) and η is a discrete r.v. then a regular conditionalprobability Pξ|G exists. Two further cases where Pξ|G exists trivially (in viewof Theorem 13.3.1) are the following: (i) if σ(ξ) andG are independent then

Pξ|G(A,ω) = P(A) for all A ∈ σ(ξ) and ω ∈ Ω

and (ii) if ξ is G-measurable then

Pξ|G(A,ω) = χA (ω) for all A ∈ σ(ξ) and ω ∈ Ω.

As will now be shown, when a regular conditional probability of ξ givenG exists, then the conditional expectation of every σ(ξ)-measurable r.v.with finite expectation can be expressed as an integral with respect to theregular conditional probability.

Theorem 13.4.2 If ξ is a r.v., g a Borel measurable function on R suchthat E|g(ξ)| < ∞, and Pξ|G is a regular conditional probability of ξ given G,then

E{g(ξ)|G}(ω) =∫Ω

g(ξ(ω′))Pξ|G(dω′,ω) a.s.

Proof The proof extends that of Theorem 13.4.1, with σ(ξ) replacingF . If A ∈ σ(ξ) the r.v. η = χA satisfies E(η|G)(ω) =

∫η(ω′)Pξ|G(dω′,ω)

a.s. This remains true if χA is replaced by a nonnegative simple σ(ξ)-measurable r.v. η and hence by the standard extension (cf. Theorem 13.4.1)for any σ(ξ)-measurable η with E|η| < ∞. But g(ξ) is such a r.v. and hencethe result follows. �

The distribution of a r.v. ξ (Chapter 9) is the probability measure Pξ–1

induced from P on the Borel sets of the real line by ξ and expectations offunctions of ξ are expressible as integrals with respect to this distribution.Similarly, conditional distributions on the Borel sets of the real line maybe induced from regular conditional probabilities and used to obtain condi-tional expectations. Indeed if the regular conditional probability Pξ|G(A,ω)of ξ given G exists then a (regular) conditional distribution Qξ|G(B,ω) of ξgiven G may be defined for any Borel set B on the real line (i.e. B ∈ B) andω ∈ Ω by

Qξ|G(B,ω) = Pξ|G(ξ–1B,ω) for all B ∈ B, ω ∈ Ω.

Clearly Qξ|G has properties similar to Pξ|G and the only problem is thatthis “definition” of Qξ|G requires the existence of Pξ|G (which is not alwaysguaranteed). However, this problem is easily eliminated by defining Qξ|Gin terms of properties it inherits from Pξ|G but without reference to thelatter. More specifically let ξ be a r.v. and G a sub-σ-field of F . A function

296 Conditioning

Qξ|G(B,ω) defined for each B ∈ B and ω ∈ Ω, with values in [0, 1] is calleda regular conditional distribution of ξ given G if

(i) for each fixed B ∈ B, Qξ|G(B,ω) is a G-measurable function of ω, andfor each fixed ω ∈ Ω, Qξ|G(B,ω) is a probability measure on the Borel setsB, and

(ii) for each fixed B ∈ B, Qξ|G(B,ω) = P(ξ–1B|G)(ω) a.s.It is clear that if a regular conditional probability Pξ|G of ξ given G existsthen Qξ|G as defined above from it, is a regular conditional distribution of ξgiven G.

We shall see that, in contrast to regular conditional probability, a regularconditional distribution of ξ given G always exists (Theorem 13.4.3) andthat the conditional expectation of every σ(ξ)-measurable r.v. with finiteexpectation may be expressed as an integral over R with respect to theregular conditional distribution (Theorem 13.4.4).

As for the regular conditional probability of ξ given G the followingintuitively appealing results hold:

(i) if σ(ξ) and G are independent, then

Qξ|G(B,ω) = Pξ–1(B) for all B ∈ B and ω ∈ Ω,

i.e. for each fixed ω ∈ Ω the conditional distribution of ξ given G is just thedistribution of ξ.

(ii) If ξ is G-measurable, then

Qξ|G(B,ω) = χξ–1B

(ω) = χB (ξ(ω)) for all B ∈ B and ω ∈ Ω,

i.e. for each fixed ω ∈ Ω the conditional distribution of ξ given G is aprobability measure concentrated at the point ξ(ω).

Theorem 13.4.3 If ξ is a r.v. and G a sub-σ-field of F , then there existsa regular conditional distribution of ξ given G.

Proof Write Ax = ξ–1(–∞, x] for any real x. By Theorem 13.3.2 it is clearthat for any fixed x, y with x ≥ y, P(Ax|G)(ω) ≥ P(Ay|G)(ω) a.s., for anyfixed x, P(Ax+1/n|G)(ω) → P(Ax|G)(ω) a.s. as n → ∞, and for any fixedsequence {xn} with xn → ∞ (–∞), P(Axn |G)(ω)→ 1 (0) a.s. By combininga countable number of zero measure sets in G we obtain a G-measurableset N with P(N) = 0 such that for each ω � N

(a) P(Ax|G)(ω) is a nondecreasing function of rational x(b) limn→∞ P(Ax+1/n|G)(ω) = P(Ax|G)(ω) for all rational x(c) limx→∞ P(Ax|G)(ω) = 1, limx→–∞ P(Ax|G)(ω) = 0 for rational x →

±∞.

Define functions F(x,ω) as follows:

for ω � N: F(x,ω) = P(Ax|G)(ω) if x is rational

= lim{F(r,ω) : r rational, r ↓ x} if x is irrational

for ω ∈ N: F(x,ω) = 0 or 1 according as x < 0 or x ≥ 0.

Then it is easily checked that F(x,ω) is a distribution function for eachfixed ω ∈ Ω and hence defines a probability measure Q(B,ω) on the classB of Borel sets, satisfying Q((–∞, x],ω) = F(x,ω) for each real x.

It will follow that Q(B,ω) is the desired regular conditional distributionof ξ given G if we show that for each B ∈ B,

(i) Q(B,ω) is a G-measurable function of ω(ii) Q(B,ω) = P(ξ–1B|G)(ω) a.s.

Let D be the class of all Borel sets B for which (i) and (ii) hold. If x isrational and B = (–∞, x], then Q(B,ω) = F(x,ω) which is equal to the G-measurable function P(Ax|G)(ω) if ω � N and a constant (0 or 1) if ω ∈ N.Further N ∈ G and P(N) = 0. Since Ax = ξ–1B, (i) and (ii) both follow whenB = (–∞, x], for rational x. Thus (–∞, x] ∈ D when x is rational.

It is easily checked thatD is aD-class. If Bi are disjoint sets ofD, withB = ∪∞1 Bi we have Q(B,ω) =

∑∞1 Q(Bi,ω) which is G-measurable since

each term is, so that (i) holds. Also,∑∞

1 Q(Bi,ω) =∑∞

1 P(ξ–1Bi|G)(ω) =P(∪∞1 ξ–1Bi|G)(ω) a.s. by Theorem 13.3.2, and this is P(ξ–1B|G), so thatD isclosed under countable disjoint unions. Similarly it is closed under properdifferences.

Thus D is a D-class containing the class of all sets of the form (–∞, x]for rational x. But this latter class is closed under intersections, and itsgenerated σ-ring is B (cf. Ex. 1.21). HenceD ⊃ B, as desired. �

The following result shows in particular that the conditional expectationof a function g of a r.v. ξ may be obtained by integrating g with respect toa regular conditional distribution of ξ (cf. Theorem 13.4.2).

Theorem 13.4.4 Let ξ be a r.v. and Qξ|G a regular conditional distribu-tion of ξ given G. Let η be a G-measurable r.v. and g a Borel measurablefunction on the plane such that E|g(ξ, η)| < ∞. Then

E{g(ξ, η)|G}(ω) =∫ ∞

–∞g(x, η(ω))Qξ|G(dx,ω) a.s.

In particular, if E is a Borel measurable set of the plane and Ey its y-section{x ∈ R : (x, y) ∈ E}, then

P{(ξ, η) ∈ E|G}(ω) = Qξ|G(Eη(ω),ω) a.s.

298 Conditioning

Proof We will first show that for every E ∈ B2, Qξ|G(Eη(ω),ω) is G-measurable and P{(ξ, η) ∈ E|G}(ω) = Qξ|G(Eη(ω),ω) a.s. Let E = A × Bwhere A, B ∈ B. Then Qξ|G(Eη(ω),ω) = Qξ|G(A,ω) or Qξ|G(∅,ω) accordingas η(ω) ∈ B or Bc, so that clearly Qξ|G(Eη(ω),ω) is G-measurable. Furthersince Qξ|G(A,ω) = P(ξ–1A|G) a.s. and P(ξ–1∅|G) = 0 a.s., it follows that

Qξ|G(Eη(ω),ω) = χη–1B(ω)P{ξ–1A|G}(ω) a.s.

= χη–1B(ω)E{χξ–1A|G}(ω) a.s.

= E{χξ–1Aχη–1B|G}(ω) a.s.

= P{(ξ, η) ∈ E|G}(ω) a.s.

(since χη–1B is σ(η)-measurable). Hence Qξ|G(Eη(ω),ω) is (a version of)P{(ξ, η) ∈ E|G} when E = A × B, A, B ∈ B.

Now denote by D the class of subsets E of R2 such that Qξ|G(Eη(ω),ω)is G-measurable and P{(ξ, η) ∈ E|G}(ω) = Qξ|G(Eη(ω),ω) a.s. (the excep-tional set depending in general on each set E). Then by writing P{(ξ, η) ∈E|G} = E{χ{(ξ,η)∈E} |G} and using the properties of conditional expectation andthe regular conditional distribution it is seen immediately that D is a D-class (i.e. closed under countable disjoint unions and proper differences).SinceD contains the Borel measurable rectangles of R2, it will contain theσ-field they generate, the Borel sets B2 of R2. Hence the second equalityof the theorem is proved.

The first equality is then obtained by the usual extension. If g = χE , theindicator of a set E ∈ B2, then by the above the equality holds. Hence it alsoholds for a B-measurable simple function g. By using the ordinary and theconditional monotone convergence theorem (and Theorem 3.5.2) we seethat it is true for all nonnegative B2-measurable functions g and hence alsofor all g as in the theorem. �

Since a regular conditional distribution Qξ|G of ξ given G always exists,one may attempt to obtain a regular conditional probability Pξ|G of ξ givenG by

Pξ|G(A,ω) = Qξ|G(B,ω) when A ∈ σ(ξ), B ∈ B, A = ξ–1B

(as was pointed out earlier in this section, if Pξ|G exists this relationshipdefines a regular conditional distribution Qξ|G). However, given A ∈ σ(ξ)there may be several Borel sets B such that A = ξ–1B for which the valuesQξ|G(B,ω) are not all equal (for fixed ω) and then Pξ|G is not defined inthe above way. Under a rather mild condition on ξ it is shown in the fol-lowing theorem that this difficulty is eliminated and a regular conditionalprobability can then be defined from a regular conditional distribution.

Theorem 13.4.5 Let ξ be a r.v. (for convenience defined for all ω) and Ga sub-σ-field of F . If the range E = {ξ(ω) : ω ∈ Ω} of ξ is a Borel set thenthere exists a regular conditional probability of ξ given G.

Proof Let Qξ|G be a regular conditional distribution of ξ given G, whichalways exists by Theorem 13.4.3. Then since E ∈ B and ξ–1(E) = Ω,

Qξ|G(E,ω) = P(ξ–1(E)|G)(ω) = P(Ω|G)(ω) = 1 a.s.

and thus there is a set N ∈ G, with P(N) = 0, such that for all ω �N,Qξ|G(E,ω) = 1.

Now fix A ∈ σ(ξ) with A = ξ–1(B1) = ξ–1(B2) where B1, B2 ∈ B. ThenB1 – B2 and B2 – B1 are Borel subsets of Ec and thus for all ω � N (sinceQξ|G is a measure for every ω)

Qξ|G(B1 – B2,ω) = 0 = Qξ|G(B2 – B1,ω)

so that

Qξ|G(B1,ω) = Qξ|G(B1 ∩ B2,ω) = Qξ|G(B2,ω).

Hence the following definition is unambiguous.

Pξ|G(A,ω) ={

Qξ|G(B,ω) for ω � Np(A) for ω ∈ N

and all A ∈ σ(ξ)

where B ∈ B is such that A = ξ–1(B) and p is an arbitrary but fixed prob-ability measure on σ(ξ). Since Qξ|G is a regular conditional distribution ofξ given G and since P(N) = 0, it is clear that Pξ|G is a regular conditionalprobability of ξ given G. �

Finally, if η is a r.v. then the following notionsregular conditional probability on F given ηregular conditional probability of ξ given ηregular conditional distribution of ξ given η

are defined (as usual) as the corresponding quantities introduced in thissection with G = σ(η), the notation used here for the last two being Pξ|ηand Qξ|η. A regular conditional distribution Qξ|η of ξ given η always ex-ists (Theorem 13.4.3) and the conditional expectation given η of everyσ(ξ, η)-measurable r.v. with finite expectation is expressed as an integralwith respect to Qξ|η, as follows from Theorem 13.4.4. Thus, if g is a Borelmeasurable function on the plane such that E|g(ξ, η)| < ∞, then

E{g(ξ, η)|η}(ω) =∫ ∞

–∞g(x, η(ω))Qξ|η(dx,ω) a.s.

300 Conditioning


P{(ξ, η) ∈ E|η}(ω) = Qξ|η(Eη(ω),ω) a.s.

13.5 Conditioning on the value of a r.v.

As promised in Section 13.1 we will now define conditional expectation(and hence then also conditional probability) given the event that a r.v. ηtakes the value y, which may have probability zero even for all y. The con-ditional expectation given η = y will be defined first giving the conditionalprobability as a particular case. Specifically if ξ, η are r.v.’s, with E|ξ| < ∞,it is known by Theorem 13.2.8 that the conditional expectation of ξ givenη is a Borel measurable function of η, i.e. E(ξ|η)(ω) = h(η(ω)) for someBorel function h. The conditional expectation of ξ given the value y of ηmay then be simply defined by

E(ξ|η = y) = h(y)

that is E(ξ|η = y) may be regarded as a version of the conditional expecta-tion induced on R by the transformation η(ω) (and thus Borel, rather thanσ(η)-measurable).

If B ∈ B it follows at once that∫BE(ξ|η = y) dPη–1(y) =

∫Bh(y) dPη–1(y) =

∫η–1B

h(η(ω)) dP(ω)

=∫η–1BE(ξ|η)(ω) dP(ω) =

∫η–1B

ξ dP.

Since in particular∫

Bh(y) dPη–1(y) =

∫η–1B

ξ dP, any two choices of h(y)

have the same integral∫

Bh dPη–1 for every B and hence must be equal a.s.

(Pη–1) so that E(ξ|η = y) is uniquely defined (a.s.).This is, of course, totally analogous to the defining property for E(ξ|η)

and may be similarly used as an independent definition of E(ξ|η = y) asindicated in the following result.

Theorem 13.5.1 For a r.v. ξ with E|ξ| < ∞ and a r.v. η, the conditionalexpectation of ξ given η = y may be equivalently defined (uniquely a.s.(Pη–1)) as a B-measurable function E{ξ|η = y} satisfying∫

η–1Bξ dP =

∫BE(ξ|η = y) dPη–1(y)

for each B ∈ B. In particular it follows by taking B =R that Eξ =∫E(ξ|η= y)

dPη–1(y) =∫E(ξ|η = y) dFη(y) where Fη is the d.f. of η.

13.5 Conditioning on the value of a r.v. 301

Proof That E(ξ|η = y) exists satisfying the defining equation and is a.s.unique follow as above, or may be shown directly from use of the Radon–Nikodym Theorem similarly to the definition of E(ξ|G) in Section 13.2. �

The conditional probability P(A|η = y) of A ∈ F given η = y is nowdefined as

P(A|η = y) = E(χA |η = y) a.s. (Pη–1).

Thus P(A|η = y) is a Borel measurable (and Pη–1-integrable) function on Rwhich is determined uniquely a.s. (Pη–1) by the equality

P(A ∩ η–1B) =∫

BP(A|η = y) dPη–1(y) for all B ∈ B.

In particular, for B = R

P(A) =∫ ∞

–∞P(A|η = y) dPη–1(y).

Since P(A|η = y) = f (y) where P(A|η)(ω) = f (η(ω)), the properties ofP(A|η = y) are easily deduced from those of P(A|η). In particular all proper-ties of Theorem 13.3.2 are valid, with “given G ” replaced by “given η = y”and “a.s.” replaced by “a.s. (Pη–1)”.

In a similar way the following notions can be defined for r.v.’s ξ, η:regular conditional probability of F given η = yregular conditional probability of ξ given η = yregular conditional distribution of ξ given η = y

with properties similar to the properties of the corresponding notions“given η” or “given G ” as developed in Section 13.4. These definitionsand properties will not all be listed here, in order to avoid overburdeningthe text, but as an example consider the third notion (which always exists),defined as follows. A function Qξ|η(B, y) defined on B×R to [0, 1] is calleda regular conditional distribution of ξ given η = y if

(i) for each fixed B ∈ B, Qξ|η(B, y) is a Borel measurable function of y,and for each fixed y ∈ R, Qξ|η(B, y) is a probability measure on the Borelsets B, and

(ii) for each fixed B ∈ B, Qξ|η(B, y) = P(ξ–1B|η = y) a.s. (Pη–1).As for a regular conditional distribution of ξ given η there are the fol-

lowing extreme cases:(i) if ξ and η are independent then Qξ|η(B, y) = Pξ–1(B) for all B ∈ B

and y ∈ R, i.e. for every fixed y ∈ R, the conditional distribution of ξ givenη = y is equal to the distribution of ξ; and

302 Conditioning

(ii) if ξ is σ(η)-measurable then Qξ|η(B, y) = χB (f (y)) for all B ∈ B andy ∈ R; where f is defined by ξ = f (η), i.e. for each fixed y ∈ R, the condi-tional distribution of ξ given η = y is a probability measure concentrated atthe point f (y).

The main properties of a regular conditional distribution of ξ given η = yare collected in the following result.

Theorem 13.5.2 Let ξ and η be r.v.’s. Then

(i) There exists a regular conditional distribution of ξ given η = y.(ii) If Qξ|η and Qξ|η are regular conditional distributions of ξ given η and

given η = y respectively, then

Qξ|η(B,ω) = Qξ|η(B, η(ω)) for all B ∈ B and ω � N

where N ∈ σ(η) and P(N) = 0.(iii) If g is a Borel measurable function on the plane such that

E|g(ξ, η)|<∞, then

E{g(ξ, η)|η = y} =∫ ∞

–∞g(x, y)Qξ|η(dx, y) a.s. (Pη–1).


P{(ξ, η) ∈ E|η = y} = Qξ|η(Ey, y) a.s. (Pη–1).

Proof The construction of a regular conditional distribution of ξ givenη= y follows that of Theorem 13.4.3 in detail, with the obvious adjustments:“given G ” is replaced by “given η = y”, the exceptional G-measurablesets with P-measure zero become Borel sets with Pη–1-measure zero, andinstead of defining F(x,ω) from R × Ω to [0, 1], it is defined from R × Rto [0, 1]. All the needed properties for conditional probabilities given η = yare valid since as already noted Theorem 13.3.2 holds with “G ” replacedby “η = y”.

Now let Qξ|η and Qξ|η be a regular conditional distribution of ξ given ηand η = y respectively. Then for each fixed B ∈ B, Qξ|η(B,ω) = P(ξ–1B|η)(ω)a.s., Qξ|η(B, y) = P(ξ–1B|η = y) a.s. (Pη–1) and it follows from the condi-tional probability version of Theorem 13.5.1 that

Qξ|η(B,ω) = Qξ|η(B, η(ω)) a.s.

From now on we write Q and Q for Qξ|η and Qξ|η. Let {Bn} be a sequenceof Borel sets which generates the σ-field of Borel sets B (cf. Ex. 1.21).

13.6 Regular conditional densities 303

Then by combining a countable number of σ(η)-measurable sets of zeroprobability we obtain a set N ∈ σ(η) with P(N) = 0 such that

Q(Bn,ω) = Q(Bn, η(ω)) for all n and all ω � N.

Denote by C the class of all subsets B of the real line such that Q(B,ω) =Q(B, η(ω)) for all ω � N. Since for each ω ∈ Ω, Q(B,ω) and Q(B, η(ω)) areprobability measures on B, it follows simply that C is a σ-field and sinceit contains {Bn} it will contain its generated σ-field B. Thus Q(B,ω) =Q(B, η(ω)) for all B ∈ B and ω � N, i.e. (ii) holds.

(iii) follows immediately from Theorem 13.4.4 (see also the last para-graph of Section 13.4), the relationship between Qξ|η and Qξ|η, and Theorem13.5.1 in the following form:

If E{g(ξ, η)|η}(ω) = f (η(ω)) a.s. then E{g(ξ, η)|η = y} = f (y) a.s. (Pη–1).�

13.6 Regular conditional densities

For two r.v.’s ξ and ηwe have (in Sections 13.4 and 13.5) defined the regularconditional distribution Qξ|η(B,ω) of ξ given η and the regular conditionaldistribution Qξ|η(B, y) of ξ given η = y, and have shown that both alwaysexist. For each fixed ω and y, Qξ|η(·,ω) and Qξ|η(·, y) are probability mea-sures on the Borel setsB, and if they are absolutely continuous with respectto Lebesgue measure it is natural to call their Radon–Nikodym derivativesconditional densities of ξ given η, and given η = y respectively. As is clearfrom the previous sections regular versions of conditional densities will beof primary interest. To be precise, a function fξ|η(x,ω) defined on R × Ω to[0,∞] is called a regular conditional density of ξ given η if it is B × σ(η)-measurable, for every fixed ω, fξ|η(x,ω) is a probability density function inx, and for all B ∈ B and ω ∈ Ω,

Qξ|η(B,ω) =∫

Bfξ|η(x,ω) dx.

Similarly a function fξ|η(x, y) defined on R2 to [0,∞] is called a regularconditional density of ξ given η = y if it is B × B-measurable, for everyfixed y, fξ|η(x, y) is a probability density function in x, and for all B ∈ B andy ∈ R,

Qξ|η(B, y) =∫

Bfξ|η(x, y) dx.

It is easy to see that fξ|η exists if and only if fξ|η exists and that in this casethey are related by

fξ|η(x,ω) = fξ|η(x, η(ω)) a.e.

304 Conditioning

(with respect to the product of Lebesgue measure and P) (cf. Theorem13.5.2). It is also clear (in view of Theorems 13.4.2 and 13.5.2) that con-ditional expectations can be expressed in terms of regular conditional den-sities, whenever the latter exist; for instance if g is a Borel measurablefunction on the plane such that E|g(ξ, η)| < ∞ then we have the following:

E{g(ξ, η)|η = y} =∫ ∞

–∞g(x, y)fξ|η(x, y) dx a.s. (Pη–1)

E{g(ξ, η)|η}(ω) =∫ ∞

–∞g(x, η(ω))fξ|η(x,ω) dx a.s.

The following result shows that a regular conditional density exists if ther.v.’s ξ and η have a joint probability density function. If f (x, y) is a jointp.d.f. of ξ and η (assumed defined and nonnegative everywhere) then thefunctions fξ(x) and fη(y) defined for all x and y by

fξ(x) =∫ ∞

–∞f (x, y) dy, fη(y) =∫ ∞

–∞f (x, y) dx

are p.d.f.’s of ξ, η respectively (Section 9.3).

Theorem 13.6.1 Let ξ and η be r.v.’s with joint p.d.f. f (x, y) and fη(y)defined as above. Then the function f (x, y) defined by

f (x, y) =

⎧⎪⎪⎨⎪⎪⎩f (x, y)/fη(y) if fη(y) > 0

h(x) if fη(y) = 0

where h(x) is an arbitrary but fixed p.d.f., is a regular conditional densityof ξ given η = y. Hence a regular conditional density of ξ given η is givenby fξ|η(x,ω) = f (x, η(ω)).

Proof Since f is B × B-measurable, it follows by Fubini’s Theorem thatfη is B-measurable and hence f is B × B-measurable.

From the definition of f it is clear that it is nonnegative and that for everyfixed y,

∫ ∞–∞ f (x, y) dx = 1. Hence for fixed y, f (x, y) is a p.d.f. in x.

Now define Q(B, y) for all B ∈ B and y ∈ R by

Q(B, y) =∫

Bf (x, y) dx.

It follows from the properties of f just established, that for each fixedB ∈ B, Q(B, y) is a Borel measurable function of y, and for each fixedy ∈ R, Q(B, y) is a probability measure on the Borel sets. In order to con-clude that Q = Qξ|η it suffices then to show that for each fixed B ∈ B,

13.7 Summary 305

Q(B, y) = P(ξ–1B|η = y) a.s. (Pη–1). Now for every fixed B ∈ B and everyE ∈ B we have∫

EQ(B, y) dPη–1(y) =

∫E∩{fη(y)>0}

∫Bf (x, y) dx dPη–1(y)

=∫

E∩{fη(y)>0}

∫Bf (x, y)fη(y) dx dy

=∫

E∩{fη(y)>0}

∫Bf (x, y) dx dy

= P{ξ–1B ∩ η–1(E ∩ {fη(y) > 0})

}= P{ξ–1B ∩ η–1E}

since Pη–1{fη(y) = 0} = 0. It follows that Q(B, y) = P(ξ–1B|η = y) a.s. (Pη–1)and thus f (x, y) is a regular conditional density of ξ given η = y. �

13.7 Summary

This is a summary of the main concepts defined in this chapter and theirmutual relationships.

I. 1. E(ξ|G): conditional expectation of ξ given G2. P(A|G): conditional probability of A ∈ F given G

Relationship: P(A|G) = E(χA |G).

II. 1. Pξ|G(A,ω): regular conditional probability of ξ given G (A ∈ σ(ξ))(exists if ξ(Ω) ∈ B)

2. Qξ|G(B,ω): regular conditional distribution of ξ given G (B ∈ B)(always exists)

Relationship, when they both exist:For a.e. ω ∈ Ω

Qξ|G(B,ω) = Pξ|G(ξ–1B,ω) for all B ∈ B.

If G = σ(η) all concepts in I and II retain their name with “given η” replac-ing “given G”.

III. 1. E(ξ|η = y): conditional expectation of ξ given η = y.2. P(A|η = y): conditional probability of A ∈ F given η = y.

Relationship to I:

E(ξ|η = y) = f (y) a.e. (Pη–1) if and only if E(ξ|η) = f (η) a.s.

P(A|η = y) = f (y) a.e. (Pη–1) if and only if P(A|η) = f (η) a.s.

3. Qξ|η(B, y): regular conditional distribution of ξ given η = y (B ∈B)(always exists)

306 Conditioning

Relationship to II:

Qξ|η(B,ω) = Qξ|η(B, η(ω)) for all B ∈ B, ω � N ∈ σ(η) with P(N) = 0.

Exercises13.1 Let ξ be a r.v. with E|ξ| < ∞ andG a purely atomic sub-σ-field of F , i.e.G is

generated by the disjoint events {E0, E1, E2, . . .} with P(E0) = 0, P(En) > 0for n = 1, 2, . . . and Ω = ∪n≥0En. Using the definition of E(ξ|G) given inSection 13.2 show that

E(ξ|G) =∑n≥1

χEn

1P(En)

∫Enξ dP a.s.

(Hint: Show first that every set E in G is the union of a subsequence of{En, n ≥ 0}.)

13.2 If the r.v.’s ξ and η are such that E|ξ| < ∞ and η is bounded then show that

E[E(ξ|G)η] = E[ξE(η|G)] = E[E(ξ|G)E(η|G)].

13.3 Let ξ, η, ζ be r.v.’s with E|ξ| < ∞ and η independent of the pair ξ, ζ. Showthat

E(ξ|η, ζ) = E(ξ|ζ) a.s.

Show also that if ξ is a Borel measurable function of η and ζ (ξ = f (η, ζ))then it is a Borel measurable function of ζ only (ξ = g(ζ)).

13.4 State and prove the conditional form of the Holder and Minkowski Inequal-ities.

13.5 If ξ ∈ Lp(Ω,F , P), p ≥ 1, show that E(ξ|G) ∈ Lp(Ω,F , P) and

||E(ξ|G)||p = E1/p[|E(ξ|G)|p] ≤ E1/p(|ξ|p) = ||ξ||p.

(Hint: Use the Conditional Jensen’s Inequality (Theorem 13.2.9).)13.6 Two r.v.’s ξ and η in L2(Ω,F , P) are called orthogonal if E(ξη) = 0. Let

ξ ∈ L2(Ω,F , P); then E(ξ|G) ∈ L2(Ω,F , P) by Ex. 13.5. Show that E(ξ|G)is the unique r.v. η ∈ L2(Ω,G, PG) which minimizes E(ξ – η)2 and that theminimum value is

E(ξ2) – E{E2(ξ|G)}.

E(ξ|G) is called the (in general, nonlinear) mean square estimate of ξ basedon G. (Hint: Show that ξ – E(ξ|G) is orthogonal to all r.v.’s in L2(Ω,G, PG),so that E(ξ|G) is the projection of ξ onto L2(Ω,G, PG), and that for everyη ∈ L2(Ω,G, PG), E(ξ – η)2 = E{ξ – E(ξ|G)}2 + E{η – E(ξ|G)}2.)In particular, if η is a r.v., then E(ξ|η) is the unique r.v. ζ ∈ L2(Ω,σ(η), Pσ(η))which minimizes E(ξ – ζ)2, or equivalently h(η) = E(ξ|η) is the uniquefunction g ∈ L2(R,B, Pη–1) which minimizes E[ξ – g(η)]2. E(ξ|η) is called

Exercises 307

the (in general, nonlinear) mean square estimate or least square regressionof ξ based on η. It follows from Ex. 13.12 that if ξ and η have a joint normaldistribution then E(ξ|η) = a + bη a.s. and thus the least squares regressionof ξ based on η is linear.

13.7 Prove the conditional form of Jensen’s Inequality (Theorem 13.2.9) by usingregular conditional distributions and the ordinary form of Jensen’s Inequal-ity (Theorem 9.5.4).

13.8 Let ξ and η be independent r.v.’s. Show that for every Borel set B,

P(ξ + η ∈ B|η)(ω) = Pξ–1{B – η(ω)} a.s.

where B – y = {x : x + y ∈ B}. What is then P(ξ + η ∈ B|η = y) equal to?Show also that

Qξ+η(B,ω) = Pξ–1{B – η(ω)}

is a regular conditional distribution of ξ + η given η.13.9 Let G be a sub-σ-field of F . We say that a family of classes of events

{Aλ, λ ∈ Λ} is conditionally independent given G if

P(∩n

k=1Aλk |G)

=n∏

k=1

P(Aλk |G

)a.s.

for any n, any λ1, . . . , λn ∈ Λ and any Aλk ∈ Aλk , k = 1, . . . , n. Generalizethe Kolmogorov Zero-One Law to conditional independence: if {ξn}∞n=1 is asequence of conditionally independent r.v.’s given G and A is a tail event,show that

P(A|G) = 0 or 1 a.s.,

and if ξ is a tail r.v., show that ξ = η a.s. for some G-measurable r.v. η.13.10 Let ξ and η be r.v.’s with E|ξ| < ∞. If y ∈ R is such that P(η = y) > 0 then

show that E(ξ|η = y) as defined in Section 13.5 is given by

E(ξ|η = y) =1

P(η = y)

∫{η=y}ξ dP.

(Hint: Let D be the at most countable set of points y ∈ R such thatP(η = y) > 0. Define f : R → R by f (y) = 1

P(η=y)

∫{η=y}ξ dP if y ∈ D and

f (y) = E(ξ|η = y) if y � D, and show that for all Borel sets B,∫

Bf dPη–1 =∫η–1Bξ dP.)

13.11 Let ξ be a r.v. and η a discrete r.v. with values y1, y2, . . . . Find expressionsfor the regular conditional probability of ξ given η and for the regular con-ditional distribution of ξ given η and given η = y. Simplify further theseexpressions when ξ is discrete with values x1, x2, . . ..

308 Conditioning

13.12 Let the r.v.’s ξ1 and ξ2 have a joint normal distribution with E(ξi) = μi,var(ξi) = σ2

i > 0, i = 1, 2, and E{(ξ1 – μ1)(ξ2 – μ2)} = ρσ1σ2, |ρ| < 1, i.e. ξ1

and ξ2 have the joint p.d.f.

1

2πσ1σ2√

1 – ρ2

× exp

⎧⎪⎪⎨⎪⎪⎩–1

2(1 – ρ2)

⎡⎢⎢⎢⎢⎢⎣ (x1 – μ1)2

σ21

–2ρ(x1 – μ1)(x2 – μ2)

σ1σ2+

(x2 – μ2)2

σ22

⎤⎥⎥⎥⎥⎥⎦⎫⎪⎪⎬⎪⎪⎭ .

Find the regular conditional density of ξ1 given ξ2 = x2 and show that

E(ξ1|ξ2) = μ1 + ρσ1

σ2(ξ2 – μ2) a.s.

(What happens when |ρ| = 1?)13.13 Let the r.v.’s ξ and η be such that ξ has a uniform distribution on [0, 1] and

the (regular) conditional distribution of η given ξ = x, x ∈ [0, 1], is uniformon [–x, x]. Find the regular conditional densities of ξ given η = y and of ηgiven ξ = x, and the conditional expectations E(ξ + η|ξ) and E(ξ + η|η).

14

Martingales

14.1 Definition and basic properties

In this chapter we consider the notion of a martingale sequence, which hasmany of the useful properties of a sequence of partial sums of independentr.v.’s (with zero means) and which forms the basis of a significant segmentof basic probability theory.

As usual, (Ω,F , P) will denote a fixed probability space. Let {ξn} be asequence of r.v.’s and {Fn} a sequence of sub-σ-fields of F . Where noth-ing else is specified in writing sequences such as {ξn}, {Fn} etc. it will beassumed that the range of n is the set of positive integers {1, 2, . . .}. We saythat {ξn,Fn} is a martingale (respectively, a submartingale, a supermartin-gale) if for every n,

(i) Fn ⊂ Fn+1

(ii) ξn is Fn-measurable and integrable

(iii) E(ξn+1|Fn) = ξn (resp. ≥ ξn, ≤ ξn) a.s.

This definition trivially contains the notion of {ξn,Fn, 1 ≤ n ≤ N} beinga martingale (respectively, a submartingale, a supermartingale); just takeξn = ξN and Fn = FN for all n > N. Clearly {ξn,Fn} is a submartingaleif and only if {–ξn,Fn} is a supermartingale. Thus the properties of super-martingales can be obtained from those of submartingales and in the sequelonly martingales and submartingales will typically be considered.

Example 1 Let {ξn} be a sequence of independent r.v.’s in L1 with zeromeans and let

Sn = ξ1 + · · · + ξn, Fn = σ(ξ1, . . . , ξn), n = 1, 2, . . . .

309

310 Martingales

Then {Sn,Fn} is a martingale since for every n, Sn is clearly Fn-measurableand integrable, and

E(Sn+1|Fn) = E(ξn+1 + Sn|Fn)

= E(ξn+1|Fn) + E(Sn|Fn)

= Eξn+1 + Sn = Sn a.s.

since Sn is Fn-measurable, σ(ξn+1) and Fn are independent, and Eξn+1 = 0.

Example 2 Let {ξn} be a sequence of independent r.v.’s in L1 with finite,nonzero means Eξn = μn, and let

ηn =n∏

k=1

ξk

μk, Fn = σ(ξ1, . . . , ξn), n = 1, 2, . . . .

Then {ηn,Fn} is a martingale since for every n, ηn is clearly Fn-measurableand integrable, and

E(ηn+1|Fn) = E(ξn+1

μn+1ηn|Fn

)= ηnE

(ξn+1

μn+1|Fn

)

= ηnE(ξn+1

μn+1

)= ηn a.s.

since ηn is Fn-measurable, and σ(ξn+1) and Fn are independent.

Example 3 Let ξ be an integrable r.v. and {Fn} an increasing sequence ofsub-σ-fields of F (i.e. Fn ⊂ Fn+1, n = 1, 2, . . .). Let

ξn = E(ξ|Fn) for n = 1, 2, . . . .

Then {ξn,Fn} is a martingale since for each n, ξn is Fn-measurable andintegrable, and

E(ξn+1|Fn) = E{E(ξ|Fn+1)|Fn}= E(ξ|Fn) = ξn a.s.

by Theorem 13.2.2 since Fn ⊂ Fn+1. It will be shown in Section 14.3 that amartingale {ξn,Fn} is of this type, i.e. ξn = E(ξ|Fn) for some ξ ∈ L1, if andonly if the sequence {ξn} is uniformly integrable.

The following results contain the simplest properties of martingales.

Theorem 14.1.1 (i) If {ξn,Fn} and {ηn,Fn} are two martingales (resp.submartingales, supermartingales) then for any real numbers a and b(resp. nonnegative numbers a and b) {aξn + bηn,Fn} is a martingale(resp. submartingale, supermartingale).

14.1 Definition and basic properties 311

(ii) If {ξn,Fn} is a martingale (resp. submartingale, supermartingale) thenthe sequence {Eξn} is constant (resp. nondecreasing, nonincreasing).

(iii) Let {ξn,Fn} be a submartingale (resp. supermartingale). Then {ξn,Fn}is a martingale if and only if the sequence {Eξn} is constant.

Proof (i) is obvious from the linearity of conditional expectation (Theo-rem 13.2.1 (ii)).

(ii) If {ξn,Fn} is a martingale we have for every n = 1, 2, . . . , E(ξn+1|Fn) = ξn a.s. and thus

Eξn+1 = E{E(ξn+1|Fn)} = Eξn.

Similarly for a sub- and supermartingale.(iii) The “only if” part follows from (ii). For the “if” part assume that

{ξn,Fn} is a submartingale and that {Eξn} is constant. Then for all n,

E{E(ξn+1|Fn) – ξn} = Eξn+1 – Eξn = 0

and since E(ξn+1|Fn) – ξn ≥ 0 a.s. (from the definition of a submartingale)and E(ξn+1|Fn) – ξn ∈ L1, it follows (Theorem 4.4.7) that

E(ξn+1|Fn) – ξn = 0 a.s.

Hence {ξn,Fn} is a martingale. �

The next theorem shows that any martingale is also a martingale relativeto σ(ξ1, . . . , ξn), and extends property (iii) of the martingale (submartin-gale, supermartingale) definitions.

Theorem 14.1.2 If {ξn,Fn} is a martingale, then so is {ξn,σ(ξ1, . . . , ξn)}and for all n, k = 1, 2, . . .

E(ξn+k|Fn) = ξn a.s.

with corresponding statements for sub- and supermartingales.

Proof If {ξn,Fn} is a martingale, since for every n, ξn is Fn-measurableand F1 ⊂ F2 ⊂ . . . ⊂ Fn, we have

σ(ξ1, . . . , ξn) ⊂ Fn.

It follows from Theorem 13.2.2, and Theorem 13.2.1 (v) that

E(ξn+1|σ(ξ1, . . . ξn)) = E{E(ξn+1|Fn)|σ(ξ1, . . . , ξn)}= E{ξn|σ(ξ1, . . . , ξn)}= ξn a.s.

so that {ξn,σ(ξ1, . . . , ξn)} is indeed a martingale.

312 Martingales

The equality E(ξn+k|Fn) = ξn a.s. holds for k = 1 and all n by the defini-tion of a martingale. If it holds for some k and all n, then it also holds fork + 1 and all n since

E(ξn+k+1|Fn) = E{E(ξn+k+1|Fn+k)|Fn}= E{ξn+k|Fn} = ξn a.s.

by Theorem 13.2.2 (Fn ⊂ Fn+k), the definition of a martingale, and theinductive hypothesis. The result thus follows for all n and k.

The corresponding statements for submartingales and supermartingalesfollow with the obvious changes. �

In the sequel the statement that “{ξn} is a martingale or sub-, super-martingale” without reference to σ-fields {Fn} will mean that Fn is to beunderstood to be σ(ξ1, . . . , ξn).

The following result shows that appropriate convex functions of martin-gales (submartingales) are submartingales.

Theorem 14.1.3 Let {ξn,Fn} be a martingale (resp. a submartingale) andg a convex (resp. a convex nondecreasing) function on the real line. If g(ξn)is integrable for all n, then {g(ξn),Fn} is a submartingale.

Proof Since g is Borel measurable, g(ξn) is Fn-measurable for all n. Also,since g is convex and ξn, g(ξn) are integrable, Theorem 13.2.9 gives

g(E{ξn+1|Fn}) ≤ E{g(ξn+1)|Fn} a.s.

for all n. If {ξn,Fn} is a martingale then E(ξn+1|Fn) = ξn a.s. and thus

g(ξn) ≤ E{g(ξn+1)|Fn} a.s.

which shows that {g(ξn),Fn} is a submartingale. If {ξn,Fn} is a submartin-gale then E(ξn+1|Fn) ≥ ξn a.s. and if g is nondecreasing we have

g(ξn) ≤ g(E{ξn+1|Fn}) ≤ E{g(ξn+1)|Fn} a.s.

which again shows that {g(ξn),Fn} is a submartingale. �

The following properties follow immediately from this theorem.

Corollary (i) If {ξn,Fn} is a submartingale, so is {ξn+,Fn} (where ξ+ = ξ

for ξ ≥ 0 and ξ+ = 0 for ξ < 0).(ii) If {ξn,Fn} is a martingale then {|ξn|,Fn} is a submartingale, and so is{|ξn|p,Fn}, 1 < p < ∞, provided ξn ∈ Lp for all n.

14.1 Definition and basic properties 313

A connection between martingales and submartingales is given in thefollowing.

Theorem 14.1.4 (Doob’s Decomposition) Every submartingale {ξn,Fn}can be uniquely decomposed as

ξn = ηn + ζn for all n, a.s.

where {ηn,Fn} is a martingale and the sequence of r.v.’s {ζn} is such thatζ1 = 0 a.s.ζn ≤ ζn+1 for all n a.s.ζn+1 is Fn-measurable for all n.{ζn} is called the predictable increasing sequence1 associated with the

submartingale {ξn}.

Proof Define

η1 = ξ1, ζ1 = 0

and for n ≥ 2

ηn = ξ1 +n∑

k=2

{ξk – E(ξk|Fk–1)}, ζn =n∑

k=2

{E(ξk|Fk–1) – ξk–1}

or equivalently

ηn = ηn–1 + ξn – E(ξn|Fn–1), ζn = ζn–1 + E(ξn|Fn–1) – ξn–1.

Then η1 + ζ1 = ξ1 and for all n ≥ 2

ηn + ζn = ξ1 +n∑

k=2

ξk –n∑

k=2

ξk–1 = ξn a.s.

Now {ηn,Fn} is a martingale, since for all n, ηn is clearly Fn-measurableand integrable and

E(ηn+1|Fn) = E{ηn + ξn+1 – E(ξn+1|Fn)|Fn}= ηn + E(ξn+1|Fn) – E(ξn+1|Fn)

= ηn a.s.

Also, ζ1 = 0 by definition, and for all n, ζn+1 is clearly Fn-measurable andintegrable, and the submartingale property E(ξn+1|Fn) ≥ ξn a.s. implies that

ζn+1 = ζn + E(ξn+1|Fn) – ξn ≥ ζn a.s.

Thus {ζn} has the stated properties.1 This terminology is most evident when e.g. Fn = σ(ξ1, . . . , ξn) so that ξn+1 ∈ Fn implies

that ξn+1 may be written as a function of (ξ1, . . . , ξn) so is “predictable” from thesevalues.

314 Martingales

The uniqueness of the decomposition is shown as follows. Let ξn = η′n+ζ′nbe another decomposition with {η′n} and {ζ′n} having the same properties as{ηn} and {ζn}. Then for all n,

ηn – η′n = ζ′n – ζn = θn,

say. Since {ηn,Fn} and {η′n,Fn} are martingales, so is {θn,Fn} so that

E(θn+1|Fn) = θn for all n a.s.

Also, since ζn+1 and ζ′n+1 are Fn-measurable, so is θn+1 and thus

E(θn+1|Fn) = θn+1 for all n a.s.

It follows that θ1 = · · · = θn = θn+1 = · · · a.s. and since θ1 = 0 a.s. we haveθn = 0 for all n a.s. and thus

η′n = ηn and ζ′n = ζn for all n a.s. �

14.2 Inequalities

There are a number of basic and useful inequalities for probabilities, mo-ments and “crossings” of submartingales, and the simpler of these aregiven in this section. The first provides a martingale form of Kolmogorov’sInequality (Theorem 11.5.1).

Theorem 14.2.1 If {(ξn,Fn) : 1 ≤ n ≤ N} is a submartingale, then for allreal a

aP{max1≤n≤N

ξn ≥ a} ≤∫{max1≤n≤N ξn≥a}ξN dP ≤ E|ξN |.

Proof Define (as in the proof of Theorem 11.5.1)

E = {ω : max1≤n≤N

ξn(ω) ≥ a}

E1 = {ω : ξ1(ω) ≥ a}En = {ω : ξn(ω) ≥ a} ∩ ∩n–1

k=1{ω : ξk(ω) < a}, n = 2, . . . , N.

Then En ∈ Fn for all n = 1, . . . , N, {En} are disjoint and E = ∪Nn=1En. Thus

∫EξN dP =

N∑n=1

∫EnξN dP.

Now for each n = 1, . . . , N,∫EnξN dP =

∫EnE(ξN |Fn) dP ≥

∫Enξn dP ≥ aP(En)

14.2 Inequalities 315

since En ∈ Fn, E(ξN |Fn) ≥ ξn by Theorem 14.1.2, and ξn ≥ a on En. Itfollows that

∫EξN dP ≥ a

N∑n=1

P(En) = aP(E).

This proves the left half of the inequality of the theorem and the right halfis obvious. �

That Theorem 14.2.1 contains Kolmogorov’s Inequality (Theorem11.5.1) follows from Example 1 and the following corollary.

Corollary Let {(ξn,Fn) : 1 ≤ n ≤ N} be a martingale and a > 0. Then

(i) P{max1≤n≤N |ξn| ≥ a} ≤ 1a

∫{max1≤n≤N |ξn |≥a}|ξN | dP ≤ E|ξN |/a.

(ii) If also Eξ2N < ∞, then

P{max1≤n≤N

|ξn| ≥ a} ≤ Eξ2N /a2.

Proof Since {(ξn,Fn) : 1 ≤ n ≤ N} is a martingale, {(|ξn|,Fn) : 1 ≤ n ≤ N}is a submartingale ((ii) of Theorem 14.1.3, Corollary) and (i) follows fromthe theorem.

For (ii) we will show that Eξ2N < ∞ implies Eξ2

n < ∞ for all n = 1, . . . , N.Then by part (ii) of the corollary to Theorem 14.1.3, {(ξ2

n ,Fn) : 1 ≤ n ≤ N}is a submartingale and (ii) follows from the theorem.

To show that if {(ξn,Fn) : 1 ≤ n ≤ N} is a martingale and Eξ2N < ∞,

then Eξ2n < ∞ for all n = 1, . . . , N, we define gk on the real line for each

k = 1, 2, . . . , by

gk(x) ={

x2 for |x| ≤ k2k(|x| – k/2) for |x| > k.

Then each gk is convex and gk(x) ↑ x2 for all real x. For each fixed k =1, 2, . . . , since for all n = 1, . . . , N,

E|gk(ξn)| =∫{|ξn |≤k}ξ

2n dP +

∫{|ξn |>k}2k(|ξn| – k/2) dP

≤ k2 + 2kE|ξn| < ∞,

it follows from Theorem 14.1.3 that {(gk(ξn),Fn) : 1 ≤ n ≤ N} is a sub-martingale and thus, by Theorem 14.1.1 (ii),

0 ≤ E{gk(ξ1)} ≤ . . . ≤ E{gk(ξN)} < ∞.

316 Martingales

Since gk(x) ↑ x2 for each x as k → ∞, the monotone convergence theoremimplies that for each n = 1, . . . , N, E{gk(ξn)} ↑ Eξ2

n . Hence we have

0 ≤ Eξ21 ≤ . . . ≤ Eξ2

N

and the result follows since Eξ2N < ∞. �

As a consequence of Theorem 14.2.1, the following inequality holds fornonnegative submartingales.

Theorem 14.2.2 If {(ξn,Fn) : 1 ≤ n ≤ N} is a submartingale such thatξn ≥ 0 a.s. n = 1, . . . , N, then for all p > 1,

E( max1≤n≤N

ξpn) ≤

( pp – 1

)p

EξpN .

Proof Define ζ = max1≤n≤N ξn and η = ξN . Then ζ, η ≥ 0 a.s. and itfollows from Theorem 14.2.1 that for all x > 0,

G(x) = P{ζ > x} ≤ 1x

∫{ζ≥x}η dP.

Now by applying the monotone convergence theorem and Fubini’sTheorem (i.e. integration by parts) we obtain

E(ζp) =∫ ∞

0xp d{1 – G(x)} =

∫ ∞0

xp d{–G(x)}

= limA↑∞

∫ A

0xp d{–G(x)}

= limA↑∞{p

∫ A

0xp–1G(x) dx – ApG(A)}

≤ limA↑∞

p∫ A

0xp–1G(x) dx = p

∫ ∞0

xp–1G(x) dx

≤ p∫ ∞

0xp–1 1

x

(∫{ζ≥x}η dP

)dx

by the inequality for G shown above. Change of integration order thus gives

E(ζp) ≤ p∫Ωη(ω)

(∫ ζ(ω)

0xp–2 dx

)dP(ω)

=p

p – 1

∫Ωη(ω)ζp–1(ω) dP(ω) =

pp – 1

E(ηζp–1)

≤ pp – 1

E1p (ηp)E

p–1p (ζp),

by Holder’s Inequality. It follows that E1p ζp ≤ p

p–1E1p (ηp) which implies the

result. �

The following corollary follows immediately from the theorem and (ii)of Theorem 14.1.3, Corollary.

14.2 Inequalities 317

Corollary If {(ξn,Fn) : 1 ≤ n ≤ N} is a martingale and p > 1, then

E( max1≤n≤N

|ξn|p) ≤(

pp – 1

)p

E|ξN |p.

The final result of this section is an inequality for the number of “up-crossings” of a submartingale, which will be pivotal in the next sectionin deriving the submartingale convergence theorem. This requires the fol-lowing definitions and notation. Let {x1, . . . , xN} be a finite sequence ofreal numbers and let a < b be real numbers. Let τ1 be the first integerin {1, . . . , N} such that xτ1 ≤ a, τ2 be the first integer in {1, . . . , N} largerthan τ1 such that xτ2 ≥ b, τ3 be the first integer in {1, . . . , N} larger than τ2

such that xτ3 ≤ a, τ4 be the first integer in {1, . . . , N} larger than τ3 such thatxτ4 ≥ b, and so on, and define τi = N +1 if the condition cannot be satisfied.In other words,

τ1 = min{j : 1 ≤ j ≤ N, xj ≤ a},τ2 = min{j : τ1 < j ≤ N, xj ≥ b},

τ2k+1 = min{j : τ2k < j ≤ N, xj ≤ a}, 3 ≤ 2k + 1 ≤ N

τ2k+2 = min{j : τ2k+1 < j ≤ N, xj ≥ b}, 4 ≤ 2k + 2 ≤ N

and τi = N + 1 if the corresponding set is empty. Let M be the number of τi

that do not exceed N. Then the number of upcrossings U[a,b] of the interval[a, b] by the sequence {x1, . . . , xN} is defined by

U[a,b] = [M/2] ={

M/2 if M is even(M – 1)/2 if M is odd

and is the number of times the sequence (completely) crosses from ≤ ato ≥ b.

Theorem 14.2.3 Let {(ξn,Fn) : 1 ≤ n ≤ N} be a submartingale, a < breal numbers, and let U[a,b](ω) be the number of upcrossings of the interval[a, b] by the sequence {ξ1(ω), . . . , ξN(ω)}. Then

EU[a,b] ≤E(ξN – a)+ – E(ξ1 – a)+

b – a≤ EξN+ + a–

b – a.

Proof It should be checked that U[a,b](ω) is a r.v. This may be done by firstshowing that {τn(ω) : 1 ≤ n ≤ N} are r.v.’s and then using the definition ofU[a,b] in terms of the τn’s.

Next assume first that a = 0 and ξn ≥ 0 for all n = 1, . . . , N. Define{ηn(ω) : 1 ≤ n ≤ N} by

ηn(ω) ={

1 if τ2k–1(ω) ≤ n < τ2k(ω) for some k = 1, . . . , [N/2]0 otherwise.

318 Martingales

We now show that each ηn is an Fn-measurable r.v. Since by definition{η1 = 1} = {ξ1 = 0}, η1 is an F1-measurable r.v. If ηn is Fn-measurable,1 ≤ n ≤ N, then it is clear from the definition of the ηn’s that

{ηn+1 = 1} = {ηn = 1, 0 ≤ ξn+1 < b} ∪ {ηn = 0, ξn+1 = 0}and thus ηn+1 is Fn+1-measurable. It follows by finite induction that each ηn

is Fn-measurable. Define

ζ = ξ1 +N–1∑n=1

ηn(ξn+1 – ξn).

If M(ω) is the number of τn(ω)’s that do not exceed N, so that U[0,b](ω) =[M(ω)/2], then if M is even

ζ = ξ1 +U[0,b]∑k=1

(ξτ2k – ξτ2k–1 )

and if M is odd

ζ = ξ1 +U[0,b]∑k=1

(ξτ2k – ξτ2k–1 ) + (ξN – ξτM ).

Since ξτ2k – ξτ2k–1 ≥ b and ξN – ξτM = ξN – 0 ≥ 0, we have in either case, i.e.for all ω ∈ Ω,

ζ ≥ ξ1 + bU[0,b]

and thus

EU[0,b] ≤Eζ – Eξ1

b.

Also

Eζ = Eξ1 +N–1∑n=1

E{ηn(ξn+1 – ξn)}.

Since ηn is Fn-measurable, 0 ≤ ηn ≤ 1, and E(ξn+1 – ξn|Fn) ≥ 0 by thesubmartingale property, we have for n = 1, . . . , N – 1,

E{ηn(ξn+1 – ξn)} = E(E{ηn(ξn+1 – ξn)|Fn})= E(ηnE{ξn+1 – ξn|Fn})≤ E(E{ξn+1 – ξn|Fn})= E(ξn+1 – ξn).

It follows that

Eζ ≤ Eξ1 +N–1∑n=1

E(ξn+1 – ξn) = EξN

14.3 Convergence 319

and hence

EU[0,b] ≤EξN – Eξ1

b.

For the general case note that the number of upcrossings of [a, b] by{ξn}Nn=1 is equal to the number of upcrossings of [0, b – a] by {ξn – a}Nn=1and this is also equal to the number of upcrossings of [0, b – a] by {(ξn –a)+ : 1 ≤ n ≤ N}. Since {(ξn,Fn) : 1 ≤ n ≤ N} is a submartingale, sois {(ξn – a,Fn) : 1 ≤ n ≤ N} and also {((ξn – a)+,Fn) : 1 ≤ n ≤ N} by(i) of Theorem 14.1.3, Corollary. It follows from the particular case justconsidered that

EU[a,b] ≤E(ξN – a)+ – E(ξ1 – a)+

b – a

≤ E(ξN – a)+

b – a≤ EξN+ + a–

b – a

since (ξN – a)+ ≤ ξN+ + a–. �

14.3 Convergence

In this section it is shown that under mild conditions submartingales andmartingales (and also supermartingales) converge almost surely. The con-vergence theorems which follow are very useful in probability and statistics.We start with a sufficient condition for a.s. convergence of a submartingale.

Theorem 14.3.1 Let {ξn,Fn} be a submartingale. If

limn→∞Eξn+ < ∞

then there is an integrable r.v. ξ∞ such that ξn → ξ∞ a.s.

Proof For every pair of real numbers a < b, let U(n)[a,b](ω) be the num-

ber of upcrossings of [a, b] by {ξi(ω) : 1 ≤ i ≤ n}. Then {U(n)[a,b](ω)} is a

nondecreasing sequence of random variables and thus has a limit

U[a,b](ω) = limn→∞

U(n)[a,b](ω) a.s.

By monotone convergence and Theorem 14.2.3, we have

EU[a,b] = limn→∞EU(n)

[a,b]

≤ limn→∞

Eξn+ + a–

b – a< ∞,

320 Martingales

so that U[a,b] < ∞ a.s. It follows that if

E[a,b] = {ω ∈ Ω : lim infn

ξn(ω) < a < b < lim supn

ξn(ω)}

then

P(E[a,b]) = 0 for all a < b.

Thus if

E = ∪a,b:rationalE[a,b] = {ω ∈ Ω : lim infn

ξn(ω) < lim supn

ξn(ω)}

then P(E) = 0. It follows that lim infn ξn(ω) = lim supn ξn(ω) a.s. and thusthe limit limn→∞ ξn exists a.s. Denote this limit by ξ∞. Then, by Fatou’sLemma,

E|ξ∞| ≤ lim infnE|ξn|

and since (by Theorem 14.1.1 (ii)) Eξn ≥ Eξ1,

E|ξn| = E(2ξn+ – ξn) ≤ 2Eξn+ – Eξ1

we obtain

E|ξ∞| ≤ lim infn{2Eξn+ – Eξ1}

= 2 limnEξn+ – E(ξ1) < ∞.

Thus ξ∞ is integrable. �

The next theorem gives conditions under which the a.s. converging sub-martingale of Theorem 14.3.1 converges also in L1. Throughout the fol-lowing, given a sequence of σ-fields {Fn}, we denote by F∞ the σ-fieldgenerated by ∪∞n=1Fn. Also, by including (ξ∞,F∞) in the sequence, we call{(ξn,Fn) : n = 1, 2, . . . ,∞} a martingale (respectively submartingale,supermartingale) if for all m, n in {1, 2, . . . ,∞} with m < n,

(i) Fm ⊂ Fn

(ii) ξn is Fn-measurable and integrable(iii) E(ξn|Fm) = ξm a.s. (resp. ≥ ξm,≤ ξm).

We have the following result.

Theorem 14.3.2 If {ξn,Fn} is a submartingale, the following are equiva-lent

(i) the sequence {ξn} is uniformly integrable(ii) the sequence {ξn} converges in L1

(iii) the sequence {ξn} converges a.s. to an integrable r.v. ξ∞ such that{(ξn,Fn) : n = 1, 2, . . . ,∞} is a submartingale and limn Eξn = Eξ∞.

Proof (i) ⇒ (ii): Since {ξn} is uniformly integrable, Theorem 11.4.1 im-plies supn E|ξn| < ∞ and thus, by Theorem 14.3.1, there is an integrable r.v.ξ∞ such that ξn → ξ∞ a.s. Since a.s. convergence implies convergence inprobability, it follows from Theorem 11.4.2 that ξn → ξ∞ in L1.

(ii)⇒ (iii): If ξn → ξ∞ in L1 we have by Theorem 11.4.2, E|ξn| → E|ξ∞| <∞ and thus supn E|ξn| < ∞. It then follows from Theorem 14.3.1 that ξn →ξ∞ a.s.

In order to show that {(ξn,Fn) : n = 1, 2, . . . ,∞} is a submartingale itsuffices to show that for all n = 1, 2, . . .

E(ξ∞|Fn) ≥ ξn a.s.

For every fixed n and E ∈ Fn, using the definition of conditional expec-tation and the convergence ξm → ξ∞ in L1 (which implies Eξm → Eξ∞)∫

EE(ξ∞|Fn) dP =

∫Eξ∞ dP

= limm→∞

∫Eξm dP

= limm→∞

∫EE(ξm|Fn) dP

≥∫

Eξn dP

since E(ξm|Fn) ≥ ξn a.s. for m > n. Thus E(ξ∞|Fn) ≥ ξn a.s. (see Ex. 4.14)and as already noted above limn Eξn = Eξ∞.

(iii) ⇒ (i): Since {(ξn,Fn) : n = 1, 2, . . . ,∞} is a submartingale, so is{(ξn+,Fn) : n = 1, 2, . . . ,∞}. Thus using the submartingale property repeat-edly we have∫

{ξn+>a}ξn+ dP ≤∫{ξn+>a}E(ξ∞+|Fn) dP =

∫{ξn+>a}ξ∞+ dP

and

P{ξn+ > a} ≤ 1aEξn+ ≤

1aE{E(ξ∞+|Fn)} =

1aEξ∞+ → 0 as a→ ∞

which clearly imply that {ξn+} is uniformly integrable.Since ξn+ → ξ∞+ a.s. and thus also in probability, and since the sequence

is uniformly integrable, it follows by Theorem 11.4.2 that ξn+ → ξ∞+ inL1, and hence that Eξn+ → Eξ∞+. Since by assumption Eξn → Eξ∞, italso follows that Eξn– → Eξ∞–. Since clearly ξn– → ξ∞– a.s. and hence inprobability, Theorem 11.4.2 implies that {ξn–} is uniformly integrable.

Since ξn = ξn+ –ξn–, the uniform integrability of {ξn : n = 1, 2, . . .} follows(see Ex. 11.21). �

322 Martingales

For martingales the following more detailed and useful result holds.

Theorem 14.3.3 If {ξn,Fn} is a martingale, the following are equivalent

(i) the sequence {ξn} is uniformly integrable(ii) the sequence {ξn} converges in L1

(iii) the sequence {ξn} converges a.s. to an integrable r.v. ξ∞ such that{(ξn,Fn) : n = 1, 2, . . . ,∞} is a martingale

(iv) there is an integrable r.v. η such that ξn = E(η|Fn) for all n = 1, 2, . . .a.s.

Proof That (i) implies (ii) and (ii) implies (iii) follow from Theo-rem 14.3.2. That (iii) implies (i) is shown as in Theorem 14.3.2 by con-sidering |ξn| instead of ξn+, and it is shown trivially by taking η = ξ∞ that(iii) implies (iv).

(iv) ⇒ (i): Put ξ∞ = η. Then E(ξ∞|Fn) = E(η|Fn) = ξn and clearly{(ξn,Fn) : n = 1, 2, . . . ,∞} is a martingale and thus {(|ξn|,Fn) : n =1, 2, . . . ,∞} is a submartingale. We thus have∫

{|ξn |>a}|ξn| dP ≤∫{|ξn |>a}E(|ξ∞||Fn) dP =

∫{|ξn |>a}|ξ∞| dP

and

P{|ξn| > a} ≤ 1aE|ξn| ≤

1aE|ξ∞| → 0 as a→ ∞,

which clearly imply that {ξn} is uniformly integrable. �

As a simple consequence of the previous theorem we have the followingvery useful result.

Theorem 14.3.4 Let ξ be an integrable r.v., {Fn} a sequence of sub-σ-fields of F such that Fn ⊂ Fn+1 all n, and F∞ the σ-field generated by∪∞n=1Fn. Then

limn→∞E(ξ|Fn) = E(ξ|F∞) a.s. and in L1.

Proof Let ξn = E(ξ|Fn), n = 1, 2, . . . . Then {ξn,Fn} is a martingale (byExample 3 in Section 14.1) which satisfies (iv) of Theorem 14.3.3. It followsby (ii) and (iii) of that theorem that there is an integrable r.v. ξ∞ such that

ξn → ξ∞ a.s. and in L1.

It suffices now to show that E(ξ|Fn)→ E(ξ|F∞) a.s. Since by (iii) of Theo-rem 14.3.3, {(ξn,Fn) : n = 1, 2, . . . ,∞} is a martingale, we have that for allE ∈ Fn,∫

Eξ∞ dP =

∫EE(ξ∞|Fn) dP =

∫Eξn dP =

∫EE(ξ|Fn) dP =

∫Eξ dP.


Hence∫

Eξ∞ dP =

∫Eξ dP for all sets E in Fn and thus in ∪∞n=1Fn. It is clear

that the class of sets for which it holds is a D-class, and since it contains∪∞n=1Fn (which is closed under intersections) it contains also F∞. Hence

∫Eξ∞ dP =

∫Eξ dP for all E ∈ F∞

and since ξ∞ = limn ξn is F∞-measurable, it follows that ξ∞ = E(ξ|F∞) a.s.�

A result similar to Theorem 14.3.4 is also true for decreasing (ratherthan increasing) sequences of σ-fields and follows easily if we introducethe concept of reverse submartingale and martingale as follows. Let {ξn} bea sequence of r.v.’s and {Fn} a sequence of sub-σ-fields of F . We say that{ξn,Fn} is a reverse martingale (respectively, submartingale, supermartin-gale) if for every n,

(i) Fn ⊃ Fn+1

(ii) ξn is Fn-measurable and integrable

(iii) E(ξn|Fn+1) = ξn+1 (resp. ≥ ξn+1, ≤ ξn+1) a.s.

The following convergence result corresponds to Theorem 14.3.1.

Theorem 14.3.5 Let {ξn,Fn} be a reverse submartingale. Then there is ar.v. ξ∞ such that ξn → ξ∞ a.s. and if

limn→∞Eξn > –∞

then ξ∞ is integrable.

Proof The proof is similar to that of Theorem 14.3.1. For each fixed n,define

ηk = ξn–k+1, Gk = Fn–k+1 k = 1, 2, . . . , n,

i.e. {η1,G1; η2,G2; . . . ; ηn,Gn} = {ξn,Fn; ξn–1,Fn–1; . . . ; ξ1,F1}. Then {(ηk,Gk) : 1 ≤ k ≤ n} is a submartingale since

E(ηk+1|Gk) = E(ξn–k|Fn–k+1) = ηk a.s.

If U(n)[a,b](ω) denotes the number of upcrossings of the interval [a, b] by the

sequence {ξn(ω), ξn–1(ω), . . . , ξ1(ω)}, then U(n)[a,b](ω) is equal to the number

324 Martingales

of upcrossings of the interval [a, b] by the submartingale {η1(ω), . . . , ηn(ω)}and by Theorem 14.2.3 we have

EU(n)[a,b] ≤

Eηn+ + a–

b – a=Eξ1+ + a–

b – a.

As in the proof of Theorem 14.3.1 it follows that the sequence {ξn} con-verges a.s., i.e. ξn → ξ∞ a.s. Again as in the proof of Theorem 14.3.1 wehave by Fatou’s Lemma,

E|ξ∞| ≤ lim infnE|ξn| and E|ξn| = 2Eξn+ – Eξn.

But now

Eξn+ = Eη1+ ≤ Eηn+ = Eξ1+

since {(ηk+,Gk) : 1 ≤ k ≤ n} is a submartingale. Also {Eξn} is clearly anonincreasing sequence. Since limn Eξn > –∞ it follows that

E|ξ∞| ≤ 2Eξ1+ – limn→∞Eξn < ∞

and thus ξ∞ is integrable. �

Corollary If {ξn,Fn} is a reverse martingale, then there is an integrabler.v. ξ∞ such that ξn → ξ∞ a.s.

Proof If {ξn,Fn} is a reverse martingale, clearly the sequence {Eξn} is con-stant and thus limn Eξn = Eξ1 > –∞. The result then follows from thetheorem. �

We now prove the result of Theorem 14.3.4 for decreasing sequences ofσ-fields.

Theorem 14.3.6 Let ξ be an integrable r.v., {Fn} a sequence of sub-σ-fields of F such that Fn ⊃ Fn+1 for all n, and F∞ = ∩∞n=1Fn. Then

limn→∞E(ξ|Fn) = E(ξ|F∞) a.s. and in L1.

Proof Let ξn = E(ξ|Fn). Then {ξn,Fn} is a reverse martingale since Fn ⊃Fn+1, ξn is Fn-measurable and integrable and by Theorem 13.2.2,

E(ξn|Fn+1) = E{E(ξ|Fn)|Fn+1} = E(ξ|Fn+1) = ξn+1 a.s.

It follows from the corollary of Theorem 14.3.5 that ξn → ξ∞ a.s. for someintegrable r.v. ξ∞.

14.4 Centered sequences 325

We first show that ξn → ξ∞ in L1 as well. This follows from Theo-rem 11.4.2 since the sequence {ξn}∞n=1 is uniformly integrable as is seenfrom ∫

{|ξn |>a}|ξn| dP ≤∫{|ξn |>a}E(|ξ||Fn) dP =

∫{|ξn |>a}|ξ| dP

and

P{|ξn| > a} ≤ 1aE|ξn| ≤

1aE|ξ| → 0 as a→ ∞

since |ξn| = |E(ξ|Fn)| ≤ E(|ξ||Fn) a.s. and thus E|ξn| ≤ E|ξ|.We now show that ξ∞ = E(ξ|F∞) a.s. For every E ∈ F∞ we have E ∈ Fn

for all n and since ξn = E(ξ|F∞) and ξn → ξ∞ in L1,∫Eξ dP =

∫Eξn dP→

∫Eξ∞ dP as n→ ∞.

Hence∫

Eξ dP =

∫Eξ∞ dP for all E ∈ F∞. Also the relations ξ∞ = limn ξn

a.s. and Fn ⊃ Fn+1 imply that ξ∞ is Fn-measurable for all n and thus F∞-measurable. It follows that ξ∞ = E(ξ|F∞) a.s. �

14.4 Centered sequences

In this section the results of Section 14.3 will be used to study the con-vergence of series and the law of large numbers for “centered” sequencesof r.v.’s, a concept which generalizes that of a sequence of independentand zero mean r.v.’s. We will also give martingale proofs for some of theprevious convergence results for sequences of independent r.v.’s.

A sequence of r.v.’s {ξn} is called centered if for every n = 1, 2, . . . , ξn isintegrable and

E(ξn|Fn–1) = 0 a.s.

where Fn = σ(ξ1, . . . , ξn) and F0 = {∅,Ω}. For n = 1 this condition is justEξ1 = 0 while for n > 1 it implies the weaker condition Eξn = 0. Fn will beassumed to beσ(ξ1, . . . , ξn) throughout this section unless otherwise stated.The basic properties of centered sequences are collected in the followingtheorem. Property (i) shows that results obtained for centered sequences aredirectly applicable to arbitrary sequences of integrable r.v.’s appropriatelymodified, i.e. centered.

Theorem 14.4.1 (i) If {ξn} is a sequence of integrable r.v.’s then the se-quence {ξn – E(ξn|Fn–1)} is centered.

(ii) The sequence of partial sums of a centered sequence is a zero meanmartingale, and conversely, every zero mean martingale is the se-quence of partial sums of a centered sequence.

326 Martingales

(iii) A sequence of independent r.v.’s {ξn} is centered if and only if for eachn, ξn ∈ L1 and Eξn = 0.

(iv) If the sequence of r.v.’s {ξn} is centered and ξn ∈ L2 for all n, then ther.v.’s of the sequence are orthogonal: Eξnξm = 0 for all n � m.

Proof (i) is obvious. For (ii) let {ξn} be centered and let Sn = ξ1 + · · ·+ξn =Sn–1 + ξn for n = 1, 2, . . . , where S0 = 0. Then each Sn is integrable andFn-measurable and

E(Sn|Fn–1) = E(Sn–1|Fn–1) + E(ξn|Fn–1) = Sn–1 a.s.

Note that Fn = σ(ξ1, . . . , ξn) = σ(S1, . . . , Sn). It follows that {Sn} is a mar-tingale with zero mean since ES1 = Eξ1 = 0. Conversely, if {Sn} is a zeromean martingale, let ξn = Sn – Sn–1 for n = 1, 2, . . . , where S0 = 0. Theneach ξn is Fn-measurable and

E(ξn|Fn–1) = E(Sn|Fn–1) – Sn–1 = 0 a.s.

Hence {ξn} is centered and clearly ξ1 + · · · + ξn = Sn – S0 = Sn.(iii) follows immediately from the fact that for independent integrable

r.v.’s {ξn} and all n = 1, 2, . . . we have from Theorem 10.3.2 that the σ-fields Fn–1 and σ(ξn) are independent and thus by Theorem 13.2.7,

E(ξn|Fn–1) = Eξn a.s.

(iv) Let {ξn} be centered, ξn ∈ L2 for all n, and m < n. Then since ξm isFm ⊂ Fn–1-measurable and E(ξn|Fn–1) = 0 a.s. we have

E(ξnξm) = E{E(ξnξm|Fn–1)} = E{ξmE(ξn|Fn–1)} = E{0} = 0. �

We now prove for centered sequences of r.v.’s some of the convergenceresults shown in Sections 11.5 and 11.6 for sequences of independent r.v.’s.In view of Theorem 14.4.1 (iii), the following result on the convergence ofseries of centered r.v.’s generalizes the corresponding result for series ofindependent r.v.’s (Theorem 11.5.3).

Theorem 14.4.2 If {ξn} is a centered sequence of r.v.’s and if∑∞

n=1 Eξ2n <

∞, then the series∑∞

n=1 ξn converges a.s. and in L2.

Proof Let Sn =∑n

k=1 ξk. Then Sn ∈ L2 since by assumption Eξ2n < ∞ for

all n. It follows from Theorem 14.4.1 (iv) that for all m < n,

E(Sn – Sm)2 = E⎛⎜⎜⎜⎜⎜⎝

n∑k=m+1

ξk

⎞⎟⎟⎟⎟⎟⎠2

=n∑

k=m+1

Eξ2k → 0 as m, n→ ∞

since∑∞

k=1 Eξ2k < ∞. Hence {Sn}∞n=1 is a Cauchy sequence in L2 and by

Theorem 6.4.7 (i) there is a r.v. S ∈ L2 such that Sn → S in L2. Thusthe series converges in L2. Now Theorem 9.5.2 shows that convergencein L2 implies convergence in L1 and thus Sn → S in L1. Since by The-orem 14.4.1 (ii), {Sn}∞n=1 is a martingale, condition (ii) of Theorem 14.3.3is satisfied and thus (by (iii) of that theorem) Sn → S a.s. and the seriesconverges also a.s. �

Note that the result of this theorem follows also directly from Ex. 14.8.We now prove a strong law of large numbers for centered sequences

which generalizes the corresponding result for sequences of independentr.v.’s (Theorem 11.6.2).

Theorem 14.4.3 If {ξn} is a centered sequence of r.v.’s and if

∞∑n=1

Eξ2n/n2 < ∞

then

1n

n∑k=1

ξk → 0 a.s.

Proof This follows from Theorem 14.4.2 and Lemma 11.6.1 in the sameway as Theorem 11.6.2 follows from Theorem 11.5.3 and Lemma 11.6.1.

�

The special convergence results for sequences of independent r.v.’s, i.e.Theorems 11.5.4, 11.6.3 and 12.5.2, can also be obtained as applications ofthe martingale convergence theorems. As an illustration we include heremartingale proofs of the strong law of large numbers (second form, Theo-rem 11.6.3) and of Theorem 12.5.2.

Theorem 14.4.4 (Strong Law, Second Form) Let {ξn} be independent andidentically distributed r.v.’s with (the same) finite mean μ. Then

1n

n∑k=1

ξk → μ a.s. and in L1.

Proof Let Sn = ξ1 + · · · + ξn. We first show that for each 1 ≤ k ≤ n,

E(ξk|Sn) =1n

Sn a.s.

328 Martingales

Every set E ∈ σ(Sn) is of the form E = S–1n (B), B ∈ B, and thus∫

Eξk dP = E(ξkχ{Sn∈B})

=∫ ∞

–∞ · · ·∫ ∞

–∞xkχB(x1 + · · · + xn) dF(x1) . . . dF(xn)

where F is the common d.f. of the ξn’s. It follows from Fubini’s Theoremthat the last expression does not depend on k and thus

∫Eξk dP =

1n

n∑i=1

∫Eξi dP =

1n

∫ESn dP

which implies E(ξk|Sn) = 1n Sn a.s.

Now let Fn = σ(Sn, Sn+1, . . .) (hence Fn ⊃ Fn+1) and let F∞ = ∩∞n=1Fn.Since Sn+1 –Sn = ξn+1 it is clear that Fn = σ(Sn, ξn+1, ξn+2, . . .). Also since theclasses of events σ(ξ1, Sn) and σ(ξn+1, ξn+2, . . .) are independent, an obviousgeneralization of Ex. 13.3 gives

E(ξ1|Sn) = E(ξ1|Fn) a.s.

Thus1n

Sn = E(ξ1|Fn) a.s.

and Theorem 14.3.6 implies that

1n

Sn → E(ξ1|F∞) a.s. and in L1.

Now limn1n Sn = limn

1n (Sn – Sk) implies that limn

1n Sn is a tail r.v. of the

independent sequence {ξn} and by Kolmogorov’s Zero-One Law (Theo-rem 10.5.3) it is constant a.s. Hence E(ξ1|F∞) is constant a.s. and thusE(ξ1|F∞) = Eξ1 = μ a.s. It follows that 1

n Sn → μ a.s. and in L1. �

The following result gives a martingale proof of Theorem 12.5.2.

Theorem 14.4.5 Let {ξn} be a sequence of independent random variableswith characteristic functions {φn}. Then the following are equivalent:

(i) the series∑∞

n=1 ξn converges a.s.(ii) the series

∑∞n=1 ξn converges in distribution

(iii) the products∏n

k=1 φk(t) converge to a nonzero limit in some neighbor-hood of the origin.

Proof Clearly, it suffices to show that (iii) implies (i), i.e. assume that

limn→∞

n∏k=1

φk(t) = φ(t) � 0 for each t ∈ [–a, a] for some a > 0.


Let Sn =∑n

k=1 ξk and Fn = σ(ξ1, . . . , ξn) = σ(S1, . . . , Sn). For each fixedt ∈ [–a, a] the sequence

{eitSn /

∏nk=1 φk(t)

}is integrable (dP), indeed uni-

formly bounded, and it follows from Example 2 of Section 14.1 that{eitSn /∏n

k=1 φk(t),Fn}

is a martingale, in the sense that its real and imaginary partsare martingales. Since for each t the sequence is uniformly bounded, Theo-rem 14.3.1 applied to the real and imaginary parts shows that the sequenceeitSn /

∏nk=1 φk(t) converges a.s. as n→ ∞. Since the denominator converges

to a nonzero limit, it follows that eitSn converges a.s. as n → ∞, for eacht ∈ [–a, a]. Some analysis using this fact will lead to the conclusion that Sn

converges a.s.We have that for every t ∈ [–a, a] there is a set Ωt ∈ F with P(Ωt) = 0

such that for every ω � Ωt, eitSn(ω) converges. Now consider eitSn(ω) as afunction of the two variables (t,ω), i.e. in the product space ([–a, a] ×Ω,B[–a,a] × F , m × P), where B[–a,a] is the σ-field of Borel subsets of[–a, a] and m denotes Lebesgue measure. Then clearly eitSn(ω) is productmeasurable and hence

D = {(t,ω) ∈ [–a, a] ×Ω : eitSn(ω) does not converge} ∈ B[–a,a] × F .

Note that the t-section of D is

Dt = {ω ∈ Ω : (t,ω) ∈ D} = {ω ∈ Ω : eitSn(ω) does not converge} = Ωt.

It follows from Fubini’s Theorem that

(m×P)(D) =∫ a

–aP(Dt) dt =

∫ a

–a0 dt = 0

and hence

0 = (m×P)(D) =∫Ω

m(Dω) dP(ω).

Hence m(Dω) = 0 a.s., i.e. there is Ω0 ∈ F with P(Ω0) = 0 such thatm(Dω) = 0 for all ω � Ω0. But

Dω = {t ∈ [–a, a]; (t,ω) ∈ D} = {t ∈ [–a, a] : eitSn(ω) does not converge}.

Hence for every ω � Ω0, P(Ω0) = 0, there is Dω ∈ B[–a,a] with m(Dω) = 0such that eitSn(ω) converges for all t ∈ [–a, a] – Dω. The proof will be com-pleted by showing that for all ω � Ω0, Sn(ω) converges to a finite limit andsince P(Ω0) = 0, this means that Sn converges a.s.

Fix ω � Ω0. To show the convergence of Sn(ω), we argue first that thesequence {Sn(ω)} is bounded. Indeed, by passing to a subsequence if neces-sary, suppose by contradiction that Sn(ω) → ∞. Denote the limit of eitSn(ω)

330 Martingales

by g(t), defined a.e. (m) on [–a, a]. Dominated convergence yields that

eiuSn(ω) – 1iSn(ω)

=∫ u

0eitSn(ω) dt →

∫ u

0g(t) dt

for any u ∈ [–a, a]. But since Sn(ω) → ∞, it follows that∫ u

0g(t) dt = 0 for

any u ∈ [–a, a], and hence g(t) = 0 a.e. (m) on [–a, a]. This is a contradic-tion since |g(t)| = 1 = limn |eitSn(ω)| a.e. (m) on [–a, a]. If {Sn(ω)} is boundedand there are two convergent subsequences Snk (ω) → s1 and Smk (ω) → s2,then eits1 = eits2 a.e. (m) on [–a, a]. Since eits is continuous for t ∈ [–a, a],it follows that eits1 = eits2 for all t ∈ [–a, a]. Differentiating the two sidesof the last equality and setting t = 0 yields s1 = s2 and hence that Sn(ω)converges. �

14.5 Further applications

In this section we give some further applications of the martingale conver-gence results of Section 14.3. The first application is related to the Lebesguedecomposition of one measure with respect to another, and thus also to theRadon–Nikodym Theorem; it helps to identify Radon–Nikodym deriva-tives and is also of interest in probability and especially in statistics.

Theorem 14.5.1 Let (Ω,F , P) be a probability space and {Fn} a sequenceof sub-σ-fields of F such that Fn ⊂ Fn+1 for all n with σ(∪∞n=1Fn) = F .Let Q be a finite measure on (Ω,F ) and consider its Lebesgue–Radon–Nikodym decomposition with respect to P:

Q(E) =∫

Eξ dP + Q(E ∩ N) for all E ∈ F

where 0 ≤ ξ ∈ L1(Ω,F , P), N ∈ F and P(N) = 0. Denote by Pn, Qn therestrictions of P, Q to Fn. If Qn � Pn for all n = 1, 2, . . . , then

(i){

dQndPn

,Fn

}is a martingale on (Ω,F , P) and

dQn

dPn→ ξ a.s. (P).

(ii) Q � P if and only if{

dQndPn

}is uniformly integrable on (Ω,F , P) in which

casedQn

dPn→ dQ

dPa.s. (P) and in L1(Ω,F , P).

14.5 Further applications 331

Proof (i) Let ξn = dQndPn

. Since Q and thus Qn are finite, it follows that ξn ∈L1(Ω,F , P), i.e. ξn is Fn-measurable and P-integrable. For every E ∈ Fn

we have ∫Eξn+1 dP =

∫Eξn+1 dPn+1 = Qn+1(E) = Qn(E)

=∫

Eξn dPn =

∫Eξn dP.

Hence E(ξn+1|Fn) = ξn for all n a.s. and thus {ξn,Fn}∞n=1 is a martingale on(Ω,F , P).

We also have ξn ≥ 0 a.s. and

Eξn =∫Ωξn dP = Qn(Ω) = Q(Ω) < ∞.

It follows from Theorem 14.3.1 that there is an integrable random variableξ∞ such that

ξn → ξ∞ a.s. (P).

Since ξn ≥ 0 a.s. we have ξ∞ ≥ 0 a.s. We now show that ξ∞ = ξ a.s.Since ξn → ξ∞ a.s., Fatou’s Lemma gives∫

Eξ∞ dP ≤ lim inf

n

∫Eξn dP for all E ∈ F .

Hence for all E ∈ Fn,∫Eξ∞ dP ≤ lim inf

nQn(E) = Q(E)

and thus∫

Eξ∞ dP ≤ Q(E) for all E ∈ ∪∞n=1Fn. We conclude that the same is

true for all E ∈ F , either from the uniqueness of the extension of the finitemeasure μ(E) = Q(E) –

∫Eξ∞ dP (Theorem 2.5.3) or from the monotone

class theorem (Ex. 1.16). Since P(N) = 0 it follows that for every E ∈ F ,∫Eξ∞ dP =

∫E∩Ncξ∞ dP ≤ Q(E ∩ Nc) =

∫E∩Ncξ dP =

∫Eξ dP

and thus ξ∞ ≤ ξ a.s.For the inverse inequality we have

∫Eξ dP ≤ Q(E) for all E ∈ F , and

hence for all E ∈ Fn,∫EE(ξ|Fn) dP =

∫Eξ dP ≤ Q(E) = Qn(E) =

∫Eξn dP.

Since both E(ξ|Fn) and ξn are Fn-measurable, it follows as in the previousparagraph that

E(ξ|Fn) ≤ ξn a.s.

Since this is true for all n and since ξn → ξ∞ a.s. and by Theorem 14.3.4,E(ξ|Fn) → E(ξ|F ) = ξ a.s., it follows that ξ ≤ ξ∞ a.s. Thus ξ∞ = ξ a.s., i.e.(i) holds.

332 Martingales

(ii) First assume that Q � P. Then Q(N) = 0 and ξ = dQdP . Hence by (i),

ξn → ξ a.s. Also for all E ∈ Fn we have∫Eξ dP = Q(E) = Qn(E) =

∫Eξn dPn =

∫Eξn dP

and thus ξn = E(ξ|Fn). Hence condition (iv) of Theorem 14.3.3 is satisfiedand from (i) and (ii) of the same theorem we have that {ξn}∞n=1 is uniformlyintegrable on (Ω,F , P), and ξn → ξ in L1(Ω,F , P).

Conversely, assume that the sequence {ξn}∞n=1 is uniformly integrable on(Ω,F , P). Then by Theorem 14.3.3, since {ξn,Fn}∞n=1 is a martingaleon (Ω,F , P), there is a r.v. ξ ∈ L1(Ω,F , P) such that ξn = E(ξ|Fn) a.s.for all n. It follows from Theorem 14.3.4 that

ξn = E(ξ|Fn)→ E(ξ|F ) = ξ a.s. and in L1(Ω,F , P).

It now suffices to show that Q � P and ξ = dQdP a.s. Indeed for all E ∈ Fn

we have

Q(E) = Qn(E) =∫

Eξn dP =

∫EE(ξ|Fn) dP =

∫Eξ dP.

Hence Q(E) =∫

Eξ dP for all E ∈ ∪∞n=1Fn and since the class of sets for

which it is true is clearly a σ-field, it follows that it is true for all E ∈ F .Thus Q � P and ξ = dQ

dP a.s. �

Application of the theorem to the positive and negative parts in the Jor-dan decomposition of a finite signed measure gives the following result.

Corollary 1 The theorem remains true if Q is a finite signed measure.

We now show how Theorem 14.5.1 can be used in finding expressionsfor Radon–Nikodym derivatives.

Corollary 2 Let (Ω,F , P) be a probability space and Q a finite signedmeasure on F such that Q � P. For every n let {E(n)

k : k ≥ 1} be a measur-able partition of Ω (i.e. Ω = ∪∞k=1E(n)

k where the E(n)k are disjoint sets in F )

and let Fn be the σ-field it generates. Assume that the partitions becomefiner as n increases (i.e. each E(n)

i is the union of sets from {E(n+1)k }) so that

Fn ⊂ Fn+1. If the partitions are such that F = σ(∪∞n=1 Fn), then

dQdP

(ω) = limn→∞

Q(E(n)kn(ω))

P(E(n)kn(ω))

a.s. and in L1(Ω,F , P)

where for every ω and n, kn(ω) is the unique k such that ω ∈ E(n)k .

Proof This is obvious from the simple observation that

dQn

dPn(ω) =

∞∑k=1

Q(E(n)k )

P(E(n)k )

χE(n)k

(ω) a.s.

where Q(E(n)k )

P(E(n)k )

is taken to be zero whenever P(E(n)k ) = 0. �

Since conditional expectations and conditional probabilities as definedin Chapter 13 are Radon–Nikodym derivatives of finite signed measureswith respect to probability measures, Corollary 2 can be used to expressthem as limits and the resulting expressions are also intuitively appealing.Such a result will be stated for a conditional probability given the value ofa r.v.

Corollary 3 Let η be a r.v. on the probability space (Ω,F , P) and A ∈ F .For each n, let {I(n)

k : –∞ < k < ∞} be a partition of the real line intointervals. Assume that the partitions become finer as n increases and that

δ(n) = supk

m(I(n)k )→ 0 as n→ ∞

(m = Lebesgue measure). Then

P(A|η = y) = limn→∞

P(A ∩ η–1I(n)kn(y))

P(η–1I(n)kn(y))

a.s. (Pη–1) and in L1(R,B, Pη–1)

where for each y and n, kn(y) is the unique k such that y ∈ I(n)k .

Proof By Section 13.5, P(A|η = y) is the Radon–Nikodym derivative ofthe finite measure ν, defined for each B ∈ B by ν(B) = P(A ∩ η–1B),with respect to Pη–1. The result follows from Corollary 2 and the simpleobservation that if Bn = σ({I(n)

k }∞k=–∞) then Bn ⊂ Bn+1 and σ(∪∞n=1Bn) =B. �

The second application concerns “likelihood ratios” and is related to theprinciple of maximum likelihood.

Theorem 14.5.2 Let {ξn} be a sequence of r.v.’s on the probability space(Ω,F , P), and Fn = σ(ξ1, . . . , ξn). Let Q be another probability measureon (Ω,F ). Assume that for every n, (ξ1, . . . , ξn) has p.d.f. pn under theprobability P and qn under the probability Q, and define

ηn(ω) ={ qn(ξ1(ω),...,ξn(ω))

pn(ξ1(ω),...,ξn(ω)) if the denominator � 00 otherwise.

334 Martingales

Then {ηn,Fn}∞n=1 is a supermartingale on (Ω,F , P) and there is a P-integra-ble r.v. η∞ such that

ηn → η∞ a.s.

and

0 ≤ Eη∞ ≤ Eηn+1 ≤ Eηn ≤ 1 for all n.

Proof Since pn and qn are Borel measurable functions, ηn is Fn-measur-able. Also ηn ≥ 0. If An = {(x1, . . . , xn) ∈ Rn : pn(x1, . . . , xn) > 0} thenP(ξ1, . . . , ξn)–1(Ac

n) = 0 and thus P(ξ1, . . . , ξn, ξn+1)–1(Acn × R) = 0. Further

Eηn =∫Ωηn dP =

∫R

n

qn

pnχAn dP(ξ1, . . . , ξn)–1

=∫R

n

qn

pnχAn pn dx1 . . . dxn =

∫R

n qnχAn dx1 . . . dxn

≤∫R

n qn dx1 . . . dxn = 1

and thus 0 ≤ Eηn ≤ 1.Also, for every E ∈ Fn there is a B ∈ Bn such that E = (ξ1, . . . , ξn)–1(B)

and ∫Eηn+1 dP =

∫Ωηn+1χE dP

=∫R

n+1

qn+1

pn+1χBχAn+1 dP(ξ1, . . . , ξn+1)–1

=∫

An+1

qn+1

pn+1χB dP(ξ1, . . . , ξn+1)–1

=∫

An+1–Acn×R

qn+1

pn+1χB dP(ξ1, . . . , ξn+1)–1

since P(ξ1, . . . , ξn+1)–1(Acn × R) = 0. Hence, since An+1 – Ac

n × R ⊂ An × R∫Eηn+1 dP =

∫An+1–Ac

n×Rqn+1χB dx1 . . . dxn dxn+1

≤∫

An×Rqn+1χB dx1 . . . dxn dxn+1

=∫

An

(∫Rqn+1(x1, . . . , xn, xn+1) dxn+1

)χB dx1 . . . dxn

=∫

AnqnχB dx1 . . . dxn

=∫

An

qn

pnχB dP(ξ1, . . . , ξn)–1

=∫ΩηnχE dP =

∫Eηn dP.

It follows that E(ηn+1|Fn) ≤ ηn for all n, a.s., and thus {ηn,Fn}∞n=1 is a su-permartingale on (Ω,F , P). Hence {–ηn,Fn}∞n=1 is a negative submartingalewhich, by the submartingale convergence Theorem 14.3.1, converges a.s. to


a P-integrable r.v. –η∞. Then by Theorem 14.1.1 (ii) and the first result ofthis proof we have 0 ≤ Eηn+1 ≤ Eηn ≤ 1 for all n. Finally by Fatou’s LemmaEη∞ ≤ Eηn and this completes the proof. �

If for each n the distribution of (ξ1, . . . , ξn) under Q is absolutely con-tinuous with respect to its distribution under P then the following strongerresult holds.

Corollary 1 Under the assumptions of Theorem 14.5.2, if for all n, Q(ξ1,. . . , ξn)–1 � P(ξ1, . . . , ξn)–1 (which is the case if qn = 0 whenever pn = 0)and F = σ(

⋃∞n=1 Fn), then {ηn,Fn} is a martingale. Furthermore Q � P if

and only if {ηn} is uniformly integrable in which case

ηn →dQdP

a.s. and in L1(Ω,F , P), as n→ ∞.

Proof For each n let Qn, Pn be the restrictions of Q, P to Fn. For everyE ∈ Fn we have E = (ξ1, . . . , ξn)–1(B), B ∈ Bn and since by absolutecontinuity P(ξ1, . . . , ξn)–1(Ac

n) = 0 implies Q(ξ1, . . . , ξn)–1(Acn) = 0, we have

Qn(E) = Q(ξ1, . . . , ξn)–1(B) = Q(ξ1, . . . , ξn)–1(B ∩ An)

=∫

B∩Anqn dx1 . . . dxn

=∫

B∩An

qn

pndP(ξ1, . . . , ξn)–1

=∫

B

qn

pndP(ξ1, . . . , ξn)–1

=∫

Bηn dPn.

Hence dQndPn

= ηn and the result follows from Theorem 14.5.1. �

When the r.v.’s {ξn} are i.i.d. under both P and Q the following resultprovides a test for the distribution of a r.v. using independent observations.

Corollary 2 Assume that the conditions of Theorem 14.5.2 are satisfiedand that under each probability measure P, Q the r.v.’s {ξn} are independentand identically distributed with (common) p.d.f. p, q. Then ηn → 0 a.s. andP ⊥ Q, provided the distributions determined by p and q are distinct.

Proof In this case we have

ηn =n∏

k=1

q(ξk)p(ξk)

a.s. (P)

336 Martingales

and thus by Theorem 14.5.2,

η∞ =∞∏

k=1

q(ξk)p(ξk)

a.s. (P).

Now let {ξ′n} be an i.i.d. sequence of r.v.’s independent also of the sequence{ξn}, with the same distribution as the sequence {ξn} (such r.v.’s can alwaysbe constructed using product spaces). Let also

η′∞ =∞∏

k=1

q(ξ′k)p(ξ′k)

a.s. (P).

Then η∞ and η∞η′∞ are clearly identically distributed and η∞ η′∞ are inde-pendent and identically distributed so that

P{η∞ = 0} = P{η∞η′∞ = 0} = 1 – P{η∞η′∞ > 0}= 1 – P{η∞ > 0}P{η′∞ > 0}= 1 – [1 – P{η∞ = 0}]2.

It follows that P{η∞ = 0} = 0 or 1.Assume now that P{η∞ = 0} = 0, so that η∞ > 0 a.s. (P). Then the r.v.’s

log(η∞η′∞) = log η∞ + log η′∞ are identically distributed and log η∞, log η′∞are independent and identically distributed and thus if φ(t) is the c.f. oflog η∞ we have φ2(t) = φ(t) for all t ∈ R. Since φ(0) = 1 and φ is continuous,it follows that φ(t) = 1 for all t ∈ R and thus η∞ = 1 a.s. (P). It follows that∏∞

k=1q(ξk)p(ξk) = 1 a.s. (P) and thus η1 = q(ξ1)

p(ξ1) = 1 a.s. Then for each B ∈ B wehave, using the notation and facts from the proof of Corollary 1,

Qξ–11 (B) = Q1ξ

–11 (B) =

∫ξ–1

1 (B)η1 dP1 = P1ξ

–11 (B) = Pξ–1

1 (B)

which contradicts the assumption that the distributions of ξ1 under P andQ are distinct. (In fact one can similarly show that Q(ξ1, . . . , ξn)–1(B) =P(ξ1, . . . , ξn)–1(B) for all B ∈ Bn and all n, which implies that P = Q.)

Hence, under the assumptions of the theorem, P{η∞ = 0} = 1 and theproof may be completed by showing that P ⊥ Q. By reversing the role ofthe probability measures P and Q we have that

n∏k=1

p(ξk)q(ξk)

→ 0 a.s. (Q).

Let EQ be the set of ω ∈ Ω such that∏n

k=1p(ξk(ω))q(ξk(ω)) → 0 and EP the set

of ω ∈ Ω such that∏n

k=1q(ξk(ω))p(ξk(ω)) → 0. Then P(EP) = 1 = Q(EQ) and

Exercises 337

clearly EP ∩ EQ = ∅ since∏n

k=1q(ξk)p(ξk)

∏nk=1

p(ξk)q(ξk) = 1 for all n. It follows that

P ⊥ Q. �

Exercises14.1 Let {ξn,Fn} be a submartingale. Let the sequence of r.v.’s {εn} be such that

for all n, εn is Fn-measurable and takes only the values 0 and 1. Define thesequence of r.v.’s {ηn} by

η1 = ξ1

ηn+1 = ηn + εn(ξn+1 – ξn), n ≥ 1.

Show that {ηn,Fn} is also a submartingale and Eηn ≤ Eξn for all n. If {ξn,Fn}is a martingale show that {ηn,Fn} is also a martingale and Eηn = Eξn for alln. (Do you see any gambling interpretation of this?)

14.2 Prove that every uniformly integrable submartingale {ξn,Fn} can be uniquelydecomposed in

ξn = ηn + ζn for all n a.s.

where {ηn,Fn} is a uniformly integrable martingale and {ζn,Fn} is a negative(ζn ≤ 0 for all n a.s.) submartingale such that limn ζn = 0 a.s. This is calledthe Riesz decomposition of a submartingale.

14.3 Let {Fn} be a sequence of sub-σ-fields of F such that Fn ⊂ Fn+1 for all nand F∞ = σ(∪∞n=1Fn). Show that if E ∈ F∞ then

limn→∞

P(E|Fn) = χE a.s.

14.4 (Polya’s urn scheme) Suppose an urn contains b blue and r red balls. Ateach drawing a ball is drawn at random, its color is noted and the drawnball together with a > 0 balls of the same color are added to the urn. Letbn be the number of blue balls and rn the number of red balls after the nthdrawing and let ξn = bn/(bn + rn) be the proportion of blue balls. Show that{ξn} is a martingale and that ξn converges a.s. and in L1.

14.5 The inequalities proved in Theorems 14.2.1 and 14.2.2 for finite submartin-gales depend only on the fact that the submartingales considered have a“last element”. Specifically show that if {ξn,Fn : n = 1, 2, . . . ,∞} is a sub-martingale then for all real a,

aP{ sup1≤n≤∞

ξn ≥ a} ≤∫{sup1≤n≤∞ ξn≥a}ξ∞ dP ≤ E|ξ∞|,

and if also ξn ≥ 0, a.s. for all n = 1, 2, . . . ,∞, then for all 1 < p < ∞,

E( sup1≤n≤∞

ξpn) ≤

(p

p – 1

)p

Eξp∞.

338 Martingales

14.6 The following is an example of a martingale converging a.s. but not in L1.Let Ω be the set of all positive integers, F the σ-field of all subsets of Ω,and P defined by

P({n}) =1n

–1

n + 1for all n = 1, 2, . . . .

Let [n,∞) denote the set of all integers ≥ n and define

Fn = σ({1}, {2}, . . . , {n}, [n + 1,∞))

ξn = (n + 1)χ[n+1,∞)

for n = 1, 2, . . . . Show that {ξn,Fn}∞n=1 is a martingale with Eξn = 1. Showalso that ξn converges a.s. (and find its limit) but not in L1.

14.7 If {ξn,Fn : n = 1, 2, . . . ,∞} is a nonnegative submartingale, show that{ξn, n = 1, 2, . . .} is uniformly integrable (cf. Theorem 14.3.2).

14.8 Let {ξn,Fn}∞n=1 be a martingale or a nonnegative submartingale. If

limn→∞

E(|ξn|p) < ∞

for some 1 < p < ∞, show that ξn converges a.s. and in Lp. (Hint: UseTheorems 14.3.1 and 14.2.2.)

14.9 Let (Ω,F , P) be a probability space and {Fn}∞n=1 a sequence of sub-σ-fieldsof F such that Fn ⊂ Fn+1 and F = σ(∪∞n=1Fn). Let Q be a finite measure on(Ω,F ). Denote by Pn, Qn the restriction of P, Q to Fn and the correspondingLebesgue–Radon–Nikodym decomposition by

Qn(E) =∫

Eξn dPn + Qn(E ∩ Nn), E ∈ Fn

Q(E) =∫

Eξ dP + Q(E ∩ N), E ∈ F

where 0 ≤ ξn ∈ L1(Ω,Fn, Pn), 0 ≤ ξ ∈ L1(Ω,F , P), Nn ∈ Fn, N ∈ F andPn(Nn) = 0, P(N) = 0. Show that {ξn,Fn}∞n=1 is a supermartingale and thatξn → ξ a.s. (P). (Hint: Imitate the proof of Theorem 14.5.1.)

14.10 Let f be a Lebesgue integrable function defined on [0, 1]. For each n, let0 = a(n)

0 < a(n)1 < . . . < a(n)

n = 1 be a partition of [0, 1] with δ(n) =sup0≤k≤n–1(a(n)

k+1 – a(n)k ) → 0, and assume that the partitions become finer

as n increases. For each n, define fn on [0, 1] by

fn(x) =1

a(n)k+1 – a(n)

k

∫ a(n)k+1

a(n)k

f (y) dy for a(n)k < x ≤ a(n)

k+1

and by continuity at x = 0. Then show that

limn→∞

fn(x) = f (x) a.e. (m) and in L1 (m = Lebesgue measure).

14.11 Let (Ω,F ) be a measurable space and assume that F is purely atomic, i.e.F is generated by the disjoint sets {En}∞n=1 with Ω = ∪∞n=1En. Let (T ,T )be another measurable space, {Pt, t ∈ T} a family of probability measures

Exercises 339

on (Ω,F ) and {Qt, t ∈ T} a family of signed measures on (Ω,F ). Assumethat for each t ∈ T , Qt � Pt and that for each E ∈ F , Pt(E) and Qt(E)are measurable functions on (T ,T ). Show that there is a T ×F -measurablefunction ξ(t,ω) such that for each fixed t ∈ T ,

ξ(t,ω) =dQt

dPt(ω) a.s. (Pt).

(Hint: Apply Theorem 14.5.1 with Fn = σ(E1, . . . , En).)

15

Basic structure of stochastic processes

Our aim in this final chapter is to indicate how basic distributional theoryfor stochastic processes, alias random functions, may be developed fromthe considerations of Chapters 7 and 9. This is primarily for reference andfor readers with a potential interest in the topic. The theory will be firstillustrated by a discussion of the definition of the Wiener process, and con-ditions for sample function continuity. This will be complemented, and thechapter completed with a sketch of construction and basic properties ofpoint processes and random measures in a purely measure-theoretic frame-work, consistent with the nontopological flavor of the entire volume.

15.1 Random functions and stochastic processes

In this section we introduce some basic distributional theory for stochas-tic processes and random functions, using the product space measures ofChapter 7 and the random element concepts of Chapter 9.

By a stochastic process one traditionally means a family of real randomvariables {ξt : t ∈ T} (ξt = ξt(ω)) on a probability space (Ω,F , P), T being aset indexing the ξt. If T = {1, 2, 3, . . .} or {. . . , –2, –1, 0, 1, 2, . . .} the family{ξn : n = 1, 2, . . .} or {ξn : n = . . . , –2, –1, 0, 1, 2, . . .} is referred to asa stochastic sequence or discrete parameter stochastic process, whereas{ξt : t ∈ T} is termed a continuous parameter stochastic process if T is aninterval (finite or infinite).

We assume throughout this chapter that each r.v. ξt(ω) is defined (andfinite) for all ω (not just a.e.). Then for a fixed ω the values ξt(ω) define afunction ξ ((ξω)(t) = ξt(ω), t ∈ T) in RT and the F |B-measurability of eachξt(ω) implies F |BT-measurability of ξ as will be shown in Lemma 15.1.1.The mapping ξ is thus a random element (r.e.) of (RT ,BT) and is termeda random function (r.f.). As will be seen in Lemma 15.1.1 the conversealso holds – if ξ is a measurable mapping from (Ω,F , P) to (RT ,BT) thenthe ω-functions ξt(ω) = (ξω)(t) are F |B-measurable for each t, i.e. ξt are

340

15.1 Random functions and stochastic processes 341

r.v.’s. Thus the notions of a stochastic process (family of r.v.’s) and a r.f. areentirely equivalent. For a fixed ω, the function (ξω)(t), t ∈ T , is termed asample function (or sample path or realization) of the process.

Lemma 15.1.1 For each t ∈ T, let ξt = ξt(ω) be a real function of ω ∈ Ωand let ξ be the mapping from Ω to RT defined as ξω = {ξt(ω) : t ∈ T}.Then ξt is F |B-measurable for each t ∈ T iff ξ is F |BT-measurable (seeSection 7.9 for the definition of BT).

Proof For u = (t1, . . . , tk) the projection πu = πt1,...,tk from RT to Rk isclearly BT |Bk-measurable since if B ∈ Bk, π–1

u B is a cylinder and henceis in BT . Hence if ξ is F |BT-measurable, ξt = πtξ is F |B-measurable foreach t.

Conversely if each ξt is F |B-measurable, (ξt1 , . . . , ξtk ) is clearly F |Bk-measurable, i.e. πuξ isF |Bk-measurable for u = (t1, . . . , tk). Hence if B ∈Bk,ξ–1π–1

u B = (πuξ)–1B ∈ F or ξ–1E ∈ F for each cylinder E. Since these cylin-ders generate BT , it follows that ξ is F |BT-measurable as required. �

Probabilistic properties of individual ξt or finite groups (ξt1 , . . . , ξtk ) are,of course, defined by the respective marginal or joint distributions

Pξ–1t (B) = P{ω : ξt(ω) ∈ B}, B ∈ B,

P(ξt1 , . . . , ξtk )–1(B) = P{ω : (ξt1 (ω), . . . , ξtk (ω)) ∈ B}, B ∈ Bk.

These are respectively read as P{ξt ∈ B}, P{(ξt1 , . . . , ξtk ) ∈ B} and are asnoted Lebesgue–Stieltjes measures on B and Bk corresponding to the dis-tribution functions

Ft(x) = P{ξt ≤ x}, Ft1,...,tk (x1, . . . , xk) = P{ξti ≤ xi, 1 ≤ i ≤ k}.

These joint distributions of ξt1 , . . . , ξtk for ti ∈ T , 1 ≤ i ≤ k, k = 1, 2, . . . ,are termed the finite-dimensional distributions (fidi’s) of the process {ξt :t ∈ T}.

The fidi’s determine many useful probabilistic properties of the processbut are restricted to probabilities of sets of values taken by finite groups ofξt’s. On the other hand, one may be interested in the probability that theentire sample function ξt, t ∈ T , lies in a given set of functions, i.e.

P{ξ ∈ E} = P{ω : ξω ∈ E} = Pξ–1(E)

which is defined for E ∈ BT . Further assumptions may be needed for setsE of interest but not in BT , e.g. to determine that the sample functions arecontinuous a.s. (see Sections 15.3, 15.4).

342 Basic structure of stochastic processes

This probability measure Pξ–1 on BT is called the distribution of (ther.f.) ξ and it encompasses the fidi’s. Specifically, the fidi’s are special casesof values of Pξ–1, for example, if B ∈ Bk

P{(ξt1 , . . . , ξtk ) ∈ B} = P{πt1,...,tkξ ∈ B} = Pξ–1(π–1t1,...,tk B)

i.e. the probability that the sample function ξω lies in the cylinder π–1t1,...,tk B

of BT . That is the fidi’s have the form Pξ–1π–1t1,...,tk for each k, t1, . . . , tk ∈ T .

On the other hand, note also that the fidi’s determine the distribution of astochastic process, that is, if two stochastic processes have the same fidi’s,then they have the same distribution. This follows from Theorem 2.2.7 andthe fact that BT is generated by the cylinders π–1

t1,...,tk (B).The fidi’s of a stochastic process are thus related to the distribution Pξ–1

of ξ on BT exactly as the measures νu are related to μ in Section 7.10. Inparticular the fidi’s are consistent as there defined, i.e. if u = (t1, . . . , tk), � =(s1, . . . , sl) ⊂ u, ξu = (ξt1 , . . . , ξtk ), ξ� = (ξs1 , . . . , ξsl ), then Pξ–1

u π–1u� = Pξ–1

� ,i.e. P(πu�ξu)–1 = Pξ–1

� . This may be made more transparent by noting itsequivalence to consistency of the d.f.’s in the sense that for each n = 1, 2, . . .any choice of t1, . . . , tn and x1, . . . , xn

(i) Ft1,...,tn (x1, . . . , xn) is unaltered by the same permutation of botht1, . . . , tn and x1, . . . , xn,

(ii) Ft1,...,tn–1 (x1, . . . , xn–1) = Ft1,...,tn–1,tn (x1, . . . , xn–1,∞) =limxn→∞ Ft1,...,tn–1,tn (x1, . . . , xn–1, xn).The requirement (i) can of course be achieved (on the real line) by definingFt1,...,tn for t1 < · · · < tn and rearranging other time sets to natural order, andhence is not an issue when T is a subset of R.

Kolmogorov’s Theorem (Theorem 7.10.3) may then be put in the follow-ing form.

Theorem 15.1.2 Let {νu} be as in Theorem 7.10.3, a family of probabilitymeasures defined on (Ru,Bu) for finite subsets u of an index set T. If thefamily {νu} is consistent in the sense that νuπ

–1u,� = ν� for each u, � with � ⊂ u,

then there is a stochastic process {ξt : t ∈ T} (unique in distribution) having{νu} as its fidi’s. That is P{(ξt1 , . . . , ξtk ) ∈ B} = νu(B) for each choice of k,u = (t1, . . . , tk), B ∈ Bk.

Proof Let P denote the unique probability measure on (RT ,BT) in Theo-rem 7.10.3, satisfying Pπ–1

u = νu for each finite set u ⊂ T . Define the prob-ability space (Ω,F , P) as (RT ,BT , P). The projection r.v.’s ξt(ω) = πtω =ω(t) for ω ∈ RT give the desired stochastic process {ξt : t ∈ T} with thegiven fidi’s νu. �

15.2 Construction of the Wiener process in R[0,1] 343

Corollary 1 below restates the theorem in terms of distribution functions.Corollary 2 considers the special case of an independent family.

Corollary 1 Let {Ft1,...,tk : t1, . . . , tk ∈ T , k = 1, 2, . . .} be a family of k-dimensional d.f.’s, assumed consistent in the sense described prior to thestatement of the theorem. Then there is a stochastic process {ξt : t ∈ T}having these d.f.’s defining its fidi’s, i.e.

P{ξti ≤ xi, 1 ≤ i ≤ k} = Ft1,...,tk (x1, . . . , xk)

for each choice of k, t1, . . . , tk.

Proof This follows since the d.f.’s Ft1,...,tk clearly determine consistentprobability distributions νu for each u = (t1, . . . , tk). �

Corollary 2 If Fi are d.f.’s for i = 1, 2, . . . , there exists a sequence ofindependent r.v.’s ξ1, ξ2, . . . such that ξi has d.f. Fi for each i.

Proof This follows from Corollary 1 by noting consistency of the d.f.’s

Ft1,...,tk (x1, . . . , xk) =k∏

i=1

Fti (xi). �

15.2 Construction of the Wiener process in R[0,1]

The Wiener process Wt on [0, 1] (a.k.a. Brownian motion) provides anilluminating and straightforward example of the use of Kolmogorov’sTheorem to construct a stochastic process.

Wt is to be defined by the requirement that all its fidi’s be normal withzero means and cov(Ws, Wt) = min(s, t). Thus the fidi for (Wt1 , Wt2 , . . . ,Wtk ), 0 ≤ t1 < t2 < · · · < tk ≤ 1, is to be normal, with zero means andcovariance matrix (see Section 9.4)

Λt1,...,tk =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

t1 t1 t1 · · · t1

t1 t2 t2 · · · t2

t1 t2 t3 · · · t3...

......

. . ....

t1 t2 t3 · · · tk

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

This matrix is readily seen to be nonnegative definite (e.g. its determi-nant is t1(t2 – t1)(t3 – t2) · · · (tk – tk–1) as may be simply shown by subtractingthe (i – 1)th row from the ith for i = k, k – 1, . . . , 2). Thus Λt1,...,tk is a covari-ance matrix of a k-dimensional normal distribution, and the elimination ofone or more points tj gives a matrix of the same form in the remaining


tj’s, showing the consistency required for Kolmogorov’s Theorem (or The-orem 15.1.2). Hence, by that theorem, there is a process {Wt : t ∈ [0, 1]}with the desired fidi’s.

15.3 Processes on special subspaces of RT

A stochastic process ξ constructed via Kolmogorov’s Theorem is a randomelement of (RT ,BT). Hence one may determine the probability P{ξ ∈ E}that the sample function ξt, t ∈ T , lies in the set E of functions, for anyE ∈ BT . However, one is sometimes interested in sets E which are not inBT (as, for example, when T = [0, 1], E = C[0, 1], the set of continuousfunctions on [0, 1]).

A small but useful extension to the framework occurs when ξ ∈ A a.s.where A ⊂ RT but A may or may not be inBT . Note that the statement ξ ∈ Aa.s. means that Ac ⊂ A0 for some A0 ∈ BT , Pξ–1(A0) = 0. The extensionmay be simply achieved by assuming that the space (Ω,F , P) is complete(or if not, by completing it to be so in the standard manner – see Section2.6). Then with A, A0 as above ξ–1Ac ∈ F since P is complete on F . Hencealso ξ–1A ∈ F , Pξ–1(Ac) = 0 and ξ–1(A ∩ E) = ξ–1A ∩ ξ–1E ∈ F for allE ∈ BT .

Hence if ξt, t ∈ T , is redefined as a fixed function in A at points ω ∈ Ωfor which {ξt(ω) : t ∈ T} � A (or if the space Ω is reduced to eliminatesuch points), then A includes all the values of (ξt(ω) : t ∈ T) and may beregarded as a space with a σ-fieldA = A∩BT . ξ is then a random elementin (A,A) with distributions satisfying Pξ–1(F) = Pξ–1(E) for F = E ∩ A,E ∈ BT .

An interesting and useful special case occurs when T is an interval andA is the set of real, continuous functions on T . For example, take T to bethe unit interval [0, 1] (with standard notation A = C[0, 1], the space ofcontinuous functions on [0, 1]). If a stochastic process {ξt : t ∈ [0, 1]} hasa.s. continuous sample functions (i.e. ξt(ω) is continuous on 0 ≤ t ≤ 1a.s.), then the r.f. ξ may be regarded as a random element of (C,C) whereC = C[0, 1] (⊂ R[0,1]) and C = C ∩ B[0,1]. This is a natural and simpleviewpoint.

It is, of course, possible to regard C as a space of continuous functions,without reference to RT , and to view it as a metric space, with metric de-fined by the norm (||x|| = sup{|x(t)| : 0 ≤ t ≤ 1}). The class of Borelsets of such a topological space is then defined to be the σ-field generatedby the open sets. This may be shown to be also generated by the (finite-dimensional) cylinder sets of C, i.e. sets of the form π–1

t1,...,tk B where B ∈ Bk

15.4 Conditions for continuity of sample functions 345

and πt1,...,tk is the usual projection mapping but restricted to C rather thanR

T . It may thus be seen that the Borel sets form precisely the same σ-fieldC ∩ BT in C as defined and used above. This connection provides a vehiclefor the consideration of properties which involve topology more intimately– such as the development of weak convergence theory in C.

15.4 Conditions for continuity of sample functions

In view of the above discussion it is of interest to give conditions on aprocess which will guarantee a.s. continuity of sample functions. The theo-rem to be shown, generalizing original results of Kolmogorov (see [Loeve]and [Cramer & Leadbetter]) gives sufficient conditions for a process ξt on[0, 1] to have an equivalent version ηt (i.e. ξt = ηt a.s. for each t) with a.s.continuous sample functions.

Theorem 15.4.1 Let ξt be a process on [0, 1] such that for all t, t + h ∈[0, 1]

P{|ξt+h – ξt| ≥ g(h)} ≤ q(h)

where g, q are nonnegative functions of h > 0, nonincreasing as h ↓ 0 andsuch that

∑g(2–n) < ∞,

∑2nq(2–n) < ∞. Then there exists a process ηt on

[0, 1] with a.s. continuous sample functions and such that ξt = ηt a.s. foreach t. In particular, of course, η has the same fidi’s as ξ.

Proof Approximate ξt by piecewise linear processes ξnt with the values ξt

at t = tn,r = r/2n, r = 0, 1, . . . , 2n, and linear between such points. Thenclearly for tn,r ≤ t ≤ tn,r+1,

|ξn+1t – ξn

t | ≤∣∣∣ξtn+1,2r+1 – 1

2 (ξtn+1,2r + ξtn+1,2r+2 )∣∣∣ ≤ 1

2 A + 12 B

where

A = |ξtn+1,2r+1 – ξtn+1,2r |, B = |ξtn+1,2r+1 – ξtn+1,2r+2 |

and hence

P{ maxtn,r≤t≤tn,r+1

|ξn+1t – ξn

t | ≥ g(2–n–1)} ≤ P{A ≥ g(2–n–1)} + P{B ≥ g(2–n–1)}

≤ 2q(2–n–1)

so that

P{max0≤t≤1|ξn+1

t – ξnt | ≥ g(2–n–1)} ≤ 2n+1q(2–n–1).

Since∑

2nq(2–n) < ∞ it follows by the Borel–Cantelli Lemma (Theo-rem 10.5.1) that a.s., max0≤t≤1 |ξn+1

t – ξnt | < g(2–n–1) for n ≥ n0 = n0(ω).

Since∑

g(2–n) < ∞ it follows that {ξnt } is uniformly Cauchy a.s. and thus

uniformly convergent a.s. to a continuous ηt as n→ ∞. Also ηt = ξt a.s. fort = tn,r since ξn+p

t = ξt, p = 0, 1, . . . .If t is not equal to any tn,r, t = lim tn,rn , 0 < t – tn,rn < 2–n and

P{|ξtn,rn– ξt| ≥ g(t – tn,rn )} ≤ q(t – tn,rn ) ≤ q(2–n)

so that P{|ξtn,rn– ξt| ≥ g(2–n)} ≤ q(2–n) and the Borel–Cantelli Lemma gives

ξtn,rn→ ξt a.s.

Since ηtn,rn→ ηt a.s. and ξtn,rn

= ηtn,rna.s., it follows that ξt = ηt a.s. for

each t as required. �

15.5 The Wiener process on C and Wiener measure

The preceding theorem readily applies to the Wiener process yielding thefollowing result.

Theorem 15.5.1 The Wiener process {Wt : t ∈ [0, 1]} may be taken tohave a.s. continuous sample functions.

Proof This follows from the above result. For Wt+h – Wt is normal, zeromean and variance |h|. Take 0 < a < 1/2. Then

P{|Wt+h – Wt| ≥ |h|a} = 2{1 – Φ(|h|a–1/2)} ≤ 2|h|1/2–aφ(|h|a–1/2)

(where Φ, φ are the standard normal d.f. and p.d.f. respectively) since 1 –Φ(x) ≤ φ(x)/x for x > 0. If g(h) = |h|a, q(h) = 2|h|1/2–aφ(|h|a–1/2) then∑

g(2–n) =∑

2–na < ∞,∑

2nq(2–n) = 2∑

2n(1+2a)/2φ(2n(1–2a)/2) < ∞

(the last convergence being easily checked). Hence a.s. continuity of (anequivalent version of) Wt follows from Theorem 15.4.1. �

As seen in Section 15.3, a process with a.s. continuous sample functionsmay be naturally viewed as a random element of (C,C) where C = C[0, 1],and C = C ∩ B[0,1]. By Theorem 15.5.1, the Wiener process Wt may be soregarded. The steps in the construction were (a) to use Kolmogorov’s Theo-rem to define a process, say W0

t in (RT ,BT) having the prescribed (normal)fidi’s, (b) to replace W0

t by an equivalent version Wt with a.s. continuoussample functions, i.e. Wt = W0

t a.s. for each t (hence with the same fidi’s),and (c) to consider W = {Wt : t ∈ [0, 1]} as a random element of (C,C) byrestricting to C = C[0, 1] (and taking C = C ∩B[0,1], equivalently the Borelσ-field of the topological space C as noted in Section 15.3).

15.6 Point processes and random measures 347

As a result of this construction a probability measure PW–1 (the distri-bution of W) is obtained on the measurable space (C,C). This probabilitymeasure is termed Wiener measure and is customarily also denoted by W.This measure has, of course, multivariate normal form for the fidi probabil-ities induced on the sets Bu, u = (t1, . . . , tk) for each k. Of course, the space(C,C, W) can be used to be the (Ω,F , P) on which the Wiener process isdefined as the identity mapping Wω = ω.

Finally, it may be noted that an alternative approach to Wiener measureand the Wiener process is to define the latter as a distributional limit ofsimple processes of random walk type (cf. [Billingsley]). This is less directand does require considerable weak convergence machinery but has the ad-vantage of simultaneously producing the “invariance principle” (functionalcentral limit theorem) of Donsker, which has significant use e.g. in appli-cations to areas such as sequential analysis.

15.6 Point processes and random measures

In the preceding sections we have indicated some basic structural theoryfor stochastic processes with continuous sample functions and given use-ful sufficient conditions for continuity. This included the construction andcontinuity of the celebrated Wiener process – a key component along withits various extensions in stochastic modeling in diverse fields.

At the other end of the spectrum are processes whose sample functionsare patently discontinuous, which may be used to model random sequencesof points (i.e. point processes) and their extensions to more general randommeasures. A special position among these is held by the Poisson processwhich is arguably equally as prominent as the Wiener process for its exten-sions and applications.

There are a number of ways of providing a framework for point pro-cesses on the (e.g. positive) real line, perhaps the most obvious being thedescription as a family {τn : n = 0, 1, 2, . . .} of r.v.’s 0 ≤ τ1 ≤ τ2 ≤ · · ·(defined on (Ω,F , P)), representing the positions of points. To avoid accu-mulation points it is assumed that τn → ∞ a.s. In particular the assumptionthat τ1, τ2 – τ1, τ3 – τ2, . . . are independent and identically distributed withd.f. F(·) leads to a renewal process and the particular case F(x) = 1 – e–λx,x > 0, gives a Poisson process with intensity λ. Fine detailed accountsof these and related processes abound, of which, for example [Feller] maybe regarded as a seminal work. Our purpose here is just to indicate how ageneral abstract framework may arise naturally by adding randomness tothe measure-theoretic structure considered throughout this volume in line

with the random element approach to real-valued processes of the preced-ing sections.

An alternative viewpoint to that above of regarding a point process asthe sequence {τn : 0 < τ1 < τ2 < · · · } of its point occurrence times is toconsider the family of (extended) r.v.’s ξ(B) taking values 0, 1, 2, . . . , +∞,consisting of the numbers of τi in (Borel) sets B. The assumption τn → ∞means that ξ(B) < ∞ for bounded Borel sets B. Since ξ(B) is clearly count-ably additive, it may be regarded as a (random) counting measure on theBorel sets of [0,∞). The two alternative viewpoints are connected e.g. byrelation {ξ(0, x] ≥ n} = {τn ≤ x}. A simple Poisson process with inten-sity λ may then be regarded as a random counting measure ξ(B) as abovewith P{ξ(B) = r} = e–λm(B)(λm(B))r/r! (m = Lebesgue measure as always)for each Borel B ⊂ [0,∞) and such that ξ(B1), ξ(B2) are independent fordisjoint such B1, B2.

It is natural to extend this latter view of a point process (a) to includeξ(B) which are not necessarily integer-valued (i.e. to define random mea-sures (r.m.’s) which are not necessarily point processes) and (b) to con-sider such concepts on a space more general than the real line, such asR

k or a space S with a topological structure. A detailed, encyclopedic ac-count of r.m.’s may be found in [Kallenberg] for certain metric (“Polish”)spaces. The topological assumptions involved are most useful for consid-eration of more intricate properties (such as weak convergence) of pointprocesses and r.m.’s. However, for the basic r.m. framework they are pri-marily used to define a purely measure-theoretic structure involving classesof sets (semirings, rings, σ-fields) considered without topology in this vol-ume. Hence our preferred approach in this brief introduction is to definea “clean” purely measure-theoretic framework in the spirit of this volume,leaving topological consideration for possible later study and as a settingfor development of more complex properties of interest.

Our interest in the possible use of a measure-theoretic framework arosefrom hearing a splendid lecture series on random measures in the early1970’s by Olav Kallenberg – leading to his subsequent classic book [Kallen-berg]. Similar developments were also of interest to others at that time andsince – including papers by D.G. Kendall, B.D. Ripley, J. Mecke and asubsequent book on the Poisson processes by J.F.C. Kingman.

15.7 A purely measure-theoretic framework for r.m.’s

Let S be an abstract space on which a r.m. is to be defined and S a σ-fieldof subsets of S, i.e. (S,S) is a measurable space (Chapter 3). Our basic

15.7 A purely measure-theoretic framework for r.m.’s 349

structural assumption about S is that there is a countable semiring P in Swhose members cover S (i.e. if P = {E1, E2, . . .}, ∪∞1 Ei = S) and such thatP generates S (i.e. S(P) = S). Note that since S = ∪∞1 Ei ∈ S(P) = S, Palso generates S as a σ-field (σ(P) = S(P) = S). We shall refer to a system(S,S,P) satisfying these assumptions as a basic structure for defining arandom measure or point process.

Two rings connected with such a basic structure are of interest:(i) R(P), the ring generated by P, i.e. the class of all finite (disjoint)

unions of sets of P.(ii) S0 = S0(P), the class of all sets E ∈ S such that E ⊂ ∪n

1Ei for somen and sets E1, E2, . . . , En in P.S0 is clearly a ring and P ⊂ R(P) ⊂ S0 ⊂ S. The ring S0 will be referredto as the class of bounded measurable sets, since they play this role in thereal line, where P = {(a, b] : a, b rational, –∞ < a < b < ∞}. This isincidentally also the case in popular topological frameworks, e.g. where Sis a second countable locally compact Hausdorff space, S is the class ofBorel sets (generated by the open sets) and P is the ring generated by acountable base of bounded sets.

In these examples, the ring S0 is precisely the class of all bounded mea-surable sets. As noted S0 will be referred to as the “class of boundedmeasurable sets” even in the general context.

Let (S,S,P) be a basic structure, and (Ω,F , P) a probability space. Letξ = {ξω(B) : ω ∈ Ω, B ∈ S} be such that

(i) For each fixed ω ∈ Ω, ξω(B) is a measure on S.(ii) For each fixed B ∈ P, ξω(B) is a r.v. on (Ω,F , P).

Then ξ is called a random measure (r.m.) on S (defined with respect to(Ω,F , P)). Further if the r.m. ξ is such that ξω(B) is integer-valued a.s. foreach B ∈ P we call ξ a point process.

If ξ is a r.m., since ξω(B) is finite a.s. for each B ∈ P and P is countable,the null sets may be combined to give a single null set Λ ∈ F , P(Λ) = 0such that ξω(B) is finite for all B ∈ P, ω ∈ Ω – Λ. Indeed ξω(B) < ∞ forall B ∈ S0 when ω ∈ Ω – Λ since such B can be covered by finitely manysets of P. If desired, Ω may be reduced to Ω – Λ thus assuming that ξω(B)is finite for all ω, B ∈ S0.

If ξ is a r.m., ξω(B) is an extended r.v. for each B ∈ S, and a r.v. forB ∈ S0. For if S = ∪∞1 Bi where Bi are disjoint sets of P, B = ∪∞1 (B ∩ Bi) sothat ξω(B) =

∑∞1 ξω(B ∩ Bi) which is the measurable sum of (nonnegative)

measurable terms.


If ξ is a r.m., its expectation or intensity measure λ = Eξ is definedby λ(B) = Eξ(B) for B ∈ S. Countable additivity is immediate (e.g. fromTheorem 4.5.2 (Corollary)). Note that λ is not necessarily finite, even on P.

Point processes and r.m.’s have numerous properties which we do notconsider in detail here. Some of these provide means of defining new r.m.’sfrom one or more given r.m.’s. An example is the following direct definitionof a r.m. as an integral of an existing r.m., proved byD-class methods:

Theorem 15.7.1 If ξ is a r.m. and f is a nonnegative S-measurable func-tion then ξf =

∫Sf (s) dξω(s) is F -measurable. Furthermore, if f is bounded

on each set of P, νf (B) =∫

Bf (s) dξω(s), B ∈ S, is a r.m.

It follows from the first part of this result that e–ξf = e–∫

f dξ is a nonnega-tive bounded r.v. for each nonnegative S-measurable function f and hencehas a finite mean. Lξ(f ) = Ee–ξf is termed the Laplace Transform (L.T.)of the r.m. ξ, and is a useful tool for many calculations. In particular forB ∈ S, Lξ(tχB) = Ee–tξ(B) is the L.T. of the nonnegative r.v. ξ(B), a usefulalternative to the c.f. for nonnegative r.v.’s.

15.8 Example: The sample point process

Let τ be a r.e. in our basic space (S,S), and consider δs(B) = χB(s)which may be viewed as unit mass at s, even if the singleton set {s} isnot S-measurable. Then it is readily checked that the composition δτω(B)defines a point process ξ(1) with unit mass at the single point τω. If ther.e. τ has distribution ν = Pτ–1 (Section 9.3), ξ(1) has intensity Eξ(1)(B) =EχB(τω) = Eχτ–1B(ω) = Pτ–1(B) = ν(B). Further straightforward calcula-tions show that ξ(1) has L.T.

Lξ(1) (f ) = Ee–f (τω) =∫

e–f (s) dPτ–1(s) = ν(e–f ).

Suppose now that τ1, τ2, . . . , τn are independent r.e.’s of S with commondistribution Pτ–1

j = ν. Then f (τ1), f (τ2), . . . , f (τn) are i.i.d. (extended) r.v.’sfor any nonnegative measurable f and in particular χB(τ1), χB(τ2), . . . ,χB(τn) are i.i.d. with P{χB(τ1) = 1} = ν(B) = 1 – P{χB(τ1) = 0}. Henceif ξ(n) is the point process

∑n1 δτj and B ∈ S,

ξ(n)(B) =n∑1

δτj (B) =n∑1

χB(τj),

so that ξ(n)(B) is binomial with parameters (n, ν(B)). ξ(n) is thus a pointprocess consisting of n events at points {τ1, τ2, . . . , τn}, its intensity being

15.10 Mixtures of random measures 351

Eξ(n) = nν, and its L.T. is readily calculated to be

Lξ(n) (f ) = Ee–∑n

1 δτj (f ) = Ee–∑n

1 f (τj) =(Ee–f (τ1)

)n=

(ν(e–f )

)n.

ξ(n) is referred to as the sample point process consisting of n independentpoints τ1, τ2, . . . , τn.

15.9 Random element representation of a r.m.

As seen in Section 15.1, a real-valued stochastic process (family of r.v.’s){ξt : t ∈ T} may be equivalently viewed as a random function, i.e. r.e. ofR

T . Similarly one may regard a r.m. {ξ(B) : B ∈ S} as a mapping ξ fromΩ into the space M of all measures μ on S which are finite on P, i.e. ξω isthe element of M defined by (ξω)(B) = ξω(B), B ∈ S. A natural σ-field forthe space M is that generated by the functions φB(μ) = μ(B), B ∈ S, i.e. thesmallest σ-fieldM making each φB M|B-measurable (M = σ{φ–1

B E : B ∈S, E ∈ B} (cf. Lemma 9.3.1)).

It may then be readily checked (cf. Section 9.3) that a r.m. ξ is a measur-able mapping from (Ω,F ,P) to (M,M), i.e. a random element of (M,M).

As defined in Section 9.3 for r.e.’s, the distribution of the r.m. ξ is theprobability measure Pξ–1 onM. It is then true that any probability measureπ onM may be taken to be the distribution of a r.m., namely the identityr.m. ξ(μ) = μ on the probability space (M,M, π).

15.10 Mixtures of random measures

As noted r.m.’s may be obtained by specifying their distributions as anyprobability measures on (M,M). Suppose now that (Θ,T , Q) is a prob-ability space, and for each θ ∈ Θ, ξ(θ) is a r.m. in (S,S) with distribution πθ,πθ(A) = P{ξ(θ) ∈ A} for each A ∈ M. (Note that the ξ(θ)’s can be defined ondifferent probability spaces.)

If for each A ∈ M, πθ(A) is a T -measurable function of θ, it follows fromTheorem 7.2.1 that

π(A) =∫Θπθ(A) dQ(θ)

is a probability measure onM, and thus may be taken to be the distributionof a r.m. ξ, which may be called the mixed r.m. formed by mixing ξ(θ) withrespect to Q. Of course, it is the distribution of ξ rather than ξ itself whichis uniquely specified.


The following intuitively obvious results are readily shown:(i) If ξ is the mixture of ξ(θ) (Pξ–1(A) =

∫P{ξ(θ) ∈ A} dQ(θ)) and B ∈ S,

the distribution of the (extended) r.v. ξ(B) is (for Borel sets E)

P{ξ(B) ∈ E} = P{φBξ ∈ E} = Pξ–1(φ–1B E)

=∫

P{ξ(θ) ∈ φ–1B E} dQ =

∫P{ξ(θ)(B) ∈ E} dQ(θ).

(ii) The intensity Eξ satisfies (for B ∈ S)

Eξ(B) =∫Eξ(θ)(B) dQ(θ).

(iii) The Laplace Transform Lξ(f ) is, for nonnegative measurable f ,

Lξ(f ) =∫

Lξ(θ) (f ) dQ(θ).

Example Mixing the sample point process.Write ξ(0) = 0 and for n ≥ 1, ξ(n) =

∑n1 δτj as in Section 15.8, where

τ1, . . . , τn are i.i.d. random elements of (S,S) with (common) distributionP–1τj

= ν say.Let Θ = {0, 1, 2, 3, . . . }, T = all subsets of Θ, Q the probability measure

with mass qn at n = 0, 1, . . . (qn ≥ 0,∑∞

0 qn = 1). Then the mixture ξ hasdistribution

Pξ–1(A) =∫

Pθ(A) dQ(θ) =∞∑

n=0

qnPn(A)

where Pn(A) = P{ξ(n) ∈ A}. For each B ∈ S the distribution of ξ(B) is givenby the probabilities

P{ξ(B) = r} =∞∑

n=r

qnP{ξ(n)(B) = r} =∞∑

n=r

qn

(nr

)ν(B)r(1 – ν(B))n–r

and

Eξ(B) =∞∑

n=0

qnnν(B) = qν(B)

where q is the mean of the distribution {qn}. That is Eξ = qν.The Laplace Transform of ξ is

Lξ(f ) =∫

Lξ(θ) (f ) dQ(θ) =∞∑

n=0

qnLξ(n) (f ) =∞∑

n=0

qn(ν(e–f ))n = G(ν(e–f ))

where G denotes the probability generating function (p.g.f.) of the distri-bution {qn}.

15.11 The general Poisson process 353

15.11 The general Poisson process

We now outline how the general Poisson process may be obtained on ourbasic space (S,S) from the mixed sample point process considered in thelast section.

First define a “finite Poisson process” as simply a mixed sample pointprocess with qn = e–aan/n! for a > 0, n = 0, 1, 2, . . . , i.e. Poisson probabili-ties. For B ∈ S,

P{ξ(B) = r} =∞∑

n=r

e–aan

n!

(nr

)ν(B)r(1 – ν(B))n–r

which reduces simply to e–aν(B)(aν(B))r/r!, r = 0, 1, 2, . . . , i.e. a Poissondistribution for any B ∈ S, with mean aν(B). In particular if B = S, ξ(S) hasa Poisson distribution with mean a. This, of course, implies ξ(S) < ∞ a.s.so that the total number of Poisson points in the whole space is finite. Thislimits the process (ordinarily one thinks of a Poisson process – e.g. on theline – as satisfying P{ξ(S) = ∞} = 1), which is the reason for referring tothis as a “finite Poisson process”. This process has intensity measure aν = λ

say, and Laplace Transform G(ν(e–f )) where G(s) = e–a(1–s), i.e.

Lξ(f ) = e–a(1–ν(e–f )) = e–aν(1–e–f ) = e–λ(1–e–f ) (ν(1) = 1).

Any finite (nonzero) measure λ on S may be taken as the intensity mea-sure of a finite Poisson process (by taking a = λ(S) and ν = λ/λ(S)).

The general Poisson process (for which ξ(S) can be infinite-valued) canbe obtained by summing a sequence of independent finite Poisson pro-cesses as we now indicate, following the construction of a sequence ofindependent r.v.’s as in Corollary 2 of Theorem 15.1.2. Let λ ∈ M (i.e. ameasure on S which is finite on P). From the basic assumptions it is readilychecked that S may be written as ∪∞i Si, where Si are disjoint sets of P andwe write λi(B) = λ(B ∩ Si), B ∈ S. The λi(B), i = 1, 2, . . . , are finite mea-sures on S and may thus be taken as the intensities of independent finitePoisson processes ξi, whose distributions on (M,M) are Pi, say. (Pi assignsmeasure 1 to the set {μ ∈ M : μ(S – Si) = 0}.)

Define now ξ =∑∞

1 ξj. Since, for B ∈ P, E{∑∞1 ξj(B)} =∑∞

1 λj(B) =∑∞1 λ(B ∩ Sj) = λ(B) < ∞ (λ ∈ M) we see that

∑∞1 ξj(B) converges a.s. on

P and hence ξ is a point process. By the above Eξ(B) = λ(B) so that ξ hasintensity measure λ.ξ is the promised Poisson process in S with intensity measure λ ∈M.

Some straightforward calculation using independence and dominated

convergence shows that its L.T. is

Lξ(f ) = limn→∞Πn

1Lξj(f ) = e–∑∞

1 λj(1–e–f ) = e–λ(1–e–f )

i.e. the same form as in the finite case.In summary then the following result holds.

Theorem 15.11.1 Let (S,S,P) be a basic structure, and let λ be a mea-sure on S which is finite on (the semiring) P. Then there exists a Poissonprocess ξ on S with intensity Eξ = λ, thus having the L.T.

Lξ(f ) = e–λ(1–e–f ).

By writing f =∑n

i=1 tiχBi and using the result for L.T.’s corresponding toTheorem 12.8.3 for c.f.’s (with analogous proof using the uniqueness theo-rem for L.T.’s, see e.g. [Feller]), it is seen simply that ξ(Bi), i = 1, 2, . . . , n,are independent Poisson r.v.’s with means λ(Bi) when Bi are disjoint setsof S.

15.12 Special cases and extensions

As defined the general Poisson process ξ has intensity Eξ = λ where λ is ameasure on S which is finite on P. The simple familiar stationary Poissonprocess on the real line is a very special case where (S,S) is (R,B), P canbe taken to be the semiclosed intervals {(a, b] : a, b rational, –∞ < a <

b < ∞} and λ is a multiple of Lebesgue measure, λ(B) = λm(B) for a finitepositive constant λ, termed the intensity of the simple Poisson process.Nonstationary Poisson processes on the line are simply obtained by takingan intensity measure λ � m, having a time varying intensity function λ(t),λ(B) =

∫Bλ(t) dt. These Poisson processes have no fixed atoms (points s

at which P{ξ{s} > 0} > 0) and no “multiple atoms” (random points s withξ{s} > 1). On the other hand fixed atoms or multiple atoms are possible ifa chosen intensity measure has atoms.

Poisson processes’ distributions may be “mixed” to form “mixed Pois-son process” or “compound Poisson processes” and intensity measures maythemselves be taken to be stochastic to yield “doubly stochastic Poissonprocesses” (“Cox processes” as they are generally known). These latterare particularly useful for modeling applications involving stochastic oc-currence rates.

The very simple definition of a basic structure in Section 15.7 sufficesadmirably for the definition of Poisson processes. However, its extensionssuch as those above and other random measures typically require at least

15.12 Special cases and extensions 355

a little more structure. One such assumption is that of separation of twopoints of S by sets of P – a simple further requirement closely akin tothe definition of Hausdorff spaces. Such an assumption typically sufficesfor the definition and basic framework of many point processes. However,more intricate properties such as a full theory of weak convergence of r.m.’sare usually achieved by the introduction of more topological assumptionsabout the space S.

References

Billingsley, P. Convergence of Probability Measures, 2nd edn, Wiley – Interscience,1999.

Chung, K.L. A Course in Probability Theory, 3rd edn, Academic Press, 2001.Cramer, H., Leadbetter, M.R. Stationary and Related Stochastic Processes, Probability

and Mathematical Statistics Series, Wiley, 1967. Reprinted by Dover PublicationsInc., 2004.

Feller, W. An Introduction to Probability Theory and Its Applications, vol. 1, John Wiley& Sons, 1950.

Halmos, P.R. Measure Theory, Springer-Verlag, 1974.Kallenberg, O. Random Measures, 4th edn, Academic Press, 1986.Kallenberg, O. Foundations of Modern Probability Theory, 2nd edn, Springer Series in

Statistics, Springer-Verlag, 2002.Loeve, M. Probability Theory I, II, 4th edn, Graduate Texts in Mathematics, vol. 45,

Springer-Verlag, 1977.Resnick, S.I. Extreme Values, Regular Variation, and Point Processes, 2nd edn,

Springer-Verlag, 2008.

356

Index

Lp-space, 127complex, 180

λ-system, 19μ*-measurable, 29σ-algebra

see σ-field, 13σ-field, 13

generated by a class of sets, 14generated by a random variable, 195generated by a transformation, 47

σ-finite, 22, 86σ-ring, 13

generated by a class of sets, 14D-class, 15absolute continuity, 94, 105, 110, 193,

199almost everywhere, 57almost surely (a.s.), 190atoms, 192Banach space, 127binomial distribution, 193, 257Bochner’s Theorem, 275Borel measurable function, 59, 190Borel sets, 16

extended, 45n-dimensional, 158two-dimensional, 153

Borel–Cantelli Lemma, 217bounded variation, 110, 180Brownian motion

see Wiener process, 343Cauchy sequence, 118

almost surely (a.s.), 224almost uniformly, 119in measure, 121in metric space, 125uniformly, 118

centered sequences, 325

central limit theoremarray form of Lindeberg–Feller, 269elementary form, 267standard form of Lindeberg–Feller, 271

change of variables in integration, 106characteristic function (c.f.) of a random

variable, 254inversion and uniqueness, 261inversion theorem, 278joint, 277recognizing, 271uniqueness, 262, 278

Chebychev Inequality, 202, 243classes of sets, 1, 2completion, 34, 41, 81conditional distribution, 295conditional expectation, 287, 288, 300,

305conditional probability, 285, 291, 301,

305conditionally independent, 307consistency of a family of measures

(distributions), 167, 342continuity theorem for characteristic

functions, 264, 279continuous from above (below), 25continuous mapping theorem, 231convergence

almost everywhere (a.e.), 58almost sure (a.s.), with probability one,

223almost uniform (a.u.), 119in distribution, 227, 228in measure, 120in probability, 225in pth order mean (Lp-spaces), 226modes, summary, 134of integrals, 73pointwise, 118

357

358 Index

convergence (cont.)uniformly, 118uniformly a.e., 118vague, 204, 237weak, 228

convex, 202convolution, 153, 216correlation, 200counting measure, 41, 81covariance, 200Cox processes, 354Cramer–Wold device, 280cylinder set, 164

De Morgan laws, 6degenerate distribution, 257density function, 105discrete measures, 104, 105distribution

marginal, 198, 341of a random element, 197of a random measure, 351of a random variable, 190

distribution function (d.f.), 191absolutely continuous, 193, 199discrete, 193joint, 197

dominated convergence, 76, 92, 179conditional, 290

Doob’s decomposition, 313

Egoroff’s Theorem, 120equivalent

signed measures, 95stochastic processes, 345

essentially unique, 96event, 189expectation, 199extension of measures, 27, 31

Fatou’s Lemma, 76conditional, 289

field (algebra), 9finite-dimensional distributions (fidi’s),

341Fourier Transform, 181, 254

Dirichlet Limit, 186inverse, 185inversion, 182“local” inversion, 186local inversion theorem, 187

Fourier–Stieltjes Transform, 180, 254inversion, 182

Fubini’s Theorem, 150, 158functional central limit theorem

(invariance principle), 347gamma distribution, 194generalized second derivative, 282Hahn decomposition, 88

minimal property, 90Hausdorff space, 349, 355Heine–Borel Theorem, 37Helly’s Selection Theorem, 232Holder’s Inequality, 128, 179, 201increasing sequence of functions, 55independent events and their classes, 208independent random elements and their

families, 211independent random variables, 213

addition, 216existence, 214

indicator (characteristic) functions, 7integrability, 68integrable function, 67–69integral, 68

defined, 69indefinite, 66of complex functions, 177of nonnegative measurable functions,

63of nonnegative simple functions, 62with respect to signed measures, 92

integration by parts, 154inverse functions, 203inverse image, 46Jensen’s Inequality, 202

conditional, 291, 306Jordan decomposition, 89, 152Kolmogorov Inequalities, 241, 314Kolmogorov Zero-One Law, 218Kolmogorov’s Extension Theorem, 167,

169, 342Kolmogorov’s Three Series Theorem,

244Laplace Transform (L.T.), 350laws of large numbers, 247, 248, 327Lebesgue decomposition, 96, 98, 106,

194Lebesgue integrals, 78Lebesgue measurable function, 59Lebesgue measurable sets, 38

n-dimensional, 158two-dimensional, 153

Index 359

Lebesgue measure, 37n-dimensional, 158two-dimensional, 153

Lebesgue–Stieltjes integrals, 78, 111Lebesgue–Stieltjes measures, 39, 78, 111,

158, 162Levy distance, 251Liapounov’s condition, 284likelihood ratios, 333Lindeberg condition, 269linear mapping (transformation), 17, 38linear space, 126Markov Inequality, 202martingale (submartingale,

supermartingale), 309, 320convergence, 319predictable increasing sequence, 313reverse, 323upcrossings, 317

mean square estimate, 306measurability criterion, 48measurable functions, 47

combining, 50complex-valued, 178extended, 45

measurable space, 44measurable transformation, 47measure space, 44measures, 22

complete, 34complex, 87from outer measures, 29induced by transformations, 58mixtures of, 143on RT , 167regularity, 162

metric space, 124complete, 126separable, 126

Minkowski’s Inequality, 129, 180, 201reverse, 130

moments, 199absolute, 199, 200central, 200inequalities, 200

monotone class theorem, 14, 19monotone convergence theorem, 74

conditional, 289nonnegative definite, 274norm, 126normal distribution, 194, 257

multivariate, 200normed linear space, 126

outer measure, 29

Palm distributions, 285point process, 349Poisson distribution, 193Poisson process, 353

compound, 354doubly stochastic, 354stationary and nonstationary, 354

Polya’s urn scheme, 337Portmanteau Theorem, 228positive definite, 274probability density function (p.d.f.),

193joint, 198

probability measure (probability), 44, 189frequency interpretation, 190inequalities, 200

probability space, 189probability transforms, 204product measurable space, 155product measure, 149, 156product spaces, 141σ-field, 142, 165σ-ring, 141, 142diagonal, 171finite-dimensional, 155space (RT ,BT ), 163

Prohorov’s Theorem, 234projection map, 164, 165

Rademacher functions, 220Radon–Nikodym derivative, 102

chain rule, 103Radon–Nikodym Theorem, 96, 100, 179random element (r.e.), 195random experiment, 189random function (r.f.), 340random measure (r.m.), 350

basic structure, 349intensity measure, 350mixed, 351random element representation, 351

random variables (r.v.’s), 190absolutely continuous, 193discrete, 193extended, 190identically distributed, 192symmetric, 281

random vector, 195, 196

360 Index

real line applications, 78, 104, 153rectangle, 141regular conditional density, 303regular conditional distribution, 296, 299,

301, 302, 305regular conditional probability, 293, 299,

301, 305relatively compact, 234repeated (iterated) integral, 148, 157Riemann integrals, 79, 80, 84Riemann–Lebesgue Lemma, 182rings, 8, 11sample functions (paths), 341

continuity, 345sample point process, 351

mixing, 352Schwarz Inequality, 129section of a set, 142semiring, 10set functions, 21

additive, 22countable subadditivity, 29extensions and restrictions, 22finitely additive (countably additive),

22monotone, 23subtractive, 23

set mapping, 46sets, 1

complement of a set, 4convergent, 7difference, 4disjoint, 4empty, 3equalities, 5

intersection, 3limits, 6lower limit, 6monotone increasing (decreasing), 7proper difference, 4symmetric difference, 4union (sum), 3upper limit, 6

signed measure, 86, 152null, negative, positive, 87total variation, 112

simple functions, 54singularity, 94, 105, 194Skorohod’s Representation, 236stochastic process, 195, 340

continuous parameter, 340on special subspaces of RT , 344realization, 341stochastic sequence or discrete

parameter, 340tail σ-field, event and random variables,

218three series theorem, 244tight family, 232transformation, 45transformation theorem, 77, 93, 179triangular array, 268Tychonoff’s Theorem, 168uniform absolute continuity, 238uniform distribution, 257uniform integrability, 238variance, 200Wiener measure, 347Wiener process, 343, 346

[leadbetter r., cambanis s., pipiras v.] a basic c(bokos-z1)

Documents

university of cambridge

measure theory

operations research

graduate course

theory ofstochastic

basic course

stable processes

chapel hill since2002