operator theoretic aspects of ergodic theoryhaase/files/efhn-july2012.pdf · ergodic theory has its...

T. Eisner, B. Farkas, M. Haase, R. Nagel

Operator Theoretic Aspects ofErgodic Theory

July 21, 2012

Springer

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 What is Ergodic Theory? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Topological Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Some Basic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Basic Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Topological Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Transitivity of Subshifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Minimality and Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1 Minimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Topological Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3 Recurrence in Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 The C∗-algebra C(K) and the Koopman Operator . . . . . . . . . . . . . . . . . . 374.1 Continuous Functions on Compact Spaces . . . . . . . . . . . . . . . . . . . . . . 384.2 The Space C(K) as a Commutative C∗-Algebra . . . . . . . . . . . . . . . . . 434.3 The Koopman Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4 The Theorem of Gelfand–Naimark . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Supplement: Proof of the Gelfand–Naimark Theorem . . . . . . . . . . . . . . . . . 50Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Measure-Preserving Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2 Measures on Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3 Haar Measure and Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

v

vi Contents

6 Recurrence and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.1 The Measure Algebra and Invertible MDS . . . . . . . . . . . . . . . . . . . . . . 756.2 Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.3 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Supplement: Induced Transformation and Kakutani–Rokhlin Lemma . . . . 85Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7 The Banach Lattice Lp and the Koopman Operator . . . . . . . . . . . . . . . . 917.1 Banach Lattices and Positive Operators . . . . . . . . . . . . . . . . . . . . . . . . 927.2 The Space Lp(X) as a Banach Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . 977.3 The Koopman Operator and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . 99Supplement: Interplay between Lattice and Algebra Structure . . . . . . . . . . 102Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8 The Mean Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078.1 Von Neumann’s Mean Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . 1088.2 The Fixed Space and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1118.3 Perron’s Theorem and the Ergodicity of Markov Shifts . . . . . . . . . . . 1148.4 Mean Ergodic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116Supplement: Operations with Mean Ergodic Operators . . . . . . . . . . . . . . . . 120Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

9 Mixing Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1259.1 Strong Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269.2 Weakly Mixing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319.3 Weak Mixing of All Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

10 Mean Ergodic Operators on C(K) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14710.1 Invariant Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14710.2 Uniquely and Strictly Ergodic Systems . . . . . . . . . . . . . . . . . . . . . . . . 15010.3 Mean Ergodicity of Group Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . 15210.4 Furstenberg’s Theorem on Group Extensions . . . . . . . . . . . . . . . . . . . 15410.5 Application: Equidistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

11 The Pointwise Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16311.1 Pointwise Ergodic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16411.2 Banach’s Principle and Maximal Inequalities . . . . . . . . . . . . . . . . . . . 16611.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Contents vii

12 Isomorphisms and Topological Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 17512.1 Point Isomorphisms and Factor Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 17512.2 Algebra Isomorphisms of Measure Preserving Systems . . . . . . . . . . . 17912.3 Topological Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18312.4 The Stone Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18612.5 Mean Ergodic Subalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

13 Markov Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19313.1 Examples and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19413.2 Embeddings, Factor Maps and Isomorphisms . . . . . . . . . . . . . . . . . . . 19613.3 Markov Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19913.4 Factors and their Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203Supplement: Operator Theory on Hilbert Space . . . . . . . . . . . . . . . . . . . . . . 206Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

14 Compact Semigroups and Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21114.1 Compact Semigroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21114.2 Compact Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21714.3 The Character Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21914.4 The Pontryagin Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22314.5 Application: Ergodic Rotations and Kronecker’s Theorem . . . . . . . . 225Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

15 Topological Dynamics Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22915.1 The Cech–Stone Compactification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22915.2 The Space of Ultrafilters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22915.3 Multiple Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22915.4 Applications to Ramsey Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

16 The Jacobs–de Leeuw–Glicksberg Decomposition . . . . . . . . . . . . . . . . . . 23116.1 Weakly Compact Operator Semigroups . . . . . . . . . . . . . . . . . . . . . . . . 23216.2 The Jacobs–de Leeuw–Glicksberg Decomposition . . . . . . . . . . . . . . . 23716.3 The Almost Weakly Stable Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24016.4 Compact Group Actions on Banach Spaces . . . . . . . . . . . . . . . . . . . . . 24416.5 The Reversible Part Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250Supplement: Unitary Representations of Compact Groups . . . . . . . . . . . . . 252Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

17 Dynamical Systems with Discrete Spectrum . . . . . . . . . . . . . . . . . . . . . . . 26117.1 Operators with Discrete Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26117.2 Monothetic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26317.3 Dynamical Systems with Discrete Spectrum . . . . . . . . . . . . . . . . . . . . 26517.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

viii Contents

18 A Glimpse at Arithmetic Progressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27518.1 The Furstenberg Correspondence Principle . . . . . . . . . . . . . . . . . . . . . 27618.2 A Multiple Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27918.3 The Furstenberg–Sarkozy Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 283Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

19 Joinings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28719.1 Dynamical Systems and their Joinings . . . . . . . . . . . . . . . . . . . . . . . . . 28719.2 Inductive Limits and the Minimal Invertible Extension . . . . . . . . . . . 28719.3 Relatively Independent Joinings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287Supplement: Tensor Products and Projective Limits . . . . . . . . . . . . . . . . . . . 287Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

20 The Host–Kra–Tao Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28920.1 Idempotent Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28920.2 The Existence of Sated Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28920.3 The Furstenberg Self-Joining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28920.4 Proof of the Host–Kra–Tao Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 289Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

21 More Ergodic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29121.1 Polynomial Ergodic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29121.2 Weighted Ergodic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29121.3 Wiener–Wintner Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29121.4 Even More Ergodic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

A Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

B Measure and Integration Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

C Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

D The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

E Theorems of Eberlein, Grothendieck, and Ellis . . . . . . . . . . . . . . . . . . . . 345

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

Symbol Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

Preface

Ergodic Theory has its roots in Maxwell’s and Boltzmann’s kinetic theory of gasesand was born as a mathematical theory around 1930 by the groundbreaking worksof von Neumann and Birkhoff. In the 1970s, Furstenberg showed how to translatequestions of combinatorial number theory into ergodic theory. This inspired a newline of research, which ultimately led to stunning recent results of Host and Kra,Green and Tao and many others.

In its 80 years of existence, ergodic theory has developed into a highly sophis-ticated field that builts heavily on methods and results from many other mathemat-ical disciplines, e.g., probability theory, combinatorics, group theory, topology andset theory, even mathematical logic. Right from the beginning, also operator the-ory (which for the sake of simplicity we use synonymously to “functional analysis”here) played a central role. To wit, the main protagonist of the seminal papers of vonNeumann [von Neumann (1932b)] and Birkhoff [Birkhoff (1931)] is the so-calledKoopman operator

T : L2(X)→ L2(X) (T f )(x) := f (ϕ(x)), (0.1)

where X = (X ,Σ ,µ) is the underlying probability space, and ϕ : X → X is themeasure-preserving dynamics. (See Chapter 1 for more details.) By passing fromthe state space dynamics ϕ to the (linear) operator T , one linearises the situationand can then employ all the methods and tools from operator theory, like, e.g., spec-tral theory, duality theory, harmonic analysis.

However, this is not a one-way street. Results and problems from ergodic theory,once formulated in operator theoretic terms, tend to emancipate from their parentalhome and to lead their own life in functional analysis, with sometimes stunningresults of wide applicability (like the mean ergodic theorem, see Chapter 8).

We, as functional analysts, are fascinated by this interplay, and the present bookis the result of this fascination. With it we offer the reader a systematic introduc-tion into the operator theoretic aspects of ergodic theory, and show some of theirsurprising applications.

ix

x Contents

As our focus is on operator theoretic aspects, we have sometimes taken theliberty to develop the functional analysis beyond its immediate connection withergodic theory, and hence to leave out topics in ergodic theory where functionalanalysis has (yet?) not so much to say. (An example of such a missing topic isentropy.) These omissions, however, should be outweighed by some inclusions,such as a detailed account of the Jacobs–deLeeuw–Glicksberg splitting theoryand a proof of the Host–Kra–Tao theorem on multiple ergodic averages. (See be-low for a more detailed description of the contents.) Readers who want to seeother parts of and trends in ergodic theory we refer to one of the excellent re-cent monographs such as [Glasner (2003)], [Kalikow and McCutcheon (2010)], and[Einsiedler and Ward (2011)].

This book can serve quite different purposes. First, we do not expect any prior en-counter with ergodic theory, so it can be used as a basis for an introductory courseinto the subject. We certainly require familiarity with standard functional analysis,but whenever there were doubts about what may be considered “standard”, we in-cluded full proofs.1 Second, it can serve someone coming from functional analysisto make the first steps into ergodic theory; and third, it can help someone from er-godic theory to learn about the many operator theoretic aspects of his own discipline.Finally—one great hope of ours—it may serve as a foundation of future research,leading towards new and yet unknown connections between ergodic and operatortheory.

A Short Synopsis

The following is a brief overview about the contents of the book, highlighting someof its distinguishing features. As said above, no prior knowledge of ergodic theoryis required. Instead, we expect the reader to have some background in topology,measure theory and functional analysis, most of which is collected in AppendicesA, B, and C.In Chapter 1 entitled “What is Ergodic Theory?” we give a brief and intuitive in-troduction into the subject. The mathematical theory then starts in Chapters 2 and3 with the case of topological dynamical systems. We present the basic definitions(transitivity, minimality and recurrence) and cover the standard examples, construc-tions and theorems, like for instance the Birkhoff recurrence theorem.Operator theory appears first in Chapter 4 when we introduce as in (0.1) the Koop-man operator T on the Banach space C(K) induced by a topological dynamicalsystem (K;ϕ). In order to be able to characterise properties of the (nonlinear) dy-namical system through properties of this linear operator T , a careful analysis ofthe space C(K) and the operators thereon is needed. Therefore we cover the ba-

1 This concerns the Gelfand–Naimark theorem, the Stone–Weierstrass theorem, Pontryagin du-ality and the Peter-Weyl theorem (main text) and the Riesz–Representation theorem, Eberlein’s,Grothendieck’s and Krein-Smulian’s theorems on weak compactness, Ellis’ theorem and the exis-tence of the Haar measure (appendices).

Contents xi

sic results about such spaces (Urysohn’s lemma, Tietze’s and the Stone–Weierstrasstheorem). Then we emphasise its structure as a Banach algebra and give a proof ofthe classical Gelfand–Naimark theorem, which allows to represent each commuta-tive C∗-algebra as a space C(K). The Koopman operators are then identified as themorphisms of such algebras.In Chapter 5 we introduce measure-preserving dynamical systems and cover stan-dard examples and constructions. In particular, we discuss the relation of measureson a compact space K with functionals on the Banach space C(K). (However, theproof of the Riesz representation theorem is deferred to Appendix D.) The clas-sical topics of recurrence and ergodicity as the most basic properties of measure-preserving systems are discussed in Chapter 6.Subsequently, in Chapter 7 we turn to the corresponding operator theory. As inthe topological case, a measure-preserving map ϕ on the probability space X =(X ,Σ ,µ) induces a Koopman operator T on each space Lp(X) as in (0.1). Whilein the topological situation we look at the space C(K) as a Banach algebra andat the Koopman operator as an algebra homomorphism, in the measure-theoreticcontext the corresponding spaces are Banach lattices and the Koopman operatorsare lattice homomorphisms. Consequently, we include a short introduction into ab-stract Banach lattices and their morphisms. Finally, we characterise the ergodicityof a measure-preserving dynamical system by the fixed space or irreducibility of theKoopman operator.After these preparations, we discuss the most central operator theoretic results inergodic theory, von Neumann’s mean ergodic theorem (Chapter 8) and Birkhoff’spointwise ergodic theorem (Chapter 11). The former is placed in the more generalcontext of mean ergodic operators and in Chapter 10 we discuss this concept forKoopman operators of topological dynamical systems. Here the classical results ofKrylov–Bogoliubov about the existence of invariant measures is proved, the con-cepts of unique and strict ergodicity are introduced and exemplified with Fursten-berg’s theorem on group extensions.In between the discussion of the ergodic theorems, in Chapter 9, we introduce theconcepts of strongly and weakly mixing systems. The topic has again a strong oper-ator theoretic flavour, as the different types of mixing are characterised by differentasymptotic behaviour of the powers T n as n→ ∞ of the Koopman operator. In asense, the results of this chapter are incomplete since they are developed withoutusing the fact that orbits of the Koopman operator are relatively weakly compact.The complete truth about weakly mixing systems is revealed only in Chapter 16,when this compactness is studied more thoroughly (see below).Next, in Chapter 12 we consider different concepts of an “isomorphism”—point iso-morphism, measure algebra isomorphism and Markov isomorphism—of measure-preserving systems. In our approach, the latter is the most important, while underly-ing state space maps are rather insignificant. This includes in particular the dynamicsϕ itself; and indeed, by virtue of the Gelfand–Naimark theorem it is shown that eachmeasure-preserving system has many topological models. One canonical model, theStone representation is discussed in detail.

xii Contents

In Chapter 13 we introduce the class of Markov operators, which plays a centralrole in later chapters. We discuss different types of Markov operators (embeddings,factor maps, Markov projections) and introduce the related concept of a factor of ameasure-preserving system.Compact semigroups and compact groups (Chapter 14) occur, e.g., when consider-ing the semigroup T n : n∈N0 and its closures in appropriate operator topologies.Since the structure of these (semi)groups is essential for an understanding of dynam-ical systems, we present the essentials of Pontrjagin’s duality theory for compactAbelian groups. This chapter is accompanied by the results in Appendix E, wherethe existence of the Haar measure and Ellis’ theorem for compact semitopologicalgroups is proved in its full generality.In Chapter 15 we approach the Cech–Stone compactification of a (discrete) semi-group via the Gelfand–Naimark theorem and return to topological dynamics byshowing some less classical results, like the one of Birkhoff about multiple recur-rence. Here we encounter the first applications of dynamical systems to combina-torics and prove the theorems of van der Waerden, Hindman and Hales–Jewett.In Chapter 16 we develop a powerful tool in the study of the asymptotic be-haviour of semigroup actions on Banach spaces, the Jacobs–deLeeuw–Glicksberg(JdLG-)decomposition. Applied to the semigroup generated by a Markov operator Tit yields an orthogonal splitting of the corresponding L2-space into the so-called “re-versible” and the “almost weakly stable” part. The former is the range of a Markovprojection and hence a factor, and the operator generates a compact group on it. Thelatter is characterised by a convergence to 0 (in some sense) of the powers of T .

Applied to the Koopman operator of a measure-preserving system, the reversiblepart is the so-called Kronecker factor. It turns out that this factor is trivial if and onlyif the system is weakly mixing. On the other hand, this factor is the whole systemif and only if the Koopman operator has discrete spectrum in which case the systemis (Markov) isomorphic to a rotation on a compact monothetic group (Halmos–vonNeumann theorem, Chapter 17).In Chapter 18 we employ the JdLG-decomposition to establish the existence ofarithmetic progressions of length 3 in certain subsets of N, thus proving the firstnontrivial case of Szemeredi’s theorem on arithmetic progressions.In Chapter 19 we leave the classical setup of ergodic theory towards generalmeasure-preserving (countable discrete) semigroup actions on L1-spaces. Moreover,we introduce the concept of joining of such systems, discuss the connection withMarkov operators, present the standard types (graph joining, relatively independentjoining), and prove the relative independence theorem.

These notions and results are then applied in Chapter 20 where we present a proof(due to Austin and de la Rue) of the Host–Kra–Tao theorem on multiple ergodicaverages.In the final Chapter 21 more ergodic theorems lead the reader to a less classical area,and actually to a field of active research.

Contents xiii

History

The text of this book has a long and complicated history. In the early 1960s, HelmutH. Schaefer founded his Tubingen school systematically investigating Banach lat-tices and positive operators [Schaefer (1974)]. Growing out of that school, RainerNagel in the early 1970’s developed in a series of lectures an abstract “ergodic the-ory on Banach lattices” in order to unify topological dynamics with the ergodictheory of measure-preserving systems. This approach was pursued subseqently to-gether with several doctoral students among whom Gunther Palm who succeeded in[Palm (1976)] to unify the topological and measure-theoretic entropy theories. Thisand other results, e.g., on discrete spectrum [Nagel and Wolff (1972)], mean ergodicsemigroups [Nagel (1973)] and dilations of positive operators [Kern et al. (1977)]led eventually to a manuscript by Roland Derndinger, Rainer Nagel and GuntherPalm entitled “Ergodic Theory in the Perspective of Functional Analysis”, ready forpublication around 1980.

However, the “Zeitgeist” seemed to be against this approach. Ergodic theoristsat the time were fascinated by other topics, like the isomorphism problem and theimpact the concept of entropy had made on it. For this reason, the publication ofthe manuscript was delayed for several years. In 1987, when the manuscript wasfinally accepted by Springer’s Lecture Notes in Mathematics, time had passed overit, and none of the authors was willing or able to do the final editing. The bookproject was buried and the manuscript remained in an unpublished preprint form[Derndinger et al. (1987)].

Then, some twenty years later and inspired by the survey articles by Bryna Kra[Kra (2006), Kra (2007)] and Terence Tao [Tao (2007)] on the Green–Tao theorem,Rainer Nagel took up the topic again. He quickly motivated two of his formerdoctoral students (T.E., B.F.) and a former master student (M.H.) for the project.However, it was clear that the old manuscript could only serve as an importantsource, but a totally new text would have had to be written. During the academicyear 2008/09 the authors organised an international internet seminar under the title“Ergodic Theory – An Operator Theoretic Approach” and wrote the correspondinglecture notes. Over the last 3 years, these notes were expanded considerably, rear-ranged and rewritten several times. Until, finally, they became the book that we nowproudly present to the mathematical public.

Acknowledgements

Many people have contributed, directly or indirectly, to the completion of the presentwork. During the Internetseminar 2008/09 we profitted enourmously from the par-ticipants, not only by their mathematical comments but also by their enthusiasticencouragement. This support continued during the subsequent years in which a partof the material was taken as a basis for lecture courses or seminars at a master’slevel. We thank all who were involved in this process, particularly Omar Aboura,Abramo Bertucco, Joanna Kulaga, Heinrich Kuttler, Daniel Maier, Carlo Monta-

xiv Contents

nari, Felix Pogorzelski, Manfred Sauter, Pavel Zorin-Kranich. We hope that theywill enjoy this final version.We thank the Mathematisches Forschungsinstitut Oberwolfach for hosting us in a“research in quadruples”-week in April 2012.B.F. acknowledges the support of the Janos Bolyai Research Fellowship of the Hun-garian Academy of Sciences and of the Hungarian Research Fund (OTKA-100461).Special thanks go to our colleagues Roland Derndinger, Mariusz Lemanczyk, JurgenVoigt and Hendrik Vogt.

Amsterdam, Budapest, Delft and Tubingen, July 2012Tanja EisnerBalint FarkasMarkus HaaseRainer Nagel

Chapter 1What is Ergodic Theory?

Ergodic Theory is not one of the classical mathematical disciplines and its name, incontrast to, e.g., number theory, does not explain its subject. However, its origin canbe described quite precisely.

It was around 1880 when L. Boltzmann, J. C. Maxwell and others tried to explainthermodynamical phenomena by mechanical models and their underlying mathe-matical principles. In this context, L. Boltzmann [Boltzmann (1885)] coined theword Ergode (as a special kind of Monode):1

“Monoden, welche nur durch die Gleichung der lebendigen Kraft beschrankt sind, will ichals Ergoden bezeichnen.”2

A few years later (in 1911) P. and T. Ehrenfest [Ehrenfest and Ehrenfest (1912)]wrote

“... haben Boltzmann und Maxwell eine Klasse von mechanischen Systemen durch die fol-gende Forderung definiert:Die einzelne ungestorte Bewegung des Systems fuhrt bei unbegrenzter Fortsetzungschließlich durch jeden Phasenpunkt hindurch, der mit der mitgegebenen Totalenergievertraglich ist. – Ein mechanisches System, das diese Forderung erfullt, nennt Boltzmannein ergodisches System.” 3

The assumption that certain systems are “ergodic” is then called “Ergodic Hypothe-sis”. Leaving the original problem behind, “Ergodic Theory” set out on its fascinat-ing journey into mathematics and arrived at quite unexpected destinations.

Before we, too, undertake this journey, let us explain the original problem withoutgoing too deep into the underlying physics. We start with an (ideal) gas contained in

1 For the still controversial discussion “On the origin of the notion ’Ergodic Theory”’ seethe excellent article by M. Mathieu [Mathieu (1988)] but also the remarks by G. Gallavotti in[Gallavotti (1975)].2 “Monodes which are restricted only by the equations of the living power, I shall call Ergodes.”3 “...Boltzmann and Maxwell have defined a class of mechanical systems by the following claim:By unlimited continuation the single undisturbed motion passes through every state which is com-patible with the given total energy. Boltzmann calls a mechanical system which fulfils this claiman ergodic system.”

1

2 1 What is Ergodic Theory?

a box and represented by d (frictionless) moving particles. Each particle is describedby six coordinates (three for position, three for velocity), so the situation of thegas (better: the state of the system) is given by a point x ∈ R6d . Clearly, not allpoints in R6d can be attained by our gas in the box, so we restrict our considerationsto the set X of all possible states and call this set the state space of our system.We now observe that our system changes while time is running, i.e., the particlesare moving (in the box) and therefore a given state (= point in X) also “moves”(in X). This motion (in the box, therefore in X) is governed by Newton’s laws ofmechanics and then by Hamilton’s differential equations. The solutions to theseequations determine a map

ϕ : X → X

in the following way: If our system, at time t = 0, is in the state x0 ∈ X , then at timet = 1 it will be in a new state x1, and we define ϕ by ϕ(x0) := x1. As a consequence,at time t = 2 the state x0 becomes

x2 := ϕ(x1) = ϕ2(x0)

andxn := ϕ

n(x0)

at time t = n ∈N. The so-obtained set ϕn(x0) | n ∈N0 of states is called the orbitof x0. In this way, the physical motion of the system of particles becomes a “motion”of the “points” in the state space. The motion of all states within one time unit isgiven by the map ϕ . For these objects we introduce the following terminology.

Definition 1.1. A pair (X ,ϕ) consisting of a state space X and a map ϕ : X → X iscalled a dynamical system. 4

The mathematical goal now is not so much to determine ϕ but rather to find inter-esting properties of it. Motivated by the underlying physical situation, the emphasisis on “long term” properties of ϕ , i.e., properties of ϕn as n gets large.

First objection. In the physical situation it is not possible to determine exactly thegiven initial state x0 ∈ X of the system or any of its later states ϕn(x0).

Fig. 1.1 Try to determine the exact state of the system for only d = 1000 gas particles.

4 This is a mathematical model for the philosophical principle of determinism and we refer to thearticle by G. Nickel in [Engel and Nagel (2000)] for more on this interplay between mathematicsand philosophy.

1 What is Ergodic Theory? 3

To overcome this objection we introduce “observables”, i.e., functions f : X→Rassigning to each state x ∈ X the value f (x) of a measurement, for instance of thetemperature. The motion in time (“evolution”) of the states described by the mapϕ : X → X is then reflected by a map Tϕ of the observables defined as

f 7→ Tϕ f := f ϕ.

This change of perspective is not only physically justified, it also has an enormousmathematical advantage:

The set of all observables f : X → R has a vector space structure andthe map Tϕ = ( f 7→ f ϕ) becomes a linear operator on this vector space.

So instead of looking at the orbits ϕn(x0) : n ∈ N0 of the state map ϕ we studythe orbit (Tϕ)

n f : n ∈ N0 of an observable f under the linear operator Tϕ . Thisallows one to use operator theoretic tools and is the leitmotiv of this book.

Returning to our description of the motion of the particles in a box and keepingin mind realistic experiments, we should make another objection.

Second objection. The motion of our system happens so quickly that we will nei-ther be able to determine the states

x0,ϕ(x0),ϕ2(x0), . . .

nor their measurements

f (x0),Tϕ f (x0),T 2ϕ f (x0), . . .

at time t = 0,1,2, . . ..

Reacting on this objection we slightly change our perspective and instead of theabove measurements we look at the averages over time

1N

N−1

∑n=0

T nϕ f (x0) =

1N

N−1

∑n=0

f (ϕn(x0))

limN→∞

1N

N−1

∑n=0

T nϕ f (x0),and their limit

called the time mean (of the observable f in the state x0). This appears to be agood idea, but it is still based on the knowledge of the states ϕn(x0) and their mea-surements f (ϕn(x0)), so the first objection remains valid. At this point, Boltzmannasked for more.

Third objection. The time mean limN1N ∑

N−1n=0 f (ϕn(x0)) should not depend on the

initial state x0 ∈ X .

Boltzmann even suggested what the time mean should be. Indeed, for his systemthere exists a canonical probability measure µ on the state space X for which heclaimed the validity of the so-called

4 1 What is Ergodic Theory?

Ergodic Hypothesis. For each initial state x0 ∈X and each (reasonable) observablef : X → R, it is true that “time mean equals space mean”, i.e.,

limN→∞

1N

N−1

∑n=0

f (ϕn(x0)) =∫

Xf dµ.

Up to now our arguments are quite vague and based on some more or less realisticphysical intuition (that one may or may not have). However, we arrived at the pointwhere mathematicians can take over and build a mathematical theory. To do so wepropose the following steps.

1) Make reasonable assumptions on the basic objects such as the state space X ,the dynamics given by the map ϕ : X → X and the observables f : Ω → R.

2) Prove theorems that answer the following questions.

(i) Under which assumptions and in which sense does the limit appearingin the time mean exist?

(ii) What are the best assumptions to guarantee that the ergodic hypothesisholds.

Historically, these goals could be achieved only after the necessary tools had beencreated. Fortunately, between 1900 and 1930 new mathematical theories emergedsuch as topology, measure theory and functional analysis. Using the tools from thesenew theories, J. von Neumann and G. D. Birkhoff proved in 1931 the so-calledmean ergodic theorem [von Neumann (1932b)] and the individual ergodic theorem[Birkhoff (1931)] and thereby established ergodic theory as a new and independentmathematical discipline, see [Bergelson (2004)].

Now, a valuable mathematical theory should offer more than precise definitions(see Step 1) and nice theorems (see Step 2). You also expect that applications canbe made to the original problem from physics. But something very fascinating hap-pened, something that E. Wigner called “the unreasonable effectiveness of math-ematics” [Wigner (1960)]. While Wigner noted this “effectiveness” with regard tothe natural sciences, we shall see that ergodic theory is effective in completely dif-ferent fields of mathematics such as, e.g., number theory. This phenomenon wasunconsciously anticipated by E. Borel’s theorem on normal numbers [Borel (1909)]or by the equidistribution theorem of H. Weyl [Weyl (1916)]. Both results are nowconsidered to be part of ergodic theory.

But not only classical results can now be proved using ergodic theory. Confirm-ing a conjecture of 1936 by P. Erdos and P. Turan, B. Green and T. Tao createda mathematical sensation by proving the following theorem using, among others,techniques and results from ergodic theory.

Theorem 1.2 (Green–Tao, 2004). The primes P contain arbitrarily long arith-metic progressions, i.e., for every k ∈ N there exist a ∈ P and n ∈ N such that

a,a+n,a+2n, . . . ,a+(k−1)n ∈ P.

1 What is Ergodic Theory? 5

The Green–Tao theorem had a precursor first proved by Szemeredi in a combinato-rial way but eventually given a purely ergodic theoretic proof by H. Furstenberg.

Theorem 1.3 (Szemeredi, 1975). If a set A⊆ N has upper density

d(A) := limn→∞

#(A∩1, . . . ,n)n

> 0,

then it contains arbitrarily long arithmetic progressions.

A complete proof of this theorem remains beyond the reach of this book. However,we shall provide some of the necessary tools.

As a warm-up before the real work, we give you a simple exercise.

Problem. Consider the unit circle T= z ∈ C | |z|= 1 and take a point 1 6= a ∈ T.What can we say about the behaviour of the barycenters bn of the polygons formedby the points 1,a,a2, . . . ,an−1 as n tends to infinity?

a

b2

a2

b3a3 b4

. . .

an−1

bn1

Fig. 1.2 Polygon

While this problem can be solved using only very elementary mathematics, we willshow later that its solution is just a (trivial) special case of von Neumann’s ErgodicTheorem. We hope you are looking forward to seeing this theorem and more.

Chapter 2Topological Dynamical Systems

In Chapter 1 we introduced a dynamical system as consisting of a set X and a self-map ϕ : X → X . However, in concrete situations one usually has some additionalstructure on the set X , e.g., a topology and/or a measure, and then the map ϕ iscontinuous and/or measurable. We shall study measure theoretic dynamical systemsin later chapters and start here with an introduction to topological dynamics.

As a matter of fact, this requires some familiarity with elementary (point-set) topology as discussed for instance in [Kuratowski (1966)], [Kelley (1975)] or[Willard (2004)]. For the convenience of the reader some basic definitions and re-sults are collected in Appendix A. The most important notion is that of a compactspace, but the reader unfamiliar with general topology may think of “compact metricspaces” whenever “compact topological spaces” appear.

Definition 2.1. A topological dynamical system (TDS)is a pair (K;ϕ), where Kis a nonempty compact space and ϕ : K → K is continuous. A TDS (K;ϕ) is sur-jective if ϕ is surjective, and the TDS is called invertible if ϕ is invertible, i.e., ahomeomorphism.

An invertible TDS (K;ϕ) defines two “one-sided” TDSs, namely the forward sys-tem (K;ϕ) and the backward system (K;ϕ−1).

Many of the following notions, like the forward orbit of a point x, do make sensein the more general setting of a continuous self-map of a topological space. How-ever, we restrict ourselves to compact spaces and reserve the term TDS for thisspecial situation.

2.1 Some Basic Examples

First we list some basic examples of dynamical systems.

Example 2.2 (Finite State Space). Take d ∈ N and consider the finite set K :=1, . . . ,d with the discrete topology. Then K is compact and every map ϕ : K→ K

7

8 2 Topological Dynamical Systems

is continuous. The TDS (K;ϕ) is invertible if and only if ϕ is a permutation of theelements of K.

Example 2.3 (Finite-Dimensional Contractions). Let ‖·‖ be a norm on Rd andlet T : Rd → Rd be linear and contractive with respect to the chosen norm, i.e.,‖T x‖ ≤ ‖x‖, x ∈ Rd . Then the unit ball K := x ∈ Rd : ‖x‖ ≤ 1 is compact, andϕ := T |K is a continuous self-map of K.As a concrete example we choose the maximum-norm ‖x‖

∞= max|x1| , . . . , |xd |

on Rd and the linear operator T given by a row-substochastic matrix (ti j)i, j, i.e.,

ti j ≥ 0 and ∑dk=1 tik ≤ 1 (1≤ i, j ≤ d).

Then|[T x]i|=

∣∣∣∑dj=1 ti jx j

∣∣∣≤ ∑dj=1 ti j

∣∣x j∣∣≤ ‖x‖

∞ ∑dj=1 ti j ≤ ‖x‖∞

for every j = 1, . . . ,d. Hence T is contractive.

Example 2.4 (Shift). Take k ∈ N and consider the set

K := W +k := 0,1, . . . ,k−1N0

of infinite sequences within the set 0, . . . ,k−1. In this context, the set 0, . . . ,k−1 is often called an alphabet, its elements being the letters. So the elements of Kare the infinite words composed of these letters. Endowed with the discrete metricthe alphabet is a compact metric space. Hence, by Tychonoff’s theorem K is com-pact when endowed with the product topology (see Section A.5 and Exercise 1). OnK we consider the (left) shift τ defined by

τ : K→ K, (xn)n∈N0 7→ (xn+1)n∈N0 .

Then (K;τ) is a TDS, called the one-sided shift. If we consider two-sided sequencesinstead, that is, Wk := 0,1, . . . ,k− 1Z with τ defined analogously, we obtain aninvertible TDS (K;τ), called the two-sided shift.

Example 2.5 (The Cantor System). The Cantor set is

C =

x ∈ [0,1] : x = ∑∞

j=1a j3 j , a j ∈ 0,2

, (2.1)

cf. Appendix A.8. As a closed subset of the unit interval, the Cantor set C is compact.Consider on C the mapping

ϕ(x) =

3x 0≤ x≤ 1

3 ,

3x−2 23 ≤ x≤ 1.

The continuity of ϕ is clear, and a close inspection using (2.1) reveals that ϕ mapsC to itself. Hence (C;ϕ) is a TDS, called the Cantor system.

2.1 Some Basic Examples 9

Example 2.6 (Translation mod 1). Consider the interval K := [0,1) and define

d(x,y) := min|x− y| ,1−|x− y| (x,y ∈ [0,1)).

By Exercise 2, d is a metric on K, continuous with respect to the standard one, andturning K into a compact metric space. For a real number x ∈ R we write

x (mod1) := x−bxc,

where bxc := maxn∈Z : n≤ x is the greatest integer less than or equal to x. Now,given α ∈ [0,1) we define the translation by α mod 1 as

ϕ(x) := x+α (mod1) = x+α−bx+αc (x ∈ [0,1)).

By Exercise 2, ϕ is continuous with respect to the metric d, hence it gives rise to aTDS on [0,1). We shall abbreviate this TDS by ([0,1);α).

Example 2.7 (Rotation on the Torus). Let K = T := z ∈ C : |z|= 1 be the unitcircle, also called the (one-dimensional) torus. Take a ∈ T and define ϕ : T→ Tby

ϕ(z) := a · z for all z ∈ T.

Since ϕ is obviously continuous, it gives rise to an invertible TDS defined on Tcalled the rotation by a. We shall denote this TDS by (T;a).

Example 2.8 (Rotation on Compact Groups). The previous example is an in-stance of the following general set-up. A topological group is a group (G, ·) en-dowed with a topology such that the maps

(g,h) 7→ g ·h G×G→ G

g 7→ g−1 G→ Gand

are continuous. A topological group is a compact group if G is compact.Given a compact group G, the left rotation by a ∈ G is the mapping

ϕa : G→ G, ϕa(g) := a ·g.

Then (G;ϕa) is an invertible TDS, which we henceforth shall denote by (G;a).Similarly, the right rotation by a ∈ G is

ρa : G→ G, ρa(g) := g ·a

and (G;ρa) is an invertible TDS.If the group is Abelian, then left and right rotations are identical and one often

speaks of translation instead of rotation. The torus T is our most important exampleof a compact Abelian group.

Example 2.9 (Group Rotation on R/Z). Consider the additive group R with thestandard topology and its closed subgroup Z. We consider the factor group R/Z


together with the canonical homomorphism

q(x) := x+Z, R→ R/Z.

If we endow R/Z with the quotient topology with respect to q, R/Z becomes atopological group. It is even compact, because R/Z= q([0,1]), cf. Proposition A.4.For α ∈ R we define ϕ(x+Z) = α + x+Z. Then ϕ is continuous, and we obtain aTDS, which we denote by (R/Z;α).

The next example is not a group rotation, but quite similar to the previous example.

Example 2.10 (The Heisenberg Group). The non-Abelian group G of upper tri-angular real matrices with ones on the diagonal, i.e.,

G =

1 x z0 1 y0 0 1

: x,y,z ∈ R

,

is called the Heisenberg group. We endow it with the usual topology of R3. Itselements with integer entries form a subgroup

H =

1 a c0 1 b0 0 1

: a,b,c ∈ Z

.

Then G is a topological group, not compact, and H is closed and discrete in G.However, note that H is not a normal subgroup. The set G/H of left cosets be-comes a Hausdorff topological space if endowed with the quotient topology underthe canonical map

q(g) := gH, G→ G/H.

The set

K =

1 x z0 1 y0 0 1

: x,y,z ∈ [0,1]

is compact, and one has G = KH (Exercise 3). Hence G/H = q(K) is compact. For

a =

1 α γ

0 1 β

0 0 1

∈ G,

the left rotation by a on G/H is defined by gH 7→ agH. In this way we obtain aninvertible TDS, denoted by (G/H;a). (See also Example 2.16.)

Example 2.11 (Dyadic Adding Machine). For n ∈ N consider the cyclic groupZ2n := Z/2nZ, endowed with the discrete topology. The factor maps

2.2 Basic Constructions 11

πi j : Z2 j → Z2i , πi j(x+2 jZ) := x+2iZ (i≤ j)

are trivially continuous, and satisfy the relations

πii = id and πi j π jk = πik (i≤ j ≤ k).

The topological product space G := ∏n∈NZ2n is a compact Abelian group by Ty-chonoff’s Theorem A.5. Since each πi j is a group homomorphism, the set

A2 =

x = (xn)n∈N ∈∏n∈NZ2n : xi = πi j(x j) for all i≤ j

is a closed subgroup of G, called the group of dyadic integers. Being closed in thecompact group G, A2 is compact. (This construction is an instance of what is calledan inverse or projective limit.) Finally, consider the element

1 := (1+2Z,1+4Z,1+8Z, . . .) ∈ A2.

The translation TDS (A2,1) is called the dyadic adding machine.

2.2 Basic Constructions

In the present section we shall make precise what it means for two TDSs to be“essentially equal”. Then we shall present some techniques for constructing “new”TDSs from “olds” ones, and review our list of examples from the previous sectionin the light of these constructions.

1. Homomorphisms

A homomorphism between two TDSs (K1;ϕ1), (K2;ϕ2) is a continuous map Ψ :K1→ K2 such that Ψ ϕ1 = ϕ2 Ψ , i.e., the diagram

K1 K1

K2 K2

-ϕ1

?Ψ

?Ψ

-ϕ2

is commutative. We indicate this by writing simply

Ψ : (K1;ϕ1)→ (K2;ϕ2).

A homomorphism Ψ : (K1;ϕ1)→ (K2;ϕ2) is a factor map if it is surjective. Then,(K2;ϕ2) is called a factor of (K1;ϕ1) and (K1;ϕ1) is called an extension of (K2;ϕ2).


If Ψ is bijective, then it is called an isomorphism (or conjugation). Two TDSs(K1;ϕ1), (K2;ϕ2) are called isomorphic (or conjugate) if there is an isomorphismbetween them. An isomorphism Ψ : (K;ϕ)→ (K;ϕ) is called an automorphism,and the set of automorphisms of a TDS (K;ϕ) is denoted by Aut (K;ϕ). This isclearly a group with respect to composition.

Example 2.12 (Right Rotations as Automorphism). Consider a left group rota-tion TDS (G;a). Then any right rotation ρh, h ∈ G, is an automorphism of (G;a):

ρh(a ·g) = (ag)h = a(gh) = a · (ρh(g)) (g ∈ G).

This yields an injective homomorphism ρ : G→ Aut (G;a) of groups.Moreover, the inversion map Ψ(g) := g−1 is an isomorphism of the TDSs (G;a)

and (G;ρa−1) since

ρa−1(Ψ(g)) = g−1a−1 = (ag)−1 =Ψ(a ·g) (g ∈ G).

Example 2.13 (Rotation and Translation). For α ∈ [0,1) let a := e2πiα . Then thethree TDSs

1) ([0,1);α) from Example 2.6,

2) (T;a) from Example 2.7,

3) and (R/Z;α) from Example 2.9

are all isomorphic.

Proof. The (well defined!) map Φ : R/Z→ T, x+Z 7→ e2πix is a group iso-morphism. It is continuous, since Φ q is, and q : R→ R/Z is a quotient map (Ap-pendix A.4). Since ϕa Φ = Φ ϕα , Φ is an isomorphism of the TDSs 2) and 3).The isomorphy of 1) and 2) is treated in Exercise 4.

Example 2.14 (Shift ∼= Cantor System). The two TDSs

1) (C;ϕ) from Example 2.5 (Cantor system)

2) (W +2 ;τ) from Example 2.4 (shift on two letters)

are isomorphic (Exercise 5).

Remark 2.15 (Factors). Let (K;ϕ) be a TDS. An equivalence relation ∼ on K iscalled a ϕ-congruence if ∼ is compatible with the action of ϕ , i.e.,

x∼ y ⇒ ϕ(x)∼ ϕ(y).

Let L := K/∼ be the space of equivalence classes with respect to ∼. Then L is acompact space by endowing it with the quotient topology with respect to the canon-ical surjection q : K→ K/∼, q(x) = [x]∼, the equivalence class of x ∈ K. Moreover,a dynamics is induced on L via

ψ([x]∼) := [ϕ(x)]∼ (x ∈ K),


well-defined because ϕ is a congruence. Hence we obtain a new TDS (L;ψ) andq : (K;ϕ)→ (L;ψ) is a factor map.

Conversely, suppose that π : (K;ϕ)→ (L;ψ) is a factor map between two TDSs.Then ∼, defined by x∼ y if and only if π(x) = π(y), is a ϕ-congruence. The map

Φ : (K/∼;ϕ)→ (L;ψ), Φ([x]∼) := π(x)

is an isomorphism of TDSs.

Example 2.16 (Homogeneous Systems). Consider a group rotation (G;a), and letH ≤ G be a closed subgroup of G. The equivalence relation

x∼ y ⇔ y−1x ∈ H

is a congruence, since (ay)−1(ax) = x−1a−1ax = y−1x. The set of correspondingequivalence classes is

G/H := gH : g ∈ G

and the induced dynamics on it is given by gH 7→ agH. The so defined TDS isdenoted by (G/H;a) and is called a homogeneous system. Example 2.10 above isan instance of a homogeneous system.

Example 2.17 (Group Factors). Let (K;ϕ) be a TDS, and let H ⊆Aut (K;ϕ) be asubgroup of Aut (K;ϕ). Consider the equivalence relation

x∼H y ⇔ ∃h ∈ H : h(x) = y

on K. The set of equivalence classes is denoted by K/H. Since all h ∈ H act asautomorphism of (K;ϕ), this is a ϕ-congruence, hence constitutes a factor

q : (K;ϕ)→ (K/G;ϕ)

with ϕ([x]H) := [ϕ(x)]H by abuse of language. A homogeneous system is a specialcase of a group factor (Exercise 6).

2. Products, Skew-Products and Group Extensions

Let (K1;ϕ1) and (K2;ϕ2) be two TDSs. The product TDS (K;ϕ) is defined by

K := K1×K2

ϕ := ϕ1×ϕ2 : K→ K, (ϕ1×ϕ2)(x,y) := (ϕ1(x),ϕ2(x)).

It is invertible if and only if both (K1;ϕ1) and (K2;ϕ2) are invertible. Iterating thisconstruction we obtain finite products of TDSs as in the following example.

Example 2.18 (d-Torus). The d-torus is the d-fold direct product G := T×·· ·×T=Td of the one-dimensional torus. It is a compact topological group. The rotation


by a = (a1,a2, . . . ,ad) ∈ G is the d-fold product of the one-dimensional rotations(T;ai), i = 1, . . . ,d.

The construction of products is not restricted to a finite number of factors. If(Kι ,ϕι) is a TDS for each ι from some index set I, then by Tychonoff’s TheoremA.5

K := ∏ι∈I

Kι , ϕ(x) = (ϕι(xι))ι∈I

is a TDS, called the product of the collection ((Kι ,ϕι))ι∈I . The canonical projec-tions

πι : (K,ϕ)→ (Kι ,ϕι), π(x) := xι (ι ∈ I)

are factor maps.

A product of two TDSs is a special case of the following construction. Let (K;ϕ)be a TDS, let L be a compact space, and let

Φ : K×L→ L

be continuous. The mapping

ψ : K×L→ K×L, ψ(x,y) := (ϕ(x),Φ(x,y))

is continuous, hence we obtain a TDS (K×L;ψ), called the skew product of (K;ϕ)with L along Φ .

Remarks 2.19. Let (K×L;ψ) be a skew product along Φ as above.

1) The projection

π : (K×L;ψ)→ (K;ϕ) π(x,y) = x

is a factor map.

2) If Φ(x,y) is independent of x∈K, the skew product is just an ordinary productas defined above.

3) We abbreviate Φx := Φ(x, ·). If ϕ and each mapping Φx, x ∈ K, is invertible,then ψ is invertible with

ψ−1(x,y) = (ϕ−1(x),Φ−1

ϕ−1(x)(y)).

4) If we iterate ψ on (x,y) we obtain

ψn(x,y) = (ϕn(x),Φϕn−1(x) . . .Φϕ(x)Φx(y)),

i.e., ψn(x,y) = (ϕn(x),Φ [n]x (y)) with Φ

[n]x := Φϕn−1(x) . . . Φϕ(x) Φx. The

mappingΦ : N0×K→ LL, Φ(n,x) := Φ

[n]x


is called the cocycle of the skew product. It satisfies the relations

Φ[0]x = id, Φ

[m+n]x = Φ

[m]ϕn(x) Φ

[n]x (n,m ∈ N0,x ∈ K).

A group extension is a particular instance of a skew product, where the secondfactor is a compact group G, and Φ : K×G→ G is given by left rotations. Thatmeans that we have (by abuse of language)

Φ(x,g) = Φ(x) ·g (x ∈ K,g ∈ G)

where Φ : K→ G is continuous. By Remark 2.19.2 a group extension is invertibleif its first factor (K;ϕ) is invertible.

Example 2.20 (Skew Rotation). Let Φ := I : T→ T be the identity map. Then forthe corresponding group extension of a rotation (T;a) along Φ we have

ψ(x,y) = (ax,xy) (x,y ∈ T).

This TDS is called the skew rotation by a ∈ T.

Remark 2.21. Let H = K×G be a group extension of (K;ϕ) along Φ : K→G. Forh ∈ G we take ρh : H → H, ρh(x,g) := (x,gh) to be the right multiplicationwith h in the second coordinate. Then ρh ∈ Aut (H;ψ); indeed,

ρh(ψ(x,g)) = ρh(ϕ(x),Φ(x)g) = (ϕ(x),Φ(x)gh) = ψ(x,gh) = ψ(ρh(x,g))

for all (x,g) ∈ H. Hence ρ : G→ Aut (K×G;ψ) is a group homomorphism.

3. Subsystems and Unions

Let (K;ϕ) be a TDS. A subset A ⊆ K is called invariant if ϕ(A) ⊆ A, stable ifϕ(A) = A, and bi-invariant if A = ϕ−1(A). If we want to stress the dependence ofthese notions on ϕ , we say invariant under ϕ or ϕ-invariant.

Lemma 2.22. Let (K;ϕ) be a TDS and A⊆ K. Then the following assertions hold.

a) A bi-invariant or stable ⇒ A invariant.

b) A stable and ϕ injective ⇒ A bi-invariant.

c) A bi-invariant and ϕ surjective ⇒ A stable.

Proof. This is Exercise 7.

Every nonempty invariant closed subset A ⊆ K gives rise to a subsystem byrestricting ϕ to A. For simplicity we write ϕ again in place of ϕ|A, and so the sub-system is denoted by (A;ϕ). Note that even if the original system is invertible, thesubsystem need not be invertible any more unless A is bi-invariant (=stable in thiscase). We record the following for later use.


Lemma 2.23. Suppose that (K;ϕ) is a TDS and /0 6= A⊆ K is closed and invariant.Then there is a nonempty, closed set B⊆ A such that ϕ(B) = B.

Proof. Since A⊆ K is invariant,

A⊇ ϕ(A)⊇ ϕ2(A)⊇ ·· · ⊇ ϕ

n(A) holds for all n ∈ N.

All these sets are compact and nonempty, since A is closed and ϕ is continuous. Thusthe set B :=

⋂n∈N ϕn(A) is again nonempty and compact, and satisfies ϕ(B) ⊆ B.

In order to prove ϕ(B) = B, let x ∈ B be arbitrary. Then for each n ∈ N we havex∈ϕn(A), i.e., ϕ−1x∩ϕn−1(A) 6= /0. Hence these sets form a decreasing sequenceof compact nonempty sets, and therefore their intersection ϕ−1x∩B is not emptyeither. It follows that x ∈ ϕ(B) as desired.

From this lemma the following results is immediate.

Corollary 2.24 (Maximal Surjective Subsystem). For a TDS (K;ϕ) let

Ks :=⋂

n≥0ϕ

n(K).

Then (Ks;ϕ) is the unique maximal surjective subsystem of (K;ϕ).

Let (K1;ϕ1), (K2;ϕ2) be two TDSs. Then there is a unique topology on K :=(K1×1)∪ (K2×2) such that the canonical embeddings

Jn : Kn→ K, Jn(x) := (x,n) (n = 1,2)

become homeomorphisms onto open subsets of K. (This topology is the inductivetopology defined by the mappings J1,J2, cf. Appendix A.4.) It is common to identifythe original sets K1,K2 with their images within K. Doing this, we can write K =K1∪K2 and define the map

ϕ(x) :=

ϕ1(x) if x ∈ K1,

ϕ2(x) if x ∈ K2.

Then (K;ϕ) is a TDS, called the (disjoint) union of the original TDSs, and itcontains the original systems as subsystems. It is invertible if and only if (K1;ϕ1)and (K2;ϕ2) are both invertible. In contrast to products, disjoint unions can only beformed out of finite collections of given TDSs.

Example 2.25 (Subshifts). As in Example 2.4, consider the shift (W +k ;τ) on the

alphabet 0,1, . . . ,k− 1. We determine its subsystems, called subshifts. For thispurpose we need some further notions. An n-block of an infinite word x ∈W +

k is afinite sequence y ∈ 0,1, . . . ,k−1n which occurs in x at some position. Take nowan arbitrary set

B⊆⋃

n∈N0

0,1, . . . ,k−1n,

2.3 Topological Transitivity 17

and consider it as the family of excluded blocks. From this we define

W Bk :=

x ∈W +

k : no block of x is contained in B.

It is easy to see that W Bk is a closed τ-invariant subset, hence, if nonempty, it gives

rise to a subsystem (W Bk ;τ). We claim that each subsystem of (W +

k ;τ) arises in thisway.

To prove this claim, let (F ;τ) be a subsystem and consider the set B of finite se-quences that are not present in any of the words in F , i.e.

B :=

y : y is a finite sequence and not contained in any x ∈ F.

We have F ⊆W Bk by definition, and we claim that actually F = W B

K . To this end letx 6∈ F . Then there is an open cylinder U ⊆W +

k with x∈U and U ∩F = /0. By takinga possibly smaller cylinder we may suppose that U has the form

U =

z ∈W +k : z0 = x0, z1 = x1, . . . ,zn = xn

for some n∈N0. Since F∩U = /0 and F is shift invariant, the block y := (x0, . . . ,xn)does not occur in any of the words in F . Hence y ∈ B, so x /∈W B

k .

Particularly important are those subshifts (F ;τ) that arise from a finite set B asF = W B

k . These are called subshifts of finite type. The subshift is called of ordern if there is an excluded block-system B containing only sequences not longer thann, i.e., B ⊆

⋃i≤n0,1, . . . ,k− 1i. In this case, by extending shorter blocks in all

possible ways, one may suppose that all blocks in B have length exactly n.

Of course, all these notions make sense and all these results remain valid for theinvertible case, i.e., for subshifts of (Wk;τ).

2.3 Topological Transitivity

Investigating a TDS means to ask questions like: How does a particular state of thesystem evolve in time? How does ϕ mix the points of K as it is applied over andover again? Will two points that are close to each other initially, stay close even aftera long time? Will a point return to its original position, at least very near to it? Willa certain point x never leave a certain region or will it come arbitrarily close to anyother given point of K?

In order to study such questions we define the forward orbit of x ∈ K as

orb+(x) := ϕn(x) : n ∈ N0,

and, if the TDS is invertible, the (total) orbit of x ∈ K as

orb(x) := ϕn(x) : n ∈ Z.


And we shall write

orb+(x) := ϕn(x) : n ∈ N0 and orb(x) := ϕn(x) : n ∈ Z

for the closure of the forward and the total orbit, respectively.

Definition 2.26. Let (K;ϕ) be a TDS. A point x ∈ K is called forward transitiveif its forward orbit orb+(x) is dense in K. If there is at least one forward transitivepoint, then the TDS (K;ϕ) is called (topologically) forward transitive.

Analogously, a point x ∈ K in an invertible TDS (K;ϕ) is called transitive if itstotal orbit orb(x) is dense, and the invertible TDS (K;ϕ) is called (topologically)transitive if there exists at least one transitive point.

Example 2.27. Let K := Z∪±∞ be the two-point compactification of Z, anddefine

ϕ(n) :=

n+1 if n ∈ Z,n if n =±∞.

Then (K;ϕ) is an invertible TDS, each point n ∈ Z is transitive but no point of K isforward nor backward transitive.

Remarks 2.28. 1) We sometimes say just “transitive” in place of “topologicallytransitive”. The reason is that algebraic transitivity—the fact that a point even-tually reaches exactly every other point—is rare in topological dynamics anddoes not play any role in this theory.

2) The distinction between forward transitivity and (two-sided) transitivity isonly meaningful in invertible systems. If a system under consideration is notinvertible we may therefore drop the word “forward” without causing confu-sion.

A point x ∈ K is forward transitive if we can reach, at least approximately, anyother point in K after some time. The following result tells us that this is a mixing-property: Two arbitrary open regions are mixed with each other under the action ofϕ after finitely many steps.

Proposition 2.29. Let (K;ϕ) be a TDS and consider the following assertions.

(i) (K;ϕ) is forward transitive, i.e., there is a point x ∈ K with orb+(x) = K.

(ii) For all open sets /0 6=U,V in K there is n ∈ N with ϕn(U)∩V 6= /0.

(iii) For all open sets /0 6=U,V in K there is n ∈ N with ϕ−n(U)∩V 6= /0.

Then (ii) and (iii) are equivalent, (ii) implies (i) if K is metrisable, and (i) implies(ii) if K has no isolated points.

Proof. The proof of the equivalence of (ii) and (iii) is left to the reader.

(i)⇒ (ii): Suppose that x ∈ K has dense forward orbit and that U,V are nonemptyopen subsets of K. Then certainly ϕk(x)∈U for some k ∈N0. Consider the open set

2.3 Topological Transitivity 19

W :=V \x,ϕ(x), . . . ,ϕk(x). If K has no isolated points, W cannot be empty, andhence ϕm(x) ∈W for some m > k. Now, ϕm(x) = ϕm−k(ϕk(x)) ∈ ϕm−k(U)∩V .

(iii)⇒ (i): If K is metrisable, there is a countable base Un : n∈N for the topologyon K (see Section A.7). For each n ∈ N consider the open set

Gn :=⋃

k∈N0

ϕ−k(Un).

By assumption b), Gn intersects nontrivially every nonempty open set, and hence isdense in K. By Baire’s Category Theorem A.10 the set

⋂n∈N Gn is nonempty (it is

even dense). Any point in this intersection has dense forward orbit.

Remarks 2.30. 1) In general, (i) does not imply (ii) even if K is metrisable. Take,e.g., K := N∪∞, and ϕ : K → K, ϕ(n) = n+ 1, ϕ(∞) = ϕ(∞). The point1 ∈ K has dense forward orbit, but for U = 2, V = 1 condition b) fails tohold.

2) The proof of Proposition 2.29 yields even more: if K is metrisable withoutisolated points, then the set of points with dense forward orbit is either emptyor a dense Gδ , see Appendix A.9.

There is an analogous statement for transitivity of invertible systems. It is provedalmost verbatim as Proposition 2.29.

Proposition 2.31. Let (K;ϕ) be an invertible TDS, with K metrisable. Then the fol-lowing properties are equivalent.

(i) (K;ϕ) is topologically transitive, i.e., there is a point x ∈ K with dense orbit.

(ii) For all /0 6=U,V open sets in K there is n ∈ Z with ϕn(U)∩V 6= /0.

Let us turn to our central examples.

Theorem 2.32 (Rotation Systems I). Let (G;a) be a left rotation TDS. Then thefollowing statements are equivalent.

(i) (G;a) is topologically forward transitive.

(ii) Every point of G has dense forward orbit.

(iii) (G;a) is topologically transitive.

(iv) Every point of G has dense total orbit.

Proof. Since every right rotation ρh : g 7→ gh is an automorphism of (G;a), we have

ρh(orb+(g)) = orb+(gh) and ρh(orb(g)) = orb(gh)

for every g,h ∈ G. Taking closures we obtain the equivalences (i)⇔ (ii) and (iii)⇔(iv). Now clearly (ii) implies (iv). In order to prove the converse, fix g ∈ G andconsider the nonempty invariant closed set


A := orb+(g) = ang : n≥ 0.

By Lemma 2.23 there is a nonempty closed set B⊆ A such that aB = B. Since B isnonempty, fix h ∈ B. Then orb(h) ⊆ B and hence orb(h) ⊆ B. By (iv), this impliesthat G⊆ B⊆ A, and thus A = G.

A corollary of this result is that a topologically transitive group rotation (G;a) doesnot admit any nontrivial subsystem. We exploit this fact in the following examples.

Example 2.33 (Kronecker’s Theorem, Simple Case). The rotation (T;a) is topo-logically transitive if and only if a ∈ T is not a root of unity.

Proof. If an0 = 1 for some n0 ∈ N, then z ∈ T : zn0 = 1 is closed and ϕ-invariant,so by Theorem 2.32, (T;a) is not transitive.

For the converse, suppose that a is not a root of unity and take ε > 0. By com-pactness, the sequence (an)n≥0 has a convergent subsequence, and so there existl < k ∈N such that

∣∣1−ak−l∣∣= ∣∣al−ak

∣∣< ε . By hypothesis one has b := ak−l 6= 1,and since |1−b|< ε every z ∈ T lies in ε-distance to some positive power of b. Butbn : n≥ 0 ⊆ orb+(1), and the proof is complete.

If a = e2πix then a is a root of unity if and only if x is a rational number. In thiscase we call the associated rotation system (T,a) a rational rotation, otherwise anirrational rotation. Hence the rotation system (T,a) is topologically transitive ifand only if a is irrational.

Example 2.34. The product of two topologically transitive systems need not betopologically transitive. Consider a1 = ei, a2 = e2i, and the product system(T2;(a1,a2)). Then M = (x,y) ∈ T2 : x2 = y is a nontrivial, closed invariant set.

Fig. 2.1 The first 100 iterates of a point under the rotation in Example 2.34 and its orbit closure,and the same for the rotation ratios 2 : 5 and 8 : 5.

Because of these two examples, it is interesting to characterise transitivity of therotations on the d-torus for d > 1.

Theorem 2.35 (Kronecker). Let a = (a1, . . . ,ad) ∈ Td . Then the rotation TDS(Td ;a) is topologically transitive if and only if a1,a2, . . . ,ad are linearly indepen-dent in the Z-module T. This means that if ak1

1 ak22 · · ·a

kdd = 1 for k1,k2, . . . ,kd ∈ Z,

then k1 = k2 = · · ·= kd = 0.

We do not give the proof here, but shall return to it later in Chapter 14. Instead, weconclude this chapter by characterising topological transitivity of certain subshifts.

2.4 Transitivity of Subshifts 21

2.4 Transitivity of Subshifts

To a subshift (F ;τ) of (W +K ;τ) of order 2 (Example 2.25) we associate its transition

matrix A = (ai j)k−1i, j=0 by

ai j =

1 if (i, j) occurs in a word in F ,0 otherwise.

The excluded blocks (i, j) of length 2 are exactly those with ai j = 0. Hence thepositive matrix A describes the subshift completely.

In subshifts of order 2 we can “glue” words together as we shall see in the proofof the next lemma.

Lemma 2.36. For n ∈ N one has [An]i j > 0 if and only if there is x ∈ F with x0 = iand xn = j.

Proof. We argue by induction on n ∈N, the case n = 1 being clear by the definitionof the transition matrix A. Suppose that this claim is proved for n≥ 1. The inequality[An+1]i j > 0 holds if and only if there is m such that

[An]im > 0 and Am j > 0.

By the induction hypothesis this means that there are x,y ∈ F with x0 = i, xn = mand y0 = m, y1 = j.

So if by assumption [An+1]i j > 0, then we can define z ∈W +k by

zs :=

xs if s < n,m if s = n,ys−n if s > n.

Then actually z ∈ F , because z does not contain any excluded block of length 2 byconstruction. Hence one direction of the claim is proved.

For the converse direction assume that there is x ∈ F and n ∈ N with x0 = i andxn+1 = j. Set y = τn(x), then y0 = xn =: m and y1 = j, as desired.

A k× k-matrix A with positive entries which has the property that for all i, j ∈0, . . . ,k−1 there is some n∈N with [An]i j > 0 is called irreducible. This propertyof transition matrices characterises forward transitivity.

Proposition 2.37. Let (F ;τ) be a subshift of order 2 and suppose that every letteroccurs in some word in F. Consider the next assertions.

(i) The transition matrix A of (F ;τ) is irreducible.

(ii) (F ;τ) is forward transitive.


Then (i) implies (ii), and if F does not have isolated points or if the shift is two-sided,then (ii) implies (i).

Proof. (i)⇒ (ii): Since F is metrisable, we can apply Proposition 2.31. It suffices toconsider open sets U and V intersecting F that are of the form

U =

x : x0 = u0, . . . ,xn = un

and V =

x : x0 = v0, . . . ,xm = vm

for some n,m ∈ N0, u0, . . . ,un,v0, . . . ,vm ∈ 0, . . . ,k− 1. Then we have to showF ∩U ∩ τ j(V ) 6= /0 for some j ∈ N. There is x ∈ U ∩ F and y ∈ V ∩ F , and byassumption there are N ∈ N, z ∈ F with z0 = un and zN = v0. Now define w by

ws :=

xs if s < n,un if s = n,zs−n if n < s < n+N,

v0 if s = n+N,

ys−n−N if s > n+N.

By construction w does not contain any excluded word of length 2, so w ∈ F , and ofcourse w ∈U , τn+N(w) ∈V .

(ii)⇒ (i): Suppose F has no isolated points. If A is not irreducible, then there arei, j ∈ 0, . . . ,k− 1 such that [An]i j = 0 for all n ∈ N. By Lemma 2.36 this meansthat there is no word in F of the form (i, · · · , j, · · ·). Consider the open sets

U =

x : x0 = i

and V =

x : x0 = j

both of which intersect F since i and j occur in some word in F and F is shiftinvariant. However, for no N ∈N can τN(V ) intersect U ∩F , so by Proposition 2.31the subshift cannot be topologically transitive.

Suppose now that (F ;τ) is a two-sided shift, y ∈ F has dense forward orbit andx ∈ F is isolated. Then τn(y) = x for some n ≥ 0. Since τ is a homeomorphism, yis isolated, too, and hence by compactness must have finite orbit. Consequently, Fis finite and τ is a cyclic permutation of F . It is then easy to see that A is a cyclicpermutation matrix, whence irreducible.

For noninvertible shifts the statements (i) and (ii) are in general not equivalent.

Example 2.38. Consider the subshift W B3 of W +

3 defined by the excluded blocksB = (0,1),(0,2),(1,0),(1,1),(2,1),(2,2), i.e.,

F =(0,0,0, . . . ,0, . . .), (1,2,0, . . . ,0, . . .), (2,0,0, . . . ,0, . . .)

.

The (F ;τ) has the transition matrix

A =

1 0 00 0 11 0 0

.

Exercises 23

Clearly, (F ;τ) is transitive, but—as a moment’s thought shows—A is not irre-ducible.

Even for two-sided subshifts, transitivity does not imply that its transition matrixis irreducible.

Example 2.39. Consider the subshift F := W B2 of W2 given by the excluded blocks

B = (1,0), i.e., the elements of F are of the form

(. . . ,0, . . . ,0, . . .), (. . . ,1, . . . ,1, . . .), (. . . ,0, . . . ,0,1, . . . ,1, . . .).

The transition matrix

A =

(1 10 1

)is clearly not irreducible, whereas (F ;τ) is transitive (of course, not forward transi-tive!).

For more on subshifts of finite type we refer to Sec. 17 of [Denker et al. (1976)].

Exercises

1. Fix k ∈ N and consider W +k = 0,1, . . . ,k−1N0 with its natural compact (prod-

uct) topology as in Example 2.4. For words x = (x j) j≥0 and y = (y j) j≥0 in W +k

defineν(x,y) := min j ≥ 0 : x j 6= y j ∈ N0∪∞.

Show that d(x,y) := e−ν(x,y) is a metric that induces the product topology on W +k .

Show further that no metric on W +k inducing its topology can turn the shift τ into

an isometry. How about the two-sided shift?

2. Prove that the function d(x,y) := min(|x− y| ,1− |x− y|) is a metric on [0,1)which turns it into a compact space. For α ∈ [0,1) show that the addition withα (mod1) is continuous with respect to the metric d. See also Example 2.6.

3. Let G be the Heisenberg group and H,K as in Example 2.10. Prove that G = KH.

4. Let α ∈ [0,1) and a := e2πiα ∈ T. Show that the map

Ψ : ([0,1),α)→ (T;a), Ψ(x) := e2πix

is an isomorphism of TDSs, cf. Example 2.13.

5. Let (C;ϕ) be the Cantor system (Example 2.5). Show that the map

Ψ : (W +2 ;τ)→ (C;ϕ), Ψ(x) :=

∞

∑j=1

2x j−1

3 j

is an isomorphism of TDSs, cf. Example 2.14.


6. Let (G;a) be a group rotation and let H ≤ G be a closed subgroup of G. Showthat the homogeneous system (G/H;a) is a group factor of (G;a), cf. Example 2.17.

7. Prove Lemma 2.22.

8 (Dyadic Adding Machine). Let K := 0,1N0 with the product topology, anddefine Ψ : K→ A2 by

Ψ((xn)n≥0) := (x0 +Z2, x0 +2x1 +Z4, x0 +2x1 +4x2 +Z8, . . . . . .).

Show that Ψ is a homeomorphism. Now let ϕ be the unique dynamics on K thatturns Ψ : (K;ϕ) → (A2;1) into an isomorphism. Describe the action of ϕ on a0,1-sequence x ∈ K.

9. Let (K;ϕ) be a TDS.

a) Show that if A⊆ K is invariant/stable, then A is invariant/stable, too.

b) Show that the intersection of arbitrary many (bi-)invariant subsets of K isagain (bi-)invariant.

c) Let Ψ : (K;ϕ)→ (L;ψ) be a homomorphism of TDSs. Show that if A ⊆ Kis invariant/stable, then so is Ψ(A)⊆ L.

10. Let the doubling map be defined as ϕ(x) = 2x (mod1) for x ∈ [0,1). Showthat ϕ is continuous with respect to the metric introduced in Example 2.6. Showthat ([0,1);ϕ) is transitive.

11. Consider the tent map ϕ(x) := 1− |2x− 1|, x ∈ [0,1]. Prove that for everynonempty closed sub-interval I of [0,1], there is n ∈ N with ϕn(I) = [0,1]. Showthat ([0,1];ϕ) is transitive.

12. Consider the function ψ : [0,1]→ [0,1], ψ(x) := 4x(1−x). Show that ([0,1];ψ)is isomorphic to the tent map system from Exercise 11. (Hint: Use Ψ(x) :=sin(πx/2)2 as an isomorphism.)

13. Let k ∈ N and consider the full shift (W +k ;τ) over the alphabet 0, . . . ,k− 1.

Let x,y ∈W +k . Show that y ∈ orb+(x) if and only if for every n ≥ 0 there is k ∈ N

such thaty0 = xk, y1 = xk+1, . . . , yn = xk+n.

Show that there is x ∈W +k with dense forward orbit.

14. Prove that for a Hausdorff topological group G and a closed subgroup H the setof left cosets G/H becomes a Hausdorff topological space under the quotient mapq : G 7→G/H, q(g) = gH. If for some compact set K ⊆G the equality KH =G holds,then G/H is compact. If moreover H is a closed, normal subgroup, then G/H is acompact group.(This is an abstract version of Examples 2.9 and 2.10.)

Exercises 25

15. a) Describe all subshifts of W +2 of order 2 and determine their transition

matrices. Which of them are forward transitive?

b) Show that Proposition 2.37 does not remain true for two-sided subshifts ifwe replace forward transitivity by transitivity.

Chapter 3Minimality and Recurrence

In this chapter, we study the existence of nontrivial subsystems in TDSs andthe intrinsically connected phenomenon of regularly recurrent points. It wasG.D. Birkhoff who discovered this and wrote in [Birkhoff (1912)]:

THEOREME III. — La condition necessaire et suffisante pour qu’un mouvement posi-tivement et negativement stable M soit un mouvement recurrent est que pour tout nombrepositif ε , si petit qu’il soit, il existe un intervalle de temps T , assez grand pour que l’arc dela courbe representative correspondant a tout intervalle egal a T ait des points distants demoins de ε de n’importe quel point de la courbe tout entiere.

and then

THEOREME IV. — L’ensemble des mouvements limites omega M′, de tout mouvementpositivement stable M, contient au moins un mouvement recurrent.

Roughly speaking, these quotations mean the following: If a TDS does not contain anontrivial invariant set, then each of its points returns arbitrarily near to itself underthe dynamical action infinitely often.

Our aim is to prove these results, today known as Birkhoff’s recurrence theorem,and to study the connected notions. To start with, let us recall from Chapter 2 thata subset A ⊆ K of a TDS (K;ϕ) is invariant if ϕ(A) ⊆ A. If A is invariant andnonempty, then it gives rise to the subsystem (A;ϕ|A).

By Exercise 9, the closure of an invariant set is invariant and the intersection ofarbitrarily many invariant sets is again invariant. It is also easy to see that for everypoint x ∈ K the forward orbit orb+(x) and its closure are both invariant. So there aremany possibilities to construct subsystems: Simply pick a point x ∈ K and considerF := orb+(x). Then (F ;ϕ|F) is a subsystem, which coincides with (K;ϕ) wheneverx ∈K is a transitive point. Can we always find nontransitive points, hence nontrivialsubsystems?

27

28 3 Minimality and Recurrence

3.1 Minimality

Systems without any subsystems play a special role for recurrence, so let us givethem a special name.

Definition 3.1. A TDS (K;ϕ) is called minimal if there are no nontrivial closedϕ-invariant sets in K. This means that whenever A⊆K is closed and ϕ(A)⊆ A, thenA = /0 or A = K.

Remarks 3.2. 1) Given any TDS, its subsystems (= nonempty, closed, ϕ-invariant sets) are ordered by set inclusion. Clearly, a subsystem is minimalin this order if and only if it is a minimal TDS. Therefore we call such a sub-system a minimal subsystem.

2) By Lemma 2.23, if (K;ϕ) is minimal, then ϕ(K) = K.

3) Even if a TDS is invertible, its subsystems need not be. However, by 2) everyminimal subsystem of an invertible TDS is invertible. More precisely, it fol-lows from Lemma 2.23 that an invertible TDS is minimal if and only if it hasno nontrivial closed bi-invariant subsets.

4) If (K1;ϕ1), (K2;ϕ2) are minimal subsystems of a TDS (K;ϕ), then either K1∩K2 = /0 or K1 = K2.

5) Minimality is an isomorphism invariant, i.e., if two TDSs are isomorphic andone of them is minimal, then so is the other.

6) More generally, if π : (K;ϕ)→ (L;ψ) is a factor map, and if (K;ϕ) is minimal,then so is (L;ψ) (Exercise 1). In particular, if a product TDS is minimal so iseach of its components. The converse is not true, see Example 2.34.

Minimality can be characterised in different ways.

Proposition 3.3. For a TDS (K;ϕ) the following assertions are equivalent.

(i) (K;ϕ) is minimal.

(ii) orb+(x) is dense in K for each x ∈ K.

(iii) K =⋃

n∈N0ϕ−n(U) for every open set /0 6=U ⊆ K.

In particular, every minimal system is topologically forward transitive.


Proposition 3.3 allows to extend the list of equivalences for rotation systems givenin Theorem 2.32.

Theorem 3.4 (Rotation Systems II). For a rotation system (G;a) the followingassertions are equivalent to any of the properties (i)–(iv) of Theorem 2.32.

(v) The TDS (G;a) is minimal.

(vi) an : n≥ 0 is dense in G.

3.1 Minimality 29

(vii) an : n ∈ Z is dense in G.

Whenever a compact space K has at least two points, the TDS (K; id) is notminimal. Also, the full shift (W +

k ;τ) on words from a k-letter alphabet for k ≥ 2is nnot minimal. Indeed, every constant sequence is shift invariant and constitutes aone-point minimal subsystem.

It is actually a general fact that one always finds a subsystem that is minimal.

Theorem 3.5. Every TDS (K;ϕ) has at least one minimal subsystem.

Proof. Let M be the family of all nonempty closed ϕ-invariant subsets of K. Then,of course, K ∈M , so M is nonempty. Further, M is ordered by set inclusion. Givena chain C ⊆M the set C :=

⋂A∈C A is not empty (by compactness) and hence a

lower bound for the chain C . Zorn’s lemma yields a minimal element F in M . ByRemark 3.2.1 above, (F ;ϕ|F) is a minimal TDS.

Besides for compact groups there is another important class of TDSs for whichminimality and transitivity coincide. A TDS (K;ϕ) is called isometric if there is ametric d inducing the topology of K such that ϕ is an isometry with respect to d.

Proposition 3.6. An isometric TDS (K;ϕ) is minimal if and only if it is topologicallytransitive.

Proof. Suppose that x0 ∈ K has dense forward orbit and pick y ∈ K. By Proposi-tion 3.3 it suffices to prove that x0 ∈ orb+(y). Let ε > 0 be arbitrary. Then there ism ∈ N0 such that d(ϕm(x0),y) ≤ ε . The sequence (ϕmn(x0))n∈N has a convergentsubsequence (ϕmnk(x0))k∈N. Using that ϕ is an isometry we obtain

d(x0,ϕm(nk+1−nk)(x0)) = d(ϕmnk(x0),ϕ

mnk+1(x0))→ 0 as k→ ∞.

This yields that there is n≥ m with d(x0,ϕn(x0))< ε , and therefore

d(ϕ(n−m)(y),x0)≤ d(ϕn−m(y),ϕn(x0))+d(ϕn(x0),x0)

= d(y,ϕm(x0))+d(ϕn(x0),x0)< 2ε.

For isometric systems we have the following decomposition into minimal subsys-tems.

Corollary 3.7 (“Structure Theorem” for Isometric Systems). An isometric sys-tem is a possibly infinite disjoint union of minimal subsystems.

Proof. By Remark 3.2.4, different minimal subsystems must be disjoint. Hence thestatement is equivalent to saying that every point in K is contained in a minimalsystem. But for any x ∈ K the system (orb+(x);ϕ) is topologically transitive, henceminimal by Proposition 3.6.


Example 3.8. Consider the rotation by a ∈ T of the closed unit disc

D := z ∈ C : |z| ≤ 1, ϕ(z) := a · z.

The system (D;ϕ) is isometric but not minimal. Now suppose that a ∈ T is not aroot of unity. Then D is the disjoint union D=

⋃r∈[0,1] rT and each rT is a minimal

subsystem. The case that e is a root of unity is left as Exercise 3.

The shift system (W +k ,τ) is an example with points that are not contained in a

minimal subsystem, see Example 3.10 below.

3.2 Topological Recurrence

For a rotation on the circle T we have observed in Example 2.33 two mutuallyexclusive phenomena:

1) For a rational rotation every orbit is periodic.

2) For an irrational rotation every orbit is dense.

In neither case, however, it matters at which point x ∈ T we start: the iterates returnto a neighbourhood of x again and again, in case of a rational rotation even exactlyto x itself. Given a TDS (K;ϕ) we can classify the points according to this behaviourof their orbits.

Definition 3.9. Let (K;ϕ) be a TDS. A point x ∈ K is called

a) recurrent if for every open neighbourhood U of x, there is m ∈ N such thatϕm(x) ∈U .

b) almost periodic if for every open neighbourhood U of x, the set of returntimes

m ∈ N : ϕm(x) ∈U

has bounded gaps. A subset M ⊆ N has bounded gaps (= syndetic, or rela-tively dense) if there is N ∈ N such that M∩ [n,n+N] 6= /0 for every n ∈ N.

c) periodic if there is m ∈ N such that ϕm(x) = x.

All these properties are invariant under automorphisms. Clearly, a periodic pointis almost periodic, and an almost periodic point is recurrent. None of the converseimplications is true in general (see Example 3.10 below). Further, a recurrent point iseven infinitely recurrent, i.e. it returns infinitely often to all of its neighbourhoods(see Exercise 4).

For all these properties of a point x ∈ K only the subsystem orb+(x) is relevant,whence we may suppose right away that the system is topologically transitive and xis a transitive point. In this context it is very helpful to use the notation

orb>0(x) :=

ϕn(x) : n ∈ N

= orb+(ϕ(x)).

3.2 Topological Recurrence 31

Then x is periodic if and only if x ∈ orb>0(x), and x is recurrent if and only ifx ∈ orb>0(x).

For an irrational rotation on the torus all points are recurrent but not periodic.Here are some more examples.

Example 3.10. Consider the full shift (W +k ;τ) in a k-alphabet.

a) A point x ∈W +k is recurrent if every finite block of x occurs infinitely often

in x (Exercise 5).

b) The point x is almost periodic if it is recurrent and the gaps between twooccurrences of a given block y of x is bounded (Exercise 5).

c) By a) and b) a recurrent, but not almost periodic point is easy to find:Consider—for the sake of simplicity—k = 2, and enumerate the wordsformed from the alphabet 0,1 according to a lexicographical ordering.Now write these words into one infinite word x ∈W +

2 :

0 |1 |00 |01 |10 |11 |000 |001 |010 |011 |100 |101 |110 |111 |0000| · · · .

All blocks of x occur as a sub-block of later finite blocks, hence they arerepeated infinitely often, and hence x is recurrent. On the other hand, sincethere are arbitrary long sub-words consisting of the letter 0 only, we see thatthe block 1 does not appear with bounded gaps.

d) A concrete example for a nonperiodic but almost periodic point is given inExercise 12.

According to Example 2.33, all points in the rotation system (T;a) are recurrent.This happens also in minimal systems: if K is minimal and x ∈ K, then orb>0(x) isa nontrivial, closed invariant set, hence it coincides with K and thus contains x. Onecan push this a little further.

Theorem 3.11. Let (K;ϕ) be a TDS and x ∈ K. Then the following assertions areequivalent.

(i) x is almost periodic.

(ii) (orb+(x);ϕ) is minimal.

(iii) x is contained in a minimal subsystem of (K;ϕ).

Proof. The equivalence of (ii) and (iii) is simple. Suppose that (iii) holds. Thenwe may suppose without loss of generality that (K;ϕ) is minimal. Let U ⊆ K bea nonempty open neighbourhood of x. By Proposition 3.3,

⋃n∈N0

ϕ−n(U) = K. Bycompactness there is m ∈ N such that K =

⋃mj=0 ϕ− j(U). For each n ∈ N we have

ϕn(x) ∈ ϕ− j(U), i.e, ϕn+ j(x) ∈U for some 0 ≤ j ≤ m. So the set of return timesk ∈ N0 : ϕk(x) ∈U has gaps of length at most m.

Conversely, suppose that (i) holds and let F := orb+(x). Take y ∈ F and a closedneighbourhood U of x, cf. Lemma A.3 in Appendix A. Since x is almost periodic,


the set

n ∈ N : ϕn(x) ∈U

has gaps with maximal length m ∈ N, say. This meansthat

orb+(x)⊆m+1⋃i=1

ϕ−i(U)

and hence y ∈ orb+(x) ⊆⋃m+1

i=1 ϕ−i(U). This implies ϕ i(y) ∈U for some 1 ≤ i ≤m+1, and therefore x∈ orb+(y). Hence (ii) follows by virtue of Proposition 3.3.

Let us draw some immediate conclusions from this characterisations.

Proposition 3.12. a) If a TDS contains a forward transitive and almost periodicpoint, then it is minimal.

b) In a minimal TDS every point is almost periodic.

c) In an isometric TDS every point is almost periodic.

d) In a group rotation system (G;a) every point is almost periodic.

e) In a homogeneous system (G/H;a) every point is almost periodic.

Proof. a) and b) are immediate from Theorem 3.11, and c) follows from Theorem3.11 together with Corollary 3.7.For the proof of d), note that since right rotations are automorphisms of (G;a) and 1can be moved via a right rotation to any other point of G, it suffices to prove that 1 isalmost periodic. Define H := clan : n ∈ Z. This is a compact subgroup containinga, and hence the rotation system (H;a) is a subsystem of (G;a). It is minimal byTheorem 3.4. Hence, by Theorem 3.11, 1 ∈ H is almost periodic.Assertion e) follows from d) and Exercise 6.

Example 3.13. Let α ∈ [0,1) and I ⊆ [0,1) be an interval containing α in its inte-rior. Then the set

n ∈ N : nα−bnαc ∈ I

has bounded gaps.

Proof. Consider ([0,1);α), the translation mod 1 by α (Example 2.6). It is isomor-phic to the group rotation (T; e2πiα), so the claim follows from Proposition 3.12.

Finally, combining Theorem 3.11 with Theorem 3.5 on the existence of minimalsubsystems yields the following famous theorem.

Theorem 3.14 (Birkhoff). Each TDS contains at least one almost periodic, hencerecurrent point.

3.3 Recurrence in Extensions

Given a compact group G, a TDS (K;ϕ) and a continuous function Φ : K→ G weconsider as in Section 2.2 on page 15 the group extension (H;ψ) with

3.3 Recurrence in Extensions 33

ψ(x,g) := (ϕ(x),Φ(x)g) for (x,g) ∈ H := K×G.

Proposition 3.15. Let (K;ϕ) be a TDS, G a compact group and (H;ψ) the groupextension along some Φ : K→ G. If x0 ∈ K is a recurrent point, then (x0,g) ∈ H isrecurrent in H for all g ∈ G.

Proof. It suffices to prove the assertions for g = 1 ∈ G. Indeed, for every g ∈ G themap ρg : H→H, ρg(x,h) = (x,hg), is an automorphism of (H;ψ), and hence mapsrecurrent points to recurrent points. For every n ∈ N we have

ψn(x0,1) =

(ϕ

n(x0),Φ(ϕn−1(x0)) · · ·Φ(x0)).

Since the projection π : H → K is continuous and H is compact, we haveπ(orb>0(x0,1)) = orb>0(x0). The point x0 is recurrent by assumption, so x0 ∈orb>0(x0). Therefore, for some h ∈ G we have (x0,h) ∈ orb>0(x0,1). Multiplyingfrom the right by h in the second coordinate we obtain by continuity that

(x0,h2) ∈ orb>0(x0,h)⊆ orb>0(x0,1).

Inductively this leads to (x0,hn) ∈ orb>0(x0,1) for all n ∈ N. By Proposition 3.12we know that 1 is recurrent in (G;h), so if V is a neighbourhood of 1, then hn ∈ Vfor some n ∈ N. This means that (x0,hn) ∈U ×V for any neighbourhood U of x0.Thus

(x0,1) ∈(x0,h),(x0,h2), . . .

⊆ orb>0(x0,1).

This means that (x0,1) is a recurrent point in (H;ψ).

An analogous result is true for almost periodic points.

Proposition 3.16. Let (K;ϕ) be a TDS, G a compact group and (H;ψ) the groupextension along Φ : K→ G. If x0 ∈ K is an almost periodic point, then (x0,g) ∈ His almost periodic in H for all g ∈ G.

Proof. As before, it suffices to prove that (x0,h) is almost-periodic for one h ∈ G.The set orb+(x0) is minimal by Theorem 3.11, so by passing to a subsystem we canassume that (K;ϕ) is minimal. Now let (H ′;ψ) be a minimal subsystem in (H;ψ)(Theorem 3.5). The projection π : H→K is a homomorphism from (H;ψ) to (K;ϕ),so the image π(H ′) is a ϕ-invariant subset in K, and therefore must be equal to K.Let x0 be an almost periodic point in (K;ϕ), and h ∈G such that (x0,h) ∈H ′. Then,by Theorem 3.11 (x0,h) is almost periodic.

We apply the resulst from above to the problem of Diophantine approximation.It is elementary that any real number can be approximated by rational numbers,but how well and, at the same time, how “simple” (keeping the denominator small)this approximation can be, is a hard question. The most basic answer is Dirichlet’stheorem, see Exercise 11. Using the recurrence result from above, we can give amore sophisticated answer.


Corollary 3.17. Let α ∈ R and ε > 0 be given. Then there exists n ∈ N, k ∈ Z suchthat ∣∣n2

α− k∣∣≤ ε.

Proof. Consider the TDS ([0,1);α) from Example 2.6, and recall that, endowedwith the appropriate metric and with addition modulo 1, it is a compact group home-omorphic to T. We consider a group extension similar to Example 2.20. Let

Φ : [0,1)→ [0,1), Φ(x) = 2x+α (mod1)

and consider the group extension of [0,1) along Φ . That means

H := [0,1)× [0,1), ψ(x,y) = (x+α,2x+α + y) (mod1) .

Then by Proposition 3.15 (0,0) is a recurrent point in (H;ψ). By induction weobtain

ψ(0,0) = (α,α), ψ2(0,0) = ψ(α,α) = (2α,4α), . . . ,ψn(0,0) = (nα,n2

α).

The recurrence of (0,0) implies that for any ε > 0 we have d(0,n2α−bn2αc)< ε

for some n ∈ N, hence the assertion follows.

By the same technique, i.e., using the recurrence of points in appropriate groupextensions, we obtain the following result.

Proposition 3.18. Let p∈R[x] be a polynomial of degree k ∈N with p(0) = 0. Thenfor every ε > 0 there is n ∈ N and m ∈ N0 with∣∣p(n)−m

∣∣< ε.

Proof. Start from a polynomial p(x) of degree deg p = k and define

pk(x) := p(x), pk−i(x) := pk−i+1(x+1)− pk−i+1(x) (i = 1, . . . ,k).

Then each polynomial pi has degree i. So p0 is a constant α . Consider the TDS([0,1);α) (Example 2.6) and the following tower of group extensions. Set H1 :=[0,1), ψ1(x) := x+α (mod1) , and for 2≤ i≤ k define

Hi := Hi−1× [0,1), Φi : Hi−1→ [0,1), Φi((x1,x2, . . . ,xi−1)) := xi−1.

Hence for the group extension (Hi;ψi) of (Hi−1;ψi−1) along Φi we have

ψi(x1,x2, . . . ,xi) = (x1 +α,x1 + x2,x2 + x3, . . . ,xi + xi−1).

We can apply Proposition 3.15, and obtain by starting from a recurrent point x1in (H1;ψ1) that every point (x1,x2, . . . ,xi) in (Hi;ψi), 1 ≤ i ≤ k, is recurrent. Inparticular, the point p1(0) just as any other point in [0,1) is recurrent in the TDS([0,1);α), hence so is the point (p1(0), p2(0), . . . , pk(0)) ∈ Hk. By the definition ofψk we have

3.3 Recurrence in Extensions 35

ψk((p1(0), p2(0), . . . , pk(0))

)=(

p1(0)+α, p2(0)+p1(0), . . . , pk(0)+pk−1(0))

=(

p1(0)+p0(0), p2(0)+p1(0), . . . , pk(0)+pk−1(0))

=(

p1(1), p2(1), . . . , pk(1)),

and, analogously, ψnk

((p1(0), p2(0), . . . , pk(0))

)=(

p1(n), p2(n), . . . , pk(n)).

By looking at the last component we see that for some n ∈N the point pk(n) = p(n)comes arbitrarily close to pk(0) = p(0) = 0. The assertion is proved.

We refer to [Furstenberg (1981)] for more results on Diophantine approximationsthat can be obtained by means of simple arguments using topological recurrence.After having developed more sophisticated tools, we shall see more applications ofdynamical systems to number theory in Chapters 10, 11 and 18.

Exercises

1. Let π : (K;ϕ)→ (L;ψ) be a homomorphism of TDSs. Prove that if B ⊆ L is anonempty, closed ψ-invariant subset of L, then π−1(B) is a nonempty, closed ϕ-invariant subset of K. Conclude that if (K;ϕ) is minimal, then so is (L;ψ).

2. Prove Proposition 3.3.

3. Consider as in Example 3.8 the system (D,ϕ), where ϕ(z) = az is rotation bya ∈ T. Describe the minimal subsystems in the case that a is a root of unity.

4. Let x0 be a recurrent point of (K;ϕ). Show that x0 is infinitely recurrent, i.e. forevery neighbourhood U of x0 one has

x0 ∈⋂

n∈N

⋃m≥n

ϕ−n(U).

5. Prove the statements a) and b) from Example 3.10.

6. Let π : (K;ϕ)→ (L;ψ) be a homomorphism of TDSs. Show that if x ∈ K is peri-odic/recurrent/almost periodic, then π(x) ∈ L is periodic/recurrent/almost periodicas well.

7. Consider the tent map ϕ(x) := 1−|2x−1|, x ∈ [0,1]. By Exercise 9 the system([0,1];ϕ) is topologically transitive. Is this TDS also minimal?

8. Show that the dyadic adding machine (A2;1) is minimal, see Example 2.11 andExercise 8.

9. Let (K;ϕ) be a TDS and let x0 ∈ K be a recurrent point. Show that x0 is alsorecurrent in the TDS (K;ϕm) for each m ∈ N. (Hint: Use a group extension by thecyclic group Zm.)


10. Consider the sequence 2, 4, 8, 16, 32, 64, 128, . . . , 2n, . . . . It seems that 7 doesnot appear as leading digit in this sequence. Show however that actually 7, just asany other digit, occurs infinitely often. As a test try out 246.

11 (Dirichlet’s Theorem). Let α ∈ R be irrational and n ∈ N be fixed. Show thatthere are pn ∈ Z, qn ∈ N such that 1≤ qn ≤ n and∣∣∣α− pn

qn

∣∣∣< 1nqn

.

(Hint: Divide the interval [0,1) into n subintervals of length 1/n, and use the pi-geonhole principle.) Show that qn→+∞ for n→ ∞.

12 (Morse Sequence). Consider W +2 = 0,1N and define the following recursion

on finite words over the alphabet 0,1. We start with f1 = 01. Then we replaceevery occurrence of letter 0 by the block 0110 and every occurrence of the letter 1by 1001. We repeat this procedure at each step. For instance:

f1 = 0 |1 f2 = 0110 |1001 f3 = 0110 |1001 |1001 |0110 |1001 |0110 |0110 |1001.

We can now consider the finite words fi as elements of W +2 by setting the “missing

coordinates” as 0. Show that the sequence ( fn) converges to some f ∈ W +2 . Show

that this f is almost periodic but not periodic in the shift TDS (W +2 ;τ).

Chapter 4The C∗-algebra C(K) and the KoopmanOperator

In the previous two chapters we introduced the concept of a topological dynamicalsystem and certain basic notions such as minimality, recurrence and transitivity.However, a deeper study requires a change of perspective: Instead of the state spacetransformation ϕ : K→ K we now consider its Koopman operator T = Tϕ definedby

Tϕ f := f ϕ

for scalar-valued functions f on K, and we shall look at the functional analytic andoperator theoretic aspects of TDSs. On the space of all such functions there is a richstructure: One can add, multiply, take absolute values or the complex conjugate. Allthese operations are defined pointwise, that is, we have

( f +g)(x) := f (x)+g(x), (λ f )(x) := λ f (x),

( f g)(x) := f (x)g(x), f (x) := f (x), | f |(x) := | f (x)|

for f ,g : K→C, λ ∈C, x∈K. Clearly, the operator T = Tϕ commutes with all theseoperations, meaning that

T ( f +g) = T f +T g, T (λ f ) = λ (T f ),

T ( f g) = (T f )(T g), T f = T f , |T f |= T | f |

for all f ,g : K→ C, λ ∈ C. In particular, the operator T is linear, and, denoting by1 the function taking everywhere the value 1, satisfies T 1 = 1.

Now, since ϕ is continuous, the operator Tϕ leaves invariant the space C(K) ofall complex-valued continuous functions on K; i.e., f ϕ is continuous whenever fis. Our major goal in this chapter is to show that the Koopman operator Tϕ on C(K)actually contains all information about ϕ , that is, ϕ can be recovered from Tϕ . Inorder to achieve this, we need a more detailed look on the Banach space C(K).

37

38 4 The C∗-algebra C(K) and the Koopman Operator

4.1 Continuous Functions on Compact Spaces

In this section we let K 6= /0 be a nonempty compact topological space and considerthe vector space C(K) of all continuous C-valued functions on K. Since a continuousimage of a compact space is compact, each f ∈ C(K) is bounded, hence

‖ f‖∞

:= sup| f (x)| : x ∈ K

is finite. The map ‖·‖

∞is a norm on C(K) turning it into a complex Banach space

(see also Appendix A.6). In our study of C(K) we shall rely on three classical the-orems: Urysohn’s Lemma, Tietze’s Extension Theorem and the Stone–WeierstraßTheorem.

Urysohn’s Lemma

Given a function f : K→ R and a number r ∈ R, we introduce the notation

[ f > r ] := x ∈ K : f (x)> r

and, analogously, [ f < r ], [ f ≥ r ], [ f ≤ r ] and [ f = r ]. The first lemma tells that ina compact space (by definition Hausdorff) two disjoint closed sets can be separatedby disjoint open sets. This property of a topological space is called normality.

Lemma 4.1. Let A,B be disjoint closed subsets of a compact space K. Then thereare disjoint open sets U,V ⊆ K with A⊆U and B⊆V ; or, equivalently, A⊆U andU ∩B = /0.

Proof. Let x ∈ A be fixed. For every y ∈ B there are disjoint open neighbourhoodsU(x,y) of x and V (x,y) of y. Finitely many of the V (x,y) cover B by compactness,i.e., B⊆V (x,y1)∪·· ·∪V (x,yk) =: V (x). Set U(x) :=U(x,y1)∩·· ·∩U(x,yk), whichis a open neighbourhood of x, disjoint from V (x). Again by compactness finitelymany U(x) cover A, i.e., A⊆U(x1)∪·· ·∪U(xn) =: U . Set V :=V (x1)∩·· ·∩V (xn).Then U and V are disjoint open sets with A⊆U and B⊆V .

Lemma 4.2 (Urysohn). Let A,B be disjoint closed subsets of a compact space K.Then there exists a continuous function f : K → [0,1] such that A ⊆ [ f = 0 ] andB⊆ [ f = 1 ].

Proof. We first construct for each dyadic rational r = n/2m, 0≤ n≤ 2m, an open setU(r) with

A⊆U(r)⊆U(r)⊆U(s)⊆U(s)⊆ Bc for r < s dyadic rationals in (0,1).

We start with U(0) := /0 and U(1) := K. By Lemma 4.1 there is some U with A⊆Uand U ∩B = /0. Set U(1/2) =U . Suppose that for some m≥ 1, all the sets U(n/2m)with n = 0, . . . ,2m are already defined. Let k ∈ [1,2m+1]∩N be odd. Lemma 4.1

4.1 Continuous Functions on Compact Spaces 39

yields open sets U(k/2m+1), when this lemma is applied to each of the pairs ofclosed sets

A and U(1/2m)c, for k = 1U((k−1)/2m+1) and U((k+1)/2m+1)c for k = 3, . . . ,2m+1−3,

U(1−1/2m)c and B for k = 2m+1−1.

In this way the open sets U(k/2m+1) are defined for all k = 0, . . . ,2m+1. Recursivelyone obtains U(r) for all dyadic rationals r ∈ [0,1].We now can define the desired function by

f (x) := inf

r : r ∈ [0,1] dyadic rational, x ∈U(r).

If x ∈ A, then x ∈U(r) for all dyadic rationals, so f (x) = 0. In turn, if x ∈ B, thenx 6∈U(r) for any r ∈ (0,1), so f (x) = 1. It only remains to prove that f is continuous.

Let r ∈ [0,1] be given and let x ∈ [ f < r ]. Then there is a dyadic rational r′ < rwith x ∈U(r′). Since for all y ∈U(r′) one has f (y) ≤ r′ < r, we see that [ f < r ]is open. On the other hand let x ∈ [ f > r ]. Then there is a dyadic rational r′′ > rwith f (x)> r′′, so x 6∈U(r′′), but then x 6∈U(r′) for every r′ < r′′. If r < r′ < r′′ andy∈U(r′)

c, then f (y)≥ r′ > r. So U(r′)

cis an open neighbourhood of x belonging to

[ f > r ], hence this latter set is open. Altogether we obtain that f is continuous.

Tietze’s Theorem

Our second classical result is essentially an application of Urysohn’s Lemma to thelevel sets of a continuous function defined on a closed subset of K.

Theorem 4.3 (Tietze). Let K be a compact space, let A ⊆ K be closed and letf ∈ C(A). Then there is h ∈ C(K) such that h|A = f .

Proof. Since the real and imaginary parts of a continuous function are continuous,it suffices to prove the assertion for real-valued functions. Let f ∈ C(A;R).

Since A is compact we may suppose that f (A) ⊆ [0,1] and 0,1 ∈ f (A). ByUrysohn’s Lemma 4.2 we obtain a continuous function g1 : K → [0,1/3] with[

f ≤ 13

]⊆ [g1 = 0 ] and

[f ≥ 2

3

]⊆[

g1 =13

]. For f1 := f − g1|A we obtain f1 ∈

C(A;R) and 0≤ f1 ≤ 23 .

Suppose we have defined the continuous functions

fn : A→ [0,( 23 )

n] and gn : K→ [0, 13 (

23 )

n]

for some n ∈ N. Then again by Urysohn’s Lemma we obtain a continuous function

gn+1 : K→ [0, 13 (

23 )

n][fn ≤ 1

3 (23 )

n ]⊆ [gn+1 = 0 ] and[

fn ≥ 23 (

23 )

n ]⊆[gn+1 =23 (

23 )

n ] .with


Set fn+1 = fn−gn+1|A. Since ‖gn‖∞ ≤ ( 23 )

n, the function

g :=∞

∑n=1

gn

is continuous, i.e., g ∈ C(K;R). For x ∈ A we have

f (x)− (g1(x)+ · · ·+gn(x)) = f1(x)− (g2(x)+ · · ·+gn(x))

= f2(x)− (g3(x)+ · · ·+gn(x)) = · · ·= fn(x).

Since fn(x)→ 0 as n→ ∞, we obtain f (x) = g(x).

The Stone–Weierstraß Theorem

Our third classical theorem about continuous functions is the Stone–Weierstrasstheorem. A linear subspace A ⊆ C(K) is called a subalgebra if f ,g ∈ A impliesf g ∈ A. It is called conjugation invariant if f ∈ A implies f ∈ A. We say that Aseparates the points of K if for any pair x,y ∈ K of distinct points of K there isf ∈ A such that f (x) 6= g(x). Urysohn’s Lemma tells that C(K) separates the pointsof K, since singleton sets x are closed. We can now formulate another centralresult.

Theorem 4.4 (Stone–Weierstraß). Let A be a complex conjugation invariant sub-algebra of C(K) containing the constant functions and separating the points of K.Then A is dense in C(K).

For the proof we note that A also satisfies the hypotheses of the theorem, hencewe may suppose without of loss of generality that A is closed. The next result is thekey to the proof.

Proposition 4.5. Let A be a closed conjugation invariant subalgebra of C(K) con-taining the constant function 1. Then any positive function f has a unique realsquare root g ∈ A, i.e. f = g2 = g ·g.

Proof. We prove that the square root of f , defined pointwise, belongs to A. Bynormalising first we can assume ‖ f‖∞ ≤ 1. Recall that the binomial series

(1+ x)1/2 =∞

∑n=0

(1/2n

)xn

is absolutely convergent for −1 ≤ x ≤ 1. Since 0 ≤ f (x) ≤ 1 for all x ∈ K implies‖ f − 1‖∞ ≤ 1, we can plug f − 1 into this series to obtain g. Since A is closed, wehave g ∈ A.

Proof of Theorem 4.4. As noted above, we may suppose that A is closed. Noticefirst that it suffices to prove that ReA is dense in C(K;R), because, by conjugation

4.1 Continuous Functions on Compact Spaces 41

invariance, for each f ∈ A also Re f and Im f belong to A. So one can approximatereal and imaginary parts of functions h ∈ C(K) by elements of ReA separately.

Since A is a conjugation invariant algebra, for every f ∈ A also | f |2 = f f ∈ A,thus Proposition 4.5 implies that | f | ∈ A. From this we obtain for every f ,g ∈ ReAthat

max( f ,g) =12(| f −g|+( f −g)) ∈ A

min( f ,g) =−12(| f −g|− ( f −g)) ∈ A.and

These facts will be crucial in the following.Let a,b ∈ R and x,y ∈ K, x 6= y be given. We construct a function fx,y ∈ A with

fx,y(x) = a and fx,y(y) = b.

Take f ∈ A a function which separates the points x and y. By addition of certainmultiples of 1 to f we may suppose f (x)< 0 < f (y). By defining

fx,y := amin( f ,0)

f (x)+b

max( f ,0)f (y)

,

we obtain a function fx,y ∈ A with the desired properties.Let x ∈ K, ε > 0 and let h ∈ C(K). We now construct a function fx ∈ A such that

fx(x) = h(x) and fx(y)> h(y)− ε

for all y ∈ K. For every y ∈ K, y 6= x, take a function fx,y ∈ A with

fx,y(x) = h(x) and fx,y(y) = h(y),

and set fx,x = h(x)1. For the open sets

Ux,y :=[ fx,y > h− ε1 ]

we have y ∈ Ux,y, whence they cover the whole of K. By compactness we findy1, . . . ,yn ∈ K such that

K =Ux,y1 ∪·· ·∪Ux,yn .

The function fx := max( fy1 , . . . , fyn) ∈ A has the desired property. Indeed, for y ∈ Kthere is some k ∈ 1, . . . ,n with y ∈Ux,yk , so

fx(y)≥ fx,yk(y)> h(y)− ε.

Now let h ∈ C(K;R) and ε > 0 be arbitrary. For every x ∈ K consider the func-tion fx from the above. The open sets Ux := [ fx < h+ ε1 ], x ∈ K, cover K. Soagain by compactness we have a finite subcover K ⊆ Ux1 ∪ ·· · ∪Uxm . We setf := min( fx1 , . . . , fxm) ∈ A and obtain


h(y)− ε < f (y)≤ fxk(y)< h(y)+ ε,

whenever y ∈Uxk . Whence ‖h− f‖∞ ≤ ε , whence A is dense in C(K;R).

Metrisability and Separability

The Stone–Weierstrass theorem can be employed to characterise the metrisabilityof the compact space K in terms of a topological property of C(K). Recall that atopological space Ω is called separable if there exists a countable and dense subsetA of Ω . A Banach space E is separable if and only if there is a countable set D⊆ Esuch that lin(D) = E.

Lemma 4.6. Each compact metric space is separable.

Proof. Fix m ∈ N. The balls B(x,1/m), x ∈ K, cover K and so there is a finite setFm ⊆ K such that

K ⊆⋃

x∈Fm

B(x,1/m).

Then the set F :=⋃

m∈N Fm is countable and dense in K.

Theorem 4.7. Let K be a compact topological space. Then K is metrisable if andonly if C(K) is separable.

Proof. Suppose that C(K) is separable, and let ( fn)n∈N be a sequence in C(K) suchthat fn : n ∈ N is dense in C(K). Define

Φ : K→Ω := ∏n∈N

C, Φ(x) := ( fn(x))n∈N,

where Ω carries the usual product topology. Then Φ is continuous and injective, byUrysohn’s lemma and the density assumption. The topology on Ω is metrisable (seeAppendix A.5), hence Φ is a homeomorphism from K onto Φ(K). Consequently, Kis metrisable.

For the converse suppose that d : K×K→R+ is a metric that induces the topol-ogy of K. By Lemma 4.6 there is a countable set A ⊆ K with A = K. Consider thecountable(!) set

D := f ∈ C(K) : f is a finite product of functions d(·,y), y ∈ A∪1.

Then lin(D) is a conjugation invariant subalgebra of C(K) containing the constantsand separating the points of K. By the Stone–Weierstrass theorem, lin(D) = C(K)and hence C(K) is separable.

4.2 The Space C(K) as a Commutative C∗-Algebra 43

4.2 The Space C(K) as a Commutative C∗-Algebra

In this section we show how the compact space K can be recovered if only the spaceC(K) is known (see Theorem 4.11 below).

The main idea is readily formulated. Let K be any compact topological space. Tox ∈ K we associate the functional

δx : C(K)→ C, 〈 f ,δx〉 := f (x) ( f ∈ C(K))

called the Dirac or evaluation functional at x∈K. Then δx ∈C(K)′ with ‖δx‖= 1.By Urysohn’s lemma, C(K) separates the points of K, which means that the map

δ : K→ C(K)′, x 7→ δx

is injective. Let us endow C(K)′ with the weak∗-topology (see Appendix C.5). Thenδ is continuous, and since the weak∗-topology is Hausdorff, δ is a homeomorphismonto its image (Proposition A.4). Consequently,

K ∼= δx : x ∈ K ⊆ C(K)′. (4.1)

In order to recover K from C(K) we need to distinguish the Dirac functionals withinC(K)′. For this we view C(K) not merely as a Banach space, but rather as a Banachalgebra.

Recall, e.g. from Appendix C.2, the notion of a commutative Banach algebra.In particular, note that for us a Banach algebra is always unital, i.e., contains a unitelement. Clearly, if K is a compact topological space, then we have

‖ f g‖∞≤ ‖ f‖

∞‖g‖

∞( f ,g ∈ C(K)),

and hence C(K) is a Banach algebra with respect to pointwise multiplication, unitelement 1 and the sup-norm. The Banach algebra C(K) is degenerate if and only ifK = /0. Since C(K) is also invariant under complex conjugation and∥∥ f f

∥∥∞= ‖ f‖2

∞( f ∈ C(K))

holds, we conclude that C(K) is even a commutative C∗-algebra.

The central notion in the study of C(K) as a Banach algebra is that of an (algebra)ideal, see again Appendix C.2. For a closed subset F ⊆ K we define

IF :=

f ∈ C(K) : f ≡ 0 on F.

Evidently, IF is closed; and it is proper if and only if F 6= /0. The next theorem tellsthat each closed ideal of C(K) is of this type.

Theorem 4.8. Let I ⊆ C(K) be a closed algebra ideal. Then there is a closed subsetF ⊆ K such that I = IF .


Proof. Define

F :=

x ∈ K : f (x) = 0 ∀ f ∈ I=⋂

f∈I[ f = 0 ] .

Evidently, F is closed and I ⊆ IF . Fix f ∈ IF , let ε > 0 and define Fε :=[ | f | ≥ ε ].Since f is continuous and vanishes on F , Fε is a closed subset of K \F . Hence foreach point x ∈ Fε one can find a function fx ∈ I such that fx(x) 6= 0. By multiplyingwith fx and a positive constant we may assume that fx ≥ 0 and fx(x) > 1. Thecollection of open sets ([ fx > 1 ])x∈Fε

covers Fε . Since Fε is compact, this cover hasa finite subcover. So there are 0≤ f1, . . . , fk ∈ I such that

Fε ⊆ [ f1 > 1 ]∪·· ·∪[ fk > 1 ] .

Let g := f1 + · · ·+ fk ∈ I. Then 0≤ g and [ | f | ≥ ε ]⊆ [g≥ 1 ]. Define

gn :=n f

1+ngg ∈ I.

|gn− f |= | f |1+ng

≤max

ε,‖ f‖

∞

1+n

.Then

Indeed, one has g≥ 1 on Fε and | f |< ε on K \Fε .

Hence for n large enough we have ‖gn− f‖∞≤ ε . Since ε was arbitrary and I is

closed, we obtain f ∈ I as desired.

Recall from Appendix C.2 that an ideal I in a Banach algebra A is called a max-imal if I 6= A, and for every ideal J satisfying I ⊆ J ⊆ A either J = I or J = A. Themaximal ideals in C(K) are easy to spot.

Lemma 4.9. An ideal I of C(K) is maximal if and only if I = Ix for some x ∈ K.

Proof. It is straightforward to see that each Ix, x ∈ K, is a maximal ideal. Supposeconverse that I is a maximal ideal. By Theorem 4.8 it suffices to show that I isclosed. Since I is again an ideal and I is maximal, I = I or I = C(K). In the lattercase we can find f ∈ I such that ‖1− f‖

∞< 1. Then f = 1− (1− f ) has no zeroes

and hence 1/ f ∈ C(K). But then 1 = (1/ f ) f ∈ I, whence I = A contradicting theassumption that I is maximal.

To proceed we observe that the maximal ideal Ix can also be written as

Ix = f ∈ C(K) : f (x) = 0= kerδx,

where δx is the Dirac functional at x as above. Clearly,

δx : C(K)→ C

is a nonzero multiplicative linear functional satisfying δx(1) = 1, i.e., an algebrahomomorphism from C(K) into C. (See again Appendix C.2 for the terminology.)As before, also the converse is true.

4.3 The Koopman Operator 45

Lemma 4.10. A nonzero linear functional ψ : C(K)→ C is multiplicative if andonly if ψ = δx for some x ∈ K.

Proof. Let γ : C(K)→ C be nonzero and multiplicative. Then there is f ∈ C(K)such that γ( f ) = 1. Hence

1 = γ( f ) = γ(1 f ) = γ(1)γ( f ) = γ(1),

which means that γ is an algebra homomorphism. Hence I := ker(γ) is a properideal, and maximal since it has codimension one. By Lemma 4.9 there is x ∈ K suchthat ker(γ) = Ix = ker(δx). Hence f − γ( f )1 ∈ kerδx and therefore

0 = ( f − γ( f )1)(x) = f (x)− γ( f )

for every f ∈ C(K).

Lemma 4.10 yields a purely algebraic distinction of the Dirac functionals amongall linear functionals on C(K). Let us summarise our results in the following theo-rem.

Theorem 4.11. Let K be a compact space, and let

Γ (C(K)) := γ ∈ C(K)′ : γ algebra homomorphism.

Then the mapδ : K→ Γ (C(K)) x 7→ δx

is a homeomorphism, where Γ (C(K)) is endowed with the weak∗-topology as asubset of C(K)′.

4.3 The Koopman Operator

We now return to our original setting of a TDS (K;ϕ) with its Koopman operatorT = Tϕ defined at the beginning of this chapter. Actually, we shall first be a littlemore general, i.e., we shall consider possibly different compact spaces K,L anda mapping ϕ : L→ K. Again we can define the Koopman operator Tϕ mappingfunctions on K to functions on L. The following lemma says that ϕ is continuous ifand only if Tϕ(C(K))⊆ C(L).

Lemma 4.12. Let K,L be compact spaces, and let ϕ : L→ K be a mapping. Then ϕ

is continuous if and only if f ϕ is continuous for all f ∈ C(K).

Proof. Clearly, if ϕ is continuous, then also f ϕ is continuous for every f ∈ C(K).Conversely, if this condition holds, then ϕ−1[ | f |> 0 ] = [ | f ϕ|> 0 ] is open in Lfor every f ∈ C(K). To conclude that ϕ is continuous, it suffices to show that thesesets form a base of the topology of K (see Appendix A.2). Note first that these sets


are open. On the other hand, if U ⊆K is open and x∈U is a point, then by Urysohn’slemma one can find a function f ∈ C(K) such that 0 ≤ f ≤ 1, f (x) = 1 and f ≡ 0on K \U . But then x ∈ [ | f |> 0 ]⊆U .

If ϕ : L→ K is continuous, the operator T = Tϕ is an algebra homomorphismfrom C(K) to C(L). The next result states that actually every algebra homomorphismis such a Koopman operator. For the case of a TDS, i.e., for K = L, the result showsthat the state space mapping ϕ is uniquely determined by its Koopman operator Tϕ .

Theorem 4.13. Let K,L be (nonempty) compact spaces and let T : C(K)→C(L) belinear. Then the following assertions are equivalent.

(i) T is an algebra homomorphism.

(ii) There is a continuous mapping ϕ : L→ K such that T = Tϕ .

In this case, ϕ in (ii) is uniquely determined and the operator has norm ‖T‖= 1.

Proof. Urysohn’s lemma yields that ϕ as in (ii) is uniquely determined, and it isclear from (ii) that ‖T‖= 1. For the proof of the implication (i)⇒ (ii) take f ∈C(K)and y ∈ L. Then

T ′δy := δy T : C(K)→ C, f 7→ (T f )(y)

is an algebra homomorphism. By Theorem 4.11 there is a unique x =: ϕ(y) suchthat T ′δy = δϕ(y). This means that (T f )(y) = f (ϕ(y)) for all y ∈ L, i.e., T f = f ϕ

for all f ∈ C(K). By Lemma 4.12, ϕ is continuous, whence (ii) is established.

Theorem 4.13 with K = L shows that no information is lost when looking at Tϕ

in place of ϕ itself. On the other hand, one has all the tools from linear analysis—inparticular spectral theory—to study the linear operator Tϕ . Our aim is to show howproperties of the TDS (K;ϕ) are reflected by properties of the operator Tϕ . Here isa first result in this direction.

Lemma 4.14. Let K,L be compact spaces, and let ϕ : L→ K be continuous, withKoopman operator T = Tϕ : C(K)→ C(L). Then the following assertion holds.

a) ϕ is surjective if and only if T is injective. In this case, T is isometric.

b) ϕ is injective if and only if T is surjective.


An important consequence is the following.

Corollary 4.15. The compact spaces K,L are homeomorphic if and only if the al-gebras C(K) and C(L) are isomorphic.

We now return to TDSs and their invariant sets.

Lemma 4.16. Let (K;ϕ) be a TDS with Koopman operator T = Tϕ and let A ⊆ Kbe a closed subset. Then A is ϕ-invariant if and only if the ideal IA is T -invariant.

4.3 The Koopman Operator 47

Proof. Suppose that A is ϕ-invariant and f ∈ IA. If x ∈ A, then ϕ(x) ∈ A and hence(T f )(x) = f (ϕ(x)) = 0 since f vanishes on A. Thus T f vanishes on A, hence T f ∈IA. Conversely, suppose that IA is T -invariant. If x /∈ A, then by Urysohn’s lemmathere is f ∈ IA such that f (x) 6= 0. By hypothesis, T f = f ϕ vanishes on A, hencex /∈ ϕ(A). This implies that ϕ(A)⊆ A.

Using Theorem 4.11, we obtain the following characterisation of minimality.

Corollary 4.17. Let (K;ϕ) be a TDS and T = Tϕ its Koopman operator on C(K).Then the TDS (K;ϕ) is minimal if and only if no nontrivial closed algebra ideal ofC(K) is invariant under T .

An important object in the study of Tϕ is its fixed space

fix(Tϕ) :=

f ∈ C(K) : Tϕ f = f.

It is the eigenspace corresponding to the eigenvalue 1, hence a spectral notion. SinceTϕ 1 = 1, the fixed space fix(Tϕ) is at least one-dimensional (unless K = /0 and theTDS is degenerate). If f ∈ fix(Tϕ), then f is constant on each forward orbit orb+(x),and, since f is continuous, even on the closure. This observation leads to a spectralproperty of topological transitivity.

Lemma 4.18. Let (K;ϕ) be a TDS with Koopman operator T = Tϕ on C(K). If(K;ϕ) is topologically transitive, then fix(T ) is one-dimensional.

Proof. As already remarked, if x ∈ K and f ∈ fix(T ), then f (ϕn(x)) = (T n f )(x) =f (x) is independent of n ≥ 0, and hence f is constant on orb+(x). Consequently, ifthere is a point with dense forward orbit, then each f ∈ fix(T ) must be constant.

Exercise 3 shows that the converse statement fails.

An eigenvalue λ of T = Tϕ of modulus 1 is called a peripheral eigenvalue. Theset of peripheral eigenvalues

σp(T )∩T :=

λ ∈ T : ∃0 6= f ∈ C(K) with T f = λ f

is called the peripheral point spectrum of T , cf. Appendix C.9. We shall see laterthat this set is important for the asymptotic behaviour of the iterates of T . In the caseof a topologically transitive system, the peripheral point spectrum of the Koopmanoperator has a particularly nice property.

Theorem 4.19. Let (K;ϕ) be a TDS with Koopman operator T = Tϕ . If fix(T ) isone-dimensional, then the peripheral point spectrum of T is a subgroup of T, eachperipheral eigenvalue is simple, and each corresponding eigenvector is unimodular(up to a multiplicative constant).

Proof. Let λ ∈T be an eigenvalue of T , and let f ∈C(K) be a corresponding eigen-vector with ‖ f‖

∞= 1. Then


| f |= 1 | f |= |λ f |= |T f |= T | f | .

Since fix(T ) is one-dimensional, it follows that | f | = 1, i.e., f is unimodular. Nowtake λ ,µ to be peripheral eigenvalues of T with normalised eigenvectors f ,g. Thenf ,g are both unimodular and for h := f g we have

T h = T ( f g) = T f T g = λ f µg = λ µh.

Hence h is an unimodular eigenvector for λ µ . This shows that the peripheral pointspectrum of T is a subgroup of T. If we take λ = µ in the above argument, wesee that h is a unimodular fixed vector of T , hence h = 1. This shows that everyeigenspace corresponding to a peripheral eigenvalue is one-dimensional.

Example 4.20 (Minimal Rotations). For a rotation system (G;a) the associatedKoopman operator is denoted by

La : C(G)→ C(G), (La f )(x) := f (ax) ( f ∈ C(G),x ∈ G)

and called the left rotation (operator). Recall from Theorem 3.4 that (G;a) isminimal if and only if an : n ∈ N0 is dense in G. In this case, G is Abelian,fix(La) = C1, and by Theorem 4.19 the peripheral point spectrum σp(La)∩T is asubgroup of T.

In order to determine this subgroup, suppose that λ ∈ T and χ ∈ C(G) such that|χ|= 1 and Laχ = λ χ , i.e., χ(ax) = λ χ(x) for all x ∈G. Without loss of generalitywe may suppose that f (1) = 1. It follows that

χ(anam) = χ(an+m) = λn+m = λ

nλ

m = χ(an)χ(am)

for all n,m ∈ N0. By continuity of χ and since the powers of a are dense in G, thisimplies

χ : G→ T continuous, χ(xy) = χ(x)χ(y) (x,y ∈ G)

i.e., χ is a continuous homomorphism into T, a so-called character of G. Con-versely, each such character χ of G is an unimodular eigenvalue of La, with eigen-value χ(a). It follows that

σp(La)∩T=

χ(a) : χ : G→ T is a character

is the group of peripheral eigenvalues of La. See Section 14.3 for more about char-acters.

4.4 The Theorem of Gelfand–Naimark 49

4.4 The Theorem of Gelfand–Naimark

We now come to one of the great theorems of functional analysis. While we haveseen above that C(K) is a commutative C∗-algebra, the following theorem tells thatthe converse also holds.

Theorem 4.21 (Gelfand–Naimark). Let A be a commutative C∗-algebra. Thenthere is a compact space K and an isometric ∗-isomorphism Φ : A→ C(K). Thespace K is unique up to homeomorphism.

The Gelfand-Naimark theorem is central to our operator theoretic approach to er-godic theory, see Chapter 12 and the Halmos–von Neumann Theorem 17.13.

Let say some words about the proof of Theorem 4.21. By Corollary 4.15 thespace K is unique up to homeomorphism. Moreover, Theorem 4.11 leads us theway to find K in the first place. Namely, it must be given as the set of all scalaralgebra homomorphisms

Γ(A) :=

ψ : A→ C : ψ is an algebra homomorphism.

The set Γ(A), endowed with the restriction of the weak∗ topology σ(A′,A), is calledthe Gelfand space of A. Each element of A can be viewed in a natural way as acontinuous function on Γ(A), and the main problem is to show that in this mannerone obtains an isomorphism of C∗-algebras C(Γ(A)) ∼= A. We postpone detailedproof to the Supplement of this chapter. At this point, let us rather familiarise withthe result by looking at some examples.

Examples 4.22. 1) For a compact space K consider the algebra A = C(K). ThenΓ(A) = K as we proved in Section 4.2.

2) Consider the C∗-algebra C0(R) of continuous functions f : R→ C vanishingat infinity. Define A := C0(R)⊕〈1〉. Then A is a C∗-algebra and Γ(A) is home-omorphic to the one-point compactification of R, i.e., to the unit circle T. Ofcourse, one can replace R by any locally compact Hausdorff space and therebyarrive at the one-point compactification of such spaces.

3) Let X = (X ,Σ ,µ) be a finite measure space. For the C∗-algebra L∞(X), theGelfand space is the Stone representation space of the measure algebra Σ(X).See Chapter 5 and Chapter 12, in particular Section 12.4.

4) The Gelfand space of the C∗-algebra `∞(N) is the Cech–Stone compactifi-cation of N and is denoted by βN. This space will be studied in some moredetail in Chapter 15.

5) Consider the Banach space E := `∞(Z) and the left shift T : E → E thereondefined by T (xn) = (xn+1). We call a sequence x = (xn) ∈ `∞ almost periodicif the set

T kx : k ∈ Z⊆ E


is relatively compact in E. The set ap(Z) of almost periodic sequences is aclosed subspace of `∞(Z) and actually a C∗-subalgebra. The Gelfand repre-sentation space of A = ap(Z) is bZ the so-called Bohr compactification of Z,see Exercise 13 and Chapter 14. Similar construction can be carried out for alllocally compact topological groups.

Supplement: Proof of the Gelfand–Naimark Theorem

We need some preparation for the proof. But as has been already said, the candidatefor the sought after compact space is the set of scalar algebra homomorphisms.

Definition 4.23. Let A be a commutative complex Banach algebra. The set

Γ(A) :=

ψ : A→ C : ψ is an algebra homomorphism

is called the Gelfand (representation) space of A.

The first proposition shows that the elements of the Gelfand space are continuousfunctionals.

Proposition 4.24. Let A be a commutative complex Banach algebra. If ψ ∈ Γ(A),then ψ is continuous with ‖ψ‖ ≤ 1.

Proof. Suppose by contradiction that there is a∈ A with ‖a‖< 1 and ψ(a) = 1. Theseries

b :=∞

∑n=1

an

is absolutely convergent in A with ab+ a = b. Since ψ is a homomorphism, weobtain

ψ(b) = ψ(ab)+ψ(a) = ψ(a)ψ(b)+ψ(a) = ψ(b)+1,

a contradiction. This means that |ψ(a)| ≤ 1 holds for all a ∈ B(0,1).

As in the case of A = C(K), we now consider the dual Banach space A′ of A en-dowed with its weak∗-topology (see Appendix C.5). Then Γ(A) is a weakly∗ closedsubset of A′ since

Γ(A) =

ψ ∈ A′ : ψ(e) = 1∩⋂

x,y∈A

ψ ∈ A′ : ψ(xy) = ψ(x)ψ(y)

.

As a consequence of Proposition 4.24 and the Banach–Alaoglu Theorem C.4 the setΓ(A) is compact. Notice, however, that at this point we do not yet know whetherΓ(A) is nonempty. Later we shall show that Γ(A) has sufficiently many elements,but accept this fact for the moment. For x ∈ A consider the function

x : Γ(A)→ C, x(ψ) := ψ(x) (ψ ∈ Γ(A)).


By definition of the topology of Γ(A), the function x is continuous. Hence the map

Φ := ΦA : A→ C(Γ(A)), x 7→ x

is well-defined and clearly an algebra homomorphism. It is called the Gelfand rep-resentation or Gelfand map.

To conclude the proof of the Gelfand–Naimark theorem it remains to show that

1) Φ commutes with the involutions, i.e., a ∗-homomorphism, see Appendix C.2;

2) Φ is isometric;

3) Φ is surjective.

The proof of 1) is simple as soon as one knows that an algebra homomorphismψ : A→C of a C∗-algebra is automatically a ∗-homomorphism. 2) entails that Γ(A)is nonempty if A is nondegenerate; the analysis of the proof of Theorem 4.11 showsthat in order to prove 2) and 3) we have to study maximal ideals in A and invertibilityof elements of the form λe−a ∈ A.

To prove the remaining parts of the theorem needs an excursion into Gelfandtheory. Extensive treatment of this topic can be found in [Rudin (1987), Chapter18], [Rudin (1991), Part III] and in most books on functional analysis.

The Spectrum in Unital Algebras

Let A be a complex algebra with unit e and let a ∈ A. The spectrum of a is the set

Sp(a) :=

λ ∈ C : λe−a is not invertible.

The numberr(a) := inf

n∈N‖an‖1/n

is called the spectral radius of a. Here are some first properties.

Lemma 4.25. a) For every a ∈ A we have r(a)≤ ‖a‖.b) If A 6= 0 then r(e) = 1.

Proof. a) This follows from the submultiplicativity of the norm. To see b) noticethat ‖e‖ = ‖ee‖ ≤ ‖e‖2. Since by assumption ‖e‖ 6= 0, we obtain ‖e‖ ≥ 1. Thisimplies ‖e‖= ‖en‖ ≥ 1, and therefore r(e) = 1.

Just as in the case of bounded linear operators one can replace “inf” by “lim” inthe definition of the spectral radius, i.e.

r(a) = limn→∞‖an‖1/n,

see Exercise 6. The spectral radius is connected to the spectrum via the next result.


Lemma 4.26. Let A be a Banach algebra with unit element e. If a ∈ A is such thatr(a)< 1, then e−a is invertible and its inverse is given by the Neumann1 series

(e−a)−1 = ∑∞

n=0 an.

Moreover, for λ ∈ C with |λ |> r(a) one has (λe−a)−1 = λ−1

∑∞

n=0λ−nan.

We leave the proof of this lemma as Exercise 5.

As for bounded linear operators on Banach spaces the spectrum has the followingproperties.

Proposition 4.27. Let A be a complex Banach algebra and let a ∈ A. Then the fol-lowing are true.

a) The spectrum Sp(a) is a compact subset of C and is contained in the closedball B(0,r(a)).

b) The mappingC\Sp(a)→ A, λ 7→ (λe−a)−1

is holomorphic and vanishes at infinity.

c) If A 6= 0, then there is λ ∈ Sp(a) with |λ |= r(a).

Proof. a) If λ ∈ C is such that |λ | > r(a), then Lemma 4.26 implies that λe−a isinvertible, hence Sp(a)⊆ B(0,r(a)). We now show that the complement of Sp(a) isopen, hence compactness of Sp(a) follows. Indeed, let λ ∈ C \Sp(a) be fixed andlet µ ∈C be such that |λ−µ|< ‖(λe−a)−1‖−1. Then by Lemma 4.26 we concludethat (µe−a)−1 exists and is given by the absolutely convergent series

(µe−a)−1 = (λe−a)−1((λ −µ)(λe−a)−1− e)−1 (4.2)

= (λe−a)−1∞

∑n=0

(λ −µ)n(λe−a)−n,

i.e., µ ∈ C\Sp(a).b) By (4.2) the function µ 7→ (µe− a)−1 is given by a power series in a smallneighbourhood of each λ ∈ C\Sp(a), so it is holomorphic. From Lemma 4.26 weobtain

‖(λe−a)−1‖ ≤ |λ−1|∞

∑n=0‖λ−nan‖ ≤ |λ |−1

∞

∑n=0|λ |−n ‖a‖n =

1|λ |−‖a‖

for |λ |> ‖a‖. This shows the second part of b).c) Suppose that Sp(a)⊆ B(0,r0) for some r0 > 0. This means that

C\B(0,r0) 3 λ 7→ (λe−a)−1

1 Named after Carl Neumann (1832 – 1925).


is holomorphic and hence given by the series

(λe−a)−1 = λ−1

∞

∑n=0

λ−nan.

which is uniformly convergent for |λ | ≥ r′ with r′ > r0. This implies that ‖an‖ ≤Mr′n holds for all n ∈ N0 and some some M ≥ 0. From this we conclude r(a)≤ r′,hence r(a)≤ r0, and Sp(a) cannot be contained in any ball smaller than B(0,r(a)).It remains to prove that Sp(a) is nonempty provided dimA ≥ 1. If Sp(a) is emptythen, by part b), the mapping

C→ A, λ 7→ (λe−a)−1

is a bounded holomorphic function. For every ϕ ∈ A′ we can define the holomorphicfunction

fϕ : C 7→ C, λ 7→ ϕ((λe−a)−1),

which is again bounded and holomorphic. By Liouville’s Theorem from complexanalysis each fϕ is constant, but then by the Hahn–Banach Theorem C.3 also thefunction λ 7→ (λe− a)−1 ∈ A is constant. By part b), this constant must be zero,which is impossible if A 6= 0.

Theorem 4.28 (Gelfand–Mazur). Let A 6= 0 be a complex Banach algebra suchthat every nonzero element in A is invertible. Then A is isomorphic to C.

Proof. Let a ∈ A. Then by Proposition 4.27 there is λa ∈ Sp(a) with |λa|= r(a). Byassumption, λae− a = 0, so a = λae, hence A is one-dimensional. This proves theassertion.

Maximal Ideals

If I is a closed ideal in the Banach algebra A, then the quotient vector space A/Ibecomes a Banach algebra if endowed with the norm

‖a+ I‖= inf‖a+ x‖ : x ∈ I

.

For the details we refer to Exercise 7.

Proposition 4.29. Let A be a commutative complex Banach algebra. If ψ ∈ Γ(A),then kerψ is a maximal ideal. Conversely, if I ⊆ A is a maximal ideal, then I isclosed and there is a unique ψ ∈ Γ(A) such that I = kerψ .

Proof. Clearly, kerψ is a closed ideal. Since kerψ is of codimension one, it mustbe maximal.For the second assertion let I be a maximal ideal. Consider its closure I, still an ideal.Since B(e,1) consists of invertible elements by Lemma 4.25, we have B(e,1)∩ I =


/0, hence B(e,1)∩ I = /0. In particular, I is a proper ideal, hence is equal to I bymaximality. Therefore, each maximal ideal is closed.

Now consider the quotient algebra A/I. Since I is maximal, A/I does not containany proper ideals other than 0. If a is a noninvertible element, then aA is an ideal,so actually it must be the zero-ideal. This yields a = 0. Hence all nonzero elementsin A/I are invertible. By the Gelfand–Mazur Theorem 4.28 we obtain that A/I isisomorphic to C under some Ψ . The required homomorphism ψ : A→ C is givenby ψ =Ψ q, where q : A→ A/I is the quotient map. Uniqueness of ψ can be provedas follows. For every a ∈ A we have ψ(a)e− a ∈ kerψ . So if kerψ = kerψ ′, thenkerψ ′(ψ(a)e−a) = 0, hence ψ(a) = ψ ′(a).

As an important consequence of this proposition we obtain that the Gelfand spaceΓ(A) is not empty if the algebra A is not degenerate. Indeed, by an application ofZorn’s lemma, every proper ideal I of A is contained in a maximal one, so in a unitalBanach algebra there are maximal ideals.

On our way to the proof of the Gelfand–Naimark theorem we need to connectthe spectrum and the Gelfand space. This is content of the next theorem.

Theorem 4.30. Let A be a commutative unital Banach algebra and let a ∈ A. Then

Sp(a) =

ψ(a) : ψ ∈ Γ(A)= a(Γ(A)).

Proof. Since ψ(e) = 1 for ψ ∈ Γ(A), one has ψ(ψ(a)e− a) = 0, so ψ(a)e− acannot be invertible, i.e., ψ(a) ∈ Sp(a).

On the other hand if λ ∈ Sp(a), then the principal ideal (λe−a)A is not the wholealgebra A. So by a standard application of Zorn’s lemma we obtain a maximal idealI containing it. By Proposition 4.27 there is ψ ∈ Γ(A) with kerψ = I. We concludeψ(λe−a) = 0, i.e., ψ(a) = λ .

An immediate consequence is the spectral radius formula.

Corollary 4.31. For a commutative unital Banach algebra A and an element a ∈ Awe have

r(a) = sup|ψ(a)| : ψ ∈ Γ(A)

= ‖a‖∞ = ‖ΦA(a)‖∞.

The Proof of the Gelfand–Naimark Theorem

To complete the proof of Gelfand–Naimark theorem the C∗-algebra structure has toenter the picture.

Lemma 4.32. Let A be a commutative C∗-algebra. For a ∈ A with a = a∗ the fol-lowing assertions are true.

a) r(a) = ‖a‖.b) Sp(a)⊆ R; equivalently, ψ(a) ∈ R for every ψ ∈ Γ(A).


Proof. a) We have aa∗ = a2, so ‖a2‖= ‖a‖2 holds. By induction one can prove that‖a2n‖= ‖a‖2n

is true for all n ∈ N0. From this we obtain

r(a) = limn→∞‖a2n‖

12n = ‖a‖.

b) The equivalence of the two formulations is clear by Theorem 4.30. For a fixedψ ∈ Γ(A) set α = Reψ(a) and β = Imψ(a). Since ψ(e) = 1, we obtain

ψ(a+ ite) = α + i(β + t)

for all t ∈ R. The inequality

‖(a+ ite)(a+ ite)∗‖= ‖a2 + t2e‖ ≤ ‖a‖2 + t2

is easy to see. On the other hand, by using that ‖ψ‖ ≤ 1 and the C∗-property of thenorm we obtain

α2 +(β + t)2 = |ψ(a+ ite)|2 ≤ ‖a+ ite‖2 = ‖(a+ ite)(a+ ite)∗‖.

These two inequalities imply that

α2 +β

2 +2β t + t2 ≤ ‖a‖2 + t2 for all t ∈ R.

This yields β = 0.

Lemma 4.33. Let A be a commutative C∗-algebra and let ψ ∈ Γ(A). Then

ψ(a∗) = ψ(a) for all a ∈ A,

i.e., ψ is a ∗-homomorphism.

Proof. For a ∈ A fixed define

x :=a+a∗

2and y :=

b−b∗

2i.

Then we have x = x∗ and y = y∗ and a = x+ iy. By Lemma 4.32.b) ψ(x),ψ(y) ∈R,so we conclude

ψ(a∗) = ψ(x∗)− iψ(y∗) = ψ(x)+ iψ(y) = ψ(x+ iy) = ψ(a).

Proof the Gelfand–Naimark Theorem 4.21. Let a ∈ A and let ψ ∈ Γ(A). Then byLemma 4.32.b) we obtain

Φ(a∗)(ψ) = a∗(ψ) = ψ(a∗) = ψ(a) = a(ψ) = Φ(a)(ψ),

hence Φ is a ∗-homomorphism, and therefore a homomorphism between the C∗-algebras A and C(Γ(A)).


Next, we prove that the Gelfand map Φ is an isometry, i.e., that ‖a‖ = ‖a‖∞. Fora ∈ A we have (aa∗)∗ = aa∗, so by Lemma 4.32.a) ‖aa∗‖ = r(aa∗). From this andfrom Corollary 4.31 we conclude

‖a‖2 = ‖aa∗‖= r(aa∗) = ‖Φ(aa∗)‖∞ = ‖Φ(a)Φ(a)‖∞ = ‖Φ(a)‖2∞.

Finally, we prove that Φ : A→ C(Γ(A)) is surjective. Since Φ is a homomorphism,Φ(A) is ∗-subalgebra, i.e., a conjugation invariant subalgebra of C(Γ(A)). It triviallyseparates the points of Γ(A). By the Stone–Weierstraß Theorem 4.4 Φ(A) is dense,but since Φ is an isometry its image is closed. Hence Φ is surjective.

Exercises


2. Show that C(K) is separable if and only if K is metrisable.

3. Let K := [0,1] and ϕ(x) := x2, x ∈ K. Show that the system (K;ϕ) is not topolog-ically transitive, while the fixed space of the Koopman operator is one-dimensional.Determine the peripheral point spectrum of Tϕ .

4. Let (K;ϕ) be a TDS with Koopman operator T = Tϕ . For m∈N let Pm := x∈K :ϕm(x) = x. Let λ ∈ T be a peripheral eigenvalue of T with eigenfunction f . Showthat either λ m = 1 or f vanishes on Pm. Determine the peripheral point spectrum ofthe Koopman operator for the one-sided shift (W +

k ;τ), k ≥ 1.


6 (Fekete’s Lemma). Let (xn) ∈ R be a nonnegative sequence with the property

xn+mn+m ≤ xn

nxmm for all n,m ∈ N.

Prove that (xn)n is convergent and that

infn∈N

xn = limn→∞

xn.

Apply this to prove the equality

r(a) = limn→∞‖an‖1/n

for a ∈ A, A a Banach algebra.

7. Let A be a commutative Banach algebra and let I be a closed ideal in A. Prove that

‖a+ I‖= inf‖a+ x‖ : x ∈ I

Exercises 57

defines a norm on the quotient vector space A/I, and that A/I becomes a Banachalgebra with the multiplication

(a+ I)(b+ I) := ab+ I.

8. Let A be a Banach algebra and a,b ∈ A such that ab = ba. Prove that r(ab) ≤r(a)r(b).

9. Let A be a complex Banach algebra. Prove that the Gelfand map is an isometry ifand only if one has ‖a‖2 = ‖a2‖ for all a ∈ A.

10 (Disc Algebra). We let D := z ∈ C : |z|< 1 be the open unit disc. Show that

D :=

f ∈ C(D) : f∣∣D holomorphic

is a commutative complex Banach algebra if endowed with pointwise operationsand the sup-norm. Define

f ∗(z) := f (z).

Show that f 7→ f ∗ is an involution on D and ‖ f‖∞ = ‖ f ∗‖∞. Prove that the Gelfandmap ΦD is an algebra homomorphism but not a ∗-homomorphism.

11. Let A be a C∗-algebra. Prove that r(a) = ‖a‖ for each a ∈ A with aa∗ = a∗a.

12. Let A be a complex algebra without unit element. Show that the vector spaceA+ := A⊕C becomes a unital algebra with respect to the multiplication

(a,x) · (b,y) := (ab+ xb+ ya,xy) (a,b ∈ A,x,y ∈ C)

and with unit element e = (0,1). Suppose that A is a Banach space, and that ‖ab‖ ≤‖a‖‖b‖ for all a,b ∈ A. Show that A+ becomes Banach algebra with respect to thenorm ‖(a,x)‖ := ‖a‖+ |x|.

13. Show that ap(Z), the set of almost periodic sequences is a C∗-subalgebra of`∞(Z).

14. Show that c, the space of convergent scalar sequences, endowed with the obviousoperations and the sup-norm is a C∗-algebra. Describe its Gelfand space.

Chapter 5Measure-Preserving Systems

In the previous chapters we looked at TDSs (K;ϕ), but now turn to dynamical sys-tems that preserve some probability measure on the state space. We shall first moti-vate this change of perspective.

As explained in Chapter 1, “ergodic theory” as a mathematical discipline has itsroots in the development of statistical mechanics, in particular in the attempts ofBoltzmann, Maxwell and others to derive the second law of thermodynamics frommechanical principles. Central to this theory is the concept of (thermodynamical)equilibrium. In topological dynamics, an equilibrium state is a state of rest of thedynamical system itself. However, this is different in the case of a thermodynamicalequilibrium, which is an emergent (=macro) phenomenon, while on the microscopiclevel the molecules show plenty of activity. What is at rest here is rather of a statis-tical nature.

To clarify this, consider the example from Chapter 1, an ideal gas in a box. Whatcould “equilibrium” mean there? Now, since the internal (micro-)states of this sys-tem are so manifold (their set is denoted by X) and the time scale of their changesis so much smaller than the time scale of our observations, a measurement on thegas has the character of a random experiment: the outcome appears to be random,although the underlying dynamics is deterministic. To wait a moment with the nextexperiment is like shuffling a deck of cards again before drawing the next card; andthe “equilibrium” hypothesis—still intuitive—just means that the experiment can berepeated at any time under the same “statistical conditions”, i.e., the distribution ofan observable f : X → R (modelling our experiment and now viewed as a randomvariable) does not change with time.

If ϕ : X → X describes the change of the system in one unit of time and if µ

denotes the (assumed) probability measure on X , then time-invariance of the distri-bution of f simply means

µ[ f > α] = µ[ f ϕ > α] for all α ∈ R.

Having this for a sufficiently rich class of observables f is equivalent to

59

60 5 Measure-Preserving Systems

µ(ϕ−1A) = µ(A)

for all A ⊆ X in the underlying σ -algebra. Thus we see how the “equilibrium hy-pothesis” translates into the existence of a probability measure µ which is invariantunder the dynamics ϕ .

We now leave thermodynamics and intuitive reasoning and return to mathemat-ics. The reader is assumed to have a background in abstract measure and integrationtheory, but some definitions and results are collected in Appendix B.

A measurable space is a pair (X ,Σ), where X is a set and Σ is a σ -algebra ofsubsets of X . Given measurable spaces (X ,Σ) and (Y,Σ ′), a mapping ϕ : X → Y iscalled measurable if

[ϕ ∈ A ] := ϕ−1(A) =

x ∈ X : ϕ(x) ∈ A

∈ Σ (A ∈ Σ

′).

Given a measure µ on Σ , its push-forward measure (or image measure) ϕ∗µ isdefined by

(ϕ∗µ)(A) := µ [ϕ ∈ A ] (A ∈ Σ′).

It is convenient to abbreviate a measure space (X ,Σ ,µ) simply with the single let-ter X, i.e., X = (X ,Σ ,µ). However, if different measure spaces X,Y,Z, . . . are in-volved we shall occasionally use ΣX,ΣY,ΣZ . . . for the corresponding σ -algebras,and µX,µY,µZ . . . for the corresponding measures. For integration with respect tothe measure of X we introduce the notation∫

Xf ·g = 〈 f ,g〉X :=

∫X

f ·g dµ

whenever f ·g ∈ L1(X). When no confusion can arise we shall drop the subscript X,and write, e.g., ∫

f g = 〈 f ,g〉= 〈 f g,1〉= 〈g, f 〉

whenever f ·g ∈ L1(X).

Definition 5.1. Let X and Y be measure spaces. A measurable mapping ϕ : X → Yis called measure-preserving if ϕ∗µX = µY, i.e.,

µ [ϕ ∈ A ] = µY(A) (A ∈ ΣY).

In the case that X = Y and ϕ is measure-preserving, µ is called an invariant mea-sure for ϕ (or ϕ-invariant, invariant under ϕ).

Standard arguments from measure theory show that for ϕ∗µX = µY it is sufficient tohave µX[ϕ ∈ A ] = µY(A) for all A belonging to a generator E of the σ -algebra ΣY,see Appendix B.1. Furthermore, it is also equivalent to∫

X( f ϕ) =

∫Y

f

5.1 Examples 61

for all f ∈M+(Y) (the positive measurable functions on Y ).

We can now introduce our main object of interest.

Definition 5.2. A pair (X;ϕ) is called a measure-preserving dynamical system(MDS) if X = (X ,Σ ,µ) is a probability space, ϕ : X → X is measurable and µ isϕ-invariant.

We reserve the notion of MDS for probability spaces. However, some results remaintrue for general or at least for σ -finite measure spaces.

5.1 Examples

1. The Baker’s Transformation

On X = [0,1]× [0,1] we define the map

ϕ : X → X , ϕ(x,y) :=

(2x,y/2), 0≤ x < 1/2;(2x−1,(y+1)/2), 1/2≤ x≤ 1.

Lebesgue measure is invariant under this transformation since∫[0,1]2

f (ϕ(x,y))d(x,y)

=∫ 1

0

∫ 1/2

0f (2x,y/2)dxdy+

∫ 1

0

∫ 1

1/2f (2x−1,(y+1)/2)dxdy

=∫ 1/2

0

∫ 1

0f (x,y)dxdy+

∫ 1

1/2

∫ 1

0f (x,y)dxdy

=∫ 1

0

∫ 1

0f (x,y)dxdy =

∫[0,1]2

f (x,y)d(x,y)

for every positive measurable function on [0,1]2. The mapping ϕ is called thebaker’s transformation, a name explained by Figure 5.1.

−→ −→

Fig. 5.1 The baker’s transformation deforms a square like a baker does with puff pastry whenkneading.


2. The Doubling Map

Let X = [0,1] and consider the doubling map ϕ : X → X defined as

ϕ(x) := 2x (mod1) =

2x, 0≤ x < 1/2,2x−1, 1/2≤ x≤ 1

(cf. also Exercise 2.10). The Lebesgue measure is invariant under ϕ since

∫ 1

0f (ϕ(x))dx =

∫ 1/2

0f (2x)dx+

∫ 1

1/2f (2x−1)dx

=12

∫ 1

0f (x)dx+

12

∫ 1

0f (x)dx =

∫ 1

0f (x)dx

for every positive measurable function f on [0,1].

ϕ−1(A) ϕ−1(A)

A

Fig. 5.2 A set A and its inverse image under the doubling map.

3. The Tent Map

Let X = [0,1] and consider the tent map ϕ : X → X given by

ϕ(x) =

2x, 0≤ x < 1/2;2−2x, 1/2≤ x≤ 1

(cf. Exercise 2.11). It is Exercise 1 to show that ϕ preserves the Lebesgue measure.

5.1 Examples 63

ϕ−1(A) ϕ−1(A)

A

Fig. 5.3 A set A and its inverse image under the tent map.

4. The Gauss Map

Consider X = [0,1) and define the Gauss map ϕ : X → X by

ϕ(x) =1x−⌊

1x

⌋(0 < x < 1), ϕ(0) := 0.

It is easy to see that

ϕ(x) =1x−n if x ∈

(1

n+1,

1n

], n ∈ N,

and

ϕ−1y=

1

y+n: n ∈ N

for every y ∈ [0,1). By Exercise 2, the measure

µ :=dx

x+1

on [0,1) is ϕ-invariant. The Gauss map links ergodic theory with number theory viacontinued fractions, see [Silva (2008), p. 154] and [Baker (1984), pp. 44–46].

5. Bernoulli Shifts

Fix k ∈ N, consider the finite space L := 0, . . . ,k−1 and form the product space

X := ∏n∈N0

L =

0, . . . ,k−1N0 = W +

k


(cf. Example 2.4) with the product σ -algebra Σ :=⊗

n≥0P(L), see Appendix B.6.On (X ,Σ) we consider the left shift τ . To see that τ is a measurable mapping, notethat the cylinder sets, i.e., sets of the form

A := A0×A1×·· ·×An−1×L×L× . . . (5.1)

with n ∈ N, A0, . . . ,An−1 ⊆ L, generate Σ . Then

[τ ∈ A] = L×A0×A1×·· ·×An−1×L×L× . . .

is again contained in Σ .There are many shift invariant probability measures on W +

k , and we just con-struct one. (For another one see the next section.) Fix a probability vector p =(p0, . . . , pk−1)

t and consider the associated measure ν = ∑k−1j=0 p jδ j on L. Then

let µ :=⊗

n≥0 ν be the infinite product measure on Σ , defined on cylinder sets A asin (5.1) by

µ(A) = ν(A0)ν(A1) · · ·ν(An−1).

(By Theorem B.9 there is a unique measure µ on Σ satisfying this requirement.)The product measure µ is shift invariant, because for cylinder sets A as in (5.1) wehave

µ[τ ∈ A] = ν(L)ν(A0) . . .ν(An−1) = ν(A0) . . .ν(An−1) = µ(A),

since ν(L) = 1. This MDS (X ,Σ ,µ;τ) is called the Bernoulli shift B(p0, . . . , pk−1).

The Bernoulli shift is a special case of a more general construction. Namely, fix aprobability space Y and consider the infinite product X := ∏n≥0 Y with the productmeasure µX :=

⊗n≥0 µY on the product σ -algebra ΣX :=

⊗n≥0 ΣY. Then the shift

τ is measurable and µX is τ-invariant. Obviously, all this can also be done withtwo-sided shifts.

6. Markov Shifts

We consider a slight generalisation of Example 5. Let L = 0, . . . ,k− 1, X = LN0

and Σ as in Example 5. Let S := (si j) be a row stochastic k× k-matrix, i.e.,

si j ≥ 0 (0≤ i, j ≤ k−1), ∑k−1j=0 si j = 1 (0≤ i≤ k−1).

For every probability vector p := (p0, . . . , pk−1)t we construct the Markov measure

µ on W +k requiring that on the special cylinder sets

A := j0× j1· · ·× jn×∏m>n

L (n≥ 1, j0, . . . , jn ∈ L)

one hasµ(A) := p j0s j0 j1 . . .s jn−1 jn . (5.2)

5.1 Examples 65

It is standard measure theory based on Lemma B.5 to show that there exists a uniquemeasure µ on (X ,Σ) satisfying this requirement. Now with A as above one has

µ[τ ∈ A] = µ

(L× j0× j1· · ·× jn×∏

m>nL

)=

k−1

∑j=0

p js j j0 . . .s jn−1 jn .

Hence the measure µ is invariant under the left shift if and only if

p j0s j0 j1 . . .s jn−1 jn =k−1

∑j=0

p js j j0 . . .s jn−1 jn

for all choices of parameters n ≥ 0,0 ≤ j1, . . . jn < k. This is true if and only if itholds for n = 0 (sum over the other indices!), i.e.,

pl = ∑k−1j=0 p js jl .

This means that ptS = pt , i.e., pt is a left fixed vector of S. Such a left fixed probabil-ity vector indeed always exists, by Perron’s theorem. (See Exercise 9 and Theorem8.12 for proofs of Perron’s theorem.)

If S is a row-stochastic k× k-matrix, p is a left fixed probability vector for Sand µ =: µ(S, p) is the unique probability measure on W +

k with (5.2), then theMDS (W +

k ,Σ ,µ(P, p);τ) is called the Markov shift associated with (S, p), and S iscalled its transition matrix. If S is the matrix all of whose rows are equal to pt , thenµ is just the product measure and the Markov system is the same as the Bernoullisystem B(p0, . . . , pk−1).

As in the case of Bernoulli shifts, Markov shifts can be generalised to arbitraryprobability spaces. One needs the notion of a probability kernel and the Ionescu–Tulcea theorem (Theorem B.10).

7. Products and Skew-Products

Let (X;ϕ) be an MDS and let Y be a probability space. Furthermore, let

Φ : X×Y → Y

be a measurable map such that Φ(x, ·) preserves µY for all x ∈ X . Define

ψ : X×Y → X×Y, ψ(x,y) := (ϕ(x),Φ(x,y)).

Then the product measure µX⊗ µY is ψ-invariant. Hence (X ×Y,ΣX⊗ΣX,µX⊗µX;ψ) is an MDS, called the skew product of (X;ϕ) along Φ .

In the special case that Φ(x, ·) = ϕ ′ does not depend on x ∈ X , the skew productis called the product of the MDSs (X;ϕ) and (Y;ϕ ′).


5.2 Measures on Compact Spaces

This section connects our discussion about topological dynamics with invariantmeasures. Since this is not a course on measure theory we do not provide all de-tails here. Interested readers may consult [Bogachev (2007)], [Bauer (1981)] or anyother standard textbook on measure theory.

Let K be a compact space. There are two natural σ -algebras on K: the Borel al-gebra Bo(K), generated by the family of open set, and the Baire algebra Ba(K), thesmallest σ -algebra that makes all continuous functions measurable. Equivalently,

Ba(K) = σ[ f > 0 ] : 0≤ f ∈ C(K)

.

Clearly Ba(K)⊆Bo(K), and by the proof of Lemma 4.12 the open Baire sets form abase for the topology of K. Exercise 8 shows that Ba(K) is generated by the compactGδ -subsets of K. If K is metrisable, then Ba(K) and Bo(K) coincide, but in generalthis is false, see Exercise 10 and [Bogachev (2007), II, 7.1.3].

A (complex) measure µ on Ba(K) (resp. Bo(K)) is called a Baire measure(resp. Borel measure).

Lemma 5.3. Every finite positive Baire measure is regular in the sense that forevery B ∈ Ba(K) one has

µ(B) = sup

µ(A) : A ∈ Ba(K), A compact, A⊆ B

= inf

µ(O) : O ∈ Ba(K), O open, B⊆ O.

Proof. This is a standard argument involving Dynkin systems (Theorem B.1); for adifferent proof see [Bogachev (2007), II, 7.1.8].

Combining the regularity with Urysohn’s lemma one can show that for a finitepositive Baire measure µ , C(K) is dense in Lp(K,Ba(K),µ), 1 ≤ p < ∞ (see alsoLemma D.3).

A finite positive Borel measure is called regular if

µ(B) = sup

µ(A) : A compact, A⊆ B

= inf

µ(O) : O open, B⊆ O. (5.3)

To clarify the connection between Baire and regular Borel measures we state thefollowing result.

Proposition 5.4. If µ,ν are regular finite positive Borel measures on K, then toeach Borel set A ∈ Bo(K) there is a Baire set B ∈ Ba(K) such that µ(A4B) = 0 =ν(A4B).

Proof. Let A ⊆ K be a Borel set. By regularity, there are open sets On,O′n, closedsets Ln,L′n such that Ln,L′n⊆A⊆On,O′n and µ(On\Ln),ν(O′n\L′n)→ 0. By passingto On ∩O′n and Ln ∩L′n we may suppose that On = O′n and Ln = L′n. Moreover, by

5.2 Measures on Compact Spaces 67

passing to finite unions of the Ln and finite intersections of the On we may supposethat

Ln ⊆ Ln+1 ⊆ A⊆ On+1 ⊆ On

for all n∈N. By Urysohn’s lemma we can find continuous functions fn ∈C(K) with1Ln ≤ fn ≤ 1On . Now define

B :=⋂

n∈N[ fn > 0 ] ,

which is clearly a Baire set. By construction, Ln ⊆ B ⊆ On. This yields A4B ⊆On \Ln for all n ∈ N, whence µ(A4B) = 0 = ν(A4B).

Proposition 5.4 shows that a regular Borel measure is completely determined byits restriction to the Baire algebra. Moreover, as soon one disregards null sets, thetwo concepts become exchangeable, see also Remark 5.8 below.

Let us denote the linear space of finite complex Baire measures on K by M(K).It is a Banach space with respect to the total variation norm ‖µ‖M := |µ|(K) (seeAppendix B.9). Every µ ∈M(K) defines a linear functional on C(K) via

〈 f ,µ〉 :=∫

Kf dµ ( f ∈ C(K)).

Since |〈 f ,µ〉| ≤∫

K | f | d |µ| ≤ ‖ f‖∞‖µ‖M, we have 〈 · ,µ〉 ∈ C(K)′, the Banach

space dual of C(K). The following lemma shows in particular that the associationµ 7→ 〈·,µ〉 is injective.

Lemma 5.5. If µ,ν ∈M(K) with∫

K f dµ =∫

K gdν for all f ∈ C(K), then µ = ν .

Proof. By passing to µ−ν we may suppose that ν = 0. By Exercise 8.a) and stan-dard measure theory it suffices to prove that µ(A) = 0 for each compact Gδ -subset Aof K. Given such a set, one can find open subsets On of K such that A=

⋂n∈N On. By

Urysohn’s lemma there are continuous functions fn ∈C(K) such that 1A ≤ fn ≤ 1On .Then fn→ 1A pointwise and | fn| ≤ 1. It follows that µ(A) = limn→∞ 〈 fn,µ〉= 0, bythe dominated convergence theorem.

Note that, using the same approximation technique as in the proof, one can showthat for a Baire measure µ ∈M(K)

µ ≥ 0 ⇔ 〈 f ,µ〉 ≥ 0 for all 0≤ f ∈ C(K).

Lemma 5.5 has an important consequence.

Proposition 5.6. Let K,L be compact spaces, let ϕ : K → L be Baire measurable,and let µ,ν be finite positive Baire measures on K,L, respectively. If∫

K( f ϕ)dµ =

∫K

f dν for all f ∈ C(L),

then ν = ϕ∗µ .


By Proposition 5.6, if ϕ : K→ K is Baire measurable, then a probability measure µ

on K is ϕ-invariant if and only if 〈 f ϕ,µ〉= 〈 f ,µ〉 for all f ∈ C(K).

We have seen that every Baire measure on K gives rise to a bounded linear func-tional on C(K) by integration. The following important theorem states the converse.

Theorem 5.7 (Riesz’ Representation Theorem). Let K be a compact space. Thenthe mapping

M(K)→ C(K)′, µ 7→ 〈 · ,µ〉

is an isometric isomorphism.

For the convenience of the reader, we have included a proof in Appendix D; seealso [Rudin (1987), 2.14] or [Lang (1993), IX.2]. Justified by the Riesz theorem,we shall identify M(K) = C(K)′ and often write µ in place of 〈 · ,µ〉.

Remark 5.8. The common form of Riesz’ theorem, for instance in [Rudin (1987)],involves regular Borel measures instead of Baire measures. This implies in particu-lar that each finite positive Baire measure has a unique extension to a regular Borelmeasure. It is sometimes convenient to use this extension, and we shall do it with-out further reference. However, it is advantageous to work with Baire measures ingeneral since regularity is automatic and the Baire algebra behaves well when oneforms infinite products (see Exercise 8). From the functional analytic point of viewthere is anyway no difference between a Baire measure and its regular Borel ex-tension, since the associated Lp-spaces coincide by Proposition 5.4. For a positiveBaire (regular Borel) measure µ on K we therefore abbreviate

Lp(K,µ) := Lp(K,Ba(K),µ) (1≤ p≤ ∞)

and note that in place of Ba(K) we may write Bo(K) in this definition. Moreover, asalready noted, C(K) is dense in Lp(K,µ) for 1≤ p < ∞.

For each 0≤ µ ∈M(K) it is easily seen that

Iµ :=

f ∈ C(K) : ‖ f‖L1(µ) =∫

K| f | dµ = 0

is a closed algebra ideal of C(K). It is the kernel of the canonical mapping

C(K)→ L∞(K,µ)

that sends f ∈ C(K) to its equivalence class modulo µ-almost everywhere equality.By Theorem 4.8 there is a closed set M ⊆ K such that I = IM . The set M is calledthe (topological) support of µ and is denoted by M =: supp(µ). Here is a measuretheoretic description. (See also Exercise 12.)

Proposition 5.9. Let 0≤ µ ∈M(K). Then

supp(µ) =

x ∈ K : µ(U)> 0 for each open neighbourhood U of x. (5.4)

5.2 Measures on Compact Spaces 69

Proof. Let M := supp(µ) and let L denote the right-hand side of (5.4). Let x ∈ Mand U be an open neighbourhood of x. By Urysohn’s lemma there is f ∈ C(K) suchthat x ∈ [ f 6= 0 ] ⊆U . Since x ∈M, f /∈ IM and hence

∫K | f | dµ 6= 0. It follows that

µ(U)> 0 and, consequently, that x ∈ L.

Conversely, suppose that x /∈M. Then by Urysohn’s lemma there is f ∈ C(K) suchthat x ∈ [ f 6= 0 ] and f = 0 on M. Hence f ∈ IM and therefore

∫K | f | dµ = 0. It

follows that µ [ f 6= 0 ] = 0, and hence x /∈ L.

A positive measure µ ∈M(K) has full support or is strictly positive if supp(µ) =K. Equivalently, the canonical map C(K)→ L∞(K,µ) is isometric (Exercise 11).

For a TDS (K;ϕ), each ϕ-invariant probability Baire measure µ ∈M(K) givesrise to an MDS (K,Ba(K),µ;ϕ), which for convenience we abbreviate by

(K,µ;ϕ).

The following classical theorem says that such measures can always be found.

Theorem 5.10 (Krylov–Bogoljubov). Let (K;ϕ) be a TDS. Then there is at leastone ϕ-invariant Baire probability measure on K.

Proof. We postpone the proof of this theorem to Chapter 10, see Theorem 10.2.Note however that under the identification M(K) = C(K)′ from above, the ϕ-invariance of µ just means that T ′ϕ(µ) = µ , i.e., µ is a fixed point of the adjointof the Koopman operator Tϕ on C(K). Hence, fixed point theorems or similar tech-niques can be applied.

The following example shows that there may be many different invariant Baireprobability measures, and hence many different MDSs (K,µ;ϕ) associated with aTDS (K;ϕ).

Example 5.11. Consider the rotation TDS (T;a) for some a ∈ T. Obviously, thenormalised arc-length measure is invariant. If a is an nth root of unity, then theconvex combination of point measures

µ :=1n

n

∑j=1

δa j

is another invariant probability measure, see also Exercise 5.

Example 5.12. Fix k ∈ N, consider the shift (W +k ;τ). Then each of the Markov

measures in Example 5.1.6 is an invariant measure for τ on Ba(W +k ) (by Exercise 8

the product σ -algebra is Σ = Ba(W +k )).

TDSs with unique invariant probability measures will be studied in Chapter 10.2.


5.3 Haar Measure and Rotations

Let G be a compact topological group. A nontrivial finite Baire measure on G thatis invariant under all left rotations g 7→ a · g, a ∈ G, is called a Haar measure.It can be shown that such a Haar measure is essentially unique, i.e., for any twoHaar measures µ,ν on G there is a positive constant α such that µ = αν . In caseof compact groups one sometimes reserves the name Haar measure for the uniqueinvariant probability measure on G. It is a fundamental fact that on each compactgroup there is such a Haar measure, usually denoted by m.

The Haar measure is automatically right-invariant, i.e., invariant under rightrotations, and inversion invariant, i.e., invariant under the inversion mapping g 7→g−1 of the group. Moreover, m has full support supp(m) = G, i.e., if /0 6=U ⊆ G isopen, then µ(U)> 0. We accept these facts for the moment without proof and returnto them in Chapter 14, see Theorem 14.13.

In many concrete cases the Haar measure can be described explicitly.

Example 5.13. 1) The Haar measure dz on T is given by∫T

f (z)dz :=∫ 1

0f (e2πit)dt

for measurable f : T→ [0,∞].

2) The Haar measure on a finite discrete group is the counting measure. Inparticular, the Haar probability measure on the cyclic group Zn = Z/nZ =0,1, . . . ,n−1 is given by

∫Zn

f (g)dg =1n

n−1

∑j=0

f ( j)

for every f : Zn→ C.

3) For the Haar measure on the dyadic integers A2, see Exercise 6.

If G,H are compact groups with Haar measures µ,ν , respectively, then the prod-uct measure µ⊗ν is a Haar measure on the product group G×H. The same holdsfor infinite products: Let (Gn)n∈N be a family of compact groups, and let µn denotethe unique Haar probability measure on Gn, n∈N. Then the product measure

⊗n µn

is the unique Haar probability measure on the product group ∏n Gn.

With the Haar measure at hand, we can prolong our list of examples from theprevious section.

Example 5.14 (Rotation Systems). If G is a compact group then the Haar measurem is (by definition) invariant under the left rotation with a ∈ G. Hence the rota-tion system (G,m;a) is an MDS with respect to the Haar measure. And, since theHaar measure is invariant also under right rotations, each right rotation system(G,m;ρa) is an MDS with respect to the Haar measure.

Exercises 71

Example 5.15 (Homogeneous Systems). Let G be a compact group with Haarmeasure m, and let H ≤ G be a closed subgroup of G. Then the factor mapq : G→ G/H maps m to a measure µ := q∗m on H that is invariant under all leftrotations on G/H. In particular, for a ∈ G the homogeneous system (G/H,µ;a) isan MDS with respect to this measure.

Exercises

1. Show that the Lebesgue measure is invariant under the tent map (Example 5.1.3).Consider the continuous mapping ϕ : [0,1]→ [0,1], ϕ(x) := 4x(1− x). Show thatthe (finite!) measure µ := dx/2

√x(1− x) on [0,1] is invariant under ϕ .

2. Consider the Gauss transformation (Example 5.1.4).

ϕ(x) =1x−⌊

1x

⌋(0 < x < 1), ϕ(0) := 0,

on [0,1) and show that the measure µ := dx/(x+1) is invariant for ϕ .

3 (Boole Transformation). Show that the Lebesgue measure on R is invariant un-der the map ϕ :R→R, ϕ(x) = x−1/x. This map is called the Boole transformation.Define the modified Boole transformation ψ : R→ R by

ψ(x) :=12

ϕ(x) =12

(x− 1

x

)(x 6= 0).

Show that the (finite!) measure µ := dx/(1+ x2) is invariant under ψ .

4. Consider the locally compact Abelian group G := (0,∞) with respect to multi-plication. Show that µ := dx/x is invariant with respect to all mappings x 7→ a · x,a > 0. (Hence µ is “the” Haar measure on G.) Find an isomorphism Φ : R→G thatis also a homeomorphism (on R we consider the additive structure).

5. Determine all invariant measures for (T;a) where a ∈ T is a root of unity.

6. Consider the compact Abelian group A2 of dyadic integers (Example 2.11) to-gether with the map

Ψ : 0,1N0 → A2

described in Exercise 2.8. Furthermore, let µ be the Bernoulli (1/2,1/2)-measure onK := 0,1N0 . Show that the push-forward measure m :=Ψ∗µ is the Haar probabil-ity measure on A2. (Hint: For n ∈ N consider the projection

πn : A2→ Z2n

onto the nth coordinate. This is a homomorphism, and let Hn := ker(πn) be its kernel.Show that m(a+Hn) = 1/2n for any a ∈ A2.)


7. Let G be a compact Abelian group, and fix k ∈ N. Consider the map ϕk : G→ G,ϕk(g) = gk, g ∈ G. This is a continuous group homomorphism since G is Abelian.Show that if ϕk is surjective, then the Haar measure is invariant under ϕk. (Hint: Usethe uniqueness of the Haar measure.)

8. Let K be a compact space with its Baire algebra Ba(K).

a) Show that Ba(K) is generated by the compact Gδ -sets.

b) Show that Ba(K) = Bo(K) if K is metrisable.

c) Show that if A⊆O with A⊆K closed and O⊆K open, there exists A⊆ A′ ⊆O′ ⊆ O with A′,O′ ∈ Ba(K), A′ closed and O′ open.

d) Let (Kn)n∈N be a sequence of nonempty compact spaces and let K := ∏n Knbe their product space. Show that Ba(K) =

⊗n Ba(Kn).

9 (Perron’s Theorem). Let A be a column-stochastic d×d-matrix, i.e., all ai j ≥ 0and ∑

di=1 ai j = 1 for all j = 1, . . . ,d. In this exercise we show: There is a nonzero

vector p≥ 0 such that Ap = p.

a) Let 1 = (1,1, · · · ,1)t and show that 1tA = 1. Conclude that there is a vector0 6= v ∈ Rd such that Av = v.

b) Show that ‖Ax‖1 ≤ ‖x‖1, where ‖x‖1 := ∑dj=1

∣∣x j∣∣ for x ∈ Rd .

c) Show that for λ > 1 the matrix λ I−A is invertible and

(λ −A)−1x =∞

∑n=0

Anxλ n+1 (x ∈ Rd).

d) Let y := |v| = (|v1| , . . . , |vd |), where v is the vector from a). Employing c),show that

‖v‖1λ −1

≤∥∥(λ −A)−1y

∥∥1 (λ > 1).

e) Fix a sequence λn 1 and consider

yn :=∥∥(λn−A)−1y

∥∥−11 (λn−A)−1y.

By compactness one may suppose that yn→ p as n→ ∞. Show that p ≥ 0,‖p‖1 = 1 and Ap = p.

10. Let X be an uncountable set, take p /∈ X and let K := X ∪p be the one-pointcompactification of X . To be precise, a set O⊆ K is open if O⊆ X , or if p ∈ O andX \O is finite.

a) Show that this indeed defines a compact topology on K.

b) Show that if f ∈ C(K), then f (x) = f (p) for all but countably many x ∈ X .

c) Prove that the singleton p is a Borel, but not a Baire set in K.

Exercises 73

11. Let K be a compact space and 0≤ µ ∈M(K). Show that the following assertionsare equivalent.

(i) supp(µ) = K.

(ii)∫

K | f | dµ = 0 implies that f = 0 for every f ∈ C(K).

(iii) ‖ f‖L∞(K,µ) = ‖ f‖∞

for every f ∈ C(K).

12. Let K be a compact space, and let µ ∈ M(K) be a regular Borel probabilitymeasure on K. Show that

supp(µ) =⋂A : A⊆ K closed, µ(A) = 1

and that µ(supp(µ)) = 1. So supp(µ) is the smallest closed subset of K with fullmeasure.

Chapter 6Recurrence and Ergodicity

In the previous chapter we have introduced the notion of a measure-preserving sys-tems (MDS) and seen many examples of such systems, so we now turn to theirsystematic study. In particular we shall define invertibility of an MDS, prove theclassical recurrence theorem of Poincare and introduce the central notion of an er-godic system.

6.1 The Measure Algebra and Invertible MDS

In a measure space (X ,Σ ,µ) we consider null-sets to be negligible. That meansthat we identify sets A,B ∈ Σ that are (µ-)essentially equal, i.e., which satisfyµ(A4B) = 0 (here A4B = (A \B)∪ (B \A) is the symmetric difference of A andB). To be more precise, we define the relation ∼ on Σ by

A∼ B Def.⇔ µ(A4B) = 0 ⇔ 1A = 1B µ-almost everywhere.

Then ∼ is an equivalence relation on Σ and the corresponding set Σ(X) := Σ/∼of equivalence classes is called the corresponding measure algebra. For a set A ∈Σ let us temporarily write [A ] for its equivalence class. It is an easy but tediousexercise to show that the set theoretic relations and (countable) operations on Σ

induce corresponding operations on the measure algebra via

[A ]⊆ [B ]Def.⇐⇒ µX(B\A) = 0, [A ]∩[B ] :=[A∩B ] , [A ]∪[B ] :=[A∪B ]

and so on. Moreover, the measure µ induces a (σ -additive) map

Σ(X)→ [0,∞], [A ] 7→ µ(A).

As for equivalence classes of functions, we normally do not distinguish notationallybetween a set A in Σ and its equivalence class [A ] in Σ(X).

75

76 6 Recurrence and Ergodicity

If the measure is finite, the measure algebra can be turned into a complete metricspace with metric given by

dX(A,B) := µ(A4B) = ‖1A−1B‖1 (A,B ∈ Σ(X)).

The set theoretic operations as well as the measure µ itself are continuous withrespect to this metric. (See also Exercise 1 and Appendix B). Frequently, the σ -algebra Σ is generated by an algebra E, i.e., Σ = σ(E) (cf. Appendix B.1). Thisproperty has an important topological implication.

Lemma 6.1 (Approximation). Let (X ,Σ ,µ) be a finite measure space and let E⊆Σ be an algebra of subsets such that σ(E) = Σ . Then E is dense in the measurealgebra, i.e., for every A ∈ Σ and ε > 0 there is E ∈ E such that dX(A,E)< ε .

Proof. This is just Lemma B.18 from Appendix B. Its proof is standard measuretheory using Dynkin systems.

Suppose that X and Y are measure spaces and ϕ : X → Y is a measurable map-ping, and consider the push-forward measure ϕ∗µX on Y . If ϕ preserves null-sets,i.e., if

µY(A) = 0 ⇒ µX(ϕ−1A) = (ϕ∗µX)(A) = 0 for all A ∈ ΣY,

the mapping ϕ induces a mapping ϕ∗ between the measure algebras defined by

ϕ∗ : Σ(Y)→ Σ(X), [A ] 7→

[ϕ−1A

].

Note that ϕ∗ commutes with all the set theoretic operations, i.e.,

ϕ∗(A∩B) = ϕ

∗A∩ϕ∗B, ϕ

∗⋃nAn =

⋃nϕ∗An . . . .

All this is in particular applicable when ϕ is measure-preserving. In this case andwhen both µX and µY are probability measures, the induced mapping ϕ∗ : Σ(Y)→Σ(X) is an isometry, since

dX(ϕ∗A,ϕ∗B) = µX([ϕ ∈ A]4[ϕ ∈ B]) = µX[ϕ ∈ (A4B)] = µY(A4B) = dY(A,B)

for A,B ∈ ΣY.

We are now able to define invertibility of an MDS.

Definition 6.2. An MDS (X;ϕ) is called invertible if the induced map ϕ∗ on Σ(X)is bijective.

We now relate this notion of invertibility to a possibly more intuitive one, basedon the following definition.

Definition 6.3. Let X and Y be measure spaces, and let ϕ : X → Y be measure-preserving. A measurable map ψ : Y → X is called an essential inverse to ϕ if

6.2 Recurrence 77

ϕ ψ = id µY-a.e. and ψ ϕ = id µX-a.e.

The mapping ϕ is called essentially invertible if it has an essential inverse.

Exercise 2 helps with computations with essential inverses. In particular, it fol-lows that if X = Y and ψ is an essential inverse of ϕ , then ψn is an essential inverseto ϕn for every n ∈ N. See also Exercise 3 for an equivalent characterisation ofessential invertibility.

Lemma 6.4. An essential inverse is unique up to equality almost everywhere. If ψ :Y → X is an essential inverse to a measure-preserving map ϕ : X → Y , then ψ isalso measure-preserving. Moreover, the induced map ϕ∗ : Σ(Y)→ Σ(X) is bijectivewith inverse (ϕ∗)−1 = ψ∗.

Proof. Let ψ1,ψ2 be essential inverses of ϕ . Observe that

ϕ−1[ψ1 6= ψ2 ]⊆ [ψ1 ϕ 6= id ]∪[ψ2 ϕ 6= id ] .

By assumption and since ϕ is measure-preserving, it follows that [Ψ1 6=Ψ2 ] is anull-set. Now suppose that ψ is an essential inverse of ϕ . Then, since µY = ϕ∗µ ,

µY(ψ−1(A)) = µ(ϕ−1

ψ−1(A)) = µ

((ψ ϕ)−1A

)= µ(A)

since (ψ ϕ)−1A = A modulo a µ-null set. The last assertion follows analogously.

A direct consequence of Lemma 6.4 is the following.

Corollary 6.5. An MDS (X;ϕ) is invertible if ϕ is essentially invertible.

Remark 6.6. The converse of Corollary 6.5 does not hold in general: Take X =0,1, Σ = /0,X and ϕ(x) = 1 for x = 0,1. However, if the probability space isa so-called standard Borel space, then invertibility of the MDS (i.e., invertibility ofϕ∗) is equivalent to the essential invertibility of ϕ , see Chapter 12.

Examples 6.7. The baker’s transformation and every group rotation is invertible(use Corollary 6.5). The tent map and the doubling map systems are not invert-ible (Exercise 4). A two-sided Bernoulli system is invertible but a one-sided is not(Exercise 5).

6.2 Recurrence

Recall that a point x in a TDS is recurrent if it returns eventually to each of itsneighbourhoods. In the measure theoretic setting pointwise notions are meaningless,due to the presence of null-sets. So, given an MDS (X;ϕ) in place of points of X wehave to use sets of positive measure, i.e., “points” of the measure algebra Σ(X). Weadopt that view with the following definition.


Definition 6.8. Let (X;ϕ) be an MDS.

a) A set A ∈ ΣX is called recurrent if almost every point of A returns to A aftersome time, or equivalently,

A⊆⋃n≥1

ϕ∗nA (6.1)

in the measure algebra Σ(X).

b) A set A ∈ ΣX is infinitely recurrent if almost every point of A returns to Ainfinitely often, or equivalently,

A⊆⋂n≥1

⋃k≥n

ϕ∗kA (6.2)

in the measure algebra Σ(X).

Here is an important characterisation.

Lemma 6.9. Let (X;ϕ) be an MDS. Then the following statements are equivalent.

(i) Every A ∈ ΣX is recurrent.

(ii) Every A ∈ ΣX is infinitely recurrent.

(iii) For every /0 6= A ∈ Σ(X) there exists n ∈ N such that A∩ϕ∗nA 6= /0.

Proof. The implication (ii)⇒ (i) is clear. For the converse take A∈ ΣX and apply ϕ∗

to (6.1) to obtain ϕ∗A⊆⋃

n≥2 ϕ∗nA. Putting this back into (6.1) we obtain

A⊆⋃n≥1

ϕ∗nA = ϕ

∗A∪⋃n≥2

ϕ∗nA⊆

⋃n≥2

ϕ∗nA.

Continuing in this manner, we see that⋃

n≥k ϕ∗nA is independent of k ≥ 1 and thisleads to (ii).(i)⇒ (iii): To obtain a contradiction suppose that A∩ϕ∗nA = /0 for all n ≥ 1. Thenintersecting with A in (6.1) yields

A = A∩⋃n≥1

ϕ∗nA =

⋃n≥1

A∩ϕ∗nA = /0.

(iii)⇒ (i): Let A ∈ ΣX and consider the set B :=(⋂

n≥1 ϕ∗nAc)∩A. Then for every

n ∈ N we obtainB∩ϕ

∗nB⊆ ϕ∗nAc∩ϕ

∗nA = /0

in the measure algebra. Now (iii) implies that B = /0, i.e., (6.1).

As a consequence of this lemma we obtain the famous recurrence theorem ofH. Poincare.

Theorem 6.10 (Poincare). Every MDS (X;ϕ) is (infinitely) recurrent, i.e., everyset A ∈ ΣX is infinitely recurrent.

6.2 Recurrence 79

Proof. Let A ∈ Σ(X) be such that A∩ϕ∗nA = /0 for all n ≥ 1. Thus for n > m ≥ 0we have

ϕ∗mA∩ϕ

∗nA = ϕ∗m(A∩ϕ

∗(n−m)A) = /0.

This means that the sets (ϕ∗nA)n∈N0 are (essentially) disjoint. (A set A with thisproperty is called wandering.) On the other hand, all sets ϕ∗nA have the same mea-sure, and since µX is finite, this measure must be zero. Hence A = /0, and the MDS(X;ϕ) is infinitely recurrent by the implication (iii)⇒ (i) of Lemma 6.9.

Remark. Poincare’s theorem is false for measure-preserving transformations oninfinite measure spaces. Just consider X = R, the shift ϕ(x) = x+ 1 (x ∈ R) andA = [0,1].

Poincare’s recurrence theorem has caused some irritation among scholars sinceits consequences may seem to be counterintuitive. To make this understandable,consider once again our example from Chapter 1, the ideal gas in a container. Sup-pose that we start observing the system after having collected all gas molecules inthe left half of the box (e.g., by introducing a wall first, pumping all the gas to theleft and then removing the wall), we expect that the gas diffuses in the whole boxand eventually is distributed uniformly within it. It seems implausible to expect thatafter some time the gas molecules again return by their own motion entirely to theleft half of the box. However, since the initial state (all molecules in the left half)has (quite small but nevertheless) positive probability, the Poincare theorem statesthat this will happen, see Figure 6.1.

−→ ?−→

Fig. 6.1 What happens after removing the wall?

Let us consider a more mathematical example which goes back to the Ehren-fests [Ehrenfest and Ehrenfest (1912)], quoted from [Petersen (1989), p.35]. Sup-pose that we have n balls, numbered from 1 to n, distributed somehow over twourns. In each step, we determine randomly a number k between 1 and n, take theball with number k out of the urn where it is at that moment, and put it into the otherurn. Initially we start with all n balls contained in box 1.

The mathematical model of this experiment is that of a Markov shift over L :=0, . . . ,n. The “state sequence” ( j0, j1, j2 . . .) records the number jm ∈ L of ballsin urn 1 after the mth step. To determine the transition matrix P note that if there arei≥ 1 balls in urn 1 then the probability to decrease its number to i−1 is i/n; and ifi < n then the probability to increase the number from i to i+1 is 1− i/n. Hence wehave P = (pi j)0≤i, j≤n with


pi j =

0 if |i− j| 6= 1in if j = i−1n−i

n if j = i+1(0≤ i, j ≤ n).

The a priori probability to have j balls in urn 1 is certainly

p j := 2−n(

nj

),

and it is easy to show that p = (p0, . . . , pn) is indeed a fixed probability row vector,i.e., that pP = p (Exercise 6). (Since P is irreducible, Perron’s theorem implies thatp is actually the unique fixed probability vector.)

Now, starting with all balls contained in urn 1 is exactly the event A :=( jm)m≥0 ∈ LN0 : j0 = n. Clearly µ(A) = 1/2n > 0, whence Poincare’s theoremtells us that in almost all sequences x ∈ A the number n occurs infinitely often.

If the number of balls n is large, this result may look again counterintuitive.However, we shall show that we will have to wait a very long time until the systemcomes back to A for the first time. To make this precise, we return to the generalsetting.

Let (X;ϕ) be an MDS, fix A ∈ ΣX and define

B0 :=⋂n≥1

ϕ∗nAc, B1 := ϕ

∗A, Bn := ϕ∗nA∩

⋂n−1

k=1ϕ∗kAc (n≥ 2).

The sequence (Bn)n∈N is the “disjointification” of (ϕ∗nA)n∈N, and (Bn)n∈N0 is adisjoint decomposition of X : the points from B0 never reach A and the points from Bnreach A for the first time after n steps. Recurrence of A just means that A⊆

⋃n≥1 Bn,

i.e., B0 ⊆ Ac (almost everywhere). If we let

An := Bn∩A (n ∈ N), (6.3)

then (An)n≥1 is an essentially disjoint decomposition of A. We note the followingtechnical lemma for later use.

Lemma 6.11. Let (X;ϕ) be an MDS, X = (X ,Σ ,µ), and let A,B ∈ Σ . Then

µ(B) =n

∑k=1

µ

(A∩

k−1⋂j=1

ϕ∗ jAc∩ϕ

∗kB

)+ µ

(n−1⋂k=0

ϕ∗kAc∩ϕ

∗nB

)(6.4)

for n≥ 1.

Proof. Using the ϕ-invariance of µ we write

µ(B) = µ(ϕ∗B) = µ(A∩ϕ∗B)+µ(Ac∩ϕ

∗B)

and this is (6.4) when n = 1 and with ∩0j=1ϕ∗ jAc =: X . Doing the same again with

the second summand yields

6.2 Recurrence 81

µ(B) = µ(ϕ∗B) = µ(A∩ϕ∗B)+µ(Ac∩ϕ

∗B)

= µ(A∩ϕ∗B)+µ(ϕ∗Ac∩ϕ

∗2B)

= µ(A∩ϕ∗B)+µ(A∩ϕ

∗Ac∩ϕ∗2B)+µ(Ac∩ϕ

∗Ac∩ϕ∗2B)

and this is (6.4) for n = 2. An induction argument concludes the proof.

Suppose that µ(A)> 0. On the set A we consider the induced σ -algebra

ΣA :=

B⊆ A : B ∈ Σ⊆ P(A)

and thereon the induced probability measure µA defined by

µA : ΣA→ [0,1], µA(B) =µ(B)µ(A)

(B ∈ ΣA).

Then µA(B) is the conditional probability for B given A. Define the function

nA : A→ N, nA(x) = n if x ∈ An (n≥ 1).

The function nA is called the time of first return to A. Clearly, nA is ΣA-measurable.The following theorem describes its expected value (with respect to µA).

Theorem 6.12. Let (X;ϕ) be an MDS, X = (X ,Σ ,µ), and let A ∈ Σ with µ(A)> 0.Then ∫

AnAdµA =

µ(⋃

n≥0 ϕ∗nA)

µ(A). (6.5)

Proof. We specialise B = X in Lemma 6.11. Note that

A∩k−1⋂j=1

ϕ∗ jAc =

∞⋃j=k

A j (k ≥ 1)

since A is recurrent, by Poincare’s theorem. Hence by (6.4) in Lemma 6.11 we obtain

µ(X) =n

∑k=1

∞

∑j=k

µ(A j)+µ

(n−1⋂k=0

ϕ∗kAc

).

Letting n→ ∞ yields

1 =∞

∑k=1

∞

∑j=k

µ(A j)+µ

(∞⋂

n=0

ϕ∗nAc

),

and interchanging the order of summation we arrive at

µ

(∞⋃

n=0

ϕ∗nA

)= µ(X)−µ

(∞⋂

n=0

ϕ∗nAc

)=

∞

∑j=1

jµ(A j) = µ(A)∫

AnA dµA.


Dividing by µ(A) proves the claim.

Let us return to our ball experiment. If we start with 100 balls, all contained inurn 1, the probability of A is µ(A) = 2−100. If we knew that

∞⋃n=0

τ∗nA = X , (6.6)

then we could conclude from (6.5) in Theorem 6.12 that the expected waiting timefor returning to A is 1/µ(A) = 2100 steps. Clearly, our lifetimes and that of theuniverse would not suffice for the waiting, even if we do one step every millisecond.

However, this reasoning builds on (6.6), a property not yet established. This willbe done in the next section.

6.3 Ergodicity

Ergodicity is the analogue of minimality in measurable dynamics. Let (X;ϕ) be anMDS and let A∈ ΣX. As in the topological case, we call A invariant if A⊆ [ϕ ∈ A ].Since A and [ϕ ∈ A ] have the same measure, and µ is finite, A ∼ [ϕ ∈ A ], i.e., A =ϕ∗A in the measure algebra. The same reasoning applies if [ϕ ∈ A ]⊆ A, and hencewe have established the following lemma.

Lemma 6.13. Let (X;ϕ) be an MDS and A ∈ ΣX. Then the following assertions areequivalent.

(i) A is invariant, i.e., A⊆ ϕ∗A.

(ii) A is strictly invariant, i.e., A = ϕ∗A.

(iii) ϕ∗A⊆ A.

(iv) Ac is (strictly) invariant.

The following notion is analogous to minimality in the topological case.

Definition 6.14. An MDS (X;ϕ) is called ergodic if every invariant set is essen-tially equal either to /0 or to X .

Since invariant sets are the fixed points of ϕ∗ in the measure algebra, a system isergodic if and only if ϕ∗ on Σ(X) has only the trivial fixed points /0 and X .

In contrast to minimality in the topological case (Theorem 3.5), an MDS neednot have ergodic subsystems: just consider X = [0,1] with the Lebesgue measureand ϕ = idX , the identity. Another difference to the topological situation is thatthe presence of a nontrivial invariant set A does not only lead to a restriction ofthe system (to A), but to a decomposition X = A∪Ac. So ergodicity could also betermed indecomposability.

The following result characterises ergodicity.

6.3 Ergodicity 83

Lemma 6.15. Let (X;ϕ) be an MDS. Then the following statements are equivalent.

(i) The MDS (X;ϕ) is ergodic.

(ii) For every /0 6= A ∈ Σ(X) one has⋂n≥0

⋃k≥n

ϕ∗kA = X .

(iii) For every /0 6= A ∈ Σ(X) one has⋃n≥0

ϕ∗nA = X .

(iv) For every /0 6= A,B ∈ Σ(X) there is n≥ 1 such that

ϕ∗nA∩B 6= /0.

Proof. (i)⇒ (ii): For a set A ∈ ΣX and n≥ 0 the set

A(n) :=⋃k≥n

ϕ∗kA

satisfies ϕ∗A(n) ⊆ A(n) and hence is an invariant set. If A 6= /0 in the measure al-gebra then A(n) 6= /0, too, and by ergodicity, A(n) = X for every n ≥ 0. Taking theintersection yields (ii).(ii)⇒ (iii): This follows from

X =⋂n≥0

⋃k≥n

ϕ∗kA⊆

⋃k≥0

ϕ∗kA⊆ X .

(iii)⇒ (iv): By hypothesis, A(0) = X , and hence A(1) = ϕ∗A(0) = ϕ∗X = X , too. Nowsuppose that B∩ϕ∗nA = /0 for all n≥ 1. Then

/0 =⋃n≥1

B∩ϕ∗nA = B∩

⋃n≥1

ϕ∗nA = B∩A(1) = B∩X = B.

(iv)⇒ (i): Let A ∈ ΣX be an invariant set. Then ϕ∗nA = A for every n ∈N and hencefor B := Ac we have

B∩ϕ∗nA = B∩A = Ac∩A = /0

for every n≥ 1. By hypothesis (iv), this implies that A = /0 or Ac = B = /0 in Σ(X).

The implication (i)⇒ (ii) of Lemma 6.15 says that in an ergodic system each set ofpositive measure is visited infinitely often by almost every point. Note also that (iv)is an analogue of condition b) in Proposition 2.29 for MDSs.

Let us now look at some examples. The implication (iv)⇒ (i) of Lemma 6.15combined with the following result shows that a Bernoulli shift is ergodic.


Proposition 6.16. Let (W +k ,Σ ,µ;τ) = B(p0, . . . , pk−1) be a Bernoulli shift. Then

limn→∞

µ(τ∗nA∩B) = µ(A)µ(B)

for all A,B ∈ Σ .

Proof. We use the notation of Example 5.1.5. Let E denote the algebra of cylindersets on W +

k = LN0 . If B ∈ E then B = B0×∏k≥n0L for some n0 ∈ N and B0 ⊆ Ln0 .

Then τ∗nA = Ln×A and

µ(τ∗nA∩B) = µ(B0×A) = µ(B)µ(A)

if n ≥ n0, since µ =⊗

n∈N0ν is a product measure. If B ∈ Σ is general, by the

Approximation Lemma 6.1 one can find a sequence (Bm)m ⊆ E of cylinder setssuch that dX(Bm,B)→ 0 as m→ ∞. Since

|µ(τ∗nA∩B)−µ(τ∗nA∩Bm)| ≤ dX(B,Bm)

(Exercise 1), one has µ(τ∗nA∩Bm)→ µ(τ∗nA∩B) as m→ ∞ uniformly in n ∈ N.Hence one can interchange limits to obtain

limn

µ(τ∗nA∩B) = limm

limn

µ(τ∗nA∩Bm) = limm

µ(A)µ(Bm) = µ(A)µ(B)

as claimed.

Proposition 6.16 actually says that the Bernoulli shift is strongly mixing, a propertythat will be studied in Chapter 9. Here are other examples of ergodic systems.

Examples 6.17. a) A Markov shift with an irreducible transition matrix is er-godic. This is Theorem 8.14 below.

b) Every minimal rotation on a compact group is ergodic. For the special caseof the torus, this is Proposition 7.15, the general case is treated in in Theorem10.13.

c) The Gauss map (see page 63) is ergodic. A proof can be found in[Einsiedler and Ward (2011), Sect. 3.2].

The next result tells that in an ergodic system the average time of first return to a(nontrivial) set is inverse proportional to its measure.

Corollary 6.18 (Kac). Let (X;ϕ) be an ergodic MDS, X = (X ,Σ ,µ), and A ∈ Σ

with µ(A)> 0. Then for the expected return time to A one has∫A

nA dµA =1

µ(A).

Proof. Since the system is ergodic, the implication (i)⇒ (iii) of Lemma 6.15 showsthat X =

⋃n≥0 ϕ∗nA. Hence the claim follows from Theorem 6.12.

6.3 Ergodicity 85

Let us return to our ball experiment. The transition matrix P described on page 80is indeed irreducible, whence by Example 6.17.a) the corresponding Markov shift isergodic. Therefore we can apply Kac’s theorem concluding that indeed the expectedreturn time to our initial state A is 2100 for our choice n = 100.

Supplement: Induced Transformation and Kakutani–RokhlinLemma

We conclude this chapter with two results of some importance in ergodic theory.However, we shall not immediately use them and one may skip this section on afirst reading.

The Induced Transformation

Let (X;ϕ) be an MDS, X = (X ,Σ ,µ), and let A ∈ Σ with µ(A) > 0. Define theinduced transformation ϕA by

ϕA : A→ A, ϕA(x) = ϕnA(x)(x) (x ∈ A).

That means that ϕA ≡ ϕn on An, n ∈ N.

Theorem 6.19. Let (X;ϕ) be an MDS, X = (X ,Σ ,µ), and let A ∈ Σ be a set ofpositive measure. Then the induced transformation ϕA is measurable with respect toΣA and preserves the induced measure µA.

Proof. Take B ∈ Σ ,B⊆ A. Then

[ϕA ∈ B ] =⋃n≥1

An∩[ϕn ∈ B ] ,

which shows that ϕA is indeed ΣA-measurable. To see that ϕA preserves µ , we useLemma 6.11. Note that since B⊆ A,

A∩k−1⋂j=1

ϕ∗ jAc∩ϕ

∗kB = Ak ∩ϕ∗kB (k ≥ 1)

andn⋂

k=0

ϕ∗kAc∩ϕ

∗nB⊆ Ac∩Bn.

Since µ is finite, ∑n µ(Bn)< ∞ and hence µ(Ac∩Bn)→ 0 as n→∞. Letting n→∞

in (6.4) we obtain


µ(B) =∞

∑k=1

µ(Ak ∩ϕ∗kB) =

∞

∑k=1

µ(Ak ∩[ϕA ∈ B ]) = µ [ϕA ∈ B ] ,

and that was to be proved.

By Exercise 8, if the original MDS is ergodic, then (A,ΣA,µA;ϕA) is ergodic, too.

The Kakutani–Rokhlin Lemma

The following theorem is due to Kakutani [Kakutani (1943)] and Rokhlin[Rohlin (1948)]. Our presentation is based on [Rosenthal (1988)].

Theorem 6.20 (Kakutani–Rokhlin). Let (X;ϕ) be an ergodic MDS, let A∈ Σ withµ(A)> 0 and n ∈ N. Then there is a set B ∈ Σ such that

B,ϕ∗B, . . . ,ϕ∗(n−1)B are pairw. disjoint and µ

(n−1⋃j=0

ϕ∗ jB)≥ 1−nµ(A).

Before we give the proof, let us think about the meaning of Theorem 6.20. ARokhlin tower of height n ∈ N is a finite sequence of pairwise disjoint sets of theform

B, ϕ∗B, ϕ

∗2B, . . . , ϕ∗(n−1)B,

and B is called the ceiling of the tower. Since

ϕ∗kB∩ϕ

∗ jB = ϕj∗(ϕ∗(k− j)B∩B) (1≤ j < k ≤ n−1),

in order to have a Rokhlin tower it suffices to show that ϕ∗ jB∩ B = /0 for j =1, . . . ,n−1.

Since the sets in a Rokhlin tower all have the same measure and µ is a probabilitymeasure, one has the upper bound

µ(B)≤ 1n

for the ceiling B of a Rokhlin tower of height n. On the other hand, the secondcondition in the Kakutani–Rokhlin result is equivalent to

µ(B)≥ 1n−µ(A).

So if µ(A) is small, then µ(B) is close to its maximal value 1/n.

Corollary 6.21. Let (X;ϕ) be an ergodic MDS such that there are sets with arbi-trarily small positive measure. Then for each ε > 0 and n ∈ N there is a Rokhlintower B, ϕ∗B, ϕ∗2B, . . . , ϕ∗(n−1)B such that

6.3 Ergodicity 87

µ

(n−1⋃j=0

ϕ∗ jB)> 1− ε.

Let us turn to the proof of Theorem 6.20. Define

A0 := A, Ak+1 := ϕ∗Ak ∩Ac (k ≥ 0)

Then An = ϕ∗nA∩⋂n−1

j=0 ϕ∗ jAc is the set of points in Ac that hit A for the first timein the nth step. Then (A j) j≥0 is a pairwise disjoint sequence of sets. Moreover, wenote that

⋃n≥1 An = Ac. This follows from

A0∪⋃n≥1

An =⋃j≥0

ϕ∗ jA = X

in Σ(X) by ergodicity. Now, define the set

B j :=⋃k≥1

Ank+ j ( j = 0, . . . ,n−1).

Then the B j are pairwise disjoint, and B1∪·· ·∪Bn−1 = Ac. We claim that

ϕ∗ jB0 = B j ∪C j for some C j ⊆ A∪ϕ

∗A∪·· ·∪ϕ∗( j−1)A (6.7)

for j = 0, . . . ,n−1. This is an easy induction argument, the induction step being

ϕ∗( j+1)B0 = ϕ

∗(B j ∪C j) =⋃k≥1

ϕ∗Ank+ j ∪ϕ

∗C j

=⋃k≥1

Ank+ j+1∪⋃k≥1

(ϕ∗Ank+ j ∩A)∪ϕ∗C j = B j+1∪C j+1

with

C j+1 = ϕ∗C j ∪

⋃k≥1

(ϕ∗Ank+ j ∩A)⊆ ϕ∗( j−1⋃

k=0

ϕ∗kA)∪A =

j⋃k=0

ϕ∗kA.

From (6.7) it follows immediately that

B0∩ϕ∗ jB0 = /0 ( j = 1, . . . ,n−1)

and hence B0 is the ceiling of a Rokhlin tower of height n. Finally, (6.7) implies that

n−1⋃j=0

ϕ∗ jB0 ⊇

n−1⋃j=0

B j =⋃k≥n

Ak,

whence

88 6 Recurrence and Ergodicity(n−1⋃j=0

ϕ∗ jB0

)c

⊆n−1⋃k=0

Ak ⊆n−1⋃k=0

ϕ∗kA.

Hence we obtain

µ

(n−1⋃j=0

ϕ∗ jB0

)≥ 1−µ

(n−1⋃k=0

ϕ∗kA)≥ 1−nµ(A).

Exercises

1. Let (X ,Σ ,µ) be a finite measure space. Show that

dX(A,B) := µ(A4B) = ‖1A−1B‖L1 (A,B ∈ Σ(X))

defines a complete metric on the measure algebra Σ(X). Show further that

a) dX(A∩B,C∩D)≤ dX(A,C)+dX(B,D);

b) dX(Ac,Bc) = dX(A,B);

c) dX(A\B,C \D)≤ dX(A,C)+dX(B,D);

d) dX(A∪B,C∪D)≤ dX(A,C)+dX(B,D);

e) |µ(A)−µ(B)| ≤ dX(A,B).

In particular, the mappings

(A,B) 7→ A∩B, A\B, A∪B and A 7→ Ac, µ(A)

are continuous with respect to dX. Show also that

(An)n ⊆ Σ(X), An A ⇒ dX(An,A) 0.

2. Let X0,X1,X2 be probability measure spaces. Let

f0,g0 : X0→ X1 and f1,g1 : X1→ X2

be measurable such that f0 = g0 and f1 = g1 almost everywhere and f0 preservesnull sets. Prove the following assertions:

a) g0 preserves null sets, too;

b) f1 f0 = g1 g0 almost everywhere;

c) f ∗0 = g∗0 as mappings Σ(X1)→ Σ(X0).

3. Let X and Y be probability spaces, and let ϕ : X → Y be measure-preserving.Show that the following assertions are equivalent.

(i) ϕ is essentially invertible;

6.3 Ergodicity 89

(ii) There are sets A ∈ ΣX, A′ ∈ ΣY such that µX(A) = 1 = µY(A′), and ϕ mapsA bijectively onto A′ with measurable inverse.

4. Show that the tent map and the doubling map are not invertible.

5. Show that if k ≥ 2, the one-sided Bernoulli shift B(p0, . . . , pk−1) on W +k is not

invertible.

6. Show that pP = p, where P, p are the transition matrix and the probability vectordefined above in connection with the ball experiment (see page 80).

7 (Recurrence in Random Literature). The book “Also sprach Zarathustra” byF. Nietzsche consists of roughly 680 000 characters, including blanks. Suppose thatSnoopy—known as an immortal philosopher—is typing randomly on his typewriterhaving 90 symbols. Show that he will almost surely type Nietzsche’s book infinitelyoften (thereby proving correct one of Nietzsche’s most mysterious theories).

Fig. 6.2 Philosopher at work.

Show also that if Snoopy had been typing since eternity, he almost surely alreadywould have typed the book infinitely often.

8. Let (X;ϕ) be an ergodic MDS, X = (X ,Σ ,µ), and let A ∈ Σ with µ(A) > 0.Show that the induced transformation ϕA is ergodic, too. (Hint: Let the sets (An)n≥1be defined as in (6.3) and let B⊆A with ϕ∗AB=B. Show by induction over n∈N thatϕ∗nB∩A⊆ B; e.g., use the identity ϕ∗nB∩A j = ϕ∗ j(ϕ∗n− jB∩A)∩A j for j < n.)

Chapter 7The Banach Lattice Lp and the KoopmanOperator

Let X and Y be two measure spaces and suppose that we are given a measurablemap ϕ : Y→ X . The Koopman operator T = Tϕ , defined by

Tϕ f = f ϕ

then maps a measurable function f on X to a measurable functions Tϕ f on Y. If, inaddition, ϕ respects null sets, i.e., we have

µX(A) = 0 ⇒ µY[ϕ ∈ A ] = 0 (A ∈ ΣX),

then

f = g µX-almost everywhere ⇒ Tϕ f = Tϕ g µY-almost everywhere

for any pair f ,g of scalar valued functions. Hence Tϕ acts actually on equivalenceclasses of measurable functions (modulo equality µX-almost everywhere) via

Tϕ [ f ] :=[

Tϕ f]=[ f ϕ ] .

It is an easy but tedious exercise to show that all the common operations for func-tions induce corresponding operations on equivalence classes, so we usually do notdistinguish between functions and their equivalence classes. Moreover, the Koop-man operator Tϕ commutes with all these operations:

Tϕ( f +g) = Tϕ f +Tϕ g, Tϕ(λ f ) = λTϕ f ,∣∣Tϕ f

∣∣= Tϕ | f | ....

Note thatTϕ 1A = 1A ϕ = 1[ϕ∈A ] = 1ϕ∗A (A ∈ ΣX).

So Tϕ acts on equivalence classes of characteristic functions as ϕ∗ acts on the mea-sure algebra Σ(X) (cf. Chapter 6.1).

Suppose now in addition that ϕ is measure-preserving, i.e., ϕ∗µY = µX. Then forany measurable function f on X and 1≤ p < ∞ we obtain (by Appendix B.5)

91

92 7 The Banach Lattice Lp and the Koopman Operator∥∥Tϕ f∥∥p

p =∫

Y| f ϕ|p dµY =

∫X| f |p d(ϕ∗µY) =

∫X| f |p dµX = ‖ f‖p

p .

This shows thatTϕ : Lp(X)→ Lp(Y) (1≤ p < ∞)

is a linear isometry. The same is true for p = ∞ (Exercise 1). Moreover, by a similarcomputation ∫

YTϕ f =

∫X

f ( f ∈ L1(X)). (7.1)

In Chapter 4 we associated with a TDS (K;ϕ) the commutative C∗-algebra C(K)and the Koopman operator T = Tϕ acting on it. In the case of an MDS (X;ϕ) itis natural to investigate the associated Koopman operator Tϕ on each of the spacesLp(X), 1≤ p≤ ∞.

For the case p = ∞ we note that L∞(X) is, like C(K) in Chapter 4, a commutativeC∗-algebra, and Koopman operators are algebra homomorphisms. (This fact willplay an important role in Chapter 12.) For p 6=∞, however, the space Lp is in generalnot closed under multiplication of functions, so we have to look for a new structuralelement preserved by the operator Tϕ ; this will be the lattice structure.

Since we do not expect order theoretic notions to be common knowledge, weshall introduce the main abstract concepts in the next section and then proceed withsome more specific facts for Lp-spaces. (The reader familiar with Banach latticesmay skip these parts and proceed directly to Section 7.3.) Our intention here, how-ever, is mostly terminological. This means that Banach lattice theory provides aconvenient framework to address certain features common to Lp- as well as C(K)-spaces, but no deep results from the theory are actually needed. The unexperiencedreader may without harm stick to these examples whenever we use abstract termi-nology. For a more detailed account we refer to the monograph [Schaefer (1974)].

7.1 Banach Lattices and Positive Operators

A lattice is a partially ordered set (V,≤) such that

x ∨ y := supx,y and x ∧ y := infx,y

exist for all x,y ∈ V . (Here supA denotes the least upper bound or supremum ofthe set A ⊆ V , if it exists. Likewise, infA is the greatest lower bound, or infimumof A.) A lattice is called complete if every nonempty subset has a supremum and aninfimum.

A subset D ⊆ V of a lattice (V,≤) is called ∨ -stable (∧ -stable) if a,b ∈ Dimplies a ∨ b∈D (a ∧ b∈D). A subset that is ∨ -stable and ∧ -stable is a sublattice.If V is a lattice and A⊆V is a nonempty set, then A has a supremum if and only if

the ∨ -stable set

7.1 Banach Lattices and Positive Operators 93a1 ∨ a2 . . . ∨ an : n ∈ N, a1, . . . ,an ∈ A

has a supremum, and in this case these suprema coincide. If (V,≤) and (W,≤) arelattices, then a map Θ : V →W such that

Θ(x ∨ y) = (Θx) ∨ (Θy) and Θ(x ∧ y) = (Θx) ∧ (Θy)

is called a homomorphism of lattices. If Θ is bijective, then also Θ−1 is a homo-morphism, and Θ is called a lattice isomorphism.

Examples 7.1. 1) The prototype of a complete lattice is the extended real lineR := [−∞,∞] with the usual order. It is lattice isomorphic with the closed unitinterval [0,1] via the isomorphism x 7→ 1

2 +1π

arctan(x).

2) For a measure space X = (X ,Σ ,µ) we denote by

L0 := L0(X;R)

the set of all equivalence classes of measurable functions f : X→ [−∞,∞],and define a partial order by

[ f ]≤ [g ] Def.⇔ f ≤ g µX-almost everywhere.

The ordered set L0 is a lattice, since for each two elements f ,g ∈ L0 the supre-mum and infimum is given by

( f ∨ g)(x) = max

f (x),g(x), ( f ∧ g)(x) = min

f (x),g(x)

(x ∈X).

(To be very precise, this means that the supremum f ∨ g is represented bythe pointwise supremum of representatives of the equivalence classes f andg.) Analogously we see that in L0 the supremum (infimum) of every sequenceexists and is represented by the pointwise supremum of representatives. How-ever, it is even true that the lattice L0(X;R) is complete (Exercise 10).

3) The subsets L0(X;R) and Lp(X;R) for 1≤ p≤ ∞ are sublattices of L0(X;R).4) If X is a finite measure space, then the measure algebra Σ(X) is a lattice (with

respect to the obvious order) and

Σ(X)→ L∞(X), [A] 7→ [1A] (A ∈ ΣX)

is an injective lattice homomorphism. By Exercise 13, the lattice Σ(X) is com-plete.The name “measure algebra” derives from the fact that Σ(X) is not just a(complete) lattice, but a Boolean algebra, cf. Section 12.2 below.

The lattice L0(X;R) does not show any reasonable algebraic structure since thepresence of the values ±∞ prevents a global definition of a sum. This problem van-ishes when one restricts to the sublattice Lp(X;R) for p = 0 or 1≤ p≤ ∞, which is

94 7 The Banach Lattice Lp and the Koopman Operator

also a real vector space. Moreover, the lattice and vector space structures are con-nected by the rules

f ≤ g ⇒ f +h≤ g+h, c · f ≤ c ·g, −g≤− f (7.2)

for all f ,g,h∈ Lp(X;R) and c≥ 0. A lattice V which is also a real vector space suchthat (7.2) holds is called a (real) vector lattice. In any vector lattice one can define

| f | := f ∨ (− f ), f+ := f ∨ 0, f− := (− f ) ∨ 0,

which—in our space Lp(X;R)—all coincide with the respective pointwise opera-tions. In any real vector lattice X the following formulae hold:∣∣| f |∣∣= | f |, (7.3a)| f +g| ≤ | f |+ |g|, |c f |= |c| · | f |, (7.3b)∣∣| f |− |g|∣∣≤ | f −g|, (7.3c)

f = f+− f−, | f |= f++ f−, (7.3d)

f ∨ g = 12 ( f +g+ | f −g|), f ∧ g = 1

2 ( f +g−| f −g|), (7.3e)( f ∨ g)+h = ( f +h) ∨ (g+h), ( f ∧ g)+h = ( f +h) ∧ (g+h), (7.3f)| f ∨ g− f1 ∨ g1| ≤ | f − f1|+ |g−g1|, (7.3e)| f ∧ g− f1 ∧ g1| ≤ | f − f1|+ |g−g1|, (7.3g)

for all f , f1,g,g1,h ∈ V and c ∈ R. (Note that these are easy to establish for thespecial case V = R and hence carry over immediately to the spaces Lp(X;R).)

In a real vector lattice V an element f ∈V is called positive if f ≥ 0. The positivecone is the set V+ := f ∈ V : f ≥ 0 of positive elements. If f ∈ V then f+ andf− are positive, and hence (7.3d) yields that V =V+−V+.

So far, we dealt with real vector spaces. However, for the purpose of spectraltheory (which plays an important role in the study of Koopman operators) it is es-sential to work with the complex Banach spaces Lp(X) = Lp(X;C). Any functionf ∈ Lp(X;C) can be uniquely written as

f = Re f + i Im f

with real-valued functions Re f , Im f ∈ Lp(X;R). Hence Lp(X) decomposes as

Lp(X) = Lp(X;R)⊕ iLp(X;R).

Furthermore, the absolute value mapping | · | has an extension to Lp(X) that satisfies

| f |= sup

Re(c f ) : c ∈ C, |c|= 1,

cf. Exercise 2. Finally, the norm on Lp(X) satisfies

7.1 Banach Lattices and Positive Operators 95

| f | ≤ |g| ⇒ ‖ f‖p ≤ ‖g‖p .

This gives a key for the general definition of a complex Banach lattice[Schaefer (1974), page 133].

Definition 7.2. A complex Banach lattice is a complex Banach space E such thatthere is a real-linear subspace ER together with an ordering ≤ of it, and a mapping|·| : E→ E, called absolute value or modulus, such that the following holds.

1) E = ER⊕ iER as real vector spaces.

The projection onto the first component is denoted by Re : E → ER and called thereal part; and Im f := −Re(i f ) is called the imaginary part; hence f = Re f +i Im f is the canonical decomposition of f .

2) (ER,≤) is a real vector lattice.

3) | f |= supReeπit f : t ∈Q= supt∈Q(cosπt)Re f − (sinπt) Im f .

4) | f | ≤ |g| ⇒ ‖ f‖ ≤ ‖g‖.

The set E+ = f ∈ ER : f ≥ 0 is called the positive cone of E. If we write f ≤ gwith f ,g∈ E, we tacitly suppose that f ,g∈ ER. For f ∈ E we call f := Re f − i Im fits conjugate.

Examples 7.3. 1) If X is a measure space, then Lp(X) is a complex Banach latticefor each 1≤ p≤ ∞.

2) If K is a compact space, then C(K) is a complex Banach lattice (Exercise 3).

The following lemma lists some properties of Banach lattices, immediate for ourstandard examples. The proof for general Banach lattices is left as Exercise 4.

Lemma 7.4. Let E be a complex Banach lattice. Then the following assertions holdfor all f ,g ∈ E and α ∈ C.

a) f +g = f +g and α f = α f .

b)∣∣ f ∣∣= | f | ≥ |Re f | , |Im f | ≥ 0.

c) f ∈ ER ⇒ | f |= f ∨ (− f ).

d) f ≥ 0 ⇔ | f |= f .

e) | f +g| ≤ | f |+ |g| and || f |− |g|| ≤ | f −g| and |α f |= |α| | f |.f) ‖| f |‖= ‖ f‖ and ‖| f |− |g|‖ ≤ ‖ f −g‖.

Moreover, (7.3d)–(7.3g) hold for f , f1,g,g1,h ∈ ER.

Part c) shows that the modulus mapping defined on E is compatible with the mod-ulus on ER coming from the lattice structure there. Part f) implies that the modulusmapping f 7→ | f | is continuous on E. Hence by (7.3e) the mappings

ER×ER→ ER, ( f ,g) 7→ f ∨ g, f ∧ g


are continuous. Furthermore, from b) it follows that ‖Re f‖ ,‖Im f‖ ≤ ‖ f‖, whencethe projection onto ER is bounded, and ER is closed. Finally, by d) also the positivecone E+ is closed.

Positive Operators

Let E,F be Banach lattices. A (linear) operator S : E→ F is called positive if

f ∈ E, f ≥ 0 ⇒ S f ≥ 0.

We write S≥ 0 to indicate that S is positive. The following lemma collects the basicproperties of positive operators.

Lemma 7.5. Let E,F be Banach lattices and let S : E → F be a positive operator.Then the following assertions hold.

a) f ≤ g ⇒ S f ≤ Sg for all f ,g ∈ ER;

b) f ∈ ER ⇒ S f ∈ FR;

c) S(Re f ) = ReS f and S(Im f ) = ImS f for all f ∈ E;

d) S f = S f for all f ∈ E;

e) |S f | ≤ S | f | for all f ∈ E;

f) S is bounded.

Proof. a) follows from linearity of S. b) holds since S is linear, S(E+) ⊆ F+ andER = E+−E+. c) and d) are immediate consequences of b).e) For every t ∈Q we have Re(eπit) f ≤ | f | and applying S yields

Re(eπit)S f ≤ S | f | .

Taking the supremum with respect to t ∈Q we obtain |S f | ≤ S | f | as claimed.f) Suppose that S is not bounded. Then there is a sequence ( fn)n such that fn → 0in norm but ‖S fn‖ → ∞. Since |S fn| ≤ S | fn| one has also ‖S | fn|‖ → ∞. Hence wemay suppose that fn ≥ 0 for all n ∈ N. By passing to a subsequence we may alsosuppose that ∑n ‖ fn‖ < ∞. But since E is a Banach space, f := ∑n fn converges.This yields 0≤ fn ≤ f and hence 0≤ S fn ≤ S f , implying that ‖S fn‖ ≤ ‖S f‖. Thisis a contradiction.

A linear operator S : E→ F between two complex Banach lattices E,F is calleda lattice homomorphism if |S f |= S | f | for all f ∈ E. Every lattice homomorphismis positive and bounded and satisfies

S( f ∨ g) = S f ∨ Sg and S( f ∧ g) = S f ∧ Sg

for f ,g ∈ ER. We note that the Koopman operator Tϕ associated with a TDS (K;ϕ)or an MDS (X;ϕ) is a lattice homomorphism.

7.2 The Space Lp(X) as a Banach Lattice 97

7.2 The Space Lp(X) as a Banach Lattice

Having introduced the abstract concept of a Banach lattice, we turn to some specificproperties of the concrete Banach lattices Lp(X) for 1 ≤ p < ∞. To begin with, wenote that these spaces have order continuous norm, which means that for eachdecreasing sequence ( fn)n∈N ⊆ Lp

+(X), fn ≥ fn+1, one has

infn

fn = 0 ⇒ ‖ fn‖p→ 0.

This is a direct consequence of the Monotone Convergence Theorem (see AppendixB.5) by considering the sequence ( f p

1 − f pn )n∈N. Actually, the Monotone Conver-

gence Theorem accounts also for the following statement.

Theorem 7.6. Let X be a measure space and let 1 ≤ p < ∞. Let F ⊆ Lp+(X) be a

∨ -stable set such that

s := sup‖ f‖p : f ∈F

< ∞.

Then f := supF exists in the Banach lattice Lp(X;R) and there exists an increas-ing sequence ( fn)n ⊆ F such that supn fn = f and ‖ fn− f‖p → 0. In particular,supF ∈F if F is closed.

Proof. Take a sequence ( fn)n∈N ⊆F with ‖ fn‖p→ s. By passing to the sequence( f1 ∨ f2 ∨ . . . ∨ fn)n∈N we may suppose that ( fn)n is increasing. Define f := limn fnpointwise almost everywhere. By the Monotone Convergence Theorem we obtain‖ f‖p = s hence f ∈ Lp. Since infn( f − fn) = 0, by order continuity of the norm itfollows that ‖ f − fn‖p→ 0. If h is any upper bound for F , then fn ≤ h for each n,and then f ≤ h. Hence it remains to show that f is an upper bound of F . Take anarbitrary g∈F . Then fn ∨ g f ∨ g. Since fn ∨ g∈F , ‖ fn ∨ g‖p≤ s for each n∈N. The Monotone Convergence Theorem implies that ‖ f ∨ g‖p = limn ‖ fn ∨ g‖p ≤s. But f ∨ g≥ f and hence

‖( f ∨ g)p− f p‖1 = ‖ f ∨ g‖pp−‖ f‖p

p ≤ sp− sp = 0.

This shows that f = f ∨ g≥ g, as desired.

Remark 7.7. In the previous proof we have used that | f | ≤ |g|, | f | 6= |g| implies‖ f‖1 < ‖g‖1, i.e., that L1 has strictly monotone norm. Each Lp-space (1≤ p < ∞)has strictly monotone norm, cf. Exercise 6.

A Banach lattice E is called order complete if any nonempty subset which hasan upper bound has even a least upper bound. Equivalently (Exercise 5), for allf ,g ∈ ER the order interval

[ f ,g] := h ∈ ER : f ≤ h≤ g

is a complete lattice (as defined in Section 7.1.)


Corollary 7.8. Let X be a measure space and let 1≤ p<∞. Then the Banach latticeLp(X;R) is order complete.

Proof. Let F ′ ⊆ Lp(X;R) and suppose that there exists F ∈ Lp(X;R) with f ≤ Ffor all f ∈F ′. We may suppose without loss of generality that F ′ is ∨ -stable. Pickg ∈F ′ and consider the set g ∨F ′ := g ∨ f : f ∈F ′, which is again ∨ -stableand has the same set of upper bounds as F ′. Now F := (g ∨F ′)− g is ∨ -stableby (7.3f), consists of positive elements and is dominated by F−g≥ 0. In particularit satisfies the conditions of Theorem 7.6. Hence it has a supremum h. It is thenobvious that h+g is the supremum of F ′. By passing to −F ′ one can prove that anonempty family in Lp that is bounded from below has an infimum in Lp.

Recall that algebra ideals play an important role in the study of C(K). In theBanach lattice setting there is an analogous notion.

Definition 7.9. Let E be a Banach lattice. A linear subspace I⊆E is called a (vector)lattice ideal if

f ,g ∈ E, | f | ≤ |g| , g ∈ I ⇒ f ∈ I.

If I is a lattice ideal, then f ∈ I if and only if | f | ∈ I if and only if Re f , Im f ∈ I.It follows from (7.3e) that the real part IR := I ∩ER of a lattice ideal I is a vectorsublattice of ER.

Immediate examples of closed lattice ideals in Lp(X) are obtained from measur-able sets A ∈ ΣX by a construction similar to the TDS case (cf. page 43):

IA :=

f : | f | ∧ 1A = 0=

f : | f | ∧ 1≤ 1Ac=

f : A⊆ [ f = 0 ].

Then IA is indeed a closed lattice ideal, and for A = /0 and A = X we recover Lp(X)and 0, the two trivial lattice ideals. The following characterisation tells that actu-ally all closed lattice ideals in Lp(X) arise by this construction.

Theorem 7.10. Let X be a finite measure space and 1 ≤ p < ∞. Then each closedlattice ideal I ⊆ Lp(X) has the form IA for some A ∈ ΣX.

Proof. Let I ⊆ Lp(X) be a closed lattice ideal. The set

J :=

f ∈ I : 0≤ f ≤ 1

is nonempty, closed, ∨ -stable and has upper bound 1 ∈ Lp(X) (since µX is finite).Therefore, by Corollary 7.8 it has even a least upper bound g ∈ J. It follows that0≤ g≤ 1 and thus h := g ∧ (1−g)≥ 0. Since 0≤ h≤ g and g∈ I, the ideal propertyyields h ∈ I. Since I is a subspace, g+ h ∈ I. But h ≤ 1− g, so g+ h ≤ 1, and thisyields h+g ∈ J. Thus h+g≤ g, i.e., h≤ 0. All in all we obtain g ∧ (1−g) = h = 0,hence g must be a characteristic function 1Ac for some A ∈ Σ .

We claim that I = IA. To prove the inclusion ⊆ take f ∈ I. Then | f | ∧ 1 ∈ J andtherefore | f | ∧ 1≤ g = 1Ac . This means that f ∈ IA. To prove the converse inclusiontake f ∈ IA. It suffices to show that | f | ∈ I, hence we may suppose that f ≥ 0.

7.3 The Koopman Operator and Ergodicity 99

Then fn := f ∧ (n1) = n(n−1 f ∧ 1)≤ n1Ac = ng. Since g ∈ J ⊆ I, we have ng ∈ I,whence fn ∈ I. Now ( fn)n is increasing and converges pointwise, hence in norm toits supremum f . Since I is closed, f ∈ I and this concludes the proof.

Remarks 7.11. a) In most of the results of this section we required p < ∞, forgood reasons. The space L∞(X) is a Banach lattice, but its norm is not ordercontinuous (Exercise 6). If the measure is finite, L∞(X) is still order complete(Exercise 10), but this is not true for general measure spaces. Moreover, ifL∞ is not finite dimensional, then there are always closed lattice ideals not ofthe form IA.

b) For a finite measure space X we have

L∞(X)⊆ Lp(X)⊆ L1(X) (1≤ p≤ ∞).

Particularly important in this scale will be the Hilbert space L2(X).

c) Let K be a compact space. Then the closed lattice ideals in the Banach latticeC(K) coincide with the closed algebra ideals, i.e., with the sets

IA = f ∈ C(K) : f ≡ 0 on A (A⊆ K, closed),

see Exercise 7.

7.3 The Koopman Operator and Ergodicity

We now study MDSs (X;ϕ) and the Koopman operator T = Tϕ on Lp(X). We knowthat, for every 1≤ p≤ ∞,

1) T is an isometry on Lp(X);

2) T is a Banach lattice homomorphism on Lp(X) (see page 93);

3) T ( f g) = T f ·T g for all f ∈ Lp(X), g ∈ L∞(X);

4) T is a C∗-algebra homomorphism on L∞(X).

We shall see in this section that ergodicity of (X;ϕ) can be characterised by a latticetheoretic property of the associated Koopman operator, namely irreducibility.

Definition 7.12. A positive operator T ∈L (E) on a Banach lattice E is called ir-reducible if the only T -invariant closed lattice ideals of E are the trivial ones, i.e.,

I ⊆ E closed lattice ideal, T (I)⊆ I ⇒ I = 0 or I = E.

If T is not irreducible, it is called reducible.

Let us discuss this notion in the finite-dimensional setting.

Example 7.13. Consider the space L1(0, . . . ,n−1) = Rn and a positive operatorT on it, identified with its n× n-matrix. Then the irreducibility of T according to


Definition 7.12 coincides with that notion introduced in Section 2.4 on page 21.Namely, if T is reducible, then there exists a nontrivial T -invariant ideal IA in Rn,for some A ( 0,1, . . . ,n− 1. After a permutation of the points we may supposethat A = k, . . . ,n− 1 for 0 ≤ k < n, and this means that the representing matrix(with respect to the canonical basis) has the form:

k? ? ||| ?

? ? ||| ?

0 ||| ?

k

Let us return to the situation of an MDS (X;ϕ). We consider the associated Koop-man operator T = Tϕ on the space L1(X), but note that T leaves each space Lp in-variant. The following result shows that the ergodicity of an MDS is characterisedby the irreducibility of the Koopman operator T and the one-dimensionality of itsfixed space

fix(T ) := f ∈ L1(X) : T f = f= ker(I−T ).

(Note that always 1∈ fix(T ) whence dimfix(T )≥ 1.) This is reminiscent (cf. Corol-lary 4.17) but also contrary to the TDS case, where minimality could not be charac-terised by the one-dimensionality of the fixed space of the Koopman operator.

Proposition 7.14. Let (X;ϕ) be an MDS with Koopman operator T := Tϕ on L1(X).Then for 1≤ p < ∞ the following statements are equivalent.

(i) (X;ϕ) is ergodic.

(ii) dimfix(T ) = 1, i.e., 1 is a simple eigenvalue of T .

(iii) dim(fix(T )∩Lp) = 1.

(iv) dim(fix(T )∩L∞) = 1.

(v) T as an operator on Lp(X) is irreducible.

Proof. Notice that by Lemma 6.13 A is ϕ-invariant if and only if ϕ∗[A] = [A] if andonly if 1A ∈ fix(T ). In particular, this establishes the implication (iv)⇒ (i).(i)⇒ (ii): For any f ∈ fix(T ) and any c ∈R the set [ f > c ] is ϕ-invariant, and hencetrivial. Let

c0 := sup

c ∈ R : µ [ f > c ] = 1.

Then for c < c0 we have µ [ f ≤ c ] = 0, and therefore µ [ f < c0 ] = 0. For c > c0we have µ [ f > c ] 6= 1, hence µ [ f > c ] = 0, and therefore µ [ f > c0 ] = 0, too. Thisimplies f = c0 almost everywhere.(ii)⇒ (iii)⇒ (iv) is trivial.(i)⇔ (v): By Theorem 7.10 it suffices to show that A ∈ Σ is essentially ϕ-invariantif and only if IA∩Lp is invariant under T . In order to prove this, note that for f ∈ L1


and A ∈ Σ

T (| f | ∧ 1A) = T | f | ∧ T 1A = |T f | ∧ T 1A.

So if A is essentially ϕ-invariant, we have T 1A = 1A, and f ∈ IA implies that T f ∈ IAas well. For the converse suppose that T (IA)⊆ IA. Then, since 1Ac ∈ IA, we must have1ϕ∗Ac ∈ IA, which amounts to ϕ∗Ac∩A = /0 in Σ(X). Consequently A⊆ ϕ∗A, whichmeans that A is ϕ-invariant.

We apply Proposition 7.14 to the rotations on the torus (see Chapter 5.3).

Proposition 7.15. For a ∈ T the rotation MDS (T,m;a) is ergodic if and only if a isnot a root of unity.

Proof. Let T := La be the Koopman operator on L2(T) (cf. Example 4.20) and sup-pose that f ∈ fix(T ). The functions χn : x 7→ xn, n∈Z, form a complete orthonormalsystem in L2(T). Note that T χn = anχn, i.e., χn is an eigenvector of T with corre-sponding eigenvalue an. With appropriate Fourier-coefficients bn we have

∑n∈Z

bnχn = f = T f = ∑n∈Z

bnT χn = ∑n∈Z

anbnχn.

By the uniqueness of the Fourier-coefficients, bn(an− 1) = 0 for all n ∈ Z. Thisimplies that either bn = 0 for all n ∈ Z \ 0 (whence f is constant), or there isn ∈ Z\0 with an = 1 (whence a is a root of unity).

Remark 7.16. If (T;a) is a rational rotation , i.e., a is a root of unity, then a non-constant fixed point of La is easy to find: Suppose that an = 1 for some n ∈ N, anddivide T into n arcs: I j := z∈T : argz∈ [2π( j−1),2π j/n), j = 1, . . . ,n. Take anyintegrable function on I1 and “copy it over” to the other segments. The so arisingfunction is a fixed point of La.

Proposition 7.15 together with Example 2.33 says that a rotation on the torus isergodic if and only if it is minimal. Exercise 8 generalises this to rotations on Tn.Actually, the result is true for any rotation on a compact group as we shall prove inChapter 14.

Similar to the case of minimal TDSs, the peripheral point spectrum of the Koop-man operator has a nice structure.

Proposition 7.17. Let (X;ϕ) be an ergodic MDS, and consider the Koopman oper-ator T := Tϕ on Lp(X) (1≤ p≤ ∞). Then each eigenvalue of T is unimodular andsimple, and each corresponding eigenfunction is unimodular (up to a multiplica-tive constant). Moreover the (peripheral) point spectrum σp(T ) = σp(T )∩T is asubgroup of T.

The proof is similar to the proof of Theorem 4.19, see Exercise 9. For an ergodicgroup rotation (G,m;a), the analogue to the TDS-case (Example 4.20) holds and istreated in Theorem 14.30.


Supplement: Interplay between Lattice and Algebra Structure

The space C(K) for a compact space K and L∞(X) for some probability spaceX = (X ,Σ ,µ) are simultaneously commutative C∗-algebras and complex Banachlattices. In this section we shall show that both structures, the ∗-algebraic and theBanach lattice structure, essentially determine each other, in the sense that eitherone can be reconstructed from the other, and that mappings that preserve one ofthem also preserve the other.

For the following we suppose that E = C(K) or E = L∞(X). To begin with, notethat

| f |= [1− (1−| f |2)]1/2 =∞

∑k=0

(1/2k

)(−1)k(1− f f )k

which is valid for f ∈ E with ‖ f‖ ≤ 1; for general f we use a rescaling. This showsthat the modulus mapping can be recovered from the C∗-algebraic structure (algebra+ conjugation + unit element + norm approximation).Conversely, the multiplication can be recovered from the lattice structure in the fol-lowing way. Since multiplication is bilinear and conjugation is present in the latticestructure, it suffices to know the products f · g where f ,g are real valued. By thepolarisation identity

2 · f ·g = ( f +g)2− f 2−g2, (7.4)

the product of the real elements is determined by taking squares; and since for a realelement h2 = |h|2, it suffices to cover squares of positive elements.

Now, let 0 < p < ∞. Writing the (continuous and convex/concave) mapping x 7→xp on R+ as the supremum/infimum over tangent lines, we obtain for x≥ 0

xp =

supt>0 pt p−1x− (p−1)t p, if 1≤ p < ∞,

inft>0 pt p−1x− (p−1)t p, if 0 < p≤ 1.(7.5)

In the case p≥ 1 define

hn(x) :=n∨

k=1

pt p−1k x− (p−1)t p

k (x≥ 0, n ∈ N),

where (tk)k∈N is any enumeration of Q+. Then, by (7.5) and continuity, hn(x)xp pointwise, and hence uniformly on bounded subsets of R+, by Dini’s theorem(Exercise 14). So if 0≤ f ∈ E then hn f → f p in norm, and hn f is contained inany vector sublattice of E that contains f and 1. In particular, for p = 2, it followsthat the multiplication can be recovered from the Banach lattice structure and theconstant function 1. More precisely, we have the following result.

Theorem 7.18. Let E = C(K) or L∞(X), and let A ⊆ E be closed, conjugation in-variant with 1 ∈ A. Then the following assertions are equivalent.

(i) A is a subalgebra of E.


(ii) A is a vector sublattice of E.

(iii) If f ∈ A then | f |p ∈ A for all 0 < p < ∞.

Now suppose that A satisfies (i)–(iii), and let Φ : A→ L∞(Y) be a conjugation-preserving linear operator such that Φ1 = 1. Then the following statements areequivalent for 1≤ p < ∞.

(iv) Φ is a homomorphism of C∗-algebras.

(v) Φ is a lattice homomorphism.

(vi) Φ | f |p = |Φ f |p for every f ∈ A.

Proof. The proof of the first part is left as Exercise 15. In the second part, supposethat (iv) holds. Then it follows from

[Φ | f |]2 = Φ | f |2 = Φ( f f ) = (Φ f )(Φ f ) = |Φ f |2

that Φ is a lattice homomorphism, i.e., (v). Now suppose that (v) is true. Then Φ ispositive and hence bounded. Let f ≥ 0 and 1 < p < ∞. We have f p = limn hn f ,where hn is as above. Then

Φ f p = limn

Φ(hn f ) = limn

hn (Φ f ) = (Φ f )p

because Φ is a lattice homomorphism. Hence (vi) is established.Finally, suppose that (vi) holds. Then Φ is positive; indeed, if 0 ≤ f ∈ A, then by(iii), g := f 1/p ∈ A and thus Φ f = Φgp = |Φg|p ≥ 0. Now let f ∈ A and supposethat p > 1. Then by the Holder inequality (Theorem 7.19) below,

Φ | f | ≤ (Φ | f |p)1/p(Φ1q)1/q = (Φ | f |p)1/p = |Φ f | ≤Φ | f |

by (vi) and the positivity of Φ . Hence (ii) follows, and this implies (vi) for p = 2 asalready shown. The polarisation identity (7.4) yields Φ( f g) = Φ( f )Φ(g) for realelements f ,g ∈ A. Since Φ preserves conjugation, Φ is multiplicative on the wholeof A, and (i) follows.

The following fact was used in the proof, but is also of independent interest.

Theorem 7.19 (Holder’s Inequality for Positive Operators). Let E = C(K) orL∞(X), and let A⊆ E be a C∗-subalgebra of E. Furthermore, let Y be any measurespace, and let T : A→ L0(Y) be a positive linear operator. Then

|T ( f ·g)| ≤ (T | f |p)1/p · (T |g|q)1/q (7.6)

whenever f ,g ∈ A with f ,g≥ 0.

Proof. We start with the representation

x1/p = inft>0

(1/p)t−1/qx+(1/q)t1/p


which follows from (7.5) with p replaced by 1/p. For fixed y > 0, multiply thisidentity with y1/q and arrive at

x1/py1/q = inft>0

(1/p)(y/t)1/qx+(1/q)t1/py1/q = infs>0

(1/p)s−1/qx+(1/q)s1/py

by a change of parameter s = ty. Replace now x by xp and y by yq to obtain

xy = infs>0

(1/p)s−1/qxp +(1/q)s1/pyq. (7.7)

Note that this holds true even for y= 0, and by a continuity argument we may restrictthe range of the parameter s to positive rational numbers. Then, for a fixed 0< s∈Q(7.7) yields

| f ·g| ≤ (1/p)s−1/q | f |p +(1/q)s1/p |g|q .

Applying T we obtain

|T ( f ·g)| ≤ T | f ·g| ≤ (1/p)s−1/qT | f |p +(1/q)s1/pT |g|q

and, taking the infimum over 0 < s ∈Q, by (7.7) again we arrive at (7.6).

Remark 7.20. a) Theorem 7.18 remains true if one replaces L∞(Y) by a spaceC(K′). (Fix an arbitrary x ∈ K′ and apply Theorem 7.18 with (Y,Σ ′,ν) =(K,Bo(K),δx).)

b) Theorem 7.18 can be generalised by means of approximation arguments, seeExercise 16.

Exercises

1. Show that if ϕ : Y→X is measurable with ϕ∗µY = µX, then the associated Koop-man operator Tϕ : L∞(X)→ L∞(Y) is an isometry.

2. Let X be a measure space and let 1≤ p≤ ∞. Show that

| f |= sup

Re(c f ) : c ∈ T

with the supremum being taken in the order of Lp(X;R).

3. Prove that C(K), K compact, is a complex Banach lattice.

4. Prove Lemma 7.4.

5. Show that a Banach lattice E is order complete if and only if every order interval[ f ,g], f ,g ∈ ER, is a complete lattice.

6. Show that in general a space L∞(X ,Σ ,µ) does neither have strictly monotone nororder continuous norm.

Exercises 105

7. Prove that for a space C(K) the closed lattice ideals coincide with the closedalgebra ideals. (Hint: Adapt the proof of Theorem 4.8.)

8. Show that the rotation on the torus Tn is ergodic if and only if it is minimal. (Hint:Copy the proof of Proposition 7.15 using n-dimensional Fourier coefficients.)


10. Let X be a finite measure space. Show that the lattice

V := L0(X; [0,1]) = f ∈ L1(X) : 0≤ f ≤ 1

is complete. (Hint: Use Corollary 7.8.) Show that the lattices

L0(X;R) and L0(X; [0,∞])

are lattice isomorphic to V and conclude that they are complete. Finally, prove thatL∞(X;R) is an order complete Banach lattice, but its norm is not order continuousin general.

11. Let X = (X ,Σ ,µ) be a probability space, and let and F ⊆ L1(X)+ be ∧ -stablesuch that infF = 0. Show (e.g., by using Theprem 7.6) that inf f∈F ‖ f‖1 = 0.

12. Let X be a finite measure space and 0 ≤ f ∈ L0(X;R). Show that supn∈N(n f ∧1) = 1[ f>0 ]. Show that f = 1A for some A ∈ Σ if and only if c f ∧ 1 = f for everyc > 1.

13. Let X = (X ,Σ ,µ) be a finite measure space. Show that 1A : A ∈ Σ is a com-plete sublattice of L1(X). Conclude that the measure algebra Σ(X) is a completelattice.

14 (Dini). Let K be a compact space and suppose that ( fn)n is a sequence in C(K)with 0 ≤ fn ≤ fn+1 for all n ∈ N. Suppose further that there is f ∈ C(K) such thatfn→ f pointwise. Then the convergence is uniform.

15. Prove the first part of Theorem 7.18.

16 (Holder’s Inequality for Positive Operators). Let X and Y be measure spaces,T : L1(X)→ L0(Y) a positive linear operator. Show that

|T ( f ·g)| ≤ (T | f |p)1/p · (T |g|q)1/q

whenever 1 < p < ∞, 1/p+1/q = 1 and f ∈ Lp(X), g ∈ Lq(X).

Chapter 8The Mean Ergodic Theorem

As was said at the beginning of Chapter 5, the reason for introducing measure-preserving dynamical systems is the intuition of a statistical equilibrium emergentfrom (very rapid) deterministic interactions of a multitude of particles. A measure-ment of the system can be considered as a random experiment, where repeated mea-surements appear to be independent since the time scale of measurements is farlarger than the time scale of the internal dynamics.

In such a perspective, one expects that the (arithmetic) averages over the out-comes of these measurements (“time averages”) should converge—in some senseor another—to a sort of “expected value”. In mathematical terms, given an MDS(X;ϕ) and an “observable” f : X → R, the time averages take the form

An f (x) :=1n

(f (x)+ f (ϕ(x))+ · · ·+ f (ϕn−1(x)

)(8.1)

if x ∈ X is the initial state of the system; and the expected value is the “space mean”∫X

f

of f . In his original approach, Boltzmann assumed the so-called “Ergoden-hypothese”, which allowed him to prove the convergence An f (x) →

∫X f

(“time mean equals space mean”, cf. Chapter 1). However, the Ehrenfests in[Ehrenfest and Ehrenfest (1912)] doubted that this “Ergodenhypothese” is ever sat-isfied, a conjecture that was confirmed independently by Rosenthal and Plancherelonly a few years later (see [Brush (1971)]). After some 20 more years, John vonNeumann and George D. Birkhoff made a major step forward by separating thequestion of convergence of the averages An f from the question whether the limit isthe space mean of f or not. Their results—the “Mean Ergodic Theorem” and the“Individual Ergodic Theorem”—roughly state that under reasonable conditions onf the time averages always converge in some sense, while the limit is the expected“space mean” if and only if the system is ergodic (in our terminology). These theo-rems gave birth to Ergodic Theory as a mathematical discipline.

107

108 8 The Mean Ergodic Theorem

The present and the following two chapters are devoted to these fundamentalresults, starting with von Neumann’s theorem. The linear operators induced by dy-namical systems and studied in the previous chapters will be the main protagonistsnow.

8.1 Von Neumann’s Mean Ergodic Theorem

Let (X;ϕ) be an MDS and let T = Tϕ be the Koopman operator. Note that the timemean of a function f under the first n ∈ N iterates of T (8.1) can be written as

An f =1n

(f + f ϕ + · · ·+ f ϕ

n−1)= 1n ∑

n−1j=0 T j f .

Von Neumann’s theorem deals with these averages An f for f from the Hilbert spaceL2(X).

Theorem 8.1 (Von Neumann). Let (X;ϕ) be an MDS and consider the Koopmanoperator T := Tϕ . For each f ∈ L2(X) the limit

limn→∞

An f = limn→∞

1n

n−1

∑j=0

T j f

exists in the L2-sense and is a fixed point of T .

We shall not give von Neumann’s original proof, but take a route that is moresuitable for generalisations. For a linear operator T on a vector space E we let

An[T ] :=1n ∑

n−1j=0 T j (n ∈ N)

be the Cesaro averages (or: Cesaro means) of the first n iterates of T . If T is under-stood, we omit explicit reference and simply write An in place of An[T ]. Further wedenote by

fix(T ) := f ∈ E : T f = f= ker(I−T )

the fixed space of T .

Lemma 8.2. Let E be a Banach space and let T : E→ E be a bounded linear oper-ator on E. Then, with An := An[T ], the following assertions hold.

a) If f ∈ fix(T ), then An f = f for all n ∈ N, and hence An f → f .

b) One has

AnT = T An =n+1

nAn+1−

1n

I (n ∈ N). (8.2)

If An f → g, then T g = g and AnT f → g.

8.1 Von Neumann’s Mean Ergodic Theorem 109

c) One has

(I−T )An = An(I−T ) =1n(I−T n) (n ∈ N). (8.3)

If (1/n)T n f → 0 for all f ∈ E, then An f → 0 for all f ∈ ran(I−T ).

d) One has

I−An = (I−T )1n

n−1

∑k=0

k−1

∑j=0

T j (n ∈ N). (8.4)

If An f → g, then f −g ∈ ran(I−T ).

Proof. a) is trivial, and the formulae (8.2)–(8.4) are established by simple algebra.The remaining statements then follow from these formulae.

From a) and b) of Lemma 8.2 we can conclude the following.

Lemma 8.3. Let T be a bounded linear operator on a Banach space E. Then

F :=

f ∈ E : PT f := limn→∞

An f exists

is a T -invariant subspace of E containing fix(T ). Moreover PT : F→ F is a projec-tion onto fix(T ) satisfying T PT = PT T = PT on F.

Proof. It is clear that F is a subspace of E and PT : F → E is linear. By Lemma8.2.b, F is T -invariant, ran(PT ) ⊆ fix(T ) and PT T f = T PT f = PT f for all f ∈ F .Finally it follows from Lemma 8.2.a that fix(T )⊆ F , and PT f = f if f ∈ fix(T ). Inparticular P2

T = PT , i.e., PT is a projection.

Definition 8.4. Let T be a bounded linear operator on a Banach space E. Then theoperator PT , defined by

PT f := limn→∞

1n

n−1

∑j=0

T j f (8.5)

on the space F of all f ∈ E where this limit exists, is called the mean ergodicprojection associated with T . The operator T is called mean ergodic if F = E, i.e.,if the limit in (8.5) exists for every f ∈ E.

Using this terminology we can rephrase von Neumann’s result: The Koopmanoperator associated with an MDS (X;ϕ) is mean ergodic when considered as anoperator on E = L2(X). Our proof of this statement consists of two more steps.

Theorem 8.5. Let T ∈L (E), E a Banach space. Suppose that supn∈N0‖An‖ < ∞

and that (1/n)T n f → 0 for all f ∈ E. Then the subspace

F := f : limn→∞

An f exists

is closed, T -invariant, and decomposes into a direct sum of closed subspaces

F = fix(T )⊕ ran(I−T ).


The operator T∣∣F ∈L (F) is mean ergodic. Furthermore, the operator

PT : F → fix(T ) PT f := limn→∞

An f

is a bounded projection with kernel ker(PT ) = ran(I−T ) and PT T = PT = T PT .

Proof. By Lemma 8.3, all that remains to show is that F is closed, PT is bounded,and ker(PT ) = ran(I−T ). The closedness of F and the boundedness of PT are solelydue to the uniform boundedness of the operator sequence (An)n∈N and the strongconvergence lemma (Exercise 1).

To prove the remaining statement, note that ran(I− T ) ⊆ ker(PT ) by Lemma8.2.c; hence ran(I−T )⊆ ker(PT ) since PT is bounded. Since PT is a projection, onehas ker(PT ) = ran(I−PT ), and by Lemma 8.2.d we finally obtain

ker(PT ) = ran(I−PT )⊆ ran(I−T )⊆ ker(PT ).

Mean ergodic operators will be studied in greater detail in Section 8.4 below. Forthe moment we note the following important result, which entails von Neumann’stheorem as a corollary.

Theorem 8.6 (Mean Ergodic Theorem on Hilbert Spaces). Let H be a Hilbertspace and let T ∈L (H) be a contraction, i.e., such that ‖T‖ ≤ 1. Then

PT f := limn→∞

1n

n−1

∑j=0

T j f exists for every f ∈ H.

Moreover, H = fix(T )⊕ ran(I−T ) is an orthogonal decomposition, and the meanergodic projection PT is the orthogonal projection onto fix(T ).

Proof. If T is a contraction, then the powers T n and hence the Cesaro averagesAn[T ] are contractions, too, and (1/n)T n → 0. Therefore, Theorem 8.5 can be ap-plied and so the subspace F is closed and PT : F → F is a projection onto fix(T )with kernel ran(I−T ).

Take now f ∈H with f ⊥ ran(I−T ). Then ( f | f −T f )= 0 and hence ( f |T f )=( f | f ) = ‖ f‖2. This implies that

‖T f − f‖2 = ‖T f‖2−2Re( f |T f )+‖ f‖2 = ‖T f‖2−‖ f‖2 ≤ 0

since T is a contraction. Consequently, f = T f , i.e., f ∈ fix(T ). Hence we haveproved that

ran(I−T )⊥ ⊆ fix(T ).

However, from ran(I−T )∩fix(T )= 0we obtain ran(I−T )⊥= fix(T ) as claimed.(Alternatively, note that PT must be a contraction and use Exercise 2.)

Let us note the following interesting consequence.

Corollary 8.7. Let T be a contraction on a Hilbert space H. Then fix(T ) = fix(T ∗)and PT = PT ∗ .

8.2 The Fixed Space and Ergodicity 111

Proof. Since An[T ]→ PT strongly we conclude that

An[T ∗] = An[T ]∗→ P∗T = PT weakly.

But T ∗ is a contraction as well, whence An[T ∗]→ PT ∗ strongly. It follows that PT =PT ∗ and hence fix(T ) = fix(T ∗).

8.2 The Fixed Space and Ergodicity

Let (X;ϕ) be an MDS. In von Neumann’s theorem the Koopman operator was con-sidered as an operator on L2(X). However, it is natural to view it also as an operatoron L1 and in fact on any Lp-space with 1≤ p≤ ∞.

Theorem 8.8. Let (X;ϕ) be an MDS, and consider the Koopman operator T = Tϕ

on the space L1 = L1(X). Then

PT f := limn→∞

1n

n−1

∑j=0

T j f exists in Lp for every f ∈ Lp, 1≤ p < ∞.

Moreover, for 1≤ p≤ ∞, the following assertions hold.

a) The space fix(T )∩Lp is a Banach sublattice of Lp containing 1.

b) PT : Lp→ fixT ∩Lp is a positive contractive projection satisfying∫X

PT f =∫

Xf ( f ∈ Lp) (8.6)

and PT ( f ·PT g) = (PT f ) · (PT g) ( f ∈ Lp,g ∈ Lq, 1p +

1q = 1). (8.7)

Proof. Let f ∈ L∞. Then, by von Neumann’s theorem, the limit PT f = limn→∞ An fexists in L2, thus a fortiori in L1. Moreover, since |T n f | ≤ ‖ f‖

∞1 for each n ∈ N0,

we have |An f | ≤ ‖ f‖∞

1 and finally |PT f | ≤ ‖ f‖∞

1. Then Holder’s inequality yields

‖An f −PT f‖pp ≤ (2‖ f‖

∞)p−1 ‖An f −PT f‖1→ 0

for any 1≤ p < ∞. Hence we have proved that for each such p the space

Fp :=

f ∈ Lp : ‖·‖p− limn→∞

An f exists

contains the dense subspace L∞; but it is closed in Lp by Theorem 8.5, and so it mustbe all of Lp.a) Take f ∈ fix(T ). Then T | f |= |T f |= | f | and T f = T f = f . Hence fix(T )∩Lp isa vector sublattice of Lp. Since it is obviously closed in Lp, it is a Banach sublatticeof Lp.


b) Equation (8.6) is immediate from the definition of PT and equation (7.1). Forthe proof of the remaining part we may suppose by symmetry and density that g =PT g ∈ L∞. Then T g = g and hence

An( f g) =1n

n−1

∑j=0

T j( f g) =1n

n−1

∑j=0

(T j f )(T jg) =1n

n−1

∑j=0

(T j f )g

= (An f ) ·g→ (PT f ) ·g as n→ ∞.

Remark 8.9. We give a probabilistic view on the mean ergodic projection PT . De-fine

Σϕ := fix(ϕ∗) :=

A ∈ ΣX : ϕ∗A = A

=

A ∈ ΣX : 1A ∈ fix(T ).

This is obviously a sub-σ -algebra of ΣX, called the ϕ-invariant σ -algebra. Weclaim that

fix(T ) = L1(X ,Σϕ ,µX).

Indeed, since step functions are dense in L1, every f ∈ L1(X ,Σϕ ,µX) is contained infix(T ). On the other hand, if f ∈ fix(T ), then for every Borel set B⊆ C one has

ϕ∗[ f ∈ B ] = [ϕ ∈ [ f ∈ B ] ] = [ f ϕ ∈ B ] =

[Tϕ f ∈ B

]=[ f ∈ B ] ,

which shows that [ f ∈ B ] ∈ Σϕ . Hence f is Σϕ -measurable. Now, take A ∈ Σϕ andf := 1A in (8.7) and use (8.6) to obtain∫

X1A PT f =

∫X

PT (1A · f ) =∫

X1A · f .

This shows that PT f = E( f |Σϕ) is the conditional expectation of f with respectto the ϕ-invariant σ -algebra Σϕ .

We can now extend the characterisation of ergodicity by means of the Koopmanoperator obtained in Proposition 7.14.

Theorem 8.10. Let (X;ϕ) be an MDS, X = (X ,Σ ,µ), with associated Koopmanoperator T = Tϕ on L1(X), and let 1≤ p < ∞. The following statements are equiv-alent.

(i) (X;ϕ) is ergodic.

(ii) dimfix(T ) = 1, i.e., 1 is a simple eigenvalue of T .

(iii) dim(fix(T )∩L∞) = 1.

(iv) T as an operator on Lp(X) is irreducible.

(v) PT f = limn→∞

1n

n−1

∑j=0

T j f =∫

Xf ·1 for every f ∈ L1(X).

8.2 The Fixed Space and Ergodicity 113

(vi) limn→∞

1n

n−1

∑j=0

∫X( f ϕ

j) ·g =

(∫X

f) (∫

Xg)

for all f ∈ Lp, g ∈ Lq where 1p +

1q = 1.

(vii) limn→∞

1n

n−1

∑j=0

µ(A∩ϕ∗ j(B)) = µ(A)µ(B) for all A,B ∈ Σ .

(viii) limn→∞

1n

n−1

∑j=0

µ(A∩ϕ∗ j(A)) = µ(A)2 for all A ∈ Σ .

Proof. The equivalence of (i)–(iv) has been proved in Proposition 7.14.(ii)⇒ (v): Let f ∈ L1. Then PT f ∈ fix(T ), so PT f = c · 1 by (ii). Integrating yieldsc =

∫X f .

(v)⇒ (vi) follows by multiplying with g and integrating; (vi)⇒ (vii) follows by spe-cialising f = 1A and g = 1B, and (vii)⇒ (viii) is proved by specialising A = B.(viii)⇒ (i): Let A be ϕ-invariant, i.e., ϕ∗(A) = A in the measure algebra. Then

1n

n−1

∑j=0

µ(A∩ϕ∗ j(A)) =

1n

n−1

∑j=0

µ(A) = µ(A),

and hence µ(A)2 = µ(A) by (viii). Therefore µ(A) ∈ 0,1, and (i) is proved.

Remarks 8.11. a) Note that the convergence in (v) is even in the Lp-normwhenever f ∈ Lp, with 1≤ p < ∞. This follows from Theorem 8.8.

b) Assertion (v) is, in probabilistic language, a kind of “weak law of large num-bers”. Namely , fix f ∈ Lp(X) and let X j := f ϕ j, j ∈ N0. Then the X j areidentically distributed random variables on the probability space (X) withcommon expectation E(X j) =

∫X f =: c. Property (iii) just says that

X0 + · · ·+Xn−1

n→ c as n→ ∞

in L1-norm. Note that this is slightly stronger than the “classical” weak lawof large numbers that asserts convergence in measure (probability) only (see[Billingsley (1979)]). We shall come back to this in Section 11.3.

c) Suppose that µ(A)> 0. Then dividing by µ(A) in (vii) yields

1n

n−1

∑j=0

µA[

ϕj ∈ B

]→ µ(B) (n→ ∞)

for every B ∈ Σ , where µA is the conditional probability given A (cf. Sec-tion 6.2). This shows that in an ergodic system the original measure µ iscompletely determined by µA.

d) By a standard density argument one can replace “for all f ∈ L1(X)” in asser-tion (v) by “for all f in a dense subset of L1(X)”. The same holds for f and


g in assertion (vi). Similarly, it suffices to take in (vii) sets from a ∩-stablegenerator of Σ (Exercise 3).

8.3 Perron’s Theorem and the Ergodicity of Markov Shifts

Let L = 0, . . . ,k−1 and let S = (si j)0≤i, j<k be a row-stochastic matrix, i.e., S≥ 0and S1 = 1, where 1 = (1,1, . . . ,1)t . It was remarked in Example 2.3 that S is acontraction on the finite-dimensional Banach space E := (Cd ,‖·‖

∞). In particular, it

satisfies the hypotheses of Theorem 8.5, whence the limit

Qx := limn→∞

1n

n−1

∑j=0

S jx

exists for every y∈ F := fix(S)⊕ ran(I−S). (Note that in finite dimensions all linearsubspaces are closed.) The dimension formula from elementary linear algebra showsthat F =E, i.e., S is mean ergodic, with Q being the mean ergodic projection. ClearlyQ ≥ 0 and Q1 = 1 as well, so the rows of Q are probability vectors. Moreover,since QS = Q, each row of Q is a left fixed vector of S. Hence we have proved thefollowing version of Perron’s theorem, already claimed in Section 5.1 on page 64.

Theorem 8.12 (Perron). Let S be a row-stochastic k× k-matrix. Then there is atleast one probability (column) vector p such that ptS = pt .

Recall (from Section 2.4, cf. Section 7.3) that a row-stochastic matrix S is ir-reducible if for every pair of indices (i, j) ∈ L× L there is an r ∈ N0 such thatsr

i j := [Sr]i j > 0. Furthermore, let us call a matrix A = (ai j)i, j strictly positive ifai j > 0 for all indices i, j. Then we have the following characterisation.

Lemma 8.13. For a row-stochastic k× k-matrix the following assertions are equiv-alent.

(i) S is irreducible;

(ii) There is m ∈ N such that (I+S)m is strictly positive;

(iii) Q = limn→∞1n ∑

n−1j=0 S j is strictly positive.

If (i)–(iii) hold, then fix(S) =C ·1, there is a unique probability vector p with ptS =pt , and p is strictly positive.

Proof. (i)⇒ (ii): Simply expand (I+S)m = ∑mj=0(m

j

)S j. If S is irreducible, then for

high enough m ∈ N, the resulting matrix must have each entry strictly positive.(ii)⇒ (iii): Since QS = S, it follows that Q(I+S)m = 2mQ. But if (I+S)m is strictlypositive, so is Q(I+S)m, since Q has no zero row.(iii)⇒ (i): If Q is strictly positive, then for large n the Cesaro mean An[S] must bestrictly positive, and hence S is irreducible.

8.3 Perron’s Theorem and the Ergodicity of Markov Shifts 115

Suppose that (i)–(iii) hold. We claim that Q has rank 1. If y is a column of Q, thenQy= y, i.e., Q is a projection onto its column-space. Let ε :=min j y j and x := y−ε1.Then x≥ 0 and Qx = x, but since Q is strictly positive and x is positive, either x = 0or x is also strictly positive. The latter is impossible by construction of x, and hencex = 0. This means that all entries of y are equal. Since y was an arbitrary column ofQ, the claim is proved.

It follows that each column of Q is a strictly positive multiple of 1, and so all therows of Q are identical. That means, there is a strictly positive p such that Q = 1 · pt .We have seen above that ptS = pt . And if q is any probability vector with qtS = qt ,then qt = qtQ = qt1pt = pt .

We remark that in general dimfix(S) = 1 does not imply irreducibility of S, seeExercise 4.

By the lemma, an irreducible row-stochastic matrix S has a unique fixed prob-ability vector p, being strictly positive. We are now in the position to prove thefollowing characterisation.

Theorem 8.14. Let S be a row-stochastic k× k-matrix with fixed probability vectorp. Then p is strictly positive and the Markov shift (W +

k ,Σ ,µ(S, p);τ) is ergodic ifand only if S is irreducible.

Proof. Let i0, . . . , il ∈ L and j0, . . . , jr ∈ L. Then for n≥ 1 we have

µ(i0×· · ·×il×Ln−1× j0× ·· ·× jr×∏L

)= pi0si0i1 . . .sil−1il sn

il j0 s j0 j1 . . .s jr−1 jr

as a short calculation using (5.2) and involving the relevant indices reveals. If weconsider the cylinder sets

E := i0× ·· ·×il×∏L, F := j0× ·· ·× jr×∏L,

then if n > l

[τn ∈ F ]∩E = i0× ·· ·×il×Ln−l−1× j0× ·· ·× jr×∏L,

and hence

µ([τn ∈ F ]∩E) = pi0si0i1 . . .sil−1il sn−lil j0 s j0 j1 . . .s jr−1 jr .

By taking Cesaro averages of this expression and by splitting off the first summandswe see that

limN→∞

1N

N−1

∑n=0

µ([τn ∈ F ]∩E) = pi0si0i1 . . .sil−1il qil j0 s j0 j1 . . .s jr−1 jr .

If S is irreducible, we know from Lemma 8.13 that qil j0 = p j0 , and hence


limN→∞

1N

N−1

∑n=0

µ([τn ∈ F ]∩E) = µ(E)µ(F).

Since cylinder sets form a dense subalgebra of Σ , by Remark 8.11.d we concludethat (vii) of Theorem 8.10 holds, i.e., the MDS is ergodic.

Conversely, suppose that p is strictly positive and the MDS is ergodic and fixi, j ∈ L. Specialising E = i×∏L and F = j×∏L above, by Theorem 8.10 weobtain

pi p j = µ(E)µ(F) = limN→∞

1N

N−1

∑n=0

µ([τn ∈ F ]∩E) = piqi j.

Since pi > 0 by assumption, qi j = p j > 0. Hence Q is strictly positive, and thisimplies that S is irreducible by Lemma 8.13.

8.4 Mean Ergodic Operators

As a matter of fact, the concept of mean ergodicity has applications far beyondKoopman operators coming from MDSs. In Section 8.1 we introduced that conceptfor bounded linear operators on general Banach spaces, and a short inspection showsthat one can study mean ergodicity under more general hypotheses.

Remark 8.15. The assertions of Lemma 8.2 and Lemma 8.3 remain valid if T ismerely a continuous linear operator on a Hausdorff topological vector space E.Some of these statements, e.g. the fundamental identity An−T An = (1/n)(I−T n)even hold if E is merely a convex subset of a Hausdorff topological vector space andT : E → E is affine (meaning T f = L f + g, with g ∈ E and L : E → E linear andcontinuous).

Accordingly, and in coherence with Definition 8.4, a continuous linear operatorT : E→ E on a Hausdorff topological vector space E is called mean ergodic if

PT f := limn→∞

1n

n−1

∑j=0

T j f

exists for every f ∈ E. Up to now we have seen three instances of mean ergodicoperators:

1) Contractions on Hilbert spaces (Theorem 8.6);

2) Koopman operators on Lp, 1≤ p < ∞, associated with MDSs (Theorem 8.8);

3) Row-stochastic matrices on Cd (Section 8.3).

And we shall see more in this and later chapters:

3) Power-bounded operators on reflexive spaces (Theorem 8.22);

4) Dunford–Schwartz operators on finite measure spaces (Theorem 8.24);

8.4 Mean Ergodic Operators 117

5) The Koopman operator on C(T) of a rotation on T (Proposition 10.10 and onR[0,1] of a mod 1 translation on [0,1) (Proposition 10.19);

6) The Koopman operator on C(G) of a rotation on a compact group G (Corollary10.12).

In this section we shall give a quite surprising characterisation for mean ergodicityof bounded linear operators on general Banach spaces.

We begin by showing that the hypotheses of Theorem 8.5 are natural. Recall thenotation

An = An[T ] =1n

n−1

∑j=0

T j (n ∈ N)

for the Cesaro-averages. A bounded operator T on a Banach space E is calledCesaro-bounded if supn∈N ‖An‖< ∞.

Lemma 8.16. If E is a Banach space and T ∈ L (E) is mean ergodic, then T isCesaro-bounded and T n f/n→ 0 for every f ∈ E.

Proof. As limn→∞ An f exists for every f ∈ E and E is a Banach space, it fol-lows from the uniform boundedness principle (Theorem C.1) that supn∈N ‖An‖ <∞. From Lemma 8.2.b we have T PT = PT , i.e., (I− T )An f → 0. The identity(I−T )An = (1/n) id−(1/n)T n implies that T n f/n→ 0 for every f ∈ E.

The next lemma exhibits in a very general fashion the connection between fixedpoints of an operator T and certain cluster points of the sequence (An f )n∈N. Weshall need it also in our proof of the Markov–Kakutani fixed point Theorem 10.1below.

Lemma 8.17. Let C be a convex subset of a Hausdorff topological vector space,T : C→ C be a continuous affine mapping and g ∈ C. Then T g = g if and only ifthere is f ∈C and a subsequence (n j) j∈N such that

(1) T n j f/n j→ 0 and (2) g ∈⋂

k∈NAn j f : j ≥ k.

Proof. If T g = g, then (1) and (2) hold for f = g and every subsequence (n j) j∈N.Conversely, suppose that (1) and (2) hold for some f ∈E and a subsequence (n j) j∈N.Then

g−T g = (I−T )g ∈ (I−T )An j f : j ≥ k= ( f −T n j f )/n j : j ≥ k

for every k ∈ N, cf. Remark 8.15. By (1), ( f −T n j f )/n j → 0 as j→ ∞, and sinceC is a Hausdorff space, 0 is the only cluster point of the convergent sequence (( f −T n j f )/n j) j∈N. This implies that g−T g = 0, i.e., T g = g.

The next auxiliary result is essentially a consequence of Lemma 8.17 andMazur’s Theorem C.7.


Proposition 8.18. Let T be a Cesaro-bounded operator on some Banach space Esuch that T nh/n→ 0 for each h ∈ E. Then for f ,g ∈ E the following statements areequivalent.

(i) An f → g in the norm of E as n→ ∞.

(ii) An f → g weakly as n→ ∞.

(iii) g is a weak cluster point of a subsequence of (An f )n.

(iv) g ∈ fix(T )∩ convT n : n≥ 0.(v) g ∈ fix(T ) and f −g ∈ ran(I−T ).

Proof. The implication (v)⇒ (i) follows from Theorem 8.5 and the implications(i)⇒ (ii)⇒ (iii) are trivial. If (iii) holds, then g ∈ fix(T ) by Lemma 8.17. Moreover,

g ∈ clσT n f : n≥ 0 ⊆ clσ convT n f : n≥ 0= convT n f : n≥ 0

by Mazur’s theorem. Finally, suppose that (iv) holds and ∑n tnT n f is any convexcombination of the vectors T n f . Then

f −∑n

tnT n f = ∑n

tn( f −T n f ) = (I−T )∑n

ntnAn f ∈ ran(I−T )

by Lemma 8.2.c). It follows that f −g ∈ ran(I−T ), as was to be proved.

In order to formulate our main result, we use the following terminology.

Definition 8.19. Let E be a Banach space, and let F ⊆ E and G ⊆ E ′ be linearsubspaces. Then G separates the points of F if for any 0 6= f ∈F there is g∈G suchthat 〈 f ,g〉 6= 0. The property that F separates the points of G is defined analogously.

Note that by the Hahn–Banach theorem C.3, E ′ separates the points of E. Our mainresult is now a characterisation of mean ergodic operators on Banach spaces.

Theorem 8.20 (Mean Ergodic Operators). Let T be a Cesaro-bounded operatoron some Banach space E such that T n f/n→ 0 weakly for each f ∈ E. Further, letD⊆ E be a dense subset of E. Then the following assertions are equivalent.

(i) T is mean ergodic.

(ii) T is weakly mean ergodic, i.e., weak-limn→∞ An f exists for each f ∈ E.

(iii) For each f ∈D the sequence (An f )n∈N has a subsequence with a weak clus-ter point.

(iv) convT j f : j ∈ N0∩fix(T ) 6= /0 for each f ∈ D.

(v) fix(T ) separates the points of fix(T ′).

(vi) E = fix(T )⊕ ran(I−T ).

Proof. The equivalence of (i)–(iv) and (vi) is immediate from Proposition 8.18 andTheorem 8.5. Suppose that (vi) holds and 0 6= f ′ ∈ fix(T ′). Then there is f ∈ E such


that 〈 f , f ′〉 6= 0. By (vi) we can write f = g+h with h∈ ran(I−T ). Since T ′ f ′ = f ′,f ′ vanishes on h by Exercise 5, and hence we must have 〈g, f ′〉 6= 0.

Conversely, suppose that (v) holds and consider the space F of vectors where theCesaro averages converge. By Theorem 8.5 F = fix(T )⊕ ran(I−T ) is closed in E.We employ the Hahn–Banach theorem C.3 to show that F is dense in E. Supposethat f ′ ∈ E ′ vanishes on F . Then f ′ vanishes in particular on ran(I− T ) and thisjust means that f ′ ∈ fix(T ′) (see Exercise 5). Moreover, f ′ vanishes also on fix(T ),which by (v) separates fix(T ′). This forces f ′ = 0.

Remarks 8.21. 1) Condition (v) in Theorem 8.20 can also be expressed asfix(T ′)∩fix(T )⊥ = 0, cf. Exercise 5.

2) Under the assumptions of Theorem 8.20, fix(T ′) always separates the pointsof fix(T ), see Exercise 6.

3) If T ∈L (E) is mean ergodic, its adjoint T ′ need not be mean ergodic withrespect to the norm topology, see Exercise 7. However, it is mean ergodicwith respect to the weak∗ topology, i.e., the Cesaro means An[T ′] convergepointwise in the weak∗-topology to PT ′ = P′T . The proof of this is Exercise 8.

In the following we discuss two classes of examples.

Power-bounded Operators on Reflexive Spaces

An operator T ∈L (E) is called power-bounded if supn∈N ‖T n‖< ∞. Contractionsare obviously power-bounded. Moreover, a power-bounded operator T is Cesaro-bounded and satisfies T n/n→ 0. Hence Theorem 8.20 applies in particular to power-bounded operators. We arrive at a famous result due Yosida [Yosida (1938)], Kaku-tani [Kakutani (1938)] and Lorch [Lorch (1939)].

Theorem 8.22. Every power-bounded linear operator on a reflexive Banach spaceis mean ergodic.

Proof. Let f ∈ E. Then, by power-boundedness of T the set An f : n∈N is norm-bounded. Since E is reflexive, it is even relatively weakly compact, hence the se-quence (An f )n has a weak cluster point.

Dunford–Schwartz Operators

Let X and Y be measure spaces. An bounded operator

T : L1(X)→ L1(Y)

is called a Dunford–Schwartz operator or a complete contraction, if

‖T f‖1 ≤ ‖ f‖1 and ‖T f‖∞≤ ‖ f‖

∞( f ∈ L1∩L∞).


By denseness, a Dunford–Schwartz operator is an L1-contraction. The followingresult states that T “interpolates” to a contraction on each Lp-space.

Theorem 8.23. Let T : L1(X)→ L1(Y) be a Dunford–Schwartz operator. Then

‖T f‖p ≤ ‖ f‖p for all f ∈ Lp∩L1, 1≤ p≤ ∞.

Proof. The claim is a direct consequence of the Riesz–Thorin interpolation the-orem [Folland (1999), Thm. 6.27]. If T is positive, there is a more elementaryproof, which we give for convenience. For p = 1,∞ there is nothing to show, solet 1 < p < ∞. Let f ∈ Lp∩L1 such that A :=[ f 6= 0 ] has finite measure. Then, byTheorem 7.19 and Exercise 7.16 and since T 1A ≤ 1,

|T f |p = |T ( f 1A)|p ≤ (T | f |p) · (T 1A)p/q ≤ (T | f |p) ·1p/q = T | f |p .

Integrating yields ‖T f‖pp ≤ ‖T | f |

p‖1 ≤ ‖ f‖pp. Finally, a standard density argument

completes the proof.

Using interpolation we can prove the following result about mean ergodicity.

Theorem 8.24. Any Dunford–Schwartz operator T on L1(X) over a finite measurespace X is mean ergodic.

Proof. By Theorem 8.23, T restricts to a contraction on L2, which is a Hilbert space.By Theorem 8.6, T is mean ergodic on L2, which means that for f ∈ L2 the limitlimn→∞ An[T ] f exists in ‖·‖2. As the measure is finite, this limit exists in ‖·‖1. SinceL2 is dense in L1, the claim follows.

Employing Theorem 8.20 we can give a second proof.

Alternative proof of Theorem 8.24: Let B := f ∈ L∞ : ‖ f‖∞≤ 1. View L∞ as the

dual of L1 and equip it with the σ(L∞,L1)-topology. By the Banach–Alaoglu theo-rem, B is weak∗-compact. Since the embedding (L∞,σ(L∞,L1)) ⊆ (L1,σ(L1,L∞))is (obviously) continuous, B is weakly compact in L1. In addition, B is invari-ant under the Cesaro averages An of T , and hence the sequence (An f )n∈N has aweak cluster point for each f ∈ B. Thus (iii) from Theorem 8.20 is satisfied withD := L∞ =

⋃c>0 cB.

Example 8.25. The assumption of a finite measure space is crucial in Theorem 8.24.Consider E = L1(R) where R is endowed with the Lebesgue measure, and T ∈L (E), the shift by 1, i.e., T f (x) = f (x+1). Then dimfixT = 0 and dimfixT ′ = ∞,so T is not mean ergodic, see also Exercise 7.

Supplement: Operations with Mean Ergodic Operators

Let us illustrate how the various conditions in Theorem 8.20 can be used to checkmean ergodicity, and thus allow us to carry out certain constructions for mean er-godic operators.


Powers of Mean Ergodic Operators

Theorem 8.26. Let E be a Banach space and let S ∈ L (E) be a power-boundedmean ergodic operator. Let T be a kth-root of S, i.e., T k = S for some k ∈N. Then Tis also mean ergodic.

Proof. Denote by PS the mean ergodic projection of S. Define P :=( 1

k ∑k−1j=0 T j

)PS

and observe that P f ∈ convT j f : j ∈N0 for all f ∈ E. Since T commutes with S,it commutes also with PS. We now obtain

PT = T P =(1

k

k−1

∑j=0

T j+1)

PS = P,

since T kPS = SPS = PS, and P f ∈ convT j f : j ∈ N0∩ fix(T ) by the above. SoTheorem 8.20 (iv) implies that T is mean ergodic. It follows also that P is the cor-responding mean ergodic projection.

On the other hand, it is possible that no power of a mean ergodic operator is meanergodic.

Example 8.27. Take E = c, the space of convergent sequences, and the multiplica-tion operator

M : c→ c, (xn)n 7→ (anxn)n

for some sequence (an)∞n=1 with 1 6= an → 1. Now fix(M) = 0, whereas fix(M′)

contains f ′ defined by 〈 f ′,(xn)∞n=1〉 := limn→∞ xn. By Theorem 8.20 (v) we conclude

that M is not mean ergodic.Consider now a kth root of unity 1 6= b ∈ T and define Tk := bM. Then it is easy

to see that fix(T ′k ) = 0, and hence, again by Theorem 8.20 (v), Tk is mean ergodic.It follows from Theorem 8.26 that T k

k is not mean ergodic. Now one can employa direct sum construction to obtain a Banach space E and a mean ergodic operatorT ∈L (E) with no power T k, k ≥ 2, being mean ergodic (Exercise 11).

Convex Combinations of Mean Ergodic Operators

Other examples of “new” mean ergodic operators can be obtained by convex combi-nations of mean ergodic operators. Our first lemma, a nice application of the Kreın–Milman Theorem, is due to Kakutani.

Lemma 8.28. Let E be a Banach space. Then the identity operator I is an extremepoint of the closed unit ball in L (E).

Proof. Take T ∈L (E) such that ‖I+T‖≤ 1 and ‖I−T‖≤ 1. Then the same is truefor the adjoints: ‖I′+T ′‖ ≤ 1 and ‖I′−T ′‖ ≤ 1. For f ′ ∈ E ′ define f ′1 := (I′+T ′) f ′,f ′2 := (I′−T ′) f ′ and conclude that f ′ = 1

2 ( f ′1 + f ′2) and ‖ f ′1‖,‖ f ′2‖ ≤ ‖ f ′‖. Supposenow that f ′ is an extreme point of the unit ball in E ′. Then we obtain f ′= f ′1 = f ′2 and


hence T ′ f ′= 0. The Banach–Alaoglu Theorem C.4 and the Kreın–Milman TheoremC.13 imply that T ′ = 0, and hence T = 0.

Now suppose that I = 12 (R+ S) for contractions R,S ∈L (E), and define T :=

I−R. This implies I−T = R and I+T = 2I−R = S, so ‖I−T‖ ≤ 1, ‖I+T‖ ≤ 1.By the above considerations it follows that T = 0, i.e., I = R = S.

Lemma 8.29. Let R,S be two power-bounded, commuting operators on a Banachspace E. Consider

T := tR+(1− t)S

for some 0 < t < 1. Then for the fixed spaces fix(T ), fix(R) and fix(S) we have

fix(T ) = fix(R)∩fix(S).

Proof. Only the inclusion fix(T )⊆ fix(R)∩fix(S) is not obvious. Endow E with anequivalent norm ‖ f‖1 := sup‖RnSm f‖ : n,m ∈N0, f ∈ E, and observe that R andS now become contractive. From the definition of T we obtain

Ifix(T ) = T |fix(T ) = tR|fix(T )+(1− t)S|fix(T )

and R|fix(T ),S|fix(T ) ∈L (fix(T )), since R and S commute. Lemma 8.28 implies thatR|fix(T ) = S|fix(T ) = Ifix(T ), i.e., fix(T )⊆ fix(R)∩fix(S).

Now we can prove the main result of this section.

Theorem 8.30. Let E be a Banach space and R,S two power-bounded, mean er-godic, commuting operators on E. Then for each 0≤ t ≤ 1 the convex combinationT := tR+(1− t)S is again mean ergodic.

Proof. Let 0< t < 1. By Lemma 8.29 we have fix(T )= fix(R)∩fix(S) and fix(T ′)=fix(R′)∩ fix(S′). By Theorem 8.20 (v) it suffices to show that fix(R)∩ fix(S) sep-arates fix(R′)∩ fix(S′). To this end pick 0 6= f ′ ∈ fix(R′)∩ fix(S′). Then there isf ∈ fix(R) with 〈 f , f ′〉 6= 0. Since Sfix(R) ⊆ fix(R), we have PS f ∈ fix(R)∩fix(S)where PS denotes the mean ergodic projection corresponding to S. Consequently, wehave (see also Remark 8.21.3))⟨

PS f , f ′⟩=⟨

f ,P′S f ′⟩=⟨

f ,PS′ f′⟩= ⟨ f , f ′

⟩6= 0.

The following are immediate consequences of the above.

Corollary 8.31. Let T be a convex combination of two power-bounded, mean er-godic, commuting operators R and S ∈ L (E). Denote by PR, resp. PS the meanergodic projections. Then the projection PT corresponding to T is obtained as

PT = PRPS = PSPR = limn→∞

(RnSn).

Corollary 8.32. Let R jmj=1 be a family of power-bounded, mean ergodic, commut-

ing operators. Then every convex combination T := ∑mj=1 t jR j is mean ergodic.

Exercises 123

Exercises

1 (Strong Convergence Lemma). Let E,F be Banach spaces and let (Tn)n∈N ⊆L (E;F). Suppose that there is M ≥ 0 such that ‖Tn‖ ≤M for all n ∈ N. Show that

G := f ∈ E : limn→∞

Tn f exists

is a closed subspace of E, and

T : G→ F, T f := limn→∞

Tn f

is a bounded linear operator with ‖T‖ ≤ liminfn→∞ ‖Tn‖.

2. Let H be a Hilbert space and let P ∈ L (H) with P2 = P. Show that P is anorthogonal projection if and only if P is self-adjoint if and only if ‖P‖ ≤ 1. (Seealso the supplement to Chapter 13.)

3. Let (X;ϕ) be an MDS, X = (X ,Σ ,µ), with E a ∩-stable generator of Σ . Supposethat

limn→∞

1n

n

∑j=0

µ(ϕ∗nA∩A) = µ(A)2

for all A ∈ E. Show that (X;ϕ) is ergodic. (Hint: Use Lemma B.16.)

4. Show that the matrix

S =

(1 01 0

)is not irreducible, but fix(S) is one-dimensional. Show that there is a unique proba-bility vector p such that ptA = pt . Then show that the Markov shift associated withS is ergodic.

5. Let E be a Banach space. For F ⊆ E and G⊆ E ′ one defines

F⊥ :=

f ′ ∈ E ′ :⟨

f , f ′⟩= 0 for all f ∈ F

and G> := G⊥∩E.

For T ∈L (E) show that

fix(T ′) = ran(I−T )⊥ and fix(T ) = ran(I−T ′)>.

6. Let T be a Cesaro-bounded linear operator on a Banach space E, and suppose thatT n f/n→ 0 as n→∞ for every f ∈ E. Show that fix(T ′) always separates the pointsof fix(T ).

(Hint: Take 0 6= f ∈ fix(T ) and consider the set K := f ′ ∈ E ′ : ‖ f ′‖ ≤1, 〈 f , f ′〉 = ‖ f‖. Then K is nonempty by the Hahn–Banach theorem, σ(E ′,E)-closed and norm-bounded. Then use Lemma 8.17 to show that fix(T ′)∩K 6= /0).

7 (Shift). For a scalar sequence x = (xn)n∈N0 the left shift L and the right shift Rare defined by


L(x0,x1, . . .) := (x1,x2, . . .) and R(x0,x1, . . .) := (0,x0,x1,x2, . . .).

Clearly, a fixed vector of L must be constant, while 0 is the only fixed vector of R.Prove the following assertions:

a) The left and the right shift are mean ergodic on each E = `p, 1 < p < ∞.

b) The left shift on E = `1 is mean ergodic, while the right shift is not.

c) The left and right shift are mean ergodic on E = c0, the space of null se-quences.

d) The left shift is while the right shift is not mean ergodic on E = c, the spaceof convergent sequences.

e) Neither the left nor the right shift is mean ergodic on E = `∞.

8. Let T be a mean ergodic operator on a Banach space E with associated meanergodic projection P : E → fix(T ). Show that A′n → P′ in the weak∗-topology andthat P′ is a projection with ran(P′) = fix(T ′).

9. Let X be a finite measure space and let T be a positive operator on L1(X). Weidentify L∞(X) with L1(X)′, the dual space of L1(X). Show that T is a Dunford–Schwartz operator if and only if T ′1≤ 1 and T 1≤ 1.

10. Show that the following operators are not mean ergodic:

a) E = C[0,1] and (T f )(x) = f (x2), x ∈ [0,1]. (Hint: Determine fix(T ) andfix(T ′), cf. Chapter 3.)

b) E = C[0,1] and (T f )(x) = x f (x), x ∈ [0,1]. (Hint: Look at An1, n ∈ N.)

11. Work out the details of Example 8.27.

Chapter 9Mixing Dynamical Systems

In the present chapter we consider two mathematical formalisations of the intu-itive concept that iterating the dynamics ϕ provides a “just mixing” of the space(X ,Σ ,µ). Whereas the main theme of the previous chapter, the mean ergodicity ofthe associated Koopman operator, involved just the norm topology, now the weaktopology of the associated Lp-spaces becomes important.

The dual space E ′ of a Banach space E and the associated weak topology on aBanach space E have already been touched upon briefly in Section 8.4. For generalbackground about these notions see Appendix C.4 and C.6. In particular, keep inmind that we write 〈x,x′〉 for the action of x′ ∈ E ′ on x ∈ E.

In the case that E = Lp(X) for some probability space X = (X ,Σ ,µ) and p ∈[1,∞), a standard result from functional analysis states that the dual space E ′ can beidentified with Lq(X) via the canonical duality

〈 f ,g〉 :=∫

Xf g =

∫X

f gdµ ( f ∈ Lp(X),g ∈ Lq(X)),

see, e.g. [Rudin (1987), Theorem 6.16]. Here q ∈ (1,∞] denotes the conjugate ex-ponent to p, i.e., it satisfies 1/p+1/q = 1. (We use this as a standing terminology.Also, if the measure space is understood, we often write Lp,Lq in order to increasereadability.)

In the symbolism of the Lp-Lq-duality, the integral of f ∈ L1(X) reads∫X

f =∫

Xf dµ = 〈 f ,1〉= 〈1, f 〉 .

In particular, µ(A) = 〈1A,1〉= 〈1,1A〉 for A ∈ Σ . The connection with the standardinner product on the Hilbert space L2(X) is

( f |g) =∫

Xf g = 〈 f ,g〉 ( f ,g ∈ L2).

After these prelimiaries we can now go medias in res.

125

126 9 Mixing Dynamical Systems

9.1 Strong Mixing

Following Halmos [Halmos (1956), p. 36] let us consider an MDS (X;ϕ), X =(X ,Σ ,µ), that models the action of a particular way of stirring the contents of aglass (of total volume µ(X) = 1) filled with two different liquids, say wine andwater. If A⊆ X is the region originally occupied by the wine and B is any other partof the glass, then, after n repetitions of the stirring operation, the relative amount ofwine in B is

an(B) := µ(ϕ∗nB∩A)/µ(B).

For a thorough “mixing” of the two liquids, one would require that eventually thewine is “equally distributed” within the glass. More precisely, for large n ∈ N therelative amount an(B) of wine in B should be close to the relative amount µ(A) ofwine in the whole glass. These considerations lead to the following mathematicalconcept.

Definition 9.1. An MDS (X;ϕ), X = (X ,Σ ,µ), is called strongly mixing (or justmixing) if for every A,B ∈ Σ one has

µ(ϕ∗nA∩B)→ µ(A)µ(B) as n→ ∞.

Example 9.2 (Bernoulli Shifts). It was shown in Proposition 6.16 that Bernoullishifts are strongly mixing.

Each mixing system is ergodic. Indeed, by Theorem 8.10 the MDS (X;ϕ) isergodic if and only if

1n

n−1

∑j=0

µ(ϕ∗ jB∩A)→ µ(A)µ(B) for all A,B ∈ Σ .

Since ordinary convergence of a sequence (an)n implies the convergence of the ar-itmetic averages (a1 + · · ·+an)/n to the same limit, ergodicity is a consequence ofmixing.

However, mere ergodicity of the system is not strong enough to guarantee mix-ing. To stay in the picture from above, ergodicity means that we have an(B)→ µ(A)only on the average, but it may well happen that again and again an(B) is quite faraway from µ(A). For example, if B = Ac, then we may find arbitrarily large n withan(B) close to 0, which means that for these n the largest part of the wine is situatedagain in A, the region where it was located in the beginning.

Example 9.3 (Nonmixing Markov shifts). Consider the Markov shift associatedwith the transition matrix

P =

(0 11 0

)on two points. The Markov measure µ on W +

2 puts equal weight of 1/2 on the twopoints

9.1 Strong Mixing 127

x1 := (0,1,0,1, . . .) and x2 := (1,0,1,0, . . .)

and the system oscillates between those two under the shift τ , so this system isergodic (cf. also Section 8.3). On the other hand, if A = x1, then µ(ϕ∗nA∩A)is either 0 or 1/2, depending whether n is odd or even. Hence this Markov shift isnot strongly mixing. We shall characterise mixing Markov shifts in Proposition 9.8below.

Let us denote by 1⊗1 the projection

1⊗1 : L1(X)→ L1(X), f 7→ 〈 f ,1〉 ·1 =(∫

Xf)·1

onto the one-dimensional space of constant functions. Then, by Theorem 8.10, er-godicity of an MDS (X;ϕ) is characterised by PT = 1⊗ 1, where PT is the meanergodic projection associated with the Koopman operator T = Tϕ on L1(X). In otherwords

PT = limn→∞

An[T ] f = 1⊗1

in the strong (equivalently, in the weak) operator topology. In an analogous fashionone can characterise strongly mixing MDSs.

Theorem 9.4. For an MDS (X;ϕ), X = (X ,Σ ,µ), its associated Koopman operatorT = Tϕ and p ∈ [1,∞) the following assertions are equivalent.

(i) (X;ϕ) is strongly mixing.

(ii) µ(ϕ∗nA∩A)→ µ(A)2 for every A ∈ Σ .

(iii) T n→ 1⊗1 as n→ ∞ in the weak operator topology on Lp(X), i.e.,∫X(T n f )g = 〈T n f ,g〉 → 〈 f ,1〉 · 〈1,g〉=

(∫X

f)·(∫

Xg)

for all f ∈ Lp(X), g ∈ Lq(X).

Proof. The implications (i)⇒ (ii) and (iii)⇒ (i) are trivial, so suppose that (ii) holds.Fix A ∈ Σ and k, l ∈ N0, and let f := T k1A, g := T l1A. Then for n≥ l− k we have

〈T n f ,g〉=⟨

T n+k1A,T l1A

⟩=⟨

T n+k−l1A,1A

⟩= µ(ϕ∗(n+k−l)A∩A).

By (ii) it follows that

limn→∞〈T n f ,g〉= µ(A)2 =

⟨T k1A,1

⟩⟨1,T l1A

⟩= 〈 f ,1〉 · 〈1,g〉

Let EA = lin1,T k1A : k ≥ 0 in L2(X). Then by the power-boundedness of T anapproximation argument yields that the convergence

〈T n f ,g〉 → 〈 f ,1〉 · 〈1,g〉 (9.1)


holds for all f ,g ∈ EA. But if g ∈ E⊥A then, since EA is T -invariant, 〈T n f ,g〉 = 0 =〈1,g〉 for f ∈ EA. Therefore (9.1) holds true for all f ∈ EA and g ∈ L2(X). If wetake f = 1A and g = 1B we arrive at (i). Moreover, another approximation argumentyields (9.1) for arbitrary f ∈ Lp(X) and g ∈ Lq(X), i.e., (iii).

Remark 9.5. In the situation of Theorem 9.4 let D⊆ Lp and F ⊆ Lq such that lin(D)is norm-dense in Lp and lin(F) is norm-dense in L1. Then by standard approximationarguments we can add to Theorem 9.4 the additional equivalence

(iii’) limn→∞

∫X(T n f ) ·g =

(∫X

f)·(∫

Xg)

for every f ∈ D, g ∈ F.

Based on (iii’), one can add another equivalence:

(i’) limn→∞

µ(ϕ∗nA∩B) = µ(A)µ(B) for all A ∈D, B ∈ F‘,

where D,F are given ∩-stable generators of Σ (use Lemma B.16).

Let T be a power-bounded operator on a Banach space E. A vector f ∈ E iscalled weakly stable with respect to T if T n f → 0 weakly. It is easy to see that theset of weakly stable vectors

Ews(T ) :=

f ∈ E : T n f → 0 weakly

is a closed T -invariant subspace of E, and that fix(T )∩Ews(T ) = 0.

Theorem 9.6. For an MDS (X;ϕ) with Koopman operator T = Tϕ on E = Lp(X),1≤ p < ∞, the following assertions are equivalent.

(i) (X;ϕ) is strongly mixing.

(iv) For all f ∈ E, if∫

X f = 0 then T n f → 0 weakly.

(v) fix(T ) = lin1 and T n+1−T n→ 0 in the weak operator topology.

(vi) E = lin1⊕Ews(T ).

Proof. (i)⇔ (vi) and (i)⇒ (v) is clear from (iii) of Theorem 9.4.(v)⇒ (vi): By mean ergodicity of T , we have E = fix(T )⊕ ran(I− T ). By (v),fix(T ) = lin1 and ran(I−T )⊆ Ews(T ). Since Ews(T ) is closed, (iv) follows.(vi)⇒ (i): Write f ∈ E as f = c1+g with g ∈ Ews(T ). Then T n f = c1+T ng→ c1weakly. It follows that 〈 f ,1〉 = 〈T n f ,1〉 → c, and this is (i), by (iii) of Theorem9.4.

Given an MDS (X;ϕ) and a fixed natural number k ∈ N, then ϕk is measure-preserving, too. We thus obtain a new MDS (X;ϕk), called the kth iterate of theoriginal system. The next result shows that the class of (strongly) mixing systems isstable under forming iterates. In fact, we can prolong our list characterisations oncemore.

Corollary 9.7. For an MDS (X;ϕ) and k ∈ N the following assertions are equiva-lent.

9.1 Strong Mixing 129

(i) The MDS (X;ϕ) is strongly mixing.

(vii) Its kth iterate (X;ϕk) is strongly mixing.

Proof. By (iv) of Theorem 9.6 it suffices to show that Ews(T ) = Ews(T k). The inclu-sion ⊆ is trivial. For the inclusion ⊇ take f ∈ Ews(T k). Then for each 0≤ j < k thesequence (T nk+ j f )n∈N tends to 0 weakly, and hence T n f → 0 weakly as well.

Let us turn to the characterisation of mixing Markov shifts.

Mixing of Markov Shifts

Let (W +k ,Σ ,µ(S, p);τ) be the Markov shift on L := 0, . . . ,k−1 associated with a

row-stochastic matrix S = (si j)0≤i, j<k and a fixed probability vector p = (p j)0≤ j<k.Since ergodicity is necessary for strong mixing, we suppose in addition that S isirreducible (cf. Theorem 8.14). As in Section 8.3 we introduce

Q := limn→∞

1n

n−1

∑j=0

S j.

Then Q is strictly positive and each row of Q is equal to p.

Proposition 9.8. In the situation described above, the following assertions areequivalent.

(i) The Markov shift (W +k ,Σ ,µ(S, p);τ) is strongly mixing.

(ii) Sn→ Q as n→ ∞.

(iii) There is n≥ 0 such that Sn is strictly positive.

(iv) σ(S)∩T= 1.

Proof. Since Q is strictly positive, (ii) implies (iii). That (iv) implies (ii) can be seenby using the ergodic decomposition Ck = fix(S)⊕ ran(I− S) and noting that S hasspectral radius strictly less than 1 when restricted to the second direct summand. Forthe implication (iii)⇒ (iv) we may suppose without loss of generality that S itselfis strictly positive. Indeed, suppose we know that σ(A)∩T= 1 for every strictlypositive row-stochastic matrix A. Now, if we take A = Sn, we obtain

σ(S)n∩T= σ(Sn)∩T= 1

for all large n ∈ N. (We used here that the product of a strictly positive matrix witha positive one is again strictly positive.) This implies that eventually λ n = λ n+1 = 1for every peripheral eigenvalue λ of S, hence (iv) holds.

So suppose that A is a strictly positive row-stochastic matrix. Then there is ε ∈ (0,1)and a row-stochastic matrix T such that A = (1− ε)T + εE, where E = (1/k)1 ·1>is the matrix having each entry equal to 1/k. Since row-stochastic matrices act onrow vectors as contractions for the 1-norm, we have

130 9 Mixing Dynamical Systems∥∥xtA− ytA∥∥

1 = (1− ε)∥∥(x− y)tT

∥∥1 ≤ (1− ε)‖x− y‖1

for all probability vectors x,y ∈ Ck. Banach’s fixed-point theorem yields thatlimn→∞ xtAn must exist for every probability vector x ∈ Ck, hence for every x ∈ Ck.Therefore 1 is the only peripheral eigenvalue of A. This was the missing piece ofpuzzle in the above.Suppose now that (ii) holds and let

E = i0× ·· ·×il×∏L, F = j0× ·· ·× jr×∏L

for certain i0, . . . , il , j0, . . . , jr ∈ L := 0, . . . ,k−1. As in Section 8.3,

µ([τn ∈ F ]∩E) = pi0si0i1 . . .sil−1il [Sn−l ]il j0s j0 j1 . . .s jr−1 jr

for n > l. By (ii) this converges to

pi0si0i1 . . .sil−1il p j0s j0 j1 . . .s jr−1 jr = µ(F)µ(E).

Since sets E,F as above generate Σ , (i) follows. Conversely, if (i) holds, then bytaking r = l = 0 in the above we obtain

pi0 [Sn]i0 j0 = µ([τn ∈ F ]∩E)→ µ(E)µ(F) = pi0 p j0 .

Since p is strictly positive, (ii) follows.

If the matrix S satisfies the equivalent statements from above, it is called prim-itive. Another term often used is aperiodic; this is due to the fact that a row-stochastic matrix is primitive if and only if the greatest common divisor of thelengths of cycles in the associated transition graph over L is equal to 1 (see[Billingsley (1979), p.106] for more information).

The Blum–Hanson Theorem

Let us return to the more theoretical aspects. The following striking result wasproved in [Blum and Hanson (1960)]. We state it without proof.

Theorem 9.9 (Blum–Hanson, 1960). Let (X;ϕ) be an MDS with Koopman oper-ator T = Tϕ , and let 1 ≤ p < ∞. Then (X;ϕ) is strongly mixing if and only if forevery subsequence (nk)k∈N of N and every f ∈ Lp(X) the strong convergence

limn→∞

1n

n−1

∑j=0

T n j f =(∫

Xf)·1

holds.

Remarks 9.10. 1) The equivalence stated in this theorem is rather elementary ifstrong convergence is replaced by weak convergence. Namely, it follows from

9.2 Weakly Mixing Systems 131

the fact that a sequence (an)n in C is convergent if and only if each of itssubsequences is Cesaro convergent (see Exercise 4).

2) As a matter of fact, Theorem 9.9 is purely of operator theoretic content. In[Jones and Kuftinec (1971)] and [Akcoglu and Sucheston (1972)] the follow-ing was proved.

Theorem. Let T be a contraction on a Hilbert space H and denote by P the corre-sponding mean ergodic projection. Then the following assertions are equivalent.

(i) T n→ P in the weak operator topology.

(ii)1n∑

∞

j=1T n j → P in the strong operator topology for every subsequence (n j) j

of N.

4) Muller and Tomilov have shown that the statement may fail for power-bounded operators on Hilbert spaces [Muller and Tomilov (2007)].

Strong mixing is generally considered to be a “difficult” property. For instance,Katok and Hasselblatt write:

“... It [strong mixing] is, however, one of those notions, that is easy and natural to define butvery difficult to study...”

[Katok and Hasselblatt (1995), p.748]. One reason may be that the characterisationsof mixing from above require knowledge about all the powers T n of T , and henceare usually not easy to verify. Another reason is that up to now no purely spectralcharacterisation of strong mixing is known. This is in contrast with ergodicity, whichis characterised by the purely spectral condition that 1 is a simple eigenvalue of T .

In the rest of the present chapter, we shall deal with a related mixing concept,which proves to be much more well-behaved, so-called weak mixing.

9.2 Weakly Mixing Systems

Let us begin with some terminological remarks. Any strictly increasing sequence( jn)n of natural numbers defines an infinite subset J = jn : n∈N of N. Conversely,any infinite subset J ⊆ N can be written in this way, where the sequence ( jn)n isuniquely determined and called the enumeration of A. As a consequence we oftendo not distinguish between the set J and its enumeration sequence ( jn)n and simplycall it a subsequence of N. If J⊆N is a subsequence and (xn)n is a sequence in sometopological space E and x ∈ E, then we say that xn→ x along J if limn→∞ x jn = x,where ( jn) j is the enumeration of J. We write

limn∈J,n→∞

xn = x or xn→ x as n ∈ J, n→ ∞ or limn∈J

xn = x

to denote that xn→ x along J.


For any subset J ⊆ N0 we define its (asymptotic) density by

d(J) := limn→∞

card(J∩ [1,n])n

provided this limit exists. The density yields a measure of “largeness” of a set of nat-ural numbers. (See Exercises 1 and 2 for some properties.) Subsequences of density1 are particularly important. If (xn)n is sequence in a topological space and x ∈ E,we say that xn→ x in density, written

D-limn

xn = x,

if there is a subsequence J ⊆ N with density d(J) = 1 such that limn∈J xn = x.

With this terminology at hand we can introduce the weaker type of mixing thatwas announced above.

Definition 9.11. An MDS (X;ϕ) is called weakly mixing if

D-limn→∞

µ(ϕ∗nA∩B) = µ(A)µ(B) for every A,B ∈ Σ . (9.2)

Obviously, a strongly mixing system is weakly mixing, and every weakly mixingsystem is ergodic (specialise A = B in (9.2) with A being ϕ-invariant). It turns outthat no nontrivial group rotation system is weakly mixing, so ergodic group rotationsprovide examples of ergodic systems that are not weakly mixing, see Example 9.19and the remarks thereafter.

In contrast to this, it is much harder to construct weakly mixing systems thatare not strongly mixing. This had been an open problem for some while, until fi-nally Chacon in [Chacon (1967)] succeeded by using his so-called stacking method(see also [Chacon (1969)] and Petersen [Petersen (1989), Section 4.5]). Curiouslyenough, the mere existence of such systems had been established much earlier, basedon Baire category arguments. Namely, Halmos [Halmos (1944)] showed that the setof weakly mixing transformations is residual (i.e., its complement is of first cat-egory) in the complete metric space G of all measure preserving transformationswith respect to the metric coming from the strong operator topology. And four yearslater, Rohlin [Rohlin (1948)] proved that the set of all strongly mixing transforma-tions is of first category in G. Thus, in this sense, the generic MDS is weakly but notstrongly mixing, cf. [Eisner (2010)].

A large part of the theory of weakly mixing systems can be reduced to pureoperator theory.

Theorem 9.12. Let T be a power-bounded operator T on a Banach space E, letx ∈ E, and let M ⊆ E ′ such that lin(M) is norm-dense in E ′. Then the followingassertions are equivalent:

(i) D-limn→∞

⟨T nx,x′

⟩= 0 for all x′ ∈M.

(ii) D-limn→∞

⟨T nx,x′

⟩= 0 for all x′ ∈ E ′.


(iii) limn→∞

1n∑

n−1j=0

∣∣⟨T jx,x′⟩∣∣= 0 for all x′ ∈M.

(iv) limn→∞

1n∑

n−1j=0

∣∣⟨T jx,x′⟩∣∣= 0 for all x′ ∈ E ′.

The proof of Theorem 9.12 rests on the following general fact, first observed in[Koopman and von Neumann (1932)].

Lemma 9.13 (Koopman–von Neumann). For a bounded sequence (xn)n in someBanach space E and x ∈ E the following conditions are equivalent.

(i) limn→∞

1n ∑

nj=1

∥∥x j− x∥∥= 0.

(ii) D-limn→∞

xn = x.

Proof. We may suppose that x = 0 and, by passing to the norm of the elements, thatE = R and 0≤ xn ∈ R for all n ∈ N.

(i)⇒ (ii): Let Lk := n ∈ N : xn ≥ 1/k. Then L1 ⊆ L2 ⊆ . . . , and d(Lk) = 0 since

card(Lk ∩ [1,n])n

≤∑

nj=1kx j

n= k · 1

n

n

∑j=1

x j→ 0 as n→ ∞.

Therefore, we can choose integers 1≤ n1 < n2 < .. . such that

card(Lk ∩ [1,n])n

<1k

(n≥ nk).

Now we defineL :=

⋃k≥1

Lk ∩ [nk,∞)

and claim that d(L)= 0. Indeed, let m∈N such that nk ≤m< nk+1. Then L∩ [1,m]⊆Lk ∩ [1,m] (because the Lk increase with k) and so

card(L∩ [1,m])

m≤ card(Lk ∩ [1,m])

m≤ 1

k.

Hence d(L) = 0 as claimed. For J := N\L we have d(J) = 1 and limn∈J xn = 0.

(ii)⇒ (i): Let ε > 0 and c := supxn : n ∈ N. Choose nε ∈ N such that n > nε andn ∈ J imply that xn < ε . If n > nε , we conclude that

1n

n

∑j=1

x j ≤cnε

n+

1n

n

∑j=nε+1

x j ≤cnε

n+ ε

n−nε

n+

1n ∑

j∈(nε ,n]\Jx j

≤ cnε

n+ ε + c

card([1,n]\ J)n

.

Since d(N\ J) = 0, this implies


limsupn→∞

1n

n

∑j=1

x j ≤ ε.

Since ε was arbitrary, the proof is complete.

Proof of Theorem 9.12. The equivalences (i)⇔ (iii) and (ii)⇔ (iv) follow from theKoopman–von Neumann lemma. It is clear that (iv) implies (iii) and the converseholds by approximation. (Note that the sequence (T nx)n is bounded.)

Proposition 9.14 (Jones–Lin). One can add the assertion

(v) supx′∈E ′,‖x′‖≤1

1n∑

n−1j=0

∣∣⟨T jx,x′⟩∣∣ −→

n→∞0.

to the equivalences in Theorem 9.12. That is, the convergence in (iv) there is evenuniform in x′ from the dual unit ball.

Proof. [Jones and Lin (1976)] First, we suppose that T is a contraction. The dualunit ball B′ := x′ ∈ E ′ : ‖x′‖ ≤ 1 is then compact with respect to the weak∗-topology (Banach–Alaoglu theorem). Since T is a contraction, T ′ leaves B′ invari-ant, and thus gives rise to a TDS (B′;T ′). We consider its Koopman operator S onC(B′), i.e.,

(S f )(x′) := f (T ′x′) (x′ ∈ B′).

Consider the function f ∈ C(B′) defined by f (x′) := |〈x,x′〉|. Hypothesis (iv) ofTheorem 9.12 simply says that An[S] f → 0 pointwise. As a consequence of theRiesz representation theorem (see Theorem 5.7) and the dominated convergencetheorem, An[S] f → 0 weakly, hence in norm, by Proposition 8.18.

In the case that T is just power-bounded we pass to the equivalent norm ‖|y|‖ :=supn≥0 ‖T ny‖ for y ∈ E. With respect to this new norm T is a contraction, and theinduced norm on the dual space is equivalent to the original dual norm, see Exercise3. Consequently, the contraction case can be applied to conclude the proof.

Let E be a Banach space and T ∈L (E). A vector x ∈ E is called almost weaklystable (with respect to T ) if it satisfies the equivalent conditions (ii) and (iv) ofTheorem 9.12 and (v) of Proposition 9.14. The set of almost weakly stable vectorsis denoted by

Es = Es(T ) :=

x ∈ E : x is almost weakly stable w.r.t. T

(9.3)

and called the almost stable subspace.

Corollary 9.15. Let T be a power-bounded operator on the Banach space E. Thenthe set Es(T ) of weakly almost stable vectors is a closed T -invariant subspace of Econtained in ran(I−T ). Moreover, Es(T ) = Es(T k) holds for all k ∈ N.

Proof. It is trivial from (ii) in Theorem 9.12 that Es is T -invariant. That it is a closedsubspace follows from characterisation (iv) (see Exercise 6). If x ∈ Es then by (iv)


An[T ]x→ 0 weakly, hence x ∈ ran(I−T ) by Proposition 8.18. Finally, suppose thatx ∈ Es(T ) and let k ∈ N. Then for n ∈ N and x′ ∈ E ′ we have

1n

n−1

∑j=0

∣∣∣⟨T k jx,x′⟩∣∣∣≤ k · 1

kn

nk−1

∑j=0

∣∣⟨T jx,x′⟩∣∣→ 0 as n→ ∞,

so x ∈ Es(T k). The converse inclusion Es(T k)⊆ Es(T ) is left as Exercise 7.

After these general operator theoretic considerations, we return to dynamical sys-tems. Recall our standing terminology that for p ∈ [1,∞] the letter q denotes theexponent conjugate to p.

Theorem 9.16. Let (X;ϕ) by an MDS with associated Koopman operator T = Tϕ

on E := Lp(X), p ∈ [1,∞). Then the following assertions are equivalent.

(i) The MDS (X;ϕ) is weakly mixing.

(ii) For every f ∈ Lp(X), g ∈ Lq(X) there is a subsequence J ⊆ N of densityd(J) = 1 with

limn∈J

∫X(T n f ) ·g =

(∫X

f)·(∫

Xg).

(iii)1n

n−1

∑k=0

∣∣∣⟨T k f ,g⟩−〈 f ,1〉 · 〈1,g〉

∣∣∣ −→n→∞

0 for all f ∈ Lp, g ∈ Lq.

(iv) For each f ∈ E with∫

X f = 0 one has f ∈ Es(T ).

(v) fix(T ) = lin1 and ran(I−T )⊆ Es(T ).

(vi) E = lin1⊕Es(T ).

Proof. Note that (ii) simply says that f −〈 f ,1〉1 ∈ Es(T ) for all f ∈ E. Hence theequivalence (ii)⇔ (iii) follows from Theorem 9.12, and the equivalences (ii)⇔ (iv)and (ii)⇔ (vi) are straightforward.

(v)⇔ (vi): By Corollary 9.15 we have Es ⊆ ran(I−T ). So the asserted equivalencefollows since T is mean ergodic.

(ii)⇒ (i): Simply specialise f = 1A,g = 1B in (ii) for A,B ∈ ΣX.

(i)⇒ (ii): Fix A ∈ ΣX and let f := 1A. Then, by definition of weak mixing,

D-limn→∞

〈T n( f −〈 f ,1〉1),1B〉= D-limn→∞

〈T n f ,1B〉−µ(A)µ(B) = 0

for all B ∈ ΣX. Hence Theorem 9.12 with M = 1B : B ∈ ΣX yields that f −〈 f ,1〉1 ∈ Es(T ). Since Es(T ) is a closed linear subspace, by approximation it fol-lows that f −〈 f ,1〉1 ∈ Es for all f ∈ E, i.e., (ii).

Remark 9.17. In the situation of Theorem 9.16 let D ⊆ Lp and M ⊆ Lq such thatlin(D) is norm-dense in Lp and lin(M) is norm-dense in L1. Then we can add toTheorem 9.16 the following additional equivalences.


(ii’) For every f ∈ D,g ∈ M there is a subsequence J ⊆ N of density d(J) = 1with

limn∈J

∫X(T n f ) ·g =

(∫X

f)·(∫

Xg).

(iii’)1n

n−1

∑k=0

∣∣∣⟨T k f ,g⟩−〈 f ,1〉 · 〈1,g〉

∣∣∣ −→n→∞

0 for each f ∈ D, g ∈M.

(vi’) f −〈 f ,1〉1 ∈ Es(T ) for each f ∈ D.

Moreover, based on (iii’) and Lemma B.16 one can add another equivalence:

(i’) D-limn→∞ µ(ϕ∗nA∩B) = µ(A)µ(B) for all A ∈D , B ∈ F,

where D,F are given ∩-stable generators of ΣX. The proofs for these claims are leftas Exercise 8.

From these results we obtain as in Corollary 9.7 a characterisation of weaklymixing systems by their iterates.

Corollary 9.18. For an MDS (X;ϕ) and k ∈N the following assertions are equiva-lent:


(vii) Its kth iterate (X;ϕk) is weakly mixing.

Proof. By (iv) from 9.16 it suffices to have Es(T ) = Es(T k), which has already beenestablished in Corollary 9.15.

An MDS (X;ϕ) is called strongly ergodic if all its kth iterates are ergodic. It iseasy to construct ergodic systems which are not strongly ergodic (Exercise 10). ByCorollary 9.18 and Theorem 9.16 every weakly mixing MDS is strongly ergodic.On the other hand, any ergodic (=irrational) rotation (T;a) is strongly ergodic, butnot weakly mixing, as the following example shows.

Example 9.19 (Nonmixing Group Rotations). Consider any rotation (T;a) withKoopman operator T and f (z) := z. Then 〈 f ,1〉 = 0. But

⟨T n f , f

⟩= an does not

converge to 0 = 〈 f ,1〉 along any subsequence J ⊆N, so (T;a) is not weakly mixing.

We shall see in Chapter 16 that no nontrivial ergodic group rotation is a weaklymixing system. Moreover, we shall give a new description of Es(T ) as a summandof the so-called Jacobs–de Leeuw–Glicksberg decomposition

E = Er(T )⊕Es(T )

of a reflexive Banach space E with respect to a Markov operator T ∈L (E). As aconsequence, we shall obtain further characterisations on weakly mixing systems,see Theorem 16.20. In particular, we shall prove in Corollary 16.35 that an ergodicMDS (X;ϕ) is weakly mixing if and only if 1 is the only eigenvalue of Tϕ of mod-ulus 1. Hence weakly mixing systems can be characterised like ergodic systems bya purely spectral condition for the Koopman operator.


Product Systems

Weak mixing can also be equivalently described by means of product systems. Re-call from Section 5.1 that given two MDSs (X;ϕ) and (Y;ψ) their product is

(X⊗Y;ϕ×ψ), where X⊗Y := (X×Y,ΣX⊗ΣY,µX⊗µY).

and (ϕ×ψ)(x,y) := (ϕ(x),ψ(y)). It is sometimes convenient to write ϕ×ψ to referto that system.

For functions f on X and g ∈ Y the function f ⊗g on the product is defined by

( f ⊗g)(x,y) := f (x)g(y) (x ∈ X , y ∈ Y )

see Appendix B.6. Standard product measure theory yields that for 1 ≤ p < ∞ thelinear span of

f ⊗g : f ∈ Lp(X), g ∈ Lp(Y)

is dense in Lp(X⊗Y), see Corollary B.19. Since we have the product measure onthe product space, the canonical Lp-Lq-duality satisfies

〈 f ⊗g,u⊗ v〉= 〈 f ,u〉 · 〈g,v〉

for Lp-functions f ,g and Lq-functions u,v (where q is, of course, the conjugate ex-ponent to p).

Let T and S denote the Koopman operators associated with ϕ and ψ , respectively.The Koopman operator associated with ϕ×ψ is denoted by T ⊗S, since it satisfies

(T ⊗S)( f ⊗g) = T f ⊗Sg ( f ∈ Lp(X), g ∈ Lp(Y)).

(By density, T ⊗ S is uniquely determined by these identities.) After these prelim-inary remarks we can state and prove the announced additional characterisation ofweak mixing.

Theorem 9.20. Let (X;ϕ) and (Y;ψ) be MDSs. Then the following assertions areequivalent.

(i) Both (X;ϕ) and (Y;ψ) are weakly mixing.

(ii) The product system (X⊗Y;ϕ×ψ) is weakly mixing.

Proof. Suppose that (i) holds and let f ,u ∈ L2(X) and g,v ∈ L2(Y). By hypothesisand Theorem 9.16 one can find subsequences J,J′ ⊆ N of density 1 such that

limn∈J〈T n f ,u〉= 〈 f ,1〉〈1,u〉 and lim

n∈J′〈Sng,v〉= 〈g,1〉〈1,v〉

Since d(J∩ J′) = 1 (Exercise 1) we may suppose that J = J′. Hence,


limn∈J〈(T ⊗S)n( f ⊗g),u⊗ v〉= lim

n∈J〈T n f ⊗Sng,u⊗ v〉

= limn∈J〈T n f ,u〉〈Sng,v〉= 〈 f ,1〉〈1,u〉〈g,1〉〈1,v〉= 〈 f ⊗g,1〉〈1,u⊗ v〉 .

By the equivalence (ii’) in Remark 9.17 with D=E = h⊗k : h∈L2(X),k∈L2(Y)we conclude that the product system is weakly mixing.Conversely, suppose that (ii) holds. Then for f ,u ∈ L2(X) we have

〈T n f ,u〉= 〈T n f ,u〉 · 〈1,1〉= 〈T n f ⊗Sn1,u⊗1〉= 〈(T ⊗S)n( f ⊗1),u⊗1〉 .

The latter converges to 〈 f ⊗1,1〉 · 〈1,u⊗1〉= 〈 f ,1〉〈1,u〉 in density, whence (X;ϕ)is weakly mixing. By symmetry, we obtain (i).

Theorem 9.20 allows us to enlarge the list of equivalences of Theorem 9.16.

Corollary 9.21. For an MDS (X;ϕ) the following assertions are equivalent.

(i) (X;ϕ) is weakly mixing.

(viii) (X⊗ Y;ϕ × ψ) is weakly mixing for some/each weakly mixing system(Y;ψ).

(ix) (X⊗X;ϕ×ϕ) is weakly mixing.

We remark that the product of two ergodic systems need not be ergodic, seeExercise 9.

9.3 Weak Mixing of All Orders

For our mixing concepts hitherto we considered pairs of sets A,B ∈ Σ or pairs offunctions f ,g. In this section we introduce a new notion, which intuitively describesthe mixing of an arbitrary finite number of sets.

Definition 9.22. An MDS (X;ϕ), X = (X ,Σ ,µ), is called weakly mixing of orderk ∈ N if

limN→∞

1N

N−1

∑n=0

∣∣µ(A0∩ϕ∗nA1∩ . . .∩ϕ

∗(k−1)nAk−1)−µ(A0)µ(A1) · · ·µ(Ak−1)∣∣= 0

holds for every A0, . . . ,Ak−1 ∈ Σ . An MDS is weakly mixing of all orders if it hasthe latter property for all k ∈ N.

Notice first that weak mixing of order 2 is just the same as weak mixing. We beginwith some elementary observations.

Proposition 9.23. a) If an MDS (X;ϕ) is weakly mixing of order m ∈N, then itis weakly mixing of order k for all k ≤ m.

9.3 Weak Mixing of All Orders 139

b) An MDS (X;ϕ), X = (X ,Σ ,µ), is weakly mixing of order k ∈ N if and onlyif

D-limn→∞

µ(A0∩ϕ∗nA1∩ . . .∩ϕ

∗(k−1)nAk−1) = µ(A0)µ(A1) · · ·µ(Ak−1)

for all A0,A1, . . . ,Ak−1 ∈ Σ .

Proof. a) To check weak mixing of order k specialise Ak = · · ·= Am−1 = X . For b)Use the Koopman–von Neumann Lemma 9.13.

As seen above, a system that is weakly mixing of all orders is weakly mixing.Our aim in this section is to show that, surprisingly, the converse is also true: Aweakly mixing system is even weakly mixing of every order k ∈ N (Theorem 9.29below).

We begin with an auxiliary lemma that plays a central role in this context.

Lemma 9.24 (Van der Corput). Let H be a Hilbert space and (un)n∈N0 ⊆ H with‖un‖ ≤ 1. For j ∈ N0 define

γ j := limsupN→∞

∣∣∣ 1N

N−1

∑n=0

(un∣∣un+ j

)∣∣∣.Then the inequality

limsupN→∞

∥∥∥ 1N

N−1

∑n=0

un

∥∥∥2≤ limsup

J→∞

1J

J−1

∑j=0

γ j

holds. In particular, limN→∞1N ∑

N−1j=0 γ j = 0 implies limN→∞

1N ∑

N−1n=0 un = 0.

Proof. We shall use the notation o(1) to denote terms converging to 0 as N → ∞.First observe that for a fixed J ∈ N we have

1N

N−1

∑n=0

un =1N

N−1

∑n=0

un+ j +o(1)

for every 0≤ j ≤ J−1 and hence

1N

N−1

∑n=0

un =1N

N−1

∑n=0

1J

J−1

∑j=0

un+ j +o(1).

By the Cauchy–Schwarz inequality this implies

∥∥∥ 1N

N−1

∑n=0

un

∥∥∥≤ 1N

N−1

∑n=0

∥∥∥1J

J−1

∑j=0

un+ j

∥∥∥+o(1)≤(

1N

N−1

∑n=0

∥∥∥1J

J−1

∑j=0

un+ j

∥∥∥2)1/2

+o(1)

≤(

1N

N−1

∑n=0

1J2

J−1

∑j1, j2=0

(un+ j1

∣∣un+ j2

))1/2

+ o(1).


Letting N→ ∞ in the above and using the definition of γ j, we thus obtain

limsupN→∞

∥∥∥ 1N

N−1

∑n=0

un

∥∥∥2≤ 1

J2

J−1

∑j1, j2=0

γ| j1− j2|. (9.4)

It thus remains to show that

limsupJ→∞

1J2

J−1

∑j1, j2=0

γ| j1− j2| ≤ limsupJ→∞

1J

J−1

∑j=0

γ j =: d. (9.5)

To this aim, observe first that

J−1

∑j1, j2=0

γ| j1− j2| = Jγ0 +2J−1

∑j=1

(J− j)γ j ≤ 2J−1

∑j=0

(J− j)γ j = 2J−1

∑k=1

k(1

k

k

∑j=0

γ j

). (9.6)

Take ε > 0 and k0 such that 1k ∑

kj=0 γ j < d + ε for every k ≥ k0. By

2J−1

∑k=k0

k(1

k

k

∑j=0

γ j

)< 2(d + ε)

J−1

∑k=1

k = J(J−1)(d + ε),

the estimate (9.6) implies that

limsupJ→∞

1J2

J−1

∑j1, j2=0

γ| j1− j2| ≤ d + ε,

which implies (9.5).

For more on van der Corput’s lemma see, e.g., [Tao (2008b)]. Here we shall employit to derive the following “multiple mean ergodic theorem”.

Theorem 9.25. Let (X;ϕ) be a weakly mixing MDS and let T := Tϕ be the Koopmanoperator on E := L2(X ,Σ ,µ). Then

limN→∞

1N

N−1

∑n=0

(T n f1)(T 2n f2) · · ·(T (k−1)n fk−1) =(∫

Xf1 · · ·

∫X

fk−1

)·1 (9.7)

in L2(X) for every k ≥ 2 and every f1, . . . , fk−1 ∈ L∞(X).

Note that since we take the functions f1, . . . , fk from L∞(X), the above productsremain in L2.

Proof. We prove the theorem by induction on k≥ 2. For k = 2 the assertion reducesto the mean ergodic theorem for ergodic systems, see Theorem 8.10. So suppose thatthe assertion holds for some k ≥ 2 and take f1, . . . , fk ∈ L∞. Since the MDS definedby ϕ is weakly mixing we can decompose fk = 〈 fk,1〉1+( f −〈 fk,1〉1), where thelatter part is contained in Es. Hence without loss of generality we my consider the

9.3 Weak Mixing of All Orders 141

cases fk ∈ Es and fk ∈ C1 separately. Since in the latter case the assertions reducesto the induction hypothesis, we are left with the case that fk ∈ Es. Then 〈 fk,1〉 = 0(since Es ⊆ ran(I−Tk), the kernel of the mean ergodic projection), so we have toshow that

limN→∞

1N

N−1

∑n=0

(T n f1)(T 2n f2) · · ·(T kn fk) = 0.

To achieve this we employ van der Corput’s Lemma 9.24. Define un := T n1 f1 ·

T n2 f2 · · ·T n

k fk. Then the invariance of µ and the multiplicativity T imply(un∣∣un+ j

)=∫

X(T n f1 ·T 2n f2 · · ·T kn fk) · (T n+ j f1 ·T 2n+2 j f2 · · ·T kn+k j fk)

=∫

X( f1 ·T n f2 · · ·T (k−1)n fk) · (T j f1 ·T n+2 j f2 · · ·T (k−1)n+k j fk)

=∫

X( f1T j f1) ·T n( f2T 2 j f2) · · ·T (k−1)n( fkT k j fk).

Thus, by the induction hypothesis, the Cesaro means of(

un∣∣un+ j

)converge to∫

Xf1T j f1 ·

∫X

f2T 2 j f2 · · ·∫

XfkT k j fk = ∏

km=1

(fm∣∣T jm fm

).

So we obtain

γ j := limsupN→∞

∣∣∣ 1N ∑

N−1n=0

(un∣∣un+ j

)∣∣∣= ∏km=1

∣∣( fm∣∣T jm fm

)∣∣ .Now, each T jm is a contraction and fk ∈ Es(T ) = Es(T k) (Corollary 9.15), hence itfollows that

limsupn→∞

1n

n−1

∑j=0

γ j ≤ ‖ f1‖2 . . .‖ fk−1‖2 limsupn→∞

1n

n−1

∑j=0

∣∣∣( fk∣∣T k j fk

)∣∣∣= 0,

where we used Theorem 9.16 (iii) for the last equality. An application of van derCorput’s lemma concludes the proof.

Remark 9.26. Suitably adapted, the proof works for T,T 2n, . . . ,T (k−1)n replaced byT m1n,T m2n, . . . ,T mk−1n for given 1≤m1 ≤ ·· · ≤mk−1. (The proof is left as Exercise12.)

By specialising the f j to characteristic functions in the Theorem 9.25 we obtainthe following corollary.

Corollary 9.27. Let (X;ϕ), X = (X ,Σ ,µ), be a weakly mixing MDS. Then

limN→∞

1N

N−1

∑n=0

µ(A0∩ϕ

∗nA1∩ . . .∩ϕ∗(k−1)nAk−1

)= µ(A0)µ(A1) · · ·µ(Ak−1)

for every k ∈ N and every A0, . . . ,Ak−1 ∈ Σ .


For the final step we need a simple lemma.

Lemma 9.28. Let H be a Hilbert space, ( fn)n ⊆ H and f ∈ H. Then the followingassertions are equivalent.

(i) fn→ f in density.

(ii)1n∑

nj=1

∥∥ f j− f∥∥→ 0.

(iii)1n∑

nj=1

∥∥ f j− f∥∥2→ 0.

(iv)1n∑

nj=1 f j→ f weakly and

1n∑

nj=1

∥∥ f j∥∥2→‖ f‖2.

Proof. The Koopman–von Neumann Lemma accounts for the equivalence of (i)–(iii). The equivalence of (iii) and (iv) follows from the identity∥∥ f j− f

∥∥2=∥∥ f j∥∥2−2Re

(f j∣∣ f)+‖ f‖2 .

We can now prove the following result explaining the title of this section.

Theorem 9.29. Every weakly mixing MDS (X;ϕ), X = (X ,Σ ,µ), is weakly mixingof all orders, i.e.,

limN→∞

1N

N−1

∑n=0

∣∣∣µ(A0∩ϕ∗nA1∩ . . .∩ϕ

∗(k−1)nAk−1)−µ(A0)µ(A1) · · ·µ(Ak−1)

∣∣∣= 0

for every k ∈ N and every A0, . . . ,Ak−1 ∈ Σ .

Proof. Since (X⊗X;ϕ×ϕ is again weakly mixing by Corollary 9.21, we can applyCorollary 9.18 to ϕ×ϕ and the sets A0×A0, . . . ,Ak−1×Ak−1 to obtain

limN→∞

1N

N−1

∑n=0

µ(A0∩ . . .∩T (k−1)nAk−1

)2

= limN→∞

1N

N−1

∑n=0

(µ×µ)((A0×A0)∩ . . .∩ (T (k−1)nAk−1×T (k−1)nAk−1)

)= µ(A0)

2 . . .µ(Ak−1)2.

This combined with Corollary 9.27 implies, by Lemma 9.28, that

limN→∞

1N

N−1

∑n=0

∣∣∣µ(A0∩ . . .∩T (k−1)nAk−1)−µ(A0) . . .µ(Ak−1)

∣∣∣= 0,

and the theorem is proved.

Remark 9.30. It is a long-standing open problem whether the analogue of Theorem9.29 for strongly mixing systems is true. That is, it is not known even for k = 3whether for a general strongly mixing system (X;ϕ) one has

Exercises 143

limn→∞

µ(A0∩ϕ

∗nA1∩ . . .∩ϕ∗(k−1)nAk−1

)= µ(A0)µ(A1) · · ·µ(Ak−1)

for every A0, . . . ,Ak−1 ∈ Σ .

Concluding remarks

Let us summarise the important relations between these many notions in the follow-ing diagram:

strongly mixing weakly mixing strongly ergodic ergodic

weak mixingof all orders

ϕ×ϕ weakly mixing ϕ×ϕ ergodic

Exercises

1 (Upper and Lower Density). For a subset J ⊆ N0 the upper and lower densityare defined as

d(J) := limsupn→∞

card(J∩ [1,n])n

and d(J) := liminfn→∞

card(J∩ [1,n])n

.

Show that d(J) and d(J) do not change if we change J by finitely many elements.Then show that

d(J) = 1−d(N\ J) for every J ⊆ N

andd(J∪K)≤ d(J)+d(K) for every J,K ⊆ N.

Conclude that if d(J) = d(K) = 1 then d(J∩K) = 1.

2. Prove the following assertions.

a) For α,β ∈ [0,1], α ≤ β there is a set A⊆ N with d(A) = α and d(A) = β .

b) Suppose that A ⊆ N is relatively dense, i.e., has bounded gaps (Definition3.9). Then d(A)> 0. The converse implication, however, is not true.

c) Let N = A1 ∪ A2 ∪ ·· · ∪ Ak. Then there is 1 ≤ j ≤ k with d(A j) > 0. Theassertion is not true if d is replaced by d.

3. Let T be a power-bounded operator on a Banach space E and define

‖|y|‖ := supn≥0 ‖T ny‖ for y ∈ E.


Show that ‖|·|‖ is a norm on E, equivalent to the original norm. Show that T is acontraction with respect to this norm. Then show that the corresponding dual norm

‖|x′|‖= sup∣∣⟨x,x′⟩∣∣ : ‖|x|‖ ≤ 1

is equivalent to the original dual norm.

4. Prove that a sequence (an)n in C converges if and only if each of its subsequencesis Cesaro convergent.

5. Let (an)n,(bn)n∈N ∈ `∞, and suppose that an→ a along a subsequence J of N withdensity 1, and 1

n ∑nj=1 b j→ b. Show that then

limn→∞

1n

n

∑j=1

a jb j = ab.

Note that the Cesaro limit is not multiplicative in general! (Hint: Use the Koopman–von Neumann lemma and a 3ε-argument.)

6. Let T be a power-bounded operator on a Banach space E. Prove that Es(T ) is aclosed subspace of E, cf. Corollary 9.15. (Hint: Use the description (iv) of Theorem9.12.)

7. Let T be a power-bounded operator on a Banach space E and let k ∈ N. Provethe inclusion Es(T k)⊆ Es(T ), cf. Corollary 9.15. (Hint: Use the description (iv) ofTheorem 9.12 and employ an argument as in the proof of Corollary 9.7.)

8. Prove the characterisations of weakly mixing systems stated in Remark 9.17.

9. Let (T,a) be an ergodic rotation. Show that the product system (T×T;(a,a)) isnot ergodic. (Hint: Consider the function f (z,w) := zw on T×T.)

10. Give an example of an ergodic MDS (X;ϕ) such that the MDS (X ,Σ ,µ;ϕk) isnot ergodic for some k ≥ 2.

11. Give an interpretation of weak mixing (Definition 9.11) and mixing of all orders(Theorem 9.29) for the wine/water-example from the beginning of this chapter.

12. Prove Remark 9.26.

13. Prove that for an MDS (X;ϕ) the following assertions are equivalent.

a) (X;ϕ) is strongly mixing.

b) (X⊗X;ϕ×ϕ) is strongly mixing.

c) (X⊗Y;ϕ×ψ) is strongly mixing for any strongly mixing MDS (Y,Σ ′,ν ;ψ).

14. Let (X;ϕ) be an MDS, with F a ∩-stable generator of Σ . Suppose that

limn→∞

1n

n

∑j=0

µ(ϕ∗nA∩A) = µ(A)2

Exercises 145

for all A ∈ F. Show that (X;ϕ) is ergodic. (Hint: Imitate the proof of (ii)⇒ (iii) inTheorem 9.4 and use Lemma B.16.)

Chapter 10Mean Ergodic Operators on C(K)

In this chapter we consider a TDS (K;ϕ) and ask under which conditions its Koop-man operator T = Tϕ is mean ergodic on C(K). In contrast to the Lp-case, it is muchmore difficult to establish weak compactness of subsets of C(K). Consequently,property (iii) of Theorem 8.20 plays a minor role here. Using instead condition (v)

“fix(T ) separates fix(T ′)”

of that theorem, we have already seen simple examples of TDSs where the Koopmanoperator is not mean ergodic (see Exercise 8.10.a)). Obviously, mean ergodicity ofT is connected with fix(T ′) being “not too large” (as compared to fix(T )). Now, ifwe identify C(K)′ = M(K) by virtue of the Riesz Representation Theorem 5.7, wesee that a complex Baire measure µ ∈M(K) is a fixed point of T ′ if and only if it isϕ-invariant, i.e.,

Mϕ(K) :=

µ ∈M(K) : µ is ϕ-invariant= fix(T ′).

Mean ergodicity of T therefore becomes more likely if there are only few ϕ-invariant measures. We first discuss the existence of invariant measures as an-nounced in Section 5.2.

10.1 Invariant Measures

Given a TDS (K;ϕ) with Koopman operator T on C(K) we want to find a fixedpoint of T ′ which is a probability measure. For this purpose let us introduce

M1(K) :=

µ ∈M(K) : µ ≥ 0, 〈µ,1〉= 1, and

M1ϕ(K) := M1(K)∩Mϕ(K) = M1(K)∩fix(T ′)

the set of all, and of all ϕ-invariant probability measures, respectively. Recall that

147

148 10 Mean Ergodic Operators on C(K)

µ ≥ 0 ⇔ 〈 f ,µ〉 ≥ 0 for all 0≤ C(K).

It follows that both M1(K) and M1ϕ(K) are convex and weakly∗ closed. Since

M1(K)⊆

µ ∈M(K) : ‖µ‖ ≤ 1

and the latter set is weakly∗ compact by the Banach–Alaoglu theorem, we concludethat M1(K) and M1

ϕ(K) are convex and compact (for the weak∗ topology). More-over, M1(K) is not empty, since it contains all point measures. To see that M1

ϕ(K) isnot empty either, we employ one of the classical fixed point theorems.

Theorem 10.1 (Markov–Kakutani). Let C be a nonempty compact, convex sub-set of a Hausdorff topological vector space, and let Γ be a collection of pairwisecommuting affine and continuous mappings T : C→C. Then these mappings havea common fixed point in C.

Proof. First we suppose that Γ = T is a singleton. Let f ∈ C. Then, by com-pactness of C, the sequence An[T ] f has a cluster point g ∈C. Since C is compact,(1/n)( f −T n f )→ 0. Hence by Lemma 8.17 we conclude that T g = g.For the general case it suffices, by compactness, that the sets fix(T ), T ∈Γ , have thefinite intersection property. We prove that by induction. Suppose for T1, . . . ,Tn ∈ Γ

we already know that C1 := fix(T1)∩·· ·∩fix(Tn−1) is nonempty. Now C1 is compactand convex, and any Tn+1 ∈Γ leaves C1 invariant (because Tn+1 commutes with eachTj, 1≤ j ≤ n). Hence by the case already treated, we obtain fix(Tn+1)∩C1 6= /0.

With this result at hand we can prove the existence of invariant measures for anyTDS.

Theorem 10.2 (Krylov–Bogoljubov). For every TDS (K;ϕ) there exists at leastone ϕ-invariant probability measure on K. More precisely, for every 0 6= f ∈ fix(Tϕ)in C(K) there exists µ ∈M1

ϕ(K) such that 〈 f ,µ〉 6= 0.

Proof. We apply the Markov–Kakutani theorem to Γ = T ′ϕ and to the weakly∗

compact, convex set

C := M1(K)∩

µ ∈M(K) : 〈 f ,µ〉= ‖ f‖∞

.

By compactness there is x0 ∈ K with δx0 ∈C, so C is nonempty. Since f ∈ fix(Tϕ),T ′ϕ leaves C invariant, by Theorem 10.1 the assertion is proved.

Remarks 10.3. 1) Let T = Tϕ . As has been mentioned, µ ∈M(K) is ϕ-invariantif and only if T ′µ = µ , i.e., µ ∈ fix(T ′). Hence Theorem 10.2 implies in par-ticular that

fix(T ′) separates fix(T ).

Remarkably enough, this holds for any power-bounded operator on a generalBanach space (see Exercise 8.6).

10.1 Invariant Measures 149

2) Theorem 10.2 has a generalisation to so-called Markov operators, i.e., oper-ators T on C(K) such that T ≥ 0 and T 1 = 1 (see Exercise 2).

3) The proof of the Krylov–Bogoljubov theorem relies on the case of theMarkov–Kakutani theorem when Γ = T ′ϕ is a singleton. Here the use ofLemma 8.17 makes it possible to consider any subsequence (Ank [T

′ϕ ]), a clus-

ter point of which will be a fixed point of T ′ϕ . This “trick” can be used to pro-duce invariant measures with particular properties, as will be done in Chapter18.

The following result shows that the extreme points of the convex compact setM1

ϕ(K) (see Appendix C.7) are of special interest.

Proposition 10.4. Let (K;ϕ) be a TDS. Then µ ∈ M1ϕ(K) is an extreme point of

M1ϕ(K) if and only if the MDS (K,µ;ϕ) is ergodic.

A ϕ-invariant probability measure µ ∈M1ϕ(K) is called ergodic if the correspond-

ing MDS (K,µ;ϕ) is ergodic. By the lemma, µ is ergodic if and only if it is anextreme point of M1

ϕ(K). By the Kreın–Milman theorem (Theorem C.13) such ex-treme points do always exist. Furthermore, we have

M1ϕ(K) = conv

µ ∈M1

ϕ(K) : µ is ergodic, (10.1)

where the closure is to be taken in the weak∗ topology. Employing the so-called“Choquet theory” one can go even further and represent an arbitrary ϕ-invariantmeasure as a “barycenter” of ergodic measures, see [Phelps (1966)].

Proof. Assume that ϕ is not ergodic. Then there exists a set A with 0 < µ(A) < 1such that ϕ∗A = A. The probability measure µA, defined by

µA(B) := µ(A∩B)/µ(A) (B ∈ Ba(K)),

is clearly ϕ-invariant, i.e., µA ∈M1ϕ(K). Analogously, µAc ∈M1

ϕ(K). But

µ = µ(A)µA +(1−µ(A))µAc

is a representation of µ as a nontrivial convex combination, and hence µ is not anextreme point of M1

ϕ(K).

Conversely, suppose that µ is ergodic and that µ = (µ1 + µ2)/2 for some µ1,µ2 ∈M1

ϕ(K). This implies in particular that µ1 ≤ 2µ , hence

|〈 f ,µ1〉| ≤ 2〈| f | ,µ〉= 2‖ f‖L1(µ) ( f ∈ C(K)).

Since C(K) is dense in L1(K,µ), the measure µ1 belongs to L1(K,µ)′. Let us con-sider the Koopman operator Tϕ on L1(K,µ). Since µ is ergodic, fix(Tϕ) is one-dimensional. By Example 8.24 the operator Tϕ is mean ergodic, and by Theorem8.20(v) fix(Tϕ) separates fix(T ′ϕ). Hence the latter is one-dimensional too, and since


µ,µ1 ∈ fix(T ′ϕ) it follows that µ1 = µ . Consequently, µ is an extreme point ofM1

ϕ(K).

By this lemma, if there are two different ϕ-invariant probability measures, thenthere is also a nonergodic one. Conversely, all ϕ-invariant measures are ergodic ifand only if there is exactly one ϕ-invariant probability measure. This leads us to thenext section.

10.2 Uniquely and Strictly Ergodic Systems

Topological dynamical systems that have a unique invariant probability measureare called uniquely ergodic. To analyse this notion further, we need the followinguseful information.

Proposition 10.5. Let (K;ϕ) be a TDS. Then Mϕ(K) is a lattice, i.e., if ν ∈Mϕ(K),then also |ν | ∈Mϕ(L). Consequently, M1

ϕ(K) linearly spans Mϕ(K).

Proof. Let T be the associated Koopman operator on C(K) and take ν ∈ fix(T ′) =Mϕ(K). By Exercise 8 we obtain |ν |= |T ′ν | ≤ T ′|ν | and

〈1, |ν |〉 ≤ 〈1,T ′|ν |〉= 〈1, |ν |〉.

This implies that 〈1,T ′|ν | − |ν |〉 = 0, hence |ν | ∈ fix(T ′). To prove the secondstatement, note that ν ∈ fix(T ′) if and only if Reν , Imν ∈ fix(T ′). But if ν is areal (=signed) measure in fix(T ′), then ν = ν+− ν− with ν+ = 1/2(|ν |+ ν) andν− = 1/2(|ν |−ν) both in Mϕ(K)+ (see Chapter 7.2).

As a consequence we can characterise unique ergodicity.

Theorem 10.6. Let (K;ϕ) be a TDS with Koopman operator T on C(K). The fol-lowing assertions are equivalent.

(i) (K;ϕ) is uniquely ergodic, i.e., M1ϕ(K) is a singleton.

(ii) dimMϕ(K) = 1.

(iii) Every invariant probability measure is ergodic.

(iv) T is mean ergodic and dimfix(T ) = 1.

(v) For each f ∈ C(K) there is c( f ) ∈ C such that

(An[T ] f )(x)→ c( f ) as n→ ∞ for all x ∈ K.

Proof. The equivalence (i)⇔ (ii) is clear by Proposition 10.5. The equivalence (i)⇔(iii) follows from (10.1).If (ii) holds then fix(T ) is one-dimensional, since by Theorem 10.2 fix(T ′)=Mϕ(K)separates the points of fix(T ). But then also fix(T ) separates the points of fix(T ′),

10.2 Uniquely and Strictly Ergodic Systems 151

and hence by Theorem 8.20, T is mean ergodic. If (iv) holds then P f := limn→∞ An fexists uniformly, and is a constant function P f = c( f )1, hence (v) holds.Finally, suppose that (v) holds. If ν ∈ M1

ϕ(K) is arbitrary and f ∈ C(K) then bydominated convergence,

〈 f ,ν〉= 〈An f ,ν〉 → 〈c( f )1,ν〉= c( f ).

Hence M1ϕ(K) consists of at most one point, and hence (i) holds.

Let K be compact and let µ ∈M1(K) be any probability measure on K. Recallfrom Section 5.2 that the support supp(µ) of µ is the unique closed set M ⊆K suchthat for f ∈ C(K)

f ∈ IM :=

g ∈ C(K) : g≡ 0 on M⇔

∫K| f | dµ = 0. (10.2)

Equivalently, we have

supp(µ) =

x ∈ K : µ(U)> 0 for each open set x ∈U ⊆ K

=⋂

A⊆ K : A is closed and µ(A) = 1

(see Lemma 5.9 and Exercise 5.12). The following observation is quite important.

Lemma 10.7. Let (K;ϕ) be a TDS and µ ∈M1(K) a ϕ-invariant measure. Then thesupport supp(µ) of µ is ϕ-stable, i.e., it satisfies ϕ(supp(µ)) = supp(µ).

Proof. Let M := supp(µ). Since T is a lattice homomorphism,∫K|T f | dµ =

∫K

T | f | dµ =∫

K| f | dµ,

whence by (10.2) f ∈ IM if and only if T f ∈ IM . By Lemma 4.14 we conclude thatϕ(M) = M. The remaining statement follows readily.

The measure µ ∈M1(K) has full support or is strictly positive if supp(µ) = K.With the help of Lemma 10.7 we obtain a new characterisation of minimal systems.

Proposition 10.8. For a TDS (K;ϕ) the following assertions hold.

a) Every minimal set M ⊆ K is the support of an invariant probability measureon K.

b) (K;ϕ) is minimal if and only if every invariant probability measure on K isstrictly positive.

Proof. By Lemma 10.7 every invariant probability measure for a minimal TDS hasfull support. Then a) follows by the Krylov–Bogoljubov theorem applied to thesubsystem (M;ϕ), and b) follows from a).


A TDS (K;ϕ) is called strictly ergodic if it is uniquely ergodic and the uniqueinvariant probability measure µ has full support. We obtain the following character-isation.

Corollary 10.9. Let (K;ϕ) be a TDS with Koopman operator T on C(K). The fol-lowing assertions are equivalent.

(i) The TDS (K;ϕ) is strictly ergodic.

(ii) The TDS (K;ϕ) is minimal and T is mean ergodic.

Proof. (i)⇒ (ii): Suppose that (K;ϕ) is strictly ergodic, and let µ be the uniqueinvariant probability measure on K. By Theorem 10.6, T is mean ergodic. Now let/0 6= M ⊆ K be ϕ-invariant. Then by Theorem 10.2 there is ν ∈M1

ϕ(M), which thencan be canonically extended to all K such that supp(ν) ⊆M, see also Exercise 11.But then µ = ν and hence M = K.(ii)⇒ (i): If (K;ϕ) is minimal, it is forward transitive, so fix(T ) =C ·1. Mean ergod-icity of T implies that fix(T ) separates fix(T ′) = Mϕ(K), which is possible only ifdimMϕ(K) = 1, too. Hence, (K;ϕ) is uniquely ergodic, with unique invariant prob-ability measure µ , say. By minimality, µ has full support (Proposition 10.8).

Minimality is not characterised by unique ergodicity. An easy example of auniquely ergodic system that is not minimal (not even transitive) is given in Ex-ercise 4. Minimal systems that are not uniquely ergodic are much harder to obtain,see [Furstenberg (1961)] and [Raimi (1964)].

10.3 Mean Ergodicity of Group Rotations

We turn our attention to various ergodic theoretic properties of rotations on a com-pact group G. By Theorem 8.8 the associated Koopman operator is mean ergodic onLp(G) for p ∈ [1,∞). We investigate now its behaviour on C(G), starting with thespecial case G = T.

Proposition 10.10. The Koopman operator La associated with a rotation system(T;a) is mean ergodic on C(T).

Proof. We write T = La for the Koopman operator and An = An[T ] its Cesaro av-erages. The linear hull of the functions χn : z 7→ zn, n ∈ Z, is a dense subalgebra ofC(T), by the Stone–Weierstrass Theorem 4.4. Since T is power-bounded and there-fore Cesaro-bounded, it suffices to show that Anχm converges for every m∈Z. Notethat T χm = amχm for m ∈N, hence if am = 1, then χm ∈ fix(T ) and there is nothingto show. So suppose that am 6= 1. Then

Anχm =1n

n−1

∑j=0

T jχm =

(1n

n−1

∑j=0

am j

)χm =

1n

1−amn

1−am χm→ 0

as n→ ∞. (Compare this with the final problem in Chapter 1.)

10.3 Mean Ergodicity of Group Rotations 153

To establish the analogous result for general group rotations, we need an auxiliaryresult.

Proposition 10.11. Let G be a compact group and let f ∈ C(G). Then the mapping

G→ C(G), a 7→ La f

is continuous.

Proof. For g, h∈C(G) define (g⊗h)(x,y) := g(x)h(y) and consider the linear space

Y := lin

g⊗h : g, h ∈ C(G)⊆ C(G×G).

This is a conjugation invariant subalgebra of C(G×G), trivially separating thepoints of G×G. By the Stone–Weierstraß Theorem 4.4, Y is dense in C(G×G).Define F(x,y) := (Lx f )(y) = f (xy), which is, by the continuity of the multiplica-tion on G, a continuous function, i.e., F ∈ C(G×G). By what is said above, we cantake a sequence (Fn)n∈N ⊆ Y converging to F . Let a ∈ G and ε > 0 be fixed. Forevery x ∈ G we have

‖La f −Lx f‖∞ = ‖F(a, ·)−F(x, ·)‖∞

≤ ‖F(a, ·)−Fn(a, ·)‖∞ +‖Fn(a, ·)−Fn(x, ·)‖∞ +‖Fn(x, ·)−F(x, ·)‖∞

≤ 2ε +‖Fn(a, ·)−Fn(x, ·)‖∞,

if we choose n ∈ N sufficiently large. Since the function Fn is a linear combinationof functions of the form g⊗h and since n ∈ N is fixed, from

‖g⊗h(a, ·)−g⊗h(x, ·)‖∞ = |g(a)−g(x)| · ‖h‖∞

it follows that

‖La f −Lx f‖∞ ≤ 2ε +‖Fn(a, ·)−Fn(x, ·)‖∞ < 3ε

if x ∈ G lies in a suitably small neighbourhood of a.

(See also Corollary 14.12 for a proof of Proposition 10.11 that avoids the use of theStone–Weierstrass theorem.) As a consequence we obtain the following generalisa-tion of Proposition 10.10.

Corollary 10.12. The Koopman operator associated with a compact group rotationsystem (G;a) is mean ergodic on C(G).

Proof. Let f ∈ C(G). Then by Proposition 10.11 the set Lg f : g ∈ G ⊆ C(G) iscompact. A fortiori, the set Ln

a f : n ∈ N0 is relatively compact. Consequently, itsclosed convex hull is compact, whence Theorem 8.20 (iii) concludes the proof.

Combining the results of this chapter so far, we arrive at the following importantcharacterisation. (We include also the equivalences known from previous chapters.)


Theorem 10.13 (Minimal Group Rotations). Let G be a compact group withHaar probability measure m and consider a group rotation (G;a) with associatedKoopman operator La on C(G). Then the following assertions are equivalent.

(i) (G;a) is minimal.

(ii) (G;a) is (forward) transitive.

(iii) an : n≥ 0 is dense in G.

(iv) an : n ∈ Z is dense in G.

(v) dimfix(La) = 1.

(vi) (G;a) is uniquely ergodic.

(vii) The Haar measure m is the only invariant probability measure for (G;a).

(viii) (G;a) is strictly ergodic.

(ix) (G,m;a) is ergodic.

Proof. The equivalence of (i)–(iv) has been shown in Theorems 2.32 and 3.4. Theimplication (ii)⇒ (v) follows from Lemma 4.18. Combining dimfix(La) = 1 withthe mean ergodicity of La (Corollary 10.12) we obtain the implication (v)⇒ (vi) byTheorem 10.6. (vi)⇒ (vii) is clear since the Haar measure is obviously invariant,and (vii)⇒ (viii) follows since the Haar measure has full support (Theorem 14.13.)By Corollary 10.9, (viii) implies that (G;a) is minimal, hence (i).

By Theorem 10.6, (vii) implies (ix). Conversely, suppose that (G,m;a) is ergodic.Then fix(La) =C1, where La is considered as an operator on L1(G,m). Since m hasfull support, the canonical map C(G) → L1(G;m) is injective, whence (v) follows.

10.4 Furstenberg’s Theorem on Group Extensions

Let G be a compact group with Haar measure m, and let (K;ϕ) be a TDS. Further-more, let Φ : K→ G be continuous and consider the group extension along Φ withdynamics

ψ : K×G→ K×G, ψ(x,y) := (ϕ(x),Φ(x)y),

see Section 2.2.2. Let π : K×G→ K, π(x,g) := x be the corresponding factor map.Of course, for any ϕ-invariant probability measure µ on K the measure µ⊗m is aninvariant measure for the TDS (K×G;ψ). Even if (K;ϕ) is uniquely ergodic, theproduct need not be so. As an example consider the product of two commensurableirrational rotations, which is not minimal (see Example 2.34), so by Theorem 10.13it is not uniquely ergodic.

Therefore, in this section we look for conditions guaranteeing that the group exten-sion is uniquely ergodic. We need the following preparations. The group G acts onG by multiplication from the right by a ∈ G, with associated Koopman operator

10.4 Furstenberg’s Theorem on Group Extensions 155

Rag(y) = g(ya) (a,y ∈ G,g ∈ C(G)). (10.3)

Also the right rotation ρa by an element a ∈ G in the second coordinate is an au-tomorphism of the group extension (K×G;ψ), see Section 2.2.2. The associatedKoopman operator is I⊗Ra, which acts on f ⊗g as f ⊗Rag, and commutes with theKoopman operator Tψ , i.e., Tψ(I⊗Ra) = (I⊗Ra)Tψ .

Theorem 10.14. Let ν be a ψ-invariant probability measure on K×G, and let µ :=π∗ν . If µ⊗m is ergodic, then ν = µ⊗m.

Proof. The product measure µ⊗m is not only ψ-invariant, but also invariant underρa for all a ∈ G. Fix f ∈ C(K) and g ∈ C(G). Since the MDS (K×G,µ⊗m;ψ) isergodic, as a consequence of von Neumann’s Theorem 8.1, we find a subsequence(nk) such that for Ank := Ank [Tψ ] we have

Ank( f ⊗g)→ 〈 f ,µ〉〈g,m〉 ·1 µ⊗m-almost everywhere. (10.4)

For a ∈ G we define

E(a) :=(x,y) ∈ K×G : Ank( f ⊗Rag)(x,y)→ 〈 f ,µ〉〈g,m〉

.

Note that ρa(x,y) ∈ E(1) if and only if (x,y) ∈ E(a) by (10.3). Hence, by (10.4) wehave (µ⊗m)(E(a)) = 1 for all a ∈ G.

Since the mappingG×C(G)→ C(G), (a,g) 7→ Rag

is continuous (by Proposition 10.11), the orbit

RGg :=

Rag : a ∈ G

is a compact subset of C(G). In particular, it is separable, and we find a countablesubset Cg ⊆ G such that Rag : a ∈ Cg is dense in RG. For such a countable setdefine

E :=⋂

a∈CgE(a),

which satisfies (µ⊗m)(E) = 1. We claim that actually E =⋂

a∈GE(a).

Proof of Claim. The inclusion ⊇ is clear. For the converse inclusion fix a ∈ G andtake (am)m ⊆Cg such that ‖Ramg−Rag‖

∞→ 0 as m→ ∞. Then Ank( f ⊗Ramg)→

Ank( f ⊗Rag) uniformly in k ∈N as m→∞. Hence, if (x,y)∈ E(am) for each m∈N,then by a standard argument, (x,y) ∈ E(a), too.

Now, for x ∈ K and y,a ∈ G we have the following chain of equivalences:

(x,ya) ∈ E ⇔ ∀b ∈ G : (x,ya) ∈ E(b) ⇔ ∀b ∈ G : (x,yab) ∈ E(1)⇔ ∀b ∈ G : (x,y) ∈ E(ab) ⇔ (x,y) ∈ E.

Hence, E = E1×G, where E1 ⊆K is the section E1 := x∈K : (x,1)∈ E. Since Eis a Baire set in K×G, and Ba(K×G) =Ba(K)⊗Ba(G) (Exercise 5.8), by standard


measure theory E1 is measurable, whence µ(E1) = 1. By definition of µ , this meansthat ν(E) = 1, i.e.,

Ank( f ⊗g)→ 〈 f ,µ〉〈g,m〉 ν-almost everywhere.

From the dominated convergence we obtain

〈 f ⊗g,ν〉=⟨Ank( f ⊗g),ν

⟩→ 〈 f ,µ〉〈g,m〉 .

As we saw in the proof of Proposition 10.11 the linear hull of functions f⊗g is densein C(K×G), so by the Riesz Representation Theorem 5.7 the proof is complete.

A direct consequence is the following result of Furstenberg.

Corollary 10.15 (Furstenberg). If (K;ϕ) is uniquely ergodic with invariant prob-ability measure µ , and if µ⊗m is ergodic, then (K×G;ψ) is uniquely ergodic.

The usual proof of Furstenberg’s result relies on the pointwise ergodic theorem(which is the subject of the next chapter) and on generic points (see Exercise 11.5).We now turn to an important example of the situation from above.

Unique Ergodicity of Skew Shifts

Let α ∈ [0,1) be an irrational number. By Theorem 10.13, the rotation ϕα : x 7→ x+α (mod1) defines a uniquely ergodic TDS. We consider here its group extensionalong the function Φ : [0,1)→ [0,1), Φ(x) = x, i.e.,

K := [0,1]2 (mod1) and ψα(x,y) = (ϕα(x),Φ(x)+ y) = (x+α,x+ y).

Then (K;ψα) is a TDS, see Chapter 2, Example 2.20.

Our aim is to show that this TDS, called the skew rotation, is strictly ergodic.The two-dimensional Lebesgue measure λ2 on K is ψ-invariant, so it remains toprove that this measure is actually the only ψ-invariant probability measure on K.The next is an auxiliary result, needed for the application of Corollary 10.15.

Proposition 10.16. Let α ∈ [0,1) be an irrational number. Then the skew rotation(K,λ2;ψα) is ergodic.

Proof. To prove the assertion we use Proposition 7.14 and show that the fixed spaceof the Koopman operator T of ψα on L2(K,λ2) consists of the constant functionsonly. For f ,g ∈ L2[0,1] the function f ⊗g : (x,y) 7→ f (x)g(y) belongs to L2(K,λ2).Now every F ∈ L2(K,λ2) can be written uniquely as an L2-sum

F = ∑n∈Z

fn⊗ en,

where ( fn)n∈Z ⊆ L2[0,1] and (en)n∈Z is the orthonormal basis

10.5 Application: Equidistribution 157

en(y) := e2πiny (y ∈ [0,1], n ∈ Z)

of L2[0,1]. Applying T to F yields

T F = ∑n∈Z

(enLα fn)⊗ en,

hence the condition F ∈ fix(T ) is equivalent to

Lα fn = e−n fn for every n ∈ Z.

For n ∈ Z with fn 6= 0 we have to show that n = 0. Since

Lα | fn|= |Lα fn|= |e−n fn|= | fn|

and α is irrational, | fn| is a constant (nonzero) function. Now take θ ∈ [0,1) ra-tionally independent from α and consider the Koopman operator Lθ ∈L (L2[0,1])associated with the translation by θ on [0,1). For gn := fn/(Lθ fn) we then have

Lα gn = Lα

fn

Lθ fn=

Lα fn

Lθ Lα fn=

e−n fn

e−n(θ)e−nLθ fn= en(θ)gn.

By Exercise 10 the eigenvalues of Lα are of the form e2πimα for some Z, so en(θ) =e2πimα = em(α) for some m ∈ Z. But since θ is rationally independent from α , itfollows that n = 0 and thus F = f0⊗e0. Since Lα f0 = f0 and ϕα is ergodic, we havethat f0 and hence F is constant.

Since λ2 is strictly positive, the TDS (K;ψα) is even strictly ergodic. So by Corol-lary 10.9 we obtain the next result.

Corollary 10.17. For irrational α ∈ R consider the mod 1 translation system([0,1);α) and the corresponding skew rotation (K;ψα) on K = [0,1)× [0,1). Then(K;ψα) is strictly ergodic ergodic and its associated Koopman operator is meanergodic on C(K).

10.5 Application: Equidistribution

A sequence (αn)∞n=0 ⊆ [0,1) is called equidistributed in [0,1) if

limn→∞

card j : 0≤ j < n, α j ∈ [a,b]n

= b−a

for all a,b with 0 ≤ a < b ≤ 1. In other words, the relative frequency of the first nelements of the sequence falling into an arbitrary fixed interval [a,b] converges tothe length of that interval (independently of its location).


In this section we show how mean ergodicity of the Koopman operator on variousspaces can be used to study equidistribution of some sequences.

A classical theorem from [Weyl (1916)] gives an important example of an equidis-tributed sequence and is a quantitative version of Kronecker’s theorem in Example2.33.

Theorem 10.18 (Weyl). Let α ∈ [0,1)\Q. Then the sequence (nα (mod1))∞n=0 is

equidistributed in [0,1).

We consider the translation system ([0,1);α) as in Example 2.6. Recall from Exam-ple 2.13 that this TDS is isomorphic to the rotation system (T;a), where a = e2πiα .As we saw in the previous section, the Koopman operator La is mean ergodic onC(T), hence so is the Koopman operator Lα of ([0,1);α) on C([0,1)). The latterspace is, however, too small for our purposes.Instead, a better candidate to work with seems to be R[0,1], the space of bounded,1-periodic functions on R that are Riemann integrable over compact intervals, be-cause this space contains characteristic functions of intervals. Equipped with thesup-norm, R[0,1] becomes a Banach space, and the Koopman operator T associatedwith the translation x 7→ x+α leaves R[0,1] invariant, i.e, restricts to an isometricisomorphism of R[0,1].

We shall use Riemann’s criterion of integrability: a bounded real-valued functionf : [0,1]→ R is Riemann integrable if and only if for every ε > 0 there exist stepfunctions gε , hε such that

gε ≤ f ≤ hε and∫ 1

0(hε −gε)(t)dt ≤ ε.

Clearly, if f is 1-periodic (i.e., if f (0) = f (1)), then gε and hε can be chosen 1-periodic as well.

Proposition 10.19. Let α ∈ [0,1)\Q. Then the Koopman operator associated withthe translation system ([0,1);α) is mean ergodic on R[0,1], and the mean ergodicprojection is given by

P f =∫ 1

0f (t)dt ·1 ( f ∈ R[0,1]).

Proof. We denote as usual the Koopman operator by T and its Cesaro averagesby An, n ∈ N. We shall identify functions defined on [0,1) with their 1-periodicextension to R.

Let χ be any characteristic function of an interval (open, closed or half-open)contained [0,1), and let ε > 0. A moment’s thought reveals that there exist continu-ous 1-periodic functions gε , hε on R satisfying

gε ≤ χ ≤ hε and∫ 1

0(hε −gε)(t)dt ≤ ε.

10.5 Application: Equidistribution 159

By Proposition 10.10 and by the isomorphism of the rotation on the torus and themod 1 translation on [0,1), Angε converges uniformly to (

∫ 10 gε(t)dt) ·1, while Anhε

converges uniformly to (∫ 1

0 hε(t)dt) ·1. Since Angε ≤Anχ ≤Anhε for all n∈N, weobtain (∫ 1

0gε(t)dt− ε

)·1≤ Anχ ≤

(∫ 1

0hε(t)dt + ε

)·1

for sufficiently large n. So we have for such n that

∥∥Anχ−∫ 1

0χ(t)dt ·1

∥∥∞≤ 2ε

by the choice of gε and hε . Therefore

‖ · ‖∞− limn→∞

Anχ =

(∫ 1

0χ(t)dt

)·1.

Clearly the convergence on these characteristic functions extends to all 1-periodicstep functions. Take now a real-valued f ∈ R[0,1]. As noted above, for ε > 0 wefind 1-periodic step functions gε , hε with gε ≤ f ≤ hε and

∫ 10 (hε−gε)(t)dt ≤ ε . By

a similar argument as above we then obtain

‖ · ‖∞− limn→∞

An f =(∫ 1

0f (t)dt

)·1.

By using the mean ergodicity of Lα on R[0,1] we can now prove Weyl’s theorem.

Proof of Weyl’s Theorem 10.18. Denote αn := nα (mod1) and let

f =

1[a,b] if b < 1,10∪[a,b] if b = 1.

Then f ∈ R[0,1] and

An f (0) =1n

n−1

∑j=0

L jα f (0) =

1n

n−1

∑j=0

f (α j) =card j : 0≤ j < n, α j ∈ [a,b]

n.

By Proposition 10.19, Lα is mean ergodic on R[0,1] and the Cesaro averages con-verge uniformly to P f . In particular

An f (0)→∫ 1

0f (s)ds = b−a.

Note that the last step of the proof consisted essentially in showing that if


1n

n−1

∑j=0

f (α j)→∫ 1

0f (s)ds (n→ ∞)

for all f ∈ R[0,1], then the sequence (αn)n is equidistributed. The following is aconverse to this fact, its proof is left as Exercise 9.

Proposition 10.20. A sequence (αn)∞n=0 ⊆ [0,1) is equidistributed if and only if

limn→∞

1n

n−1

∑j=0

f (α j) =∫ 1

0f (s)ds

holds for every bounded and Riemann integrable (equivalently, for every continu-ous) function f : [0,1]→ C.

This result applied to exponential functions s 7→ e2πins was the basis of Weyl’soriginal proof. More on this circle of ideas can be found in [Hlawka (1979)] or[Kuipers and Niederreiter (1974)].

By using the mean ergodicity of the Koopman operator corresponding to the skewrotation from Corollary 10.17, we obtain the following result on equidistribution.

Proposition 10.21. Let α ∈ [0,1) \Q. Then the sequence (n2α (mod1))∞n=0 is

equidistributed in [0,1], i.e.,

limn→∞

card j : 0≤ j < n, j2α ∈ [a,b] (mod1)n

= b−a

holds for all a,b with 0≤ a < b≤ 1.

Proof. Consider the TDS (K;ψ2α) as in Section 10.4, whose Koopman operatorT := L2α is now known to be mean ergodic on C(K) by Corollary 10.17. In partic-ular, the Cesaro means An of T converge to the mean ergodic projection P in thesup-norm, hence pointwise. We saw in Corollary 10.9 that P f =

∫K f dλ2 for every

f ∈ C(K). Let now g ∈ C([0,1)) be arbitrary and set f (x,y) := g(y). It is easy to seethat one has ψn

2α(α,0) = ((2n+1)α,n2α). So the convergence

1n

n−1

∑j=0

g(n2α (mod1)) = An f (α,0)→

∫K

f dλ =∫ 1

0g(s)ds

and Proposition 10.20 conclude the proof.

Exercises

1 (Mean Ergodicity on C(K)). Let (K;ϕ) be a TDS with an invariant probabilitymeasure µ . Suppose that

Exercises 161

limn→∞

1n

n−1

∑j=0

f (ϕ j(x)) =∫

Kf dµ

for every f ∈C(K) and every x∈K. Show that the Koopman operator Tϕ on C(K) ismean ergodic and that (K,µ;ϕ) is ergodic. (Hint: Use the Dominated ConvergenceTheorem to prove weak mean ergodicity and then use part (iii) of Theorem 8.20.)

2 (Markov Operators). Let K be a compact topological space. A bounded linearoperator T : C(K)→ C(K) is called a Markov operator if it is positive (i.e., f ≥ 0implies T f ≥ 0) and satisfies T 1 = 1.Let T be a Markov operator on C(K). Show that ‖T‖ = 1 and that there exists aprobability measure µ ∈M1(K) such that∫

KT f dµ =

∫K

f dµ for all f ∈ C(K).

(Hint: Imitate the proof of the Krylov–Bogoljubov theorem.)

3. A linear functional ψ on `∞ = `∞(N) with the following properties is called aBanach limit.

(i) ψ is positive, i.e., (xn)n∈N0 ∈ `∞, xn ≥ 0 imply ψ(xn)n∈N0 ≥ 0.

(ii) ψ is translation invariant, i.e., ψ(xn)n∈N0 = ψ(xn+1)n∈N.

(iii) ψ(1) = 1, where 1 denotes the constant 1 sequence.

Prove the following assertions.

a) A Banach limit ψ is continuous, i.e., ψ ∈ (`∞)′.

b) If (xn)n∈N0 ∈ `∞ is periodic with period k ≥ 1, then for each Banach limit ψ

one has

ψ(xn) =1k

k−1

∑j=0

x j.

c) If (xn) ∈ c and ψ is a Banach limit, then ψ(xn) = limn→∞ xn.

d) If (xn) ∈ `∞ is Cesaro convergent (i.e., cn := 1n ∑

n−1j=0 x j converges), then for

each Banach limit ψ one has

ψ(xn) = limn→∞

1n

n−1

∑j=0

x j.

e) There exist Banach limits. (Hint: Imitate the proof of the Krylov–Bogoljubovtheorem.)For any (xn)n∈N ∈ `∞(R) and α ∈ [liminfn→∞ xn, limsupn→∞ xn] there is aBanach limit ψ with ψ(xn) = α (see Remark 10.3.3)).

4. Let K := [0,1] and ϕ(x)= 0 for all x∈ [0,1]. Show that (K;ϕ) is uniquely ergodic,but not minimal.


5. Take K = z ∈ C : |z| ≤ 1 and ϕa(z) = az for some a ∈ T satisfying an 6= 1 forall n ∈ N. Find the ergodic measures for ϕa.

6. For the doubling map on K = [0,1) (cf. Exercise 2.10) and the tent map on [0,1](cf. Exercise 2.11) answer the following questions:

1) Is the TDS uniquely ergodic?

2) Is the Koopman operator on C(K) mean ergodic?

7. Let a ∈ T with an 6= 1 for all n ∈ N. Show that the Koopman operator T = Tϕa

induced by the rotation ϕa is not mean ergodic on the space B(T) of bounded, Borelmeasurable functions endowed with sup-norm. (Hint: Take a 0-1-sequence cn∞

n=1which is not Cesaro convergent and consider the characteristic function of the setan : cn = 1.)

8. Let K,L be compact spaces and let ϕ : K → L be continuous. Let T := Tϕ :C(L)→ C(K) be the associated Koopman operator. Show that T ′µ = ϕ∗µ is thepush-forward measure for every µ ∈M(K). Then show that |T ′µ| ≤ T ′ |µ| for everyµ ∈M(K), e.g., by using the definition of the modulus of a measure in AppendixB.9.


10. Let α ∈ [0,1) be an irrational number and consider the shift ϕα : x 7→ x +α (mod1) on [0,1]. Show that for the point spectrum of the Koopman operatorT = Lα on L2[0,1] we have σp(T ) = e2πimα : m ∈ Z. (Hint: Use a Fourier expan-sion as in the proof of Proposition 7.15.)

11. Let (K;ϕ) be a TDS and let L ⊆ K be a compact subset with ϕ(L) ⊆ L. Let R :C(K)→ C(M) be the restriction operator. Show that the dual operator R′ : M(L)→M(K) is an isometric lattice homomorphism satisfying

R′(M1ϕ(L)) = µ ∈M1

ϕ(K) : supp(µ)⊆ L.

(Hint: First use Tietze’s theorem to show that R′ is an isometry; then use Exercise 8and the isometry property of R′ to show that |R′µ|= |µ| for every µ ∈M(L); finallyuse Proposition 5.6 to show that R′(Mϕ(L))⊆Mϕ(K).)

12. Show the following result:

Proposition. Let (Ks;ϕ) be the maximal surjective subsystem of a TDS (K;ϕ) (seeCorollary 2.24), and let R : C(K)→C(Ks) be the restriction mapping. Then the dualoperator R′ : M(Ks)→M(K) restricts to an isometric lattice isomorphism

R′ : Mϕ(Ks)→Mϕ(K).

Chapter 11The Pointwise Ergodic Theorem

While von Neumann’s mean ergodic theorem is powerful and far reaching, it doesnot actually solve the original problem of establishing that “time mean equals spacemean” for a given MDS (X;ϕ). For this we need the pointwise limit

limn→∞

1n

n−1

∑j=0

f (ϕ j(x))

for states x ∈ X and observables f : X → R. Of course, by Theorem 8.10 this limitequals the “space mean”

∫X f dµ for each observable f only if the system is er-

godic. Moreover, due to the presence of null-sets we cannot expect the convergenceto hold for all points x ∈ X . Hence we should ask merely for convergence almosteverywhere.

Shortly after and inspired by J. von Neumann’s mean ergodic theorem,G.D. Birkhoff in [Birkhoff (1931)] proved the desired result.

Theorem 11.1 (Birkhoff). Let (X ,Σ ,µ;ϕ) be an MDS and let f ∈ L1(X). Thenthe limit

limn→∞

1n

n−1

∑j=0

f (ϕ j(x))

exists for µ-almost every x ∈ X.

Birkhoff’s theorem is also called the Pointwise (or: Individual) Ergodic Theorem.Using the Koopman operator T = Tϕ and its Cesaro averages An = An[T ] we obtainby Birkhoff’s theorem that for every f ∈ L1(X), the sequence (An f )n∈N convergespointwise µ-almost everywhere. Since we already know that T is mean ergodic, theCesaro averages An f converge in L1-norm to the projection onto the fixed spaceof T . Hence Birkhoff’s theorem in combination with Theorem 8.10 (v) implies thefollowing characterisation of ergodic MDSs.

163

164 11 The Pointwise Ergodic Theorem

Corollary 11.2. An MDS (X;ϕ), X = (X ,Σ ,µ), is ergodic if and only if for every(“observable”) f ∈ L1(X) one has “time mean equals space mean”, i.e.,

limn→∞

1n

n−1

∑j=0

f (ϕ j(x)) =∫

Xf dµ

for µ-almost every (“state”) x ∈ X.

As in the case of von Neumann’s theorem, Birkhoff’s result is operator theoreticin nature. We therefore proceed as in Chapter 8 and use an abstract approach.

11.1 Pointwise Ergodic Operators

Definition 11.3. Let X be a measure space and let 1 ≤ p ≤ ∞. A bounded linearoperator T on Lp(X) is called pointwise ergodic if the limit

limn→∞

An f = limn→∞

1n

n−1

∑j=0

T j f

exists µ-almost everywhere for every f ∈ Lp(X).

Birkhoff’s theorem says that Koopman operators arising from MDSs are pointwiseergodic. We shall obtain Birkhoff’s theorem from a more general result which goesback to [Hopf (1954)] and [Dunford and Schwartz (1958)]. Recall that an operatorT on L1(X) is called a Dunford–Schwartz operator (or a complete contraction) ifone has

‖T f‖1 ≤ ‖ f‖1 and ‖T f‖∞≤ ‖ f‖

∞

for all f ∈ L∞∩L1. The Hopf–Dunford–Schwartz result then reads as follows.

Theorem 11.4 (Pointwise Ergodic Theorem). Let X be a measure space and letT be a positive Dunford–Schwartz operator on L1(X). Then T is pointwise ergodic.

Before we turn to the proof of Theorem 11.4 let us discuss some further results.Dunford and Schwartz [Dunford and Schwartz (1958)] have shown that one can

omit the condition of positivity from Theorem 11.4. This is due to the fact that fora Dunford– Schwartz operator T there always exists a positive Dunford-Schwartzoperator S≥ 0 dominating it, in the sense that |T f | ≤ S | f | for all f ∈ L1(X). On theother hand, a general positive contraction on L1(X) need not be pointwise ergodic. Afirst example was given in [Chacon (1964)]. Shortly after, A. Ionescu Tulcea provedeven that the class of positive isometric isomorphisms on L1[0,1] which are notpointwise ergodic is “rich” in the sense of category [IonescuTulcea (1965)].

One can weaken the condition on L∞-contractivity, though. E.g., Hopf has shownthat in place of T 1 ≤ 1 (cf. Exercise 8.9) one may suppose that there is a strictly

11.1 Pointwise Ergodic Operators 165

positive function f such that T f ≤ f [Krengel (1985), Theorem 3.3.5]. For generalpositive L1-contractions there is the following result from [Krengel (1985), Theorem3.4.9].

Theorem 11.5 (Stochastic Ergodic Theorem). Let X be a finite measure spaceand let T be a positive contraction on L1(X). Then the Cesaro averages (An[T ] f )nconverge in measure for every f ∈ L1(X).

If we pass to Lp spaces with 1 < p < ∞, the situation improves. Namely, buildingon [IonescuTulcea (1964)] Akcoglu established in [Akcoglu (1975)] the followingcelebrated result.

Theorem 11.6 (Akcoglu’s Ergodic Theorem). Let X be a measure space and letT be a positive contraction on Lp(X) for some 1 < p < ∞. Then T is pointwiseergodic.

For p = 2 and T self-adjoint, this is due to Stein and has an elementary proof,see [Stein (1961b)]. In the general case the proof is quite involved and beyond thescope of this book, see [Krengel (1985), Section 5.2] or [Kern et al. (1977)] and[Nagel and Palm (1982)]. Burkholder [Burkholder (1962)] has shown that if p = 2,the condition of positivity cannot be omitted. (The question whether this is true alsofor p 6= 2 seems still to be open.)

Let us return to Theorem 11.4 and its proof. For simplicity, suppose that X =(X ,Σ ,µ) is a finite measure space, and T a Dunford–Schwartz operator on X. ByTheorem 8.24 we already know that T is mean ergodic, whence

L1(X) = fix(T )⊕ ran(I−T ).

Since L∞(X) is dense in L1(X), the space

F := fix(T )⊕ (I−T )L∞(X)

is still dense in L1(X). As before, we write An := An[T ] for n ∈ N.

Lemma 11.7. In the situation described above, the sequence (An f )n∈N is ‖·‖∞

-convergent for every f ∈ F.

Proof. Write f = g+(I−T )h with g ∈ fix(T ) and h ∈ L∞(X). Then as already seenbefore, An f = g+(1/n)(h−T nh), and since supn ‖T nh‖

∞≤ ‖h‖

∞, it follows that

‖An f −g‖∞→ 0 for n→ ∞.

By the lemma we have a.e.-convergence of (An f )n∈N for every f from the densesubspace F of L1(X). What we need is a tool that allows us to pass from F to itsclosure. This is considerably more difficult than in the case of norm convergence,and will be treated in the next section.


11.2 Banach’s Principle and Maximal Inequalities

Let X = (X ,Σ ,µ) be a measure space, 1≤ p≤ ∞, and let (Tn)n∈N be a sequence ofbounded linear operators on E := Lp(X). The statement “limn→∞ Tn f exists almosteverywhere” can be reformulated as

limsupk,l→∞

|Tk f −Tl f | := infn∈N

supk,l≥n|Tk f −Tl f |= 0, (11.1)

where suprema and infima are taken in the complete lattice L0 := L0(X;R) (seeSection 7.2).

If (11.1) is already established for f from some dense subspace F of E, onewould like to infer that it holds for every f ∈ E. To this purpose we consider theassociated maximal operator T ∗ : E→ L0 defined by

T ∗ f := supn∈N|Tn f | ( f ∈ E).

Note that T ∗ f ≥ 0 and T ∗(α f ) = |α|T ∗ f for every f ∈ E and α ∈C. Moreover, theoperator T ∗ is only subadditive, i.e., T ∗( f +g)≤ T ∗ f +T ∗g for all f ,g ∈ E.

Definition 11.8. We say that the sequence of operators (Tn)n satisfies an (abstract)maximal inequality if there is a function c : (0,∞)→ [0,∞) with limλ→∞ c(λ ) = 0such that

µ [T ∗ f > λ ]≤ c(λ ) (λ > 0, f ∈ E, ‖ f‖ ≤ 1).

The following result shows that an abstract maximal inequality is exactly whatwe need.

Proposition 11.9 (Banach’s Principle). Let X = (X ,Σ ,µ) be a measure space,1 ≤ p < ∞, and let (Tn)n∈N be a sequence of bounded linear operators on E =Lp(X). If the associated maximal operator T ∗ satisfies a maximal inequality, thenthe set

F :=

f ∈ E : (Tn f )n∈N is a.e.-convergent

is a closed subspace of E.

Proof. Since the operators Tn are linear, F is a subspace of E. To see that it is closed,let f ∈ E and g ∈ F . For any natural numbers k, l we have

|Tk f −Tl f | ≤ |Tk( f −g)|+ |Tkg−Tlg|+ |Tl(g− f )| ≤ 2T ∗( f −g)+ |Tkg−Tlg| .

Taking the limsup in L0 with k, l→ ∞ (see (11.1)) one obtains

h := limsupk,l→∞

|Tk f −Tl f | ≤ 2T ∗( f −g)+ limsupk,l→∞

|Tkg−Tlg|= 2T ∗( f −g)

since g ∈ F . For λ > 0 we thus have [h > 2λ ]⊆ [T ∗( f −g)> λ ] and hence

11.2 Banach’s Principle and Maximal Inequalities 167

µ [h > 2λ ]≤ µ [T ∗( f −g)> λ ]≤ c(

λ

‖ f −g‖

).

If f ∈F we can make ‖ f −g‖ arbitrarily small, and since limt→∞ c(t) = 0, we obtainµ [h > 2λ ] = 0. Since λ > 0 was arbitrary, it follows that h = 0. This shows thatf ∈ F , whence F is closed.

Maximal Inequalities for Dunford–Schwartz Operators

In the following X = (X ,Σ ,µ) denotes a general measure space, and T denotes apositive Dunford–Schwartz operator. By Theorem 8.23, T is contractive for the p-norm on L1∩Lp for each 1≤ p≤ ∞, and by a standard approximation T extends ina coherent way to a positive contraction on each space Lp(X) for 1≤ p < ∞.

Since the measure is not necessarily finite, it is unclear, however, whether Textends to L∞. What we will use are the following simple observations.

Lemma 11.10. Let 1≤ p < ∞ and 0≤ f ∈ Lp(X;R) and λ > 0. Then the followingassertions hold.

a) µ [ f > λ ]≤ λ−p ‖ f‖pp < ∞.

b) ( f −λ )+ ∈ L1∩Lp.

c) T f −λ ≤ T ( f −λ )+.

Proof. a) Let A :=[ f > λ ]. Then λ p1A ≤ f p1A, and integrating proves the claim.For b) use the same set A to write

( f −λ )+ = ( f −λ )1A = f 1A−λ1A

Since µ(A) < ∞, the claim follows. Finally note that | f − ( f −λ )+| ≤ λ , which iseasily checked by distinguishing what happens on the sets A and Ac. Since T is aDunford–Schwartz operator we obtain

T f −T ( f −λ )+ ≤∣∣T ( f − ( f −λ )+)

∣∣≤ λ ,

and c) is proved.

We now turn to the so-called “maximal ergodic theorem”. For 0 ≤ f ∈ Lp (with1≤ p < ∞) and λ > 0 we write

Sk f := ∑k−1j=0T j f , A∗n f = max

1≤k≤nAk f , Mλ

n f := max1≤k≤n

(Sk f − kλ ).

Then[A∗n f > λ ] =

[Mλ

n f > 0]⊆⋃n

k=1[Sk f > kλ ]

has finite measure and (Mλn f )+ ∈ L1∩Lp, by Lemma 11.10.a) and b) above.


Theorem 11.11 (Maximal Ergodic Theorem). Let T be a positive Dunford–Schwartz operator on some space L1(X), and let p ∈ [1,∞) and 0 ≤ f ∈ Lp(X).Then for each λ > 0

µ [A∗n f > λ ]≤ 1λ

∫[A∗n f>λ ]

f dµ.

Proof. Fix 2≤ k ≥ n. Then by Lemma 11.10.c)

Sk f − kλ = f −λ +T Sk−1 f − (k−1)λ ≤ f −λ +T (Sk−1 f − (k−1)λ )+

≤ f −λ +T (Mλn f )+.

Taking the maximum we obtain Mλn f ≤ f −λ +T (Mλ

n f )+. Now we integrate andestimate∫

X(Mλ

n f )+ dµ =∫[Mλ

n f>0]Mλ

n f dµ ≤∫[Mλ

n f>0]f −λ dµ +

∫X

T (Mλn f )+ dµ

≤∫[Mλ

n f>0]f −λ dµ +

∫X(Mλ

n f )+ dµ

by the L1-contractivity of T . It follows that

λ µ [A∗n f > λ ] = λ µ[Mλn f > 0]≤

∫[Mλ

n f>0]f dµ =

∫[A∗n f>λ ]

f dµ,

which concludes the proof.

Corollary 11.12 (Maximal Inequality). Let T be a positive Dunford–Schwartzoperator on some L1(X), and let p ∈ [1,∞) and f ∈ Lp(X). Then

µ [A∗ f > λ ]≤ λ−p ‖ f‖p

p (λ > 0).

Proof. By the Maximal Ergodic Theorem 11.11 and Holder’s inequality

µ [A∗n | f |> λ ]≤ 1λ

∫[A∗n| f |>λ ]

f dµ ≤ 1λ‖ f‖p µ [A∗n | f |> λ ]1/p′ ,

which leads to

µ [A∗n | f |> λ ]1/p ≤‖ f‖p

λ.

Now, since T is positive, each An is positive and hence max1≤k≤n |Ak f | ≤ A∗n | f |. Itfollows that

µ

[max

1≤k≤n|Ak f |> λ

]≤ µ [A∗n | f |> λ ]≤ λ

−p ‖ f‖pp .

We let n→ ∞ and the claim is proved.

11.2 Banach’s Principle and Maximal Inequalities 169

We remark that there is a better estimate in the case 1 < p < ∞, see Exercise7. However, we shall not need that in the following, where we turn to the proof ofTheorem 11.4.

Proof of the Pointwise Ergodic Theorem

Let, as before, T be a positive Dunford–Schwartz operator on some L1(X). If themeasure space is finite we saw in Lemma 11.7 that An f converges almost every-where for all f from the dense subspace F = fix(T )⊕ (I− T )L∞, so by virtue ofBanach’s principle and the Maximal Ergodic Theorem, also for each f ∈ L1(X).

In the general case, consider the assertion

limn→∞

An f exists pointwise almost everywhere. (11.2)

We shall establish (11.2) for larger and larger classes of functions f by virtue ofBanach’s principle. Note that T is coherently defined simultaneously on all spacesLp with 1≤ p < ∞. If we want to explicitly consider its version on Lp we shall writeTp; note again that Tp is a contraction.First of all, it is clear that (11.2) holds if T f = f . It also holds if f ∈ (I−T )(L1∩L∞),because then An f → 0 uniformly by the L∞-contractivity of T (cf. Lemma 11.7).Next, let 1 < p < ∞. We know by the Mean Ergodic Theorem that

Lp = fixTp + ran(I−Tp).

Since a maximal inequality holds for Lp (Corollary 11.12) and L1 ∩L∞ is dense inLp, (11.2) holds for all f ∈ Lp. Finally, since a maximal inequality holds for L1

(Corollary 11.12 again) and L2∩L1 is dense in L1, (11.2) holds for all f ∈ L1.

Remark 11.13. The Maximal Ergodic Theorem and a related result, called“Hopf’s Lemma” (Exercise 8) are from [Hopf (1954)] generalising results from[Yosida and Kakutani (1939)]. A slightly weaker form was already obtained by[Wiener (1939)]. Our proof is due to [Garsia (1965)].

The role of maximal inequalities for almost everywhere convergence results isknown at least since [Kolmogoroff (1925)] and is demonstrated impressively in[Stein (1993)]. Employing a Baire category argument one can show that an abstractmaximal inequality is indeed necessary for pointwise convergence results of quitea general type [Krengel (1985), Ch. 1, Theorem 7.2]; a thorough study of this con-nection has been carried out in [Stein (1961a)].Finally, we recommend [Garsia (1970)] for more results on almost everywhere con-vergence.


11.3 Applications

Weyl’s Theorem Revisited

Let α ∈ [0,1]\Q. In Chapter 10 we proved Weyl’s theorem stating that the sequence(nα (mod1))n∈N is equidistributed in [0,1). The mean ergodicity of the involvedKoopman operator with respect to the sup-norm (see Proposition 10.19) implies thefollowing stronger statement.

Corollary 11.14. If α ∈ [0,1]\Q, then for every interval B = [a,b]⊆ [0,1] we have

1n

card

j ∈ [0,n)∩N0 : x+ jα (mod1) ∈ B→ b−a

uniformly for x ∈ [0,1].

The Pointwise Ergodic Theorem accounts for the analogous statement allowingfor general Borel set B in place of just intervals. However, one has to pay the priceof loosing convergence everywhere. Denoting by λ the Lebesgue measure on R weobtain the following result.

Corollary 11.15. If α ∈ [0,1]\Q, then for every Borel set B⊆ [0,1] we have

1n

card

j ∈ [0,n)∩N0 : x+ jα (mod1) ∈ B→ λ (B)

for almost every x ∈ [0,1].


Borel’s Theorem on (Simply) Normal Numbers

A number x ∈ [0,1] is called simply normal (in base 10) if in its decimal expansion

x = 0,x1x2x3 . . . x j ∈ 0, . . . ,9, j ∈ N,

each digit appears asymptotically with frequency 1/10. The following goes back to[Borel (1909)].

Theorem 11.16 (Borel). Almost every number x ∈ [0,1] is simply normal.

Proof. First of all note that the set of numbers with nonunique decimal expansionis countable. Let ϕ(x) =: 10x (mod1) for x ∈ [0,1). As in Example 12.3, the MDS([0,1],λ ;ϕ) is isomorphic to the Bernoulli shift (Section 5.1.5)

B(1/10, . . . ,1/10) = (W +10 ,Σ ,µ;τ).

The isomorphism is induced by the point isomorphism (modulo null sets)

11.3 Applications 171

W +10 → [0,1], (x1,x2, . . .) 7→ 0,x1x2 . . . .

(For the more details on isomorphism we refer to Section 12.1 below.) Sincethe Bernoulli shift is ergodic (Proposition 6.16), the MDS ([0,1],Λ ,λ ;ϕ), Λ theLebesgue σ -algebra, is ergodic as well. Fix a digit k ∈ 0, . . . ,9 and consider

A :=

x ∈ [0,1) : x1 = k= [k/10,(k+1)/10),

so λ (A) = 1/10. Then

card1≤ j ≤ n : x j = kn

=1n

n−1

∑j=0

1A(ϕj(x))→

∫[0,1]

1A dλ = λ (A) = 1/10

for almost every x ∈ [0,1] by Corollary 11.2.

We refer to Exercise 6 for the definition and analogous property of normal numbers.

The Strong Law of Large Numbers

In Chapter 10 we briefly pointed at a connection between the (Mean) Er-godic Theorem and the (weak) law of large numbers. In this section we shallmake this more precise and prove the following classical result, going back to[Kolmogoroff (1933)]. We freely use the terminology common in probability the-ory, cf. [Billingsley (1979)].

Theorem 11.17 (Kolmogorov). Let (Ω ,F,P) be a probability space, and let(Xn)n∈N ⊆ L1(Ω ,F,P) be a sequence of independent and identically distributed realrandom variables. Then

limn→∞

1n(X1 + · · ·+Xn) = E(X1) P-almost surely.

Proof. Since the X j are identically distributed, ν := PX j (the distribution of X j) is aBorel probability measure on R, independent of j, and

E(X1) =∫R

t dν(t)

is the common expectation. Define the product space

X := (X ,Σ ,µ) :=(RN,

⊗N Bo(R),

⊗N ν

).

As mentioned in Chapter 5, the left shift τ is a measurable transformation of (X ,Σ)and µ is τ-invariant. The MDS (X;τ) is ergodic, what can be shown in exactly thesame way as it was done for the Bernoulli shift (Proposition 6.16).

For n∈N let Yn : X→R be the nth projection and write g :=Y1. Then Yj+1 = gτ j

for every j ≥ 0, hence Corollary 11.2 yields that


limn→∞

1n(Y1 + · · ·+Yn) = lim

n→∞

1n ∑

n−1j=0(g τ

j) =∫

Xgdµ

pointwise µ-almost everywhere. Note that g∗µ = ν and hence∫X

gdµ =∫R

t dν(t).

It remains to show that the (Yn)n∈N are in a certain sense “the same” as the origi-nally given (Xn)n∈N. This is done by devising an injective lattice homomorphism

Φ : L0(X;R)→ L0(Ω ,F,P;R)

which carries Yn to Φ(Yn) = Xn for every n ∈ N. Define

ϕ : Ω → X , ϕ(ω) := (Xn(ω))n∈N (ω ∈Ω).

Since (Xn)n∈N is an independent sequence, the push-forward measure satisfiesϕ∗P = µ . Let Φ = Tϕ : f 7→ f ϕ be the Koopman operator induced by ϕ map-ping L0(X;R) to L0(Ω ,F,P;R). The operator Φ is well-defined since ϕ is measure-preserving.

By construction, ΦYn = Yn ϕ = Xn for each n ∈ N. Moreover, Φ is clearly ahomomorphism of lattices (see Chapter 7) satisfying

supn∈N

Φ(

fn)= Φ

(supn∈N

fn)

and infn∈N

Φ(

fn)= Φ( inf

n∈Nfn)

for every sequence ( fn)n∈N ⊆ L0(X;R). Since the almost everywhere convergenceof a sequence can be described in purely lattice theoretic terms involving only count-able suprema and infima (cf. also (11.1)), one has

limn→∞

fn = f µ-a.e. ⇒ limn→∞

Φ( fn) = Φ( f ) P-almost surely.

This, for fn := (1/n)(Y1 + · · ·+Yn) and f := E(X1)1, concludes the proof.

Remark 11.18. By virtue of the same product construction one can show that theMean Ergodic Theorem implies a general weak law of large numbers.

Final Remark: Birkhoff vs. von Neumann

From the point of view of statistical mechanics, Birkhoff’s theorem seems to outrunvon Neumann’s. By virtue of the Dominated Convergence Theorem and the dense-ness of L∞ in L2, the latter is even a corollary of the former (cf. the presentationin [Walters (1982), Corollary 1.14.1]). Reed and Simon take a moderate viewpointwhen they write in [Reed and Simon (1972), p.60] (annotation in square brackets bythe authors).

Exercises 173

This [i.e., the pointwise ergodic] theorem is closer to what one wants to justify [in?] sta-tistical mechanics than the von Neumann theorem, and it is fashionable to say that the vonNeumann theorem is unsuitable for statistical mechanics. We feel that this is an exaggera-tion. If we had only the von Neumann theorem we could probably live with it quite well.Typically, initial conditions are not precisely measurable anyway, so that one could well as-sociate initial states with measures f dµ where

∫f dµ = 1, in which case the von Neumann

theorem suffices. However, the Birkhoff theorem does hold and is clearly a result that weare happier to use in justifying the statement that phase-space averages and time averagesare equal.

However, von Neumann’s theorem inspired the operator theoretic concept of meanergodicity and an enormous amount of research in the field of asymptotics of dis-crete (and continuous) operator semigroups with tantamount applications to variousother fields. Certainly it would be too much to say that Birkhoff’s theorem is over-rated, but von Neumann’s theorem should not be underestimated either.

Exercises

1. Let K be a compact topological space, let T be a Markov operator on C(K) (seeExercise 2), and let µ ∈M1(K) be such that T ′µ ≤ µ . Show that T extends uniquelyto a positive Dunford–Schwartz operator on L1(K,µ).

2. Let X be a measure space, 1≤ p < ∞, and let (Tn)n∈N be a sequence of boundedlinear operators on E = Lp(X). Moreover, let T be a bounded linear operator on E.If the associated maximal operator T ∗ satisfies a maximal inequality, then the set

C :=

f ∈ E : Tn f → T f almost everywhere

is a closed subspace of E.

3. Prove Corollaries 11.14 and 11.15.

4. Let α ∈ [0,1) be an irrational number and consider the shift ϕα : x 7→ x +α (mod1) on [0,1]. Show that for the point spectrum of the Koopman operatorT = Tϕα

on L2[0,1] we have σp(T ) = e2πimα : m ∈ Z. (Hint: Use a Fourier ex-pansion as in the proof of Proposition 7.15.)

5. Let (K;ϕ) be a TDS with Koopman operator T = Tϕ and associated Cesaro aver-ages An = An[T ], n ∈ N. Let µ be a ϕ-invariant probability measure on K. A pointx ∈ K is called generic for µ if

limn→∞

(An f )(x) =∫

Kf dµ (11.3)

for all f ∈ C(K).

a) Show that x ∈ K is generic for µ if (11.3) holds for each f from a densesubset of C(K).


b) Show that in the case that C(K) is separable and µ is ergodic, µ-almostevery x ∈ K is generic for µ . (Hint: Apply Corollary 11.2 to every f from acountable dense set D⊆ C(K).)

6. A number x ∈ [0,1] is called normal (in base 10) if every finite combination(of length k) of the digits 0,1, . . . ,9 appears in the decimal expansion of x withasymptotic frequency 10−k. Prove that almost all numbers in [0,1] are normal.

7 (Dominated Ergodic Theorem). Let X = (X ,Σ ,µ) be a measure space and letf ∈ L0

+(X). Show that ∫X

f dµ =∫

∞

0µ [ f > t ] dt

and derive from this that ‖ f‖pp =

∫∞

0 pt p−1µ [ | f |> t ] dt for all f ∈ L0(X) for 1 ≤p < ∞.

Now let T be a Dunford–Schwartz operator on L1(X) with its Cesaro averages(An)n∈N and the associated maximal operator A∗. Use the Maximal Ergodic Theo-rem to show that

‖A∗ f‖p ≤p

p−1‖ f‖p (1 < p < ∞, f ∈ Lp(X)).

8 (Hopf’s Lemma). For a positive contraction T on L1(X) define

Sn f := ∑n−1j=0 T j f and Mn f := max

1≤k≤nSk f ( f ∈ L1(X;R))

for n ∈ N. Prove that then ∫[Mn f≥0 ]

f dµ ≥ 0

for every f ∈ L1(X;R).(Hint: Replace λ = 0 in the proof of the Maximal Ergodic Theorem 11.11 and realisethat only the L1-contractivity is needed.)

Chapter 12Isomorphisms and Topological Models

In Chapter 10 we showed how a TDS (K;ϕ) gives rise to an MDS by choosing aϕ-invariant measure on K. The existence of such a measure is guaranteed by theKrylov–Bogoljubov theorem. In general there are many invariant measures, but

We also investigated how minimality of the TDS is reflected in properties ofthe constructed MDS. It is our aim now to go in the other direction: To a givenMDS construct some TDS (sometimes called topological model) and an invari-ant measure so that the resulting MDS is isomorphic to the original one. By doingthis, methods from TDS theory will be available, an we may gain further insight toMDSs. Thus, by switching back and forth between the measure theoretic and thetopological situation, we can deepen our understanding of dynamical systems. Inparticular, this procedure will be carried out in Chapter 17.

Let us first discuss the question when two MDSs can be considered as identical, i.e.,isomorphic.

12.1 Point Isomorphisms and Factor Maps

Isomorphisms of topological systems were defined in Chapter 2. The correspondingdefinition for measure-preserving systems is more subtle and is the subject of thepresent section.

Definition 12.1. Let (X;ϕ) and (Y;ψ) be two MDSs. A point factor map or apoint homomorphism is a measure-preserving map θ : X → Y such that

ψ θ = θ ϕ µ-almost everywhere.

We shall indicate this by writing

θ : (X;ϕ)→ (Y;ψ).

175

176 12 Isomorphisms and Topological Models

A point isomorphism is a factor map which is essentially invertible (see Definition6.3). The two MDSs are called point isomorphic if there is a point isomorphismbetween them.

In other words, a measure-preserving map θ : X → Y is a factor map if the diagram

X X

Y Y

-ϕ

?θ

?θ

-ψ

commutes almost everywhere.

Let θ : X → Y be a point isomorphism. Recall that by Lemma 6.4 an essentialinverse η of θ is uniquely determined up to equality almost everywhere and it isalso measure-preserving. Moreover, since θ ϕ = ψ θ almost everywhere and η

is measure-preserving, it follows from Exercise 5.2 that

ϕ η = η θ ϕ η = η ψ θ η = η ψ

almost everywhere. To sum up, if θ is a point isomorphism with essential inverse η ,then η is a point isomorphism with essential inverse θ .

Remarks 12.2. 1) The composition of point factor maps/isomorphisms of MDSsis again a point isomorphism/factor map, see Exercise 1. In particular, it fol-lows that point isomorphy is an equivalence on the class of MDSs.

2) Sometimes an alternative characterisation of point isomorphic MDSs is usedin the literature, e.g., in [Einsiedler and Ward (2011), Definition 2.7]. See Ex-ercise 2.

Now let us give some examples of point isomorphic systems.

Example 12.3 (Doubling Map ∼= Bernoulli Shift). The doubling map

ϕ(x) := 2x (mod1) =

2x, 0≤ x < 1/2;2x−1, 1/2≤ x≤ 1

on [0,1] is point isomorphic to the one-sided Bernoulli shift B(1/2,1/2).

Proof. We define

Ψ : 0,1N→ [0,1], Ψ(x) := ∑∞

j=1x j

2 j .

Then Ψ is a pointwise limit of measurable maps, hence measurable. On the otherhand, let

Φ : [0,1]→0,1N, [Φ(a)]n := b2nac (mod2) ∈ 0,1.

12.1 Point Isomorphisms and Factor Maps 177

In order to see that Φ is measurable and measure-preserving, we fix a1, . . . ,an ∈0,1 and note that

Φ−1[x j = a j : 1≤ j ≤ n ] =

n

∑j=1

a j

2 j +[0,2−n].

The claim follows since the cylinder sets form a generator of the σ -algebra on0,1N. Next, we note that Ψ Φ = id except on the null set 1, since the mapΦ just produces a dyadic expansion of a number in [0,1). Conversely, Φ Ψ = idexcept on the null set

N :=⋃

n≥1

⋂k≥n

[xk = 1 ]⊆ 0,1N

of all 0,1-sequences that are eventually constant 1. Finally, it is an easy compu-tation to show that τ Φ = Φ ϕ everywhere on [0,1].

Example 12.4 (Tent Map ∼= Bernoulli Shift). The tent map

ϕ(x) =

2x, 0≤ x < 1/2;2−2x, 1/2≤ x≤ 1

on [0,1] is point isomorphic to the one-sided Bernoulli shift B(1/2,1/2).

Proof. We define Φ = (Φn)n≥0 : [0,1]→0,1N0 via

Φn(x) :=

0, 0≤ ϕn(x)< 1

2 ,

1 12 ≤ ϕn(x)≤ 1.

Then it is clear that Φ is measurable. By induction on n ∈ N0 one proves that thegraph of ϕn on a dyadic interval of the form [( j− 1)2−n, j2−n] is either a line ofslope 2n starting in 0 and rising to 1 (if j is odd), or a line of slope −2n starting at 1and falling down to 0 (if j is even). Applying ϕ once more, we see that

ϕn+1(x) = 0 if and only if x =

j2n ( j = 0, . . . ,2n).

Again by induction on n ∈ N0, we can prove that given a0, . . . ,an ∈ 0,1 the set

n⋂j=0

[Φ j(x) = a j ] = Φ−1(a0× ·· ·×an×∏

k>n0,1

)is a dyadic interval (closed, half-open or open) of length 2−(n+1). Since the algebraof cylinder sets is generating, it follows that Φ is measure-preserving. Moreover, ifτ denotes the shift on 0,1N0 , then clearly τ Φ = Φ ϕ everywhere, whence Φ

is a point homomorphism.An essential inverse of Φ is constructed as follows. Define


A0n :=

[0≤ ϕ

n ≤ 12

], A1

n :=[ 1

2 ≤ ϕn ≤ 1

](n ∈ N0).

Then there is a one-to-one correspondence between dyadic intervals of length2−(n+1) and finite sequences (a0, . . . ,an) ∈ 0,1n+1 given by

[(k−1)2−(n+1),k2−(n+1)] =n⋂

j=0

Aa jj . (12.1)

Let Ψ : 0,1N0 → [0,1] be defined by

Ψ(a)=∞⋂

j=0

Aa jj .

If x ∈ [0,1], then x ∈ AΦn(x)n by definition of Φ , and hence Ψ(Φ(x)) = x. This shows

that Ψ Φ = id.In order to show that Ψ is measurable, suppose first that a,b are 0−1-sequences

such that a 6= b but Ψ(a) = x = Ψ(b). Then there is a minimal n ≥ 0 such thatbn 6= an. Since x∈ Aan

n ∩Abnn , it follows that ϕn(x) = 1/2. Hence x is a dyadic rational

of the form j2−(n+1) with j odd. Moreover, ϕk(x) = 0 for all k ≥ n+ 2, and thisforces ak = bk = 0 for all k ≥ n+ 2. It follows that a and b both come from thecountable set N of sequences that are eventually constant 0.

Now, given a dyadic interval with associated (a0, . . . ,an) as in (12.1), we have that

a0× . . .an×∏k>n0,1 ⊆Ψ

−1

(n⋂

j=0

Aa jj

)⊆ N∪

(a0× . . .an×∏

k>n0,1

).

Since N is countable, every subset of N is measurable, and so Ψ−1(⋂n

j=0 Aa jj

)is

measurable. And since dyadic intervals form a ∩-stable generator of the Borel σ -algebra of [0,1], Ψ is measurable.

Finally, suppose that b := Φ(Ψ(a)) 6= a for some a∈ 0,1N0 . Then, since Ψ Φ =id everywhere, we have Ψ(b) =Ψ(a) and hence a ∈ N as seen above. This meansthat [Φ Ψ 6= id ]⊆ N is a null set, whence Φ Ψ = id almost everywhere.

Example 12.5. The baker’s transformation

ϕ : X → X , ϕ(x,y) :=

(2x,y/2), 0≤ x < 1/2;(2x−1,(y+1)/2), 1/2≤ x≤ 1

on X = [0,1]× [0,1] is isomorphic to the two-sided Bernoulli B(1/2,1/2)-shift.The proof is left as Exercise 3.

12.2 Algebra Isomorphisms of Measure Preserving Systems 179

We indicated in Chapter 6 that for the study of an MDS (X;ϕ), the relevant featureis the action of ϕ on the measure algebra and not on the set X itself. This motivatesa notion of “isomorphism” that is based on the measure algebra, too.

12.2 Algebra Isomorphisms of Measure Preserving Systems

Recall the notions of a lattice and a lattice homomorphism from Section 7.1. Alattice (V,≤) is called a distributive lattice ‘ if the distributive laws

x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z), x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z)

hold for all x,y,z ∈ V . If V has a greatest element > and least element ⊥, then acomplement of x ∈V is an element y ∈V such that

x ∨ y => and x ∧ y =⊥ .

A Boolean algebra is a distributive lattice (V,≤) with > and ⊥ such that every el-ement has a complement. Such a complement is then unique, see [Birkhoff (1948),Thm. X.1], and is usually denoted by xc. An (abstract) measure algebra is a com-plete Boolean algebra V together with a map µ : V → [0,1] such that

1) µ(>) = 1,

2) µ(x) = 0 if and only if x =⊥,

3) µ is σ -additive in the sense that

(xn)n∈N ⊆V, xn ∧ xm =⊥ (n 6= m) ⇒ µ

(∨n∈Nxn

)= ∑n∈Nµ(xn).

A lattice homomorphism Θ : V →W of (abstract) measure algebras (V,µ),(W,ν)is called a (measure algebra) homomorphism if ν(Θ(x)) = µ(x) for all x ∈V .

By the results of Chapter 6.1, the measure algebra Σ(X) associated with a measurespace X is an abstract measure algebra (see also Example 7.1.4). Moreover, the mapϕ∗ acts as a measure algebra homomorphism on Σ(X). In this way, (Σ(X),µ;ϕ∗)can be viewed as a “measure algebra dynamical system” and one can form the cor-responding notion of a homomorphism of such systems. This leads to the followingdefinition.

Definition 12.6. An (algebra) homomorphism of two MDSs (X;ϕ) and (Y;ψ) is ahomomorphism

Θ : (Σ(Y),µY)→ (Σ(X),µX)

of the corresponding measure algebras such that Θ ψ∗ = ϕ∗ Θ , i.e., the diagram


Σ(X) Σ(X)

Σ(Y) Σ(Y)

-ϕ∗

-ψ∗

6Θ

6Θ

is commutative. An (algebra) isomorphism is a bijective homomorphism. TwoMDSs are (algebra) isomorphic if there exists an (algebra) isomorphism betweenthem.

Note that by Exercise 5 each measure algebra homomorphism is an embedding,isometric with respect to the canonical metric coming from the measure(s).

Remark 12.7. Let θ : (X;ϕ)→ (Y;ψ) be a point factor map. Then the associatedmap

θ∗ : (Σ(Y),µY;ψ

∗)→ (Σ(X),µX;ϕ∗)

(see Section 6.1) is a measure algebra homomorphism. If θ is a point isomorphism,then θ ∗ is an algebra isomorphism.

The following example shows that not every algebra isomorphism is induced bya point isomorphism as in Remark 12.7.

Example 12.8. Take X = 0, Σ = /0,X, µ(X) = 1, ϕ = id and (Y;ψ) with Y =0,1, Σ ′ = /0,Y, µY(Y ) = 1, ψ = id. The two MDSs are not point isomorphic,but (algebra) isomorphic.

Let us call two probability spaces X and Y point isomorphic or algebra isomor-phic if (X;idX ) and (Y;idY ) are point isomorphic or algebra isomorphic, respec-tively. Example 12.8 shows that these concepts may differ in general. In certain sit-uations, however, the distinction between “algebra isomorphic” and “point isomor-phic” is unnecessary. Indeed, J. von Neumann proved in [von Neumann (1932a)]that two compact metric (Borel) probability spaces are algebra isomorphic if andonly if they are point isomorphic. (For a proof and further reading we refer to[Royden (1963), Sec. 14.5] or Bogachev [Bogachev (2007), Ch. 9]). From this onecan deduce the following classical result.

Theorem 12.9 (Von Neumann). Two MDSs on compact metric probability spacesare isomorphic if and only if they are point isomorphic.

Markov Embeddings

In the remaining part of this section we shall see that (measure) algebra homomor-phisms correspond in a natural way to certain operators on the associated L1-spaces.

Definition 12.10. An operator S : L1(X)→ L1(Y) is called a Markov operator if

12.2 Algebra Isomorphisms of Measure Preserving Systems 181

S≥ 0, S1 = 1, and∫

YS f =

∫X

f for all f ∈ L1(X).

If, in addition, one has

|S f |= S | f | for every f ∈ L1(Y).

then S is called a Markov embedding.

Note that S is a Markov operator if and only if S ≥ 0, S1 = 1 and S′1 = 1. AMarkov operator is a Markov embedding if it is a homomorphism of Banach lattices.It is easy to see that in this case ‖S f‖1 = ‖ f‖1 for each f ∈ L1, so each Markovembedding is an isometry.

Markov operators form the basic operator class in ergodic theory and shall playa central role below. We shall look at them in more detail in Chapter 13. For nowwe only remark that of course each Koopman operator associated with an MDS is aMarkov embedding.A Markov embedding S is called a Markov isomorphism or an isometric latticeisomorphism if it is surjective. In this case, S is bijective and S−1 is a Markov em-bedding, too. The next theorem clarifies the relation between Markov embeddingsand measure algebra homomorphisms.

Theorem 12.11. Each Markov embedding S : L1(Y)→ L1(X) induces a map Θ :Σ(Y)→ Σ(X) via

S1A = 1ΘA for all A ∈ Σ(Y), (12.2)

and Θ is a (measure) algebra homomorphism.Conversely let Θ : Σ(Y)→ Σ(X) be a measure algebra homomorphism. Then thereis a unique bounded operator S : L1(Y)→L1(X) satisfying (12.2), and S is a Markovembedding.Finally, Θ is surjective if and only if S is a Markov isomorphism.

Proof. As S is a positive operator, it maps real-valued functions to real-valued func-tions. Now, a moment’s thought reveals that a real-valued function f ∈ L1 is a char-acteristic function if and only if f ∧ (1− f ) = 0. Hence S maps characteristic func-tions on Y to characteristic functions on X , which gives rise to a map

Θ : Σ(X)→ Σ(Y) with S1A = 1ΘA (A ∈ Σ(Y)).

It then is easy to see that Θ is a measure algebra homomorphism (Exercise 6).For the converse, let Θ : Σ(Y)→ Σ(X) be a measure algebra homomorphism. Fora step function f on Y we choose a representation

f = ∑nj=1α j1A j

with pairwise disjoint A j ∈ ΣY, µY(A j)> 0, and α j ∈ C. Then we define

S f := ∑nj=1α j1ΘA j .


From the properties of Θ it follows in a standard way that S is well-defined andsatisfies S1 = 1,

∫X S f =

∫Y f , |S| = S | f | and hence ‖S f‖1 = ‖ f‖1 all step func-

tions f . By approximation, S extends uniquely to an isometry of the L1-spaces. Thisextension is clearly a Markov embedding.Finally, if Θ is surjective, then the set Step(X) of step functions is contained in therange of S, and since S is an isometry and Step(X) is dense in L1(X), S is surjective.Conversely, if S is surjective, it is bijective and S−1 is also a Markov embedding.Hence if B ∈ Σ(X) then S−11B = 1A for some A ∈ Σ(Y), and therefore Θ(A) = B.Consequently, Θ is surjective.

Remark 12.12. If (X;ϕ) is a measure-preserving system, then its Koopman oper-ator Tϕ on L1(X) is a Markov embedding, and the associated map on the measurealgebra is just ϕ∗. By Definition 6.2, the system is invertible if ϕ∗ is bijective, whichby Theorem 12.11 is the case if and only if the Koopman operator Tϕ is a Markovisomorphism.

Suppose that (X;ϕ) and (Y;ψ) are measure-preserving systems, and Θ ,S is apair of corresponding maps

Θ : (Σ(Y),µY)→ (Σ(X),µX) S : L1(Y)→ L1(X)

related by S1A = 1Θ(A) for A ∈ Σ(Y) as before. Then

STψ 1A = S1ψ∗A = 1Θ(ψ∗A) and Tϕ S1A = Tϕ 1ΘA = 1ϕ∗(ΘA).

Therefore, Θ is an algebra homomorphism of the two MDSs if and only if its asso-ciated operator S : L1(Y)→ L1(X) satisfies STψ = Tϕ S, i.e., if the diagram

L1(X) L1(X)

L1(Y) L1(Y)

-Tϕ

-Tψ

6S

6S (12.3)

commutes. In yet other words: S intertwines the Koopman operators of the systems.In this case, we call S a Markov embedding of the dynamical systems and write

S : (L1(Y);Tψ)→ (L1(X);Tϕ)

The following is an immediate consequence.

Corollary 12.13. Two measure-preserving systems (X;ϕ) and (Y;ψ) are algebraisomorphic if and only if there is a Markov isomorphism S : L1(Y)→ L1(X) thatsatisfies STψ = Tϕ S.

A Markov isomorphisms restricts to a lattice isomorphism between the correspond-ing Lp spaces, whence one can obtain the corresponding characterisation of isomor-phism of MDSs in terms of the Koopman operators on the Lp spaces.

12.3 Topological Models 183

Remark 12.14. Corollary 12.13 should be compared with the following conse-quence of Theorems 4.13 and 7.18:

For two TDSs (K;ϕ) and (L;ψ) the following assertions are equivalent.

(i) (K;ϕ) and (L;ψ) are isomorphic.

(ii) There is an C∗-algebra isomorphism Φ : C(K)→C(L) with Tψ Φ =Φ Tϕ .

(iii) There is an Banach lattice isomorphism Φ : C(K)→ C(L) with Φ1 = 1 andTψ Φ = Φ Tϕ .

12.3 Topological Models

A topological MDS is any MDS of the form (K,µ;ϕ), where (K;ϕ) is a TDSand µ is an invariant (Baire) probability measure. The topological MDS (K,µ;ϕ) iscalled metric if K is endowed with a metric inducing its topology. And it is calledfaithful if supp(µ) = K, i.e., if the canonical map C(K)→ L∞(K,µ) is injective.The following is a consequence of Lemma 10.7.

Lemma 12.15. If (K,µ;ϕ) is a faithful topological MDS, then ϕ(K) = K, i.e.,(K;ϕ) is a surjective TDS.

Theorem 10.2 of Krylov and Bogoljubov tells that every TDS (K;ϕ) has at leastone invariant probability measure, and hence gives rise to at least one topologicalMDS. (By the lemma above, this topological MDS cannot be faithful if (K;ϕ) isnot a surjective system. But even if the TDS is surjective and uniquely ergodic, thearising MDS need not be faithful, as Exercise 9 shows.) Conversely, one may ask:

Is every measure-preserving system (algebra) isomorphic to a topologicalone?

Let us define a (faithful) topological model of an MDS (X;ϕ) to be any (faithful)topological MDS (K,µ;ψ) together with a Markov isomorphism

Φ : (L1(K,µ);Tψ)→ (L1(X);Tϕ)

of dynamical systems. We shall abbreviate this by writing

Φ : (K,µ;ψ)→ (X;ϕ)

in the following. Then the question above can be rephrased as: Does every MDShave a topological model? In the following we shall answer this question.

Suppose that (X;ϕ) is a MDS, with Koopman operator T = Tϕ and let A⊆L∞(X)be a C∗-subalgebra. (Recall that this means that A is norm-closed and conjugationinvariant, and that 1 ∈ A.) By the Gelfand–Naimark Theorem 4.21, there is a com-pact space K and a (unital) C∗-algebra isomorphism Φ : C(K)→ A. By the Rieszrepresentation theorem there is a unique probability measure µ ∈M1(K) such that

184 12 Isomorphisms and Topological Models∫K

f dµ =∫

XΦ f ( f ∈ C(K)). (12.4)

By construction, the measure µ has full support. By Theorem 7.18 one has in addi-tion

|Φ f |= Φ | f | ( f ∈ C(K)), (12.5)

and this yields

‖Φ f‖L1(X) =∫

XΦ | f | =

∫K| f | dµ = ‖ f‖L1(K,µ)

for every f ∈ C(K), i.e., Φ is an L1-isometry. Consequently, Φ extends uniquely toan isometric embedding

Φ : L1(K,µ)→ L1(X)

with range ranΦ = clL1(A), the L1-closure of A. Moreover, it follows by approxi-mation and (12.4) and (12.5) that Φ is a Markov embedding.

So far, we have not looked at the dynamics yet. So suppose in addition that A isTϕ -invariant, i.e., Tϕ(A)⊆ A. Then

Φ−1Tϕ Φ : C(K)→ C(K)

is an algebra homomorphism. Hence, by Theorem 4.13 there is a unique continuousmap ψ : K→ K such that Φ−1Tϕ Φ = Tψ . Moreover, the measure µ is ψ-invariant,since ∫

Kf ψ dµ =

∫K

Φ−1Tϕ Φ f dµ =

∫X

Tϕ Φ f =∫

XΦ f =

∫K

f dµ

for every f ∈ C(K). It follows that (K,µ;ψ) is a faithful topological MDS, andthat Φ : L1(K,µ)→ L1(X) is a Markov embedding that intertwines the Koopmanoperators, i.e., a Markov embedding of the dynamical systems.

12.3 Topological Models 185

L1(X)

L1(K,µ) clL1(A)

C(K) A

C(K) A

K X

K Xϕψ

Tϕ

Tψ

Φ

'

Φ

'

Φ

'

Φ

We have proved the nontrivial part of the following theorem. (The remaining part isleft as Exercise 10.)

Theorem 12.16. Let (X;ϕ) be a measure-preserving system with Koopman opera-tor Tϕ . Then A⊆ L∞(X) is a Tϕ -invariant C∗-subalgebra if and only if there exists afaithful topological MDS (K,µ;ψ) and a Markov embedding Φ : L1(K,µ)→ L1(X)with Tϕ Φ = ΦTψ and such that A = Φ(C(K)).

Let us call an invariant subalgebra A as in Theorem 12.16 full if A is dense inL1(X). In this case

Φ : (L1(K,µ);Tψ)→ (L1(X);Tϕ)

is a Markov isomorphism of dynamical systems, and we have constructed a (faith-ful) topological model of (X;ϕ). For the choice A := L∞(X) we hence obtain adistinguished model of (X;ϕ), which will be studied in more detail in Section 12.4below. For now let us answer the question posed above.

Corollary 12.17. Every measure-preserving system is (algebra) isomorphic to afaithful topological MDS.

Note that in the construction above we can choose each full subalgebra, unique-ness of a model cannot be expected. Rather, one may construct by a diligent choiceof A a model with “nice” properties. Here is a first example for this.

Theorem 12.18 (Metric Models). An MDS (X;ϕ) has a metric model if and onlyif L1(X) is a separable Banach space.

Proof. Let (K,µ;ψ) be a metric model for (X;ϕ). By Theorem 4.7, C(K) is a sep-arable Banach space, and as any dense subset of C(K) is also dense in L1(K,µ), thelatter space must be separable as well.


Conversely, suppose that L1(X) is separable, and let M ⊆ L1(X) be a countabledense set. Since L∞ is dense in L1 we can approximate each member of M by asequence in L∞, and hence we may suppose without loss of generality that 1 ∈M ⊆ L∞. By passing to

⋃n≥0 T n

ϕ (M) we may further suppose that M is Tϕ -invariant.Let A := clL∞alg(M) be the smallest C∗-subalgebra of L∞ containing M. Then A isseparable (Exercise 11), Tϕ -invariant, and full. Applying Theorem 12.16 yields atopological model Φ : (K,µ;ψ)→ (X;ϕ) with Φ(C(K)) = A. Since Φ : C(K)→ Ais an isomorphism of C∗-algebras, C(K) is a separable Banach space. By Theorem4.7, K is metrisable.

12.4 The Stone Representation

Let (X;ϕ) be any given MDS. Then we can apply the construction preceding The-orem 12.16 to the algebra A = L∞(X), yielding a distinguished topological model(K,µ;ψ) of the whole system (X;ϕ). This MDS is called its Stone representationor its Stone model.

The name derives from an alternative description of the compact space K. Con-sider the measure algebra V := Σ(X), which is a Boolean algebra (see Section 12.2).By the Stone representation theorem (see, e.g., [Birkhoff (1948), Sec. IX.9]) thereexists a unique compact and totally disconnected space KV such that V is isomorphicto the Boolean algebra of all clopen subsets of KV . The compact space KV is calledthe Stone representation space of the Boolean algebra Σ(X), and one can provethat K and KV are actually homeomorphic. By Example 7.1.4 the Boolean algebraΣ(X) is complete, in which case the Stone representation theorem asserts that thespace K ' KV is extremally disconnected. We shall prove this fact directly using theBanach lattice C(K).

Proposition 12.19. The compact space K is extremally disconnected, i.e., the clo-sure of every open set is open.

Proof. We know that L∞(X) and C(K) are isomorphic Banach lattices. Since L∞(X)is order complete (Remark 7.11), so is C(K). Let G ⊆ K be an open set. Considerthe function

f := inf

g : g ∈ C(K), g(x)≥ 1G(x) for all x ∈ K,

where the infimum is to be understood in the order complete lattice C(K), hence f iscontinuous. If x 6∈G, then by Urysohn’s Lemma 4.2 there is a real-valued continuousfunction g vanishing on a small neighbourhood of x and greater than 1 on G, sof (x) = 0. If x ∈ G, then each g appearing in the above infimum is 1 on a fixedneighbourhood of x, so f (x) = 1. Since f is continuous, we have f (x) = 1 for allx ∈ G. This implies f = 1G, hence G is open.

Recall that the measure (positive functional) µ on K is induced via the isomor-phism

12.4 The Stone Representation 187

C(K)∼= L∞(X)

and hence is strictly positive, i.e., has full support supp(µ) = K. Moreover, µ isorder continuous, i.e., if F ⊆ C(K)+ is a ∧ -stable set such that infF = 0, theninf f∈F 〈 f ,µ〉= 0 (see Exercise 11). In the following it will be sometimes convenientto use the regular Borel extension of the (Baire) measure µ .

Lemma 12.20. The µ-null sets in K are precisely the nowhere dense subsets of K.

Proof. We shall use that C(K) is order complete, extremally disconnected and thatµ induces a strictly positive order continuous linear functional on C(K); moreover,we shall use that a characteristic function 1U is continuous if and only if U is aclopen subset of K.Let F ⊆ K be a nowhere dense set. Since a set is nowhere dense if and only if itsclosure is nowhere dense, we may suppose without loss of generality that F is closedand has empty interior. Consider the continuous function f defined as

f := inf

1U : F ⊆U, U is clopen

in the (order complete) lattice C(K). Then trivially f ≥ 0 and we claim that actuallyf = 0. Indeed, if f 6= 0 there is some x ∈ K \F with f (x)> 0 (since K \F is dense).Then the compact sets x and F can be separated by disjoint open sets and byProposition 12.19 we find a clopen set U ⊇ F with x 6∈U . It follows by definition off that f ≤ 1U , which in turn implies that f (x) = 0, a contradiction.We now can use that µ is order continuous and

µ(F)≤ inf

µ(U) : F ⊆U, U is clopen= inf

F⊆U clopen〈1U ,µ〉= 〈 f ,µ〉= 0,

whence F is a µ-null set.For the converse implicaton, suppose that A ∈ Bo(K) is a µ-null set. By regular-ity we find a decreasing sequence (Un)n∈N of open sets containing A such thatlimn→∞ µ(Un) = µ(A) = 0. By what was proved above, the nowhere dense closedsets ∂Un have µ-measure 0, so µ(Un) = µ(Un). The closed set B :=

⋂n∈NUn con-

tains A and satisfies µ(B) = 0 since

µ(B)≤ µ(Un)→ 0 as n→ ∞.

By the strict positivity of µ , B has empty interior, whence A is nowhere dense.

We now can prove the following result, showing in particular that constructingStone models repeatedly does not lead to something new.

Proposition 12.21. The space is L∞(K,µ) is canonically isomorphic to C(K), i.e.,the canonical map C(K)→ L∞(K,µ) is bijective. In other words, every equivalenceclass f ∈ L∞(K,µ) contains a (unique) continuous function.

Proof. Since µ is strictly positive, the canonical map J : C(K)→ L∞(K,µ) (map-ping each continuous function onto its equivalence class modulo equality µ-almosteverywhere) is an isometry (Exercise 5.11).


Let now A ∈ Bo(K) be arbitrary. By regularity there are open sets Gn ⊇ A so thatfor H =

⋂n Gn we have A ⊆ H and µ(H \A) = 0. Since ∂Gn is nowhere dense, by

Lemma 12.20 it is of µ-measure 0, and we may suppose that Gn are clopen, whenceH is closed. Also by regularity there are compact sets Kn ⊆ A so that for F =

⋃n Kn

we have F ⊆ A and µ(A\F) = 0. Since ∂Kn is nowhere dense, we obtain the openset G =

⋃n Kn with µ(A \G) = 0. From G ⊆ A ⊆ H and from the fact that H is

closed we conclude that G is a clopen set with ν(G4A) = 0. Therefore 1A coincidesµ-almost everywhere with the continuous function 1G.Since step functions are dense in L∞(K,µ) and J is an isometry, J is an isomorphism.

The next question is whether ergodicity of (X;ϕ) reappears in its Stone repre-sentation. We note the following result.

Proposition 12.22. Let (X;ϕ) be an MDS, let A ⊆ L∞(X) be an invariant C∗-subalgebra, and let (K,µ;ψ) be a faithful topological MDS and

Φ : (L1(K,µ),Tψ)→ (L1(X),Tϕ)

a Markov embedding of dynamical systems with Φ(C(K)) = A. Then (K,µ;ψ) isstrictly ergodic if and only if A∩fix(Tϕ) = C1 and Tϕ is mean ergodic on A.

Proof. By construction Tψ is mean ergodic on C(K) if and only if Tϕ is mean er-godic on A. The fixed spaces are related by

fix(Tψ ∩C(K)) = Φ−1(fix(Tψ)∩A).

Hence, the assertions follows from Theorem 10.6 and the fact that µ is strictly pos-itive.

Note that by Corollary 10.9 a strictly ergodic TDS is minimal. If we apply Propo-sition 12.22 to A = L∞(X), we obtain the following result.

Corollary 12.23. If an MDS (X;ϕ) is ergodic and its Koopman operator is meanergodic on L∞(X), then its Stone representation TDS is strictly ergodic (and henceminimal).

In the next section we shall see that mean ergodicity on L∞ is a very strong as-sumption: Essentially no interesting MDS has this property. However, by virtue ofProposition 12.22 one may try to find a strictly ergodic model of a given ergodicsystem (X;ϕ) by looking at suitable full subalgebras A of L∞(X). This is the topicof the following section.

12.5 Mean Ergodic Subalgebras

In Chapter 8 we showed that a Koopman operator of an MDS (X;ϕ) is always meanergodic on Lp(X) for 1 ≤ p < ∞. In this section we are interested in the missing

12.5 Mean Ergodic Subalgebras 189

case p = ∞. However, mean ergodicity cannot be expected on the whole of L∞. Infact, the Koopman operator of an ergodic rotation on the torus is not mean ergodicon L∞(T,dz), see Exercise 12. The next proposition shows that this is no exception.

Proposition 12.24. For an ergodic MDS (X;ϕ) the following assertions are equiv-alent.

(i) The Koopman operator Tϕ is mean ergodic on L∞(X).

(ii) L∞(X) is finite dimensional.

Proof. Since a finite-dimensional Banach space is reflexive, by Theorem 8.22 everypower-bounded operator thereon is mean ergodic. Therefore the implication (ii)⇒(i) follows.For the converse we use the Stone representation TDS (K;ψ) with its (unique) in-variant measure µ . This TDS is minimal by Corollary 12.23. For an arbitrary x ∈ K,the orbit

orb+(x) =

ψn(x) : n ∈ N0

is dense in K, so by Lemma 12.20 µ(orb+(x)) > 0. Since the orbit is an at mostcountable set, there is n ∈ N0 such that the singleton ψn has positive measureα := µψn(x)> 0. But then, by the ψ-invariance of µ ,

µψn+k(x)= µ

[ψ

k ∈ ψn+k(x)]≥ µψn(x)= α

for each k ≥ 0. Since the measure is finite, the orbit orb+(x) must be finite. Butsince it is dense in K, it follows that K = orb+(x) is finite. Consequently, L∞(K,µ)is finite-dimensional.

Having seen that T = Tϕ is (in general) not mean ergodic on L∞(X), we look fora smaller (but still full) subalgebra A of L∞(X) on which mean ergodicity is guaran-teed. (Let us call such a subalgebra A a mean ergodic subalgebra.) In the case ofan ergodic rotation on the torus there are natural examples of such subalgebras, e.g.A = C(T) (Corollary 10.12) or A = R(T) := t 7→ f (e2πit) : f ∈ R[0,1], the spaceof Riemann integrable functions on the torus (Proposition 10.19). But, of course,the question arises, whether such a choice is always possible.

In any case, each mean ergodic subalgebra A has to be contained in the closedspace

fix(Tϕ)⊕ (I−Tϕ)L∞(X),

since by Theorem 8.5 this is the largest subspace of L∞ on which the Cesaro averagesof Tϕ converge. Would this be an appropriate choice for an algebra? Here is anotherbad news.

Proposition 12.25. For every ergodic MDS (X;ϕ) the following assertions areequivalent.

(i) lin1⊕ (I−Tϕ)L∞ is a C∗-subalgebra of L∞(X).

(ii) L∞(X) is finite dimensional.


(iii) The Koopman operator Tϕ is mean ergodic on L∞(X).

Proof. By Proposition 12.24 (ii) and (iii) are equivalent. Let A := lin1 ⊕(I−Tϕ)L∞, which is a closed subspace of L∞. Since (X;ϕ) is ergodic, one hasfixTϕ = lin1 by Proposition 7.14, and hence (iii) implies (i).Conversely, suppose that (i) holds. We identify (X;ϕ) with its Stone representation(K,µ;ϕ). Under this identification A is a closed ∗-subalgebra of C(K) containing 1.If we can prove that A separates the points of K, by the Stone–Weierstraß Theorem4.4 it follows that A = C(K), and therefore (iii).

Suppose first that (K;ϕ) has a fixed point x ∈ K. Then f (x) = 0 for every f ∈ran(I− Tϕ) and hence for every f ∈ ran(I− Tϕ) (closure in sup-norm). For f ∈ran(I− Tϕ) ⊆ A we have | f |2 = f f ∈ A and | f |2(x) = 0. Since x is a fixed point,An| f |2(x) = 0 for every n ∈ N. But Tϕ is mean ergodic on A, so the Cesaro meansAn| f |2 converge in sup-norm. By Theorem 8.10, the limit is the constant function∫

K | f |2dµ ·1, hence∫

K | f |2 dµ = 0. From the strict positivity of µ we obtain | f |2 = 0

and hence f = 0. This means that (I− Tϕ)L∞ = 0, i.e., Tϕ = I. By ergodicity,dimL1 = 1, i.e., (ii).

Now suppose that (K;ϕ) does not have any fixed points. We shall show thatalready ran(I−Tϕ) separates the points of K. Take two different points x,y ∈ K. Weneed to find a continuous function f ∈ C(K) such that

f (x)− f (y) 6= f (ϕ(x))− f (ϕ(y)).

Since x 6= ϕ(x) and y 6= ϕ(y) such a function is found easily by an application ofUrysohn’s lemma: If ϕ(x) = ϕ(y), let f be any continuous function separating x andy; and if ϕ(x) 6= ϕ(y), let f ∈ C(K) be such that f (x) = f (ϕ(y)) = 0 and f (y) =f (ϕ(x)) = 1.

After these rather negative results it becomes clear that our task consists infinding “large” subalgebras contained in 〈1〉 ⊕ (I−Tϕ)L∞. This was achieved byR. I. Jewett [Jewett (1970)] (in the weak mixing case, cf. Chapter 9) and W. Krieger[Krieger (1972)].

Theorem 12.26 (Jewett–Krieger).

a) Let (X;ϕ) be an ergodic MDS with Koopman operator Tϕ . Then there existsa Tϕ -invariant C∗-subalgebra of L∞(X), dense in L1(X), on which Tϕ is meanergodic.

b) Every ergodic MDS (X;ϕ) with ΣX countably generated is isomorphic to anMDS determined by a strictly ergodic TDS on a totally disconnected compactmetric space.

The proof is, unfortunately, beyond the scope of this book.

Exercises 191

Exercises

1. Let (X j,Σ j,µ j;ϕ j), j = 1,2,3 be given MDSs, and let θ1 : X1 → X2 and θ2 :X2 → X3 be point factor maps/isomorphisms. Show that θ2 θ1 is a point factormap/isomorphism, too. (Hint: Exercise 6.2.)

2. Prove the following alternative characterisation of point isomorphic MDSs.

Proposition. For MDSs (X;ϕ) and (Y;ψ) the following assertions are equiva-lent.

(i) (X;ϕ) and (Y;ψ) are point isomorphic.

(ii) There are sets A ∈ Σ and A′ ∈ Σ ′ and a bijection θ : A→ A′ such that thefollowing hold.

1) µ(A) = 1 = µ ′(A′),

2) θ is bijective, bi-measurable and measure-preserving,

3) A⊆ ϕ−1(A), A′ ⊆ ψ−1(A′) and θ ϕ = ψ θ on A.

(Hint: For the implication (i)⇒ (ii), define

A :=⋂n≥0

[η θ ϕn = ϕ

n ]∩[ψn θ = θ ϕn ] ∈ Σ

and A′ ∈ Σ ′ likewise, cf. also Exercise 6.3.)

3. a) Consider the Bernoulli shift B(1/2,1/2) = (W +2 ,Σ ,µ;τ) and endow [0,1]

with the Lebesgue measure. Show that the measure spaces (W2,Σ ,µ) and([0,1],Λ ,λ ), Λ the Lebesgue σ -algebra, are point isomorphic. Prove thatB(1/2,1/2) and ([0,1],Λ ,λ ;ϕ), ϕ the doubling map, are point isomorphic(see Example 5.1.2).

b) Prove the analogous statements for the baker’s transformation (Example5.1.1) and the two-sided Bernoulli-shift B(1/2,1/2) = (W2,Σ ,µ;τ).

(Hint: Write the numbers in [0,1] in binary form.)

4. Let ϕ be the tent map and ψ be the doubling map on [0,1]. Show that

ϕ : ([0,1],λ ;ψ)→ ([0,1],λ ;ϕ)

is a point homomorphism of MDSs.

5. a) Let (V,µ) be an abstract measure algebra. Show that

dµ(x,y) := µ((x ∧ yc) ∨ (y ∧ xc) (x,y ∈V )

is a metric on V .

b) Let Θ : (V,µ)→ (W,ν) be a homomorphism of measure algebras. Show thatΘ(⊥) =⊥, Θ(>) => and Θ(xc) =Θ(x)c for all x ∈V . Then show that Θ isisometric for the metric(s) dµ ,dν defined as in a).


c) Let Θ : (V,µ)→ (W,ν) be a surjective homomorphism of measure algebras.Show that Θ is bijective, and Θ−1 is a measure algebra homomorphism aswell.

6. Work out in detail the proof of Theorem 12.11.

7. Show that ergodicity and invertibility of MDSs are (algebra) isomorphism invari-ants.

8. Let (X;ϕ) and (Y;ψ) be two MDSs. Show that if S is a Markov isomorphismof the corresponding L1-spaces, the respective measure algebra isomorphism Θ is aσ -algebra isomorphism, i.e., preserves countable disjoint unions.

9. Let K = Z∪∞ be the one-point compactification of Z and let ϕ : K → K begiven by

ϕ(x) :=

x+1 if x ∈ Z,∞ if x = ∞.

Show that δ∞ is the only ϕ-invariant probability measure on K, so (K;ϕ) is uniquelyergodic, surjective, but not minimal. (Cf. Example 2.27.)

10. Let (X;ϕ) be a measure-preserving system with Koopman operator Tϕ , let(K,µK ;ϕK) be a faithful topological MDS, and let Φ : L1(K,µ) → L1(X) be aMarkov embedding of the dynamical systems. Prove that A := Φ(C(K)) is a Tϕ -invariant C∗-subalgebra. (Hint: Use Theorem 7.18.)

11. Let A be a C∗-algebra let M ⊆ A be countable subset of A, and let B := clalg(M)be the smallest C∗-subalgebra of A containing M. Show that B is separable.

12. Let a∈T with an 6= 1 for all n∈N. Show that the Koopman operator La inducedby the rotation with a is not mean ergodic on L∞(T,dz). (Hint: Consider M := an :n ∈ N and

I :=[ f ] ∈ L∞(T,dz) : f ∈ B(T) vanishes on a neighbourhood of M

.

Show that I is an ideal of L∞(T,dz) with 1 /∈ J := I. Conclude that there is ν ∈L∞(T,dz)′ which vanishes on J and satisfies 〈1,ν〉 = 1. Use this property for T ′nν

and A′nν and then employ a weak*-compactness argument.)

13. Consider the rotation system (T, dz;a) for some a ∈ T.

a) Show that R(T) is a full C∗-subalgebra of L∞(T).b) Show that if the system is ergodic, then there is precisely one normalised

positive invariant functional on R(T) (namely, f 7→∫

f dz).

c) Show that on L∞(T, dz) functionals as in b) abound.

14. Give an example of an MDS (X;ϕ) such that the space L∞(X) is infinite dimen-sional and the Koopman operator Tϕ is mean ergodic thereon.

15. Illustrate the Jewett–Krieger theorem on an irrational rotation on the torus T.

Chapter 13Markov Operators

In this chapter we shall have a closer look at Markov operators introduced in Section12.2 above. In the whole chapter X,Y,Z are given probability spaces. Recall fromDefinition 12.10 that an operator S : L1(X)→ L1(Y) is a Markov operator if itsatisfies

S≥ 0, S1 = 1, and∫

YS f =

∫X

f ( f ∈ L1(X)),

the latter being equivalently expressed as S′1 = 1. We denote the set of all Markovoperators by

M(X;Y) :=

S : L1(X)→ L1(Y) : S is Markov

and abbreviate M(X) := M(X;X). The standard topology on M(X;Y) is the weakoperator topology (see Appendix C.8).

There is an obvious conflict of terminology with Markov operators betweenC(K)-spaces defined in Exercise 10.2. Markov operators between L1-spaces, be-sides being positive and one-preserving also preserve the integral, or equivalently:T ′ as an operator between L∞-spaces is also positive and one-preserving. Hence our“Markov operators” should rather be called bi-Markov operators.

This would also be in coherence with the finite dimensional situation. Namely,if Ω = 1, . . . ,d is a finite set with probability measure µ = 1/d ∑

dj=1 δ j, and

operators on L1(Ω ,µ) = Cd are identified with matrices, then a matrix represents aMarkov operator in the sense of definition above if and only if it is bistochastic (or:doubly stochastic).

However, our definition reflects the common practice in the field, see[Glasner (2003), Definition 6.12], so we stick to it. As operator theorists we regretthe conflicting meanings of the term “Markov operator”, but at least in this book,where Markov operators on C(K)-spaces play a minor role, no problems shouldarise.

193

194 13 Markov Operators

13.1 Examples and Basic Properties

Let us begin with a collection of examples.

Examples 13.1. a) The operator 1⊗1 defined by

(1⊗1) f := 〈 f ,1〉 ·1 =

(∫X

f)

1 ( f ∈ L1(X))

is a Markov operator from L1(X) to L1(Y). It is the unique Markov operatorS ∈M(X;Y) with dimran(S) = 1, and is called the trivial Markov operator.

b) The identity mapping I is a Markov operator, I ∈M(X).

c) Clearly, every Markov embedding as defined in Section 12.2 is a Markovoperator. In particular, every Koopman operator Tϕ associated with an MDS(X;ϕ) is a Markov operator.

The following theorem summarises the basic properties of Markov operators.

Theorem 13.2. a) Composition of Markov operators yields a Markov opera-tor. The set M(X;Y) of all Markov operators is a compact convex subset ofL (L1(X);L1(Y)) with the weak operator topology. In case X = Y it is asubsemigroup.

b) Every Markov operator S ∈M(X;Y) is a Dunford–Schwartz operator, i.e., itrestricts to a contraction on each space Lp(X) for 1≤ p≤ ∞:

‖S f‖p ≤ ‖ f‖p ( f ∈ Lp(X), 1≤ p≤ ∞).

c) The adjoint S′ of a Markov operator S : L1(X)→ L1(Y) extends uniquely toa Markov operator S′ : L1(Y)→ L1(X). The mapping

M(X;Y)→M(Y;X), S 7→ S′

is affine and continuous. Moreover, one has (S′)′ = S. If R ∈M(Y;Z) is an-other Markov operator, then (RS)′ = S′R′.

Proof. Assertion a) is straightforward. For b) note first that S ∈M(X;Y) is an L1-contraction. Indeed, by Lemma 7.5 we have |S f | ≤ S | f |, and integrating yields

‖S f‖1 =∫

Y|S f | ≤

∫Y

S | f |=∫

X| f |= ‖ f‖1

for every f ∈L1. Furthermore, S is an L∞-contraction, since |S f | ≤ S | f | ≤ ‖ f‖∞

S1=‖ f‖

∞1 for every f ∈ L∞. The rest is then Theorem 8.23.

For c), take 0≤ g ∈ L∞(Y). Then∫X

S′g · f =∫

Yg ·S f ≥ 0 whenever 0≤ f ∈ L∞(X).

13.1 Examples and Basic Properties 195

Hence S′ is a positive operator. It follows that |S′g| ≤ S′ |g| (Lemma 7.5) and hence∥∥S′g∥∥

1 =∫

X

∣∣S′g∣∣≤ ∫X

S′ |g|=∫

Y|g| ·S1 =

∫Y|g|= ‖g‖1

for every g ∈ L∞(Y). So S′ is an L1-contraction, and as such has a unique extensionto a contraction S′ : L1(Y)→ L1(X). Since the positive cone of L∞ is dense in thepositive cone of L1, S′ is indeed a positive operator. An easy argument shows that

(S′)′ = S∣∣L∞(X)

,

in particular (S′)′1 = S1 = 1, whence S′ is a Markov operator. The proofs for theremaining assertions are left as Exercise 1.

By definition, the adjoint S′ of a Markov operator S ∈M(X;Y) satisfies

〈S f ,g〉Y =⟨

f ,S′g⟩

X

for f ∈ L1(X) and g ∈ L∞(Y). In order to extend this to other pairs of functions fand g, we need the following lemma.

Lemma 13.3. Let 1≤ p < ∞ and let f ,g ∈ Lp(X). Then f ·g ∈ Lp(X) if and only ifthere are sequences ( fn)n,(gn)n ⊆ L∞(X) such that

‖ fn− f‖p ,‖gn−g‖p→ 0, supn‖ fn ·gn‖p < ∞.

In this case one can choose ( fn)n and (gn)n with the additional properties

| fn| ≤ | f | , |gn| ≤ |g| , ‖ fngn− f g‖p→ 0.

Proof. For one implication, pass to subsequences that converge almost everywhereand then apply Fatou’s lemma to conclude that the product belongs to Lp. For theconverse use the approximation

fn := f ·1[ | f |≤n ], gn := f ·1[ |g|≤n ]

and the dominated convergence theorem.

With the help of Lemma 13.3 we can relate the Markov adjoint and the Hilbertspace adjoint.

Proposition 13.4. Let S ∈M(X;Y) and f ∈ L1(X), g ∈ L1(Y). If g · S | f | ∈ L1(Y),then f ·S′ |g| ∈ L1(X) and one has∫

YS f ·g =

∫X

f ·S′g.

Moreover, S′∣∣L2 =

(S∣∣L2

)∗, the Hilbert space adjoint of S∣∣L2 .


Proof. Take gn ∈ L∞ with |gn| ≤ |g| and gn→ g almost everywhere. Hence gnS f →gS f in L1, by the dominated convergence theorem. On the other hand, we haveS′gn→ S′g and∥∥ f S′gn

∥∥1 ≤

∫X| f |S′ |gn|=

∫Y

(S | f |

)|gn| ≤ ‖g ·S | f |‖1

for all n ∈ N. Hence, by Lemma 13.3, f · S′ |g| ∈ L1(X) and f · S′gn → f · S′g inL1(X). Therefore, 〈S f ,g〉Y = 〈 f ,S′g〉X as claimed. The remaining assertion followsfrom the computation(

S′ f∣∣g)L2 =

∫X

S′ f ·g =∫

Xf ·Sg =

∫X

f ·Sg = ( f |Sg)L2

for f ∈ L2(X) and g ∈ L2(Y).

Remark 13.5. Since L2 is dense in L1, the mapping

M(X;Y)→L(L2(X);L2(Y)

), S 7→ S

∣∣L2

preserves convex combinations, is continuous and injective. Hence we can switchback and forth between the L1-setting and the L2-setting.

Finally in this section we describe a convenient method of constructing Markovoperators, already employed in Section 12.3 above. The simple proof is left as Ex-ercise 2.

Proposition 13.6. . Let K be a compact space and S : C(K)→L1(X) a linear opera-tor satisfying S≥ 0 and S1 = 1. Then there is a unique µ ∈M1(K), namely µ = S′1,such that S extends to a Markov map S∼ : L1(K,µ)→ L1(X). If S is injective, thenµ has full support.

What we mean by “extension” here is of course that S= S∼J, where J : C(K)→L1(K,µ) is the canonical map assigning to f ∈C(K) its equivalence class J f moduloequality µ-almost everywhere. We usually suppress reference to this map and writeagain simply S in place of S∼ when it is safe to do so.

13.2 Embeddings, Factor Maps and Isomorphisms

Recall from Section 12.2 that a Markov operator is an embedding if it is a latticehomomorphism. Let

Emb(X;Y) :=

S : S ∈M(X;Y) is a Markov embedding

denote the set of Markov embeddings. We can now give a comprehensive character-isation.

13.2 Embeddings, Factor Maps and Isomorphisms 197

Theorem 13.7. For a Markov operator S ∈ M(X;Y) the following assertions areequivalent.

(i) S( f ·g) = S f ·Sg for all f ,g ∈ L∞(X).

(ii) S′S = IX.

(iii) There is T ∈M(Y;X) such that T S = IX.

(iv) ‖S f‖p = ‖ f‖p for all f ∈ Lp(X) and some/all 1≤ p < ∞.

(v) |S f |p = S | f |p for all f ∈ L∞(X) and some/all 1≤ p < ∞.

(vi) S ∈ Emb(X;Y), i.e., |S f |= S | f | for all f ∈ L1(X).

Moreover, (ii) implies also that ‖S f‖∞= ‖ f‖

∞for all f ∈ L∞(X).

Proof. If (i) holds, then 〈S f ,Sg〉=∫(S f ) · (Sg) =

∫S( f ·g) =

∫f ·g = 〈 f ,g〉 for all

f ,g ∈ L∞(X). This means that S′S f = f for all f ∈ L∞(X), whence (ii) follows byapproximation. The implication (ii)⇒ (iii) is trivial. If (iii) holds, then for f ∈Lp(X)and 1≤ p≤ ∞ on has ‖ f‖p = ‖T S f‖p ≤ ‖S f‖p ≤ ‖ f‖p, hence ‖S f‖p = ‖ f‖p.

Suppose that (iv) holds for some p ∈ [1,∞) and f ∈ Lp(X). Then 0≤ |S f |p ≤ S | f |pby Theorem 7.19 with g = 1. Integrating yields

‖ f‖pp = ‖S f‖p

p =∫

Y|S f |p ≤

∫Y

S | f |p =∫

X| f |p = ‖ f‖p

p .

Hence |S f |p = S | f |p, i.e., (v) is true. If we start from (v) and use the identity for fand | f |, we obtain

|S f |p = S | f |p =∣∣S | f |∣∣p = (S | f |)p

.

Taking pth roots yields |S f | = S | f | for every f ∈ L∞(X), and by density we obtain(vi). Finally, the implication (vi)⇒ (i) follows from the second part of Theorem7.18.

Remark 13.8. Property (i) can be strengthened to

(i’) S( f ·g) = S f ·Sg for all f ,g ∈ L1(Y) such that f ·g ∈ L1(Y).

This follows from (i) and Lemma 13.3.

Factor Maps

A Markov operator P ∈ M(X;Y) is called a (Markov) factor map if P′ is anembedding. Here is a characterisation of factor maps.

Theorem 13.9. For a Markov operator P ∈ M(X;Y) the following assertions areequivalent.

(i) P is a factor map, i.e., P′ is an embedding.

(ii) PP′ = IY.


(iii) There is T ∈M(Y;X) such that PT = IY.

(iv) P((P′ f ) ·g) = f · (Pg) for all f ∈ L∞(Y), g ∈ L∞(X).

Proof. If (i) holds, then P′ is an embedding, whence I = (P′)′P′ = PP′, i.e., (ii) fol-lows. The implication (ii)⇒ (iii) is trivial. If PT = I then taking adjoints we obtainT ′P′ = I′ = I, hence by Theorem 13.7 we conclude that P′ is an embedding, and thatis (i).Finally, observe that (iv) implies (ii) (just take g = 1 and apply a density argument).On the other hand, if P′ is an embedding, we have by Theorem 13.7.(i),∫

YP(P′ f ·g) ·h =

∫X

P′ f ·g ·P′h =∫

XP′( f h) ·g =

∫Y

f ·Pg ·h.

for f ,h ∈ L∞(Y) and g ∈ L∞(X). This is the weak formulation of (iv).

Remark 13.10. Assertion (iv) can be strengthened to

(iv’) P((P′ f ) ·g) = f · (Pg) ( f ∈ L1(Y), g ∈ L1(X), P′ f ·g ∈ L1(X)).

Proof. By denseness, the claim is true if g∈ L∞(X). In the general case approximategn→ g a.e., with gn ∈ L∞(X) and so that |gn| ≤ |g|. Then P′ f ·gn→ P′ f ·g a.e. and∣∣P′ f ·gn

∣∣≤ ∣∣P′ f ·g∣∣hence P′ f ·gn→ P′ f ·g by the dominated convergence theorem. This implies that

f ·Pgn = P(P′ f ·gn)→ P(P′ f ·g)

in L1(Y). But Pgn→ Pg in L1(Y) and by passing to an a.e.-convergent subsequenceone concludes (iv’).

Isomorphisms

Recall from Section 12.2 that a surjective Markov embedding S ∈M(X;Y) is calledMarkov isomorphism. The next is a characterisation of such embeddings.

Corollary 13.11. For a Markov operator S ∈M(X;Y) the following assertions areequivalent.

(i) There are T1,T2 ∈M(Y;X) such that T1S = IX and ST2 = IY.

(ii) S is an embedding and a factor map.

(iii) S and S′ are both embeddings (both factor maps).

(iv) S is a surjective embedding, i.e., a Markov isomorphism.

(v) S is an injective factor map.

(vi) S is bijective and S−1 is positive.

13.3 Markov Projections 199

In this case S−1 = S′.

Proof. By Theorems 13.7 and 13.9 it is clear that (i)–(iii) are equivalent, and anyone of them implies (iv) and (v). Suppose that (iv) holds. Then S′S = I and S issurjective. Hence it is bijective and S−1 = S′ is a Markov operator, whence (i). Theproof of (v)⇒ (i) is similar. Clearly (i)–(v) all imply (vi). Conversely, suppose that(vi) holds, i.e., S is bijective and S−1 ≥ 0. Then

|S f | ≤ S | f |= S∣∣S−1S f

∣∣≤ SS−1 |S f |= |S f | ,

for each f ∈ L1(X), whence S is an embedding, i.e., (iv).

A Markov automorphism S of X is a self-isomorphism of X. We introduce thefollowing notations

Iso(X;Y) :=

S : S ∈M(X;Y) S is a Markov isomorphism,

Aut(X) :=

S : S ∈M(X) S is a Markov automorphism.

We note that even if S is a bijective Markov operator, its inverse S−1 need not bepositive, hence not a Markov operator. As an example consider the bistochastic ma-trix

S :=(

1/3 2/3

2/3 1/3

)with inverse S−1 =

(−1 2

2 −1

),

which is not positive. This shows that one cannot drop the requirement of positivityof S−1 in (ii) of Corollary 13.11.

13.3 Markov Projections

We turn to our last class of distinguished Markov operators. A Markov operatorQ ∈M(X) is called a Markov projection if it is a projection, i.e., Q2 = Q. Alter-natively, Markov projections are called conditional expectations, for reasons thatwill become clear below (Remark 13.17).

The following lemma characterises Markov projections in terms of the Hilbertspace L2(X).

Lemma 13.12. Every Markov projection Q ∈ M(X) is self-adjoint, i.e., satisfiesQ′ = Q and restricts to an orthogonal projection on the Hilbert space L2(X). Con-versely, if Q is an orthogonal projection on L2(X) such that Q≥ 0 and Q1 = 1, thenit extends uniquely to a Markov projection on L1(X).

Proof. If Q is a Markov projection, it restricts to a contractive projection on theHilbert space H := L2(X). Hence, by Theorem 13.31 in the supplement below, Q isan orthogonal projection and in particular self-adjoint. Then Q = Q′ by Proposition13.4.


Suppose that Q is an orthogonal projection on H, satisfying Q≥ 0 and Q1= 1. ThenQ is self-adjoint and hence for f ∈ L2 one has

‖Q f‖1 =∫

X|Q f | ≤

∫X

Q | f |=∫

X| f | ·Q1 =

∫X| f |= ‖ f‖1 .

By approximation Q extends to a positive contraction on L1 which is clearly aMarkov operator. It follows that Q2 = Q, i.e., Q is a Markov projection.

Corollary 13.13. Every Markov projection Q ∈M(X) is uniquely determined by itsrange ran(Q).

Proof. Let P,Q be Markov projections with ran(P) = ran(Q). Then

ran(P|L2) = ran(P)∩L2 = ran(Q)∩L2 = ran(Q|L2).

Hence P = Q on L2 (Corollary 13.32.c)), and thus on L1 by approximation.

Before we continue, let us note the following useful fact, based on the L2-theory oforthogonal projections.

Corollary 13.14. Let X,Y be probability spaces, and let S ∈ M(X;Y) and Q ∈M(Y) such that Q is a projection and QS is an embedding. Then QS = S.

Proof. Let f ∈ L2(X). Since QS is an embedding, it is isometric on L2, and hence

‖ f‖2 = ‖QS f‖2 ≤ ‖S f‖2 ≤ ‖ f‖2 .

Hence QS f = S f by Corollary 13.32.a) below. Since L2 is dense in L1, QS = Sfollows.

We now aim at a characterisation of Markov projections. Our main auxiliaryresult is the following.

Proposition 13.15. Let X = (X ,Σ ,µ) be a probability space. Then the followingassertions hold.

a) The range F := ran(Q) of a Markov projection Q ∈M(X) is a Banach sub-lattice of L1(X) containing 1.

b) If F is a Banach sublattice of L1(X) containing 1, then

ΣF :=

A ∈ Σ : 1A ∈ F

is a sub-σ -algebra of Σ , and F = L1(X ,ΣF ,µ).

c) If Σ ′ is a sub-σ -algebra of Σ , then the canonical injection

J : L1(X ,Σ ′,µ)→ L1(X ,Σ ,µ), J f = f

is a Markov embedding.

13.3 Markov Projections 201

d) If S : L1(Y)→ L1(X) is a Markov embedding, then Q := SS′ is a Markovprojection with ran(Q) = ran(S).

Proof. a) Obviously 1 = Q1 ∈ F . For f = Q f ∈ F we have

| f |= |Q f | ≤ Q | f | , i.e., Q | f |− | f | ≥ 0.

Therefore, since Q2 = Q and Q is a Markov operator,

0≤∫

X

(Q | f |− | f |

)=∫

XQ(Q | f |− | f |

)=∫

X

(Q2−Q

)| f |= 0.

This implies that | f |= Q | f | ∈ F , whence F is a vector sublattice. Since F is closed(being the range of a bounded projection), it is a Banach sublattice.b) It is easily seen that ΣF is a sub-σ -algebra of Σ . Now, the inclusionL1(X ,ΣF ,µ) ⊆ F is clear, the step functions being dense in L1. For the converseinclusion, let 0 ≤ f ∈ F . Then 1[ f>0 ] = supn∈N n f ∧ 1 ∈ F . Applying this to( f − c1)+ ∈ F yields that 1[ f>c ] ∈ F for each c ∈ R. Hence f is ΣF -measurable.Since F is a sublattice, F ⊆ L1(X ,ΣF ,µ), as claimed.c) is trivial.d) Since S is an embedding, S′S = I (Theorem 13.7) and hence Q2 = (SS′)(SS′) =S(S′S)S′ = SS′ = Q. By Q = SS′ it is clear that ran(Q) ⊆ ran(S), but since QS =SS′S = S, we also have ran(S)⊆ ran(Q) as claimed.

Suppose Q ∈M(X) is a Markov projection. Following the steps a)–d) in Propo-sition 13.15 yields

Q 7→ F = ran(Q) 7→ ΣF 7→ JJ′ = P

and P is a Markov projection with ran(P) = ran(J) = L1(X ,ΣF ,µ) = F = ran(Q).By Corollary 13.13, it follows that P = Q.

If, however, we start with a Banach sublattice F of L1(X) containing 1 and followthe steps from above, we obtain

F 7→ ΣF 7→ Q = JJ′ 7→ ran(Q),

with ran(Q) = ran(J) = L1(X ,ΣF ,µ) = F .

Finally, start with a sub-σ -algebra Σ ′ ⊆ Σ and follow the above steps to obtain

Σ′ 7→ Q = JJ′ 7→ F = ran(Q) = ran(J) = L1(X ,Σ ′,µ) 7→ ΣF .

One has A ∈ ΣF if and only if 1A ∈ L1(X ,Σ ′,µ), which holds if, and only if,µ(A4B) = 0 for some B ∈ Σ ′. Hence Σ ′ = ΣF if and only if Σ ′ is relatively com-plete, which means that it contains all µ-null sets. (Note that ΣF as in Proposition13.15.b) is always relatively complete.)

We have proved the following characterisation result.


Theorem 13.16. Let X = (X ,Σ ,µ) be a probability space. The assignments

Q 7→ P = Q|L2 , Q 7→ ran(Q), Q 7→ Σ′ =

A ∈ Σ : Q1A = 1A

constitute one-to-one correspondences between the following four sets of objects.

1) Markov projections Q on L1(X).

2) Orthogonal projections P on L2(X) satisfying P≥ 0 and P1 = 1.

3) Banach sublattices F ⊆ L1(X) containing 1.

4) Relatively complete sub-σ -algebras Σ ′ of Σ .

In the following we sketch the connection of all these results with probability theory.

Remark 13.17 (Conditional Expectations). Let Σ ′ be a sub-σ -algebra of Σ , andlet Q be the associated Markov projection, i.e., the unique Markov projection Q2 =Q ∈M(X) with

ran(Q) = L1(X ,Σ ′,µ).

Let us abbreviate Y := (X ,Σ ′,µ). Then, for A ∈ Σ ′ and f ∈ L1(X), we have∫A

Q f dµ =∫

X1A · JJ′ f =

∫Y

1A · J′ f =∫

X(J1A) · f =

∫A

f dµ.

This means that Q = E(·|Σ ′) is the conditional expectation operator, common inprobability theory [Billingsley (1979), Section 34].

Let us turn to characterisation of Markov projections in the spirit of Theorems13.7 and 13.9.

Theorem 13.18. For a Markov operator Q ∈ M(X) the following assertions areequivalent.

(i) Q2 = Q, i.e., Q is a Markov projection.

(ii) Q = SS′ for some Markov embedding S.

(iii) Q = P′P for some Markov factor map P.

(iv) Q(Q f ·g) = Q f ·Qg for all f ,g ∈ L∞(X).

Proof. By the duality of embeddings and factor maps, (ii) and (iii) are equivalent.The equivalence of (i) and (ii) follows from Proposition 13.15 and the remarks fol-lowing it. The implication (iii)⇒ (iv) follows from

Q(Q f ·g) = P′P(P′P f ·g) = P′[P(P′(P f ) ·g

)]= P′(P f ·Pg)

= P′P f ·P′Pg = Q f ·Qg

by virtue of Theorems 13.7 and 13.9. The implication (iv)⇒ (i) is proved by spe-cialising g = 1 and by denseness of L∞(X) in L1(X).

Remark 13.19. Assertion (iv) can be strengthened to

13.4 Factors and their Models 203

(iv’) Q(Q f ·g) = Q f ·Qg for all f ,g ∈ L1(X) such that Q f ·g ∈ L1(X).

This follows, as in Remarks 13.10 and 13.8, by approximation.

Example 13.20 (Mean Ergodic Projections). Every Markov operator T ∈M(X),X = (X ,Σ ,µ), is a Dunford–Schwartz operator. Hence, it is mean ergodic (Theorem8.24), i.e., the limit

PT := limn→∞

An[T ] = limn→∞

1n∑

n−1j=0T j

exists in the strong operator topology. The operator PT is a projection with ran(PT )=fix(T ). Since T is a Markov operator, so is PT , i.e., PT is a Markov projection. ByProposition 13.15, fix(T ) is a Banach sublattice. Note that by Corollary 8.7 we have

fix(T ) = fix(T ′) and PT = PT ′ .

The corresponding σ -algebra is

Σfix =

A ∈ Σ : T 1A = 1A

and the mean ergodic projection PT coincides with the conditional expectation PT =E(·|Σfix). (Cf. Remark 8.9 for the special case that T is a Koopman operator.)

13.4 Factors and their Models

We now apply the result of the previous section to dynamical systems. We beginwith an auxiliary result employing the notation from above.

Lemma 13.21. Let X,Y be probability spaces, and let P ∈M(X) and Q ∈M(Y) beMarkov projections with ranges F = ran(P) and G = ran(Q). Then for a Markovoperator T ∈M(X;Y) the following equivalences are true.

a) T (F)⊆ G ⇔ QT P = T P.

b) T (F)⊆ G, T ′(G)⊆ F ⇔ QT = T P ⇔ PT ′ = T ′Q.

Proof. The equivalence a) is trivial. For the equivalences in b) we use a) and notethat T F ⊆G and T ′G⊆ F are equivalent to QT P = T P and PT ′Q = T ′Q. By takingadjoints in the latter equality we obtain QT P = QT . This implies QT = T P. Thelast equivalence is trivial.

Remark 13.22. Under the hypotheses of Lemma 13.21, let ΣF and ΣG be the asso-ciated sub-σ -algebras of ΣX and ΣY, respectively. Furthermore, suppose that T = Tϕ

is the Koopman operator of a measure-preserving measurable map ϕ : Y → X .Then it is easy to see that the equivalent assertions in Lemma 13.21.a) are further

equivalent to ϕ being ΣG-ΣF -measurable. If ϕ is essentially invertible, then

T ′ = T−1 = Tϕ−1 ,


and hence the statements in b) are equivalent to ϕ being ΣG-ΣF -measurable and ϕ−1

being ΣF -ΣG-measurable. See Exercise 4.

We can now introduce one of the main notions in structural ergodic theory. Afactor of an MDS (X;ϕ) is any Tϕ -invariant Banach sublattice of L1(X) containing1. A factor F is called bi-invariant if F is also T ′ϕ -invariant.

The following characterises factors in terms of Markov projections. Its proof isstraightforward from Lemma 13.21 above.

Theorem 13.23. For an MDS (X;ϕ) with Koopman operator T the following asser-tions are true.

a) The mapQ 7→ F = ran(Q)

establishes a one-to-one correspondence between the factors F of (X;ϕ) andthe Markov projections Q ∈M(X) satisfying T Q = QT Q.

b) If the MDS (X;ϕ) is invertible, then under the given map in a) the bi-invariant factors F correspond to Markov projections satisfying T Q = QT .

Equivalently, by Remark 13.22 above and the results of the previous sections onecan describe factors in terms of sub-σ -algebras.

Theorem 13.24. For an MDS (Ω ,Σ ,µ;ϕ), X = (X ,Σ ,µ), the following assertionshold.

a) There is a one-to-one correspondence between factors F of (X;ϕ) and rela-tively complete sub-σ -algebras ΣF of Σ with respect to which ϕ is measur-able, given by

F 7→ ΣF :=

A ∈ Σ : 1A ∈ F

and Σ′ 7→ FΣ ′ := L1(X ,Σ ′,µ).

b) If ϕ is essentially invertible, then F is a bi-invariant factor if and only if bothϕ and ϕ−1 are ΣF -measurable. Alternatively, F is bi-invariant if it is also afactor of the MDS (X;ϕ−1).

Finally, let us characterise factors in the spirit of Chapter 12, i.e., in the frameworkof topological MDSs.

Theorem 13.25. Let (X;ϕ) be an MDS with Koopman-operator T = Tϕ , and letF ⊆ L1(X). Then the following assertions are equivalent.

(i) F is a factor of (X;ϕ).

(ii) F = clL1(A) for some Tϕ -invariant C∗-subalgebra A⊆ L∞(X).

(iii) F = ran(S) for some Markov embedding of MDSs

S : (L1(Y);Tψ)→ (L1(X);Tϕ)

and some MDS (Y;ψ).


(iv) F = ran(Φ) for some Markov embedding of MDSs

Φ : (L1(K,µ);Tψ)→ (L1(X);Tϕ)

and some faithful topological MDS (K,µ;ψ).

Proof. The implications (iv)⇒ (iii)⇒ (i) are trivial, and (ii)⇒ (iv) is Theorem12.16.For the implication (i)⇒ (ii) let F be a Tϕ -invariant Banach sublattice of L1(X)containing 1. We claim that A := F ∩ L∞ satisfies the requirements. Indeed, A iscertainly closed in L∞ and Tϕ -invariant as well as conjugation invariant. Moreover,it contains 1 and it is a vector sublattice of L∞. Hence, by Theorem 7.18, A is alsoa subalgebra. It remains to show that A is L1-dense in F . Since F is a sublattice, itsuffices to show that each 0≤ f ∈ F can be approximated by elements of A. But thiscan be done simply by taking fn := f ∧ n1. So (ii), and hence the whole theorem isproved.

Let F be a factor of the MDS (X;ϕ). Then each embedding S as in (iii) of The-orem 13.25 is called a model of the factor. If the model is given as a faithful topo-logical MDS (K,ν ;ψ) as in (iv), then it is called a topological model of the factor.A straightforward way to obtain a model for F is to form the MDS (XF ;ϕ) on theprobability space

XF := (X ,ΣF ,µX), ΣF =

A ∈ ΣX : 1A ∈ F.

However, this is in general not a topological model, even if the original system istopological.

Let S : (L1(Y);Tψ)→ (L1(X);Tϕ) be a model of the factor F = ran(S), S∈M(X).Then the Markov adjoint P := S′ is a factor map satisfying

T ′ψ PTϕ = P.

Indeed, from STψ = Tϕ S it follows T ′ψ P = T ′ψ S′ = S′T ′ϕ = PT ′ϕ , and hence T ′ψ PTϕ =PT ′ϕ Tϕ = P.

It is convenient to denote this situation in a short-hand way with

P : (X;ϕ)→ (Y;ψ), (13.1)

understanding that P is actually an operator between the L1-spaces and not a pointmap on the underlying sets. The factor F can be reconstructed from P via F =ran(P′) = ran(S) and the associated conditional expectation is Q = PP′. So all theessential information is contained in the factor map P, and it is common to abuselanguage and simply call the whole situation (13.1) a factor.

Examples 13.26. 1) If θ : (X;ϕ)→ (Y;ψ) is a point factor map (Definition 12.1)then its associated Koopman operator Tθ : (L1(Y),Tψ) → (L1(X),Tϕ) is a


Markov embedding of MDSs, and hence constitutes a factor. The associatedfactor map is T ′

θ, so we can denote this situation also with

T ′θ : (X;ϕ)→ (Y;ψ)

by our convention above.

2) For each (X;ϕ), the fixed space F = fix(Tϕ) is a factor (Theorem 8.8), andon each faithful topological model for it the dynamics is trivial. If the MDSis ergodic, i.e., dimfix(Tϕ) = 1, then F has a one-point space K = /0 as atopological model.

Exercise 5 shows that in Theorem 13.25 one can replace the role of L1(X) by Lp(X)for any 1≤ p < ∞.

Remark 13.27. Suppose that the factor F is not only Tϕ -invariant, but satisfiesTϕ(F) = F . Then, equivalently, Tϕ restricts to an isomorphism on F , and in theconstruction of a topological model one can assume Tϕ(A) = A. Equivalently, thedynamics ψ on the representation space K is invertible. Hence the factors on whichthe dynamics is invertible are characterized by admitting invertible (topological)models.

If, actually, the MDS (X;ϕ) is itself invertible, then it is natural to consider onlybi-invariant factors, i.e., factors that are also invariant under T−1

ϕ . As seen above,these are characterised by admitting invertible models. In a context where one ex-clusively considers invertible MDSs it is hence convenient (and very common) torestrict the name “factor” to include the bi-invariance.

Supplement: Operator Theory on Hilbert Space

In this supplement we review some facts from elementary Hilbert space operatortheory. In the following H and K denote Hilbert spaces with inner product denotedby ( · | ·). The Hilbert space adjoint of a linear operator T : K → H is denoted byT ∗ : H→ K, see Appendix C.10.

We begin a useful lemma concerning the relation of weak and strong convergencein a Hilbert space.

Lemma 13.28. For a net ( fα)α in a Hilbert space H and some f ∈ H the followingassertions are equivalent.

(i) ‖ fα − f‖→ 0.

(ii) fα → f weakly, and ‖ fα‖→ ‖ f‖.

Proof. The equivalence follows from the standard identity

‖ fα − f‖2 = ‖ fα‖2 +‖ f‖2−2Re( fα | f )


Recall that an operator T : H→ K is an isometry if ‖T f‖= ‖ f‖ for all f ∈ H. Weobtain the following straightforward consequence of Lemma 13.28.

Corollary 13.29. On the set

Iso(H) :=

T ∈L (H) : ‖T f‖= ‖ f‖ for all f ∈ H

of isometries, the weak and the strong operator topologies coincide.

Here is a useful characterisation of isometries.

Lemma 13.30. For an operator T ∈L (H;K) the following assertions are equiva-lent.

(i) T is an isometry, i.e., ‖T f‖= ‖ f‖ for all f ∈ H.

(ii) (T f |T g) = ( f |g) for all f ,g ∈ H.

(iii) T ∗T = I.

Proof. (i)⇒ (ii): From the polarisation identity (see Section C.10) it follows thatRe(T f |T g) = Re( f |g) for all f ,g ∈ H. Replacing g by ig we obtain (ii). Theimplications (ii)⇒ (iii)⇒ (i) are straightforward.

Recall that an operator P ∈L (H) is called an orthogonal projection if P2 = Pand ran(P)⊥ ran(I−P). Here is a characterisation.

Theorem 13.31. Let H be a Hilbert space and P ∈L (H). Then the following as-sertions are equivalent.

(i) P = SS∗ for some Hilbert space K and some linear isometry S : K→ H.

(ii) P2 = P and ‖P‖ ≤ 1.

(iii) ran(I−P)⊥ ran(P).

(iv) P2 = P = P∗.

(v) ran(P) is closed, and P = JJ∗ with J : ran(P)→ H the canonical inclusionmap.

In this case, ran(S) = ran(P) in (i).

Proof. (i)⇒ (ii): By Lemma 13.30 above, S∗S = I, whence P2 = (SS∗)(SS∗) =S(S∗S)S∗ = SS∗ = P. Also, ‖P‖ = ‖S∗S‖ ≤ ‖S∗‖‖S‖ = 1. Finally, ran(S) ⊆ ran(P)is clear from P = SS∗, and the converse inclusion follows from PS = SS∗S = S.

(ii)⇒ (iii): Let f ∈ ran(P), g ∈ H. Then for c ∈ C we have by (ii)

‖ f − cPg‖2 = ‖P( f − cg)‖2 ≤ ‖ f − cg‖2 .

Expanding the scalar products yields

2Rec( f |g−Pg)≤ |c|2 (‖g‖2−‖Pg‖2).


Now we specialise c = t( f |g−Pg) with t > 0, and obtain

2t∣∣( f |g−Pg)

∣∣2 ≤ t2∣∣( f |g−Pg)∣∣2(‖g‖2−‖Pg‖2) for all t > 0,

and that implies f ⊥ g−Pg as was to be proved.

(iii)⇒ (iv): Let f ,g ∈ H. Then by (iii)∥∥P f −P2 f∥∥2

= (P( f −P f ) |(I−P)P f ) = 0,

whence P2 = P. Moreover, by (ii) and symmetry

(P f |g) = (P f |g−Pg)+(P f |Pg) = (P f |Pg) = · · ·= ( f |Pg)

which means that P∗ = P.(iv)⇒ (v): Let Q := JJ∗. Since ran(J) = ran(P) by construction, PJ = J, and hencePQ = Q and then QP = Q by (iv). On the other hand, since J is an isometry, QJ =J(J∗J) = J, whence QP = P. It follows that Q = P.

The implication (v)⇒ (i) is trivial.

Corollary 13.32. a) For an orthogonal projection P ∈ L (H) and f ∈ H wehave

P f = f ⇔ ‖P f‖= ‖ f‖ .

b) For orthogonal projections P,Q ∈L (H) we have

ran(P)⊆ ran(Q) ⇔ PQ = P ⇔ QP = P.

c) Every orthogonal projection P on H is uniquely determined by its rangeran(P).

Proof. a) Since ran(I−P)⊥ ran(P), by Pythagoras’ lemma

‖ f‖2 = ‖P f‖2 +‖(I−P) f‖2

for every f ∈ H.b) It is clear that ran(P)⊆ ran(Q) is equivalent with QP = P. But since Q and P areself-adjoint, the latter is equivalent with P = P∗ = (QP)∗ = P∗Q∗ = PQ.c) If P,Q are orthogonal projections on H with ran(P) = ran(Q), then by b) we haveP = PQ = Q.

A self-adjoint operator T ∈L (H) is called positive semi-definite if (T f | f )≥0 for all f ∈ H. In this case, the mapping ( f ,g)→ (T f |g) defines a semi-innerproduct on H. The square of a self-adjoint operator is clearly positive semi-definite.More generally, if T is positive definite and S ∈L (H), then ST S∗ is positive defi-nite, too. If T is positive semi-definite, so is each power T n. Moreover, each orthog-onal projection is positive semi-definite since


(P f | f ) =(

P2 f∣∣ f)= (P f |P f ) = ‖P f‖2 ≥ 0

for every f ∈ H.

Proposition 13.33. Let T ∈L (H) be a positive semi-definite contraction. Then

S f := limn→∞

T n f

exists in H for each f ∈ H and yields a positive semi-definite contraction S.

Proof. Since T is a contraction, we have

(T f | f )≤ ‖T f‖ · ‖ f‖ ≤ ( f | f ) ,

hence the operator I−T is positive semi-definite. From the identity

T (I−T ) = (I−T )T (I−T )+T (I−T )T

and since T and I−T are self-adjoint, we conclude that T (I−T ) is positive semi-definite.

By what is proved above, for k ∈ N0 and f ∈ H we can write(T 2k+1 f

∣∣ f)=(

T T k f∣∣T k f

)≤(

T k f∣∣T k f

)=(

T 2k f∣∣ f),

and, since T (I−T ) is positive semi-definite,(T 2k f

∣∣ f)=(

T 2T k−1 f∣∣T k−1 f

)≤(

T T k−1 f∣∣T k−1 f

)=(

T 2k−1 f∣∣ f).

Hence for fixed f ∈H the sequence ((T n f | f ))n∈N decreasing and positive, so con-vergent. Moreover, since T is self-adjoint, we have

‖T n f −T m f‖2 =(

T 2n f∣∣ f)+(

T 2m f∣∣ f)−2Re(T n+m f | f )→ 0,

which shows that (T n f )n∈N is a Cauchy sequence. The properties of the limit oper-ator S follow easily.

If P,Q are orthogonal projections on H, then F := ran(P)∩ ran(Q) is a closedsubspace of H. Here is how to compute the orthogonal projection onto F .

Proposition 13.34. Let P,Q ∈ L (H) be orthogonal projections. Then the stronglimit

S f := limn→∞

(PQ)n f

exists in H for every f ∈H, and S is the orthogonal projection onto ran(P)∩ ran(Q).

Proof. Let f ∈H and define T := QPQ. Then T is a positive semi-definite contrac-tion and hence R f := limn→∞ T n f exists by Proposition 13.33, and R is self-adjoint.By construction QT n = T nQ = T n for each n ∈ N, whence QR = RQ = R follows.


We can also conclude QPR = R. Now, since (PQ)n = PT n−1, we have that hence(PQ)n → PR =: S strongly. Since P,Q,R are contractions, so is S. It also follows,that (PQ)2n → S2, whence S = S2. This shows that S is an orthogonal projection,and RP = PR. We have PS = S = S∗ = SP and QS = QPR = QRP = RP = S, and asa consequence ran(S)⊆ ran(P)∩ (Q). But if f ∈ ran(P)∩ ran(Q), then (QP)n f = ffor each f and hence S f = f . It follows that ran(P)∩ ran(Q) = ran(S), which con-cludes the proof.

Exercises

1. Prove the remaining statements of Theorem 13.2.


3. For T ∈L (H) a positive definite operator prove that its power T n, n ∈ N are allpositive definite.

4. Prove the claims made in Remark 13.22.

5. Let (X;ϕ) be an MDS, 1 ≤ p < ∞, and E ⊆ Lp(X) with 1 ∈ E. Show that thefollowing assertions are equivalent.

(i) clL1(E) is a factor of (X;ϕ).

(ii) E is a Tϕ -invariant Banach sublattice of Lp(X).

(iii) E = clLp(A) for some Tϕ -invariant C∗-subalgebra of L∞(X).

(iv) E = Φ(Lp(Y)) for some embedding

Φ : (L1(Y);Tψ)→ (L1(X);Tϕ)

of measure-preserving systems.

Chapter 14Compact Semigroups and Groups

Compact groups were introduced already in Chapter 2 as nice examples of TDSs. InChapter 5 they reappeared, endowed with their Haar measure, as simple examples ofMDSs. As we shall see later in Chapter 17, these examples are by no means artificial:within a structure theory of general dynamical systems they form the fundamentalbuilding blocks.

Another reason to study groups (or even semigroups) is that they are present inthe mere setting of general dynamical systems already. If (K;ϕ) is an invertibleTDS, then ϕn : n ∈ Z is an Abelian group of transformations of K and—passingto the Koopman operator—T n

ϕ : n∈Z is a group of operators on C(K). Analogousremarks hold for MDSs, of course. If the dynamical system is not invertible, thenone obtains semigroups instead of groups (see definition below), and indeed, manynotions and results from Chapters 2 and 3 can be generalised to general (semi)groupactions on compact spaces (see, e.g., [Gottschalk and Hedlund (1955)]).

In this chapter we shall present some facts about compact semigroups andgroups. We concentrate on those that are relevant for a deeper analysis of dy-namical systems, e.g., for the decomposition theorem to come in Chapter 16.Our treatment is therefore far from being complete and the reader is referredto [Hofmann and Mostert (1966)], [Hewitt and Ross (1979)], [Rudin (1990)] and[Hofmann and Morris (2006)] for further information.

14.1 Compact Semigroups

A semigroup is a nonempty set S with an operation S×S→ S, (s, t) 7→ st := s · t—usually called multiplication—which is associative, i.e., one has r(st) = (rs)t for allr,s, t ∈ S. If the multiplication is commutative (i.e., st = ts for all s, t ∈ S), then thesemigroup is called Abelian. An element e of a semigroup S is called an idempotentif e2 = e, a zero element if es = se = e for all s ∈ S, and a neutral element ifse= es= s for all s∈ S. It is easy to see that, though there may be many idempotents,but at most one neutral/zero element in a semigroup. The semigroup S is called a

211

212 14 Compact Semigroups and Groups

group if it has a neutral element e and for every s ∈ S the equation sx = xs = e hasa solution x. It is easy to see that this solution is unique, called the inverse s−1 ofs. (We assume the reader to be familiar with the fundamentals of group theory andrefer, e.g., to [Lang (2005)].)

For subsets A,B⊆ S and elements s ∈ S we write

sA = sa : a ∈ A, As = as : a ∈ A, AB := ab : a ∈ A, b ∈ B.

A right ideal of S is a nonempty subset J such that JS⊆ J. Similarly one defines leftideals. A subset which is simultaneously a left and a right ideal is called a (two-sided) ideal. Obviously, in an Abelian semigroup this condition reduces to JS ⊆ J.A subsemigroup of S is a nonempty subset H ⊆ S such that HH ⊆ H.

A right (left, two-sided) ideal of a semigroup S is called minimal if it is minimalwith respect to set inclusion within the set of all right (left, two-sided) ideals. Theintersection of all ideals

K(S) :=⋂J : J ⊆ S, J ideal

is called the Sushkevich kernel of S. It is either empty, or the (unique) minimalideal. (Indeed, if J, I are minimal ideals of S, then /0 6= JI ⊆ I ∩ J is an ideal, andhence I = I∩ J = J by minimality.)

Minimal right (left) ideals are either disjoint or equal. An idempotent element ina minimal right (left) ideal is called minimal idempotent and has special properties.

Lemma 14.1 (Minimal Idempotents). Let S be semigroup and e ∈ S with e2 = e.Then the following assertions are equivalent.

(i) eS (or Se) is a minimal right ideal (or left ideal).

(ii) e is contained in some minimal right ideal (left ideal).

(iii) eSe is a group (with neutral element e).

In this case, e is minimal in the set of idempotents with respect to the ordering

p≤ q ⇔ pq = qp = p.

Proof. (i)⇒ (ii) is trivial. For the proof of (ii)⇒ (iii) let R be a minimal right idealwith e ∈ R, and left G := eSe. Clearly e is a neutral element in G. Let s ∈ S be arbi-trary. Then eseR⊆ R is also a right ideal, and by minimality eseR = R. In particular,there is x ∈ R such that esex = e. It follows that g := ese ∈ G has the right inverseh := exe ∈ G. Applying the same reasoning to x we find that also h has a right in-verse, say k. But then hg = hge = hghk = hk = e, and hence h is also the left inverseof g.(iii)⇒ (i): Suppose that G := eSe is a group, and let R ⊆ eS be a right ideal. Taker = es∈R. Then there is x∈ S such that e= (ese)(exe) = rexe∈R. It follows eS⊆R,so R = eS.

14.1 Compact Semigroups 213

Suppose f is an idempotent and f ≤ e, i.e., e f = f e = f = f 2. Then f S⊆ eS and byminimality f S = eS. Hence there is x ∈ S such that f x = e2 = e. But then f = f e =f 2x = f x = e.

As we are doing analysis, semigroups are interesting to us only if endowed with atopology which in some sense is related to the algebraic structure. One has differentpossibilities here.

Definition 14.2. A semigroup endowed with a topology is called a left-topologicalsemigroup if for each a ∈ S the left multiplication with a, i.e., the mapping

S→ S, s 7→ as

is continuous. Similarly, S is called a right-topological semigroup if for each a ∈ Sthe right multiplication with a, i.e., the mapping

S→ S, s 7→ sa

is continuous. If both left and right multiplications are continuous, S is a semitopo-logical semigroup. And S is a topological semigroup if the multiplication mapping

S×S→ S, (s, t) 7→ st

is continuous. A topological group is a topological semigroup that is algebraicallya group and such that the inversion mapping s 7→ s−1 is continuous.

Clearly an Abelian semigroup is left-topological if and only if it is semitopolog-ical. In a later section we shall see that βN (familiar already from Chapter 4) is anexample of a left-topological but not semitopological semigroup.

One says that in a semitopological semigroup the multiplication is separately con-tinuous, whereas in a topological semigroup it is jointly continuous. Of course thereare examples of semitopological semigroups that are not topological, i.e., such thatthe multiplication is not jointly continuous (see Exercise 3). The example particu-larly interesting to us is L (E), E a Banach space, endowed with the weak operatortopology. By Proposition C.18 and Example C.18 this is a semitopological semi-group, which in general is not topological.

Any semigroup can be made into a topological one if endowed with the discretetopology. This indicates that such semigroups may be too general to study. However,compact left-topological semigroups (i.e., ones whose topology is compact) exhibitsome amazing structure, which we shall study now.

Theorem 14.3 (Ellis). In a compact left-topological semigroup, every right idealcontains a minimal right ideal. Every minimal right ideal is closed and contains atleast one idempotent.

Proof. Note that sS is a closed right ideal for any s ∈ S. If R is any right ideal andx ∈ R then xS ⊆ R, and hence any right ideal contains a closed one. If R is evenminimal, then we must have R = xS, and R is itself closed.


Now if J0 is a given right ideal, then let M be the set of all closed right idealsof S contained in J0. Then M is nonempty and partially ordered by set inclusion.Moreover, any chain C in M has a lower bound

⋂C , since this is nonempty by

compactness. By Zorn’s lemma, there is a minimal element R in M . If J ⊆ R isalso a right ideal and x ∈ J, then xS ⊆ J ⊆ R, and xS is a closed right ideal. Byconstruction xS = R, whence J = R, too.

Finally, we find by an application of Zorn’s lemma as before a nonempty closedsubsemigroup H of R which is minimal within all closed subsemigroups of R. Ife ∈ H then eH is a closed subsemigroup of R, contained in H, whence eH = H.The set t ∈ H : et = e is then nonempty and closed, and a subsemigroup of R.By minimality, it must coincide with H, and hence contains e. This yields e2 = e,concluding the proof.

Lemma 14.4. In a compact left-topological semigroup, the Sushkevich kernel satis-fies

K(S) =⋃

R : R minimal right ideal6= /0. (14.1)

Moreover, an idempotent of S is minimal if and only if it is contained in K(S).

Proof. To prove “⊇”, let I be an ideal and J a minimal right ideal. Then JI ⊆ J∩ I,so J ∩ I is a nonempty right ideal. By minimality J = J ∩ I, i.e., J ⊆ I. For “⊆” itsuffices to show that the right-hand side of (14.1) is an ideal. Let R be any minimalright ideal, x ∈ S, and R′ ⊆ xR another right ideal. Then /0 6= y : xy ∈ R′∩R isa right ideal, whence by minimality equal to R. But this means that xR ⊆ R′ and itfollows that xR is also minimal, whence contained in the right-hand side of (14.1).

The remaining statement follows from (14.1) and Lemma 14.1.

We can now state and prove the main result.

Theorem 14.5. Let S be a compact semitopological semigroup. Then the followingassertions are equivalent.

(i) Every minimal right ideal is a minimal left ideal.

(ii) There is a unique minimal right ideal and a unique minimal left ideal.

(iii) There is a unique minimal idempotent.

(iv) The Sushkevich kernel K(S) is a group.

The conditions (i)–(iv) are satisfied in particular if S is Abelian.

Proof. Clearly, (ii) implies (i), by Lemma 14.4 and symmetry.(i)⇒ (iv): Let R be a minimal right ideal and let e ∈ R be an idempotent, whichexists by Theorem 14.3. By hypothesis, R is also a left ideal, i.e., an ideal. HenceK(S)⊆ R⊆ K(S) (by Lemma 14.4), yielding R = K(S). Since R is minimal as a leftideal and a right ideal, R = eS = Se, from which it follows that K(S) = R = eSe, agroup by Lemma 14.1, hence (iv).(iv)⇒ (iii): A minimal idempotent has to lie in K(S) by Lemma 14.4, and so mustcoincide with the unique neutral element of K(S).

14.1 Compact Semigroups 215

(iii)⇒ (ii): Since different minimal right ideals are disjoint, and each one containsan idempotent, there can be only one. By symmetry, the same is true for left ideals,whence (ii) follows.

Suppose that S is a compact semitopological semigroup that satisfies the condi-tions (i)–(iv) of Theorem 14.5, so that K(S) is a group. Then K(S) is also compact,as it coincides with the unique right (left) ideal, closed by Theorem 14.3. The fol-lowing celebrated result of Ellis (see [Ellis (1957)] or [Hindman and Strauss (1998),Section 2.5]) states that in this case the group K(S) is already topological.

Theorem 14.6 (Ellis). Let G be a semitopological group whose topology is locallycompact. Then G is a topological group.

As mentioned, combining Theorem 14.5 with Ellis’ result we obtain the following.

Corollary 14.7. Let S be a compact semitopological semigroup that satisfies theequivalent conditions of Theorem 14.5, e.g., suppose S Abelian. Then its minimalideal K(S) is a compact (topological) group.

One can ask for which non-Abelian semigroups S the minimal ideal K(S) is a group.One class of examples is given by the so-called amenable semigroups, for detailssee [Day (1957a)] or [Paterson (1988)]. Another one, and very important for op-erator theory, are the weakly closed semigroups of contractions on certain Banachspaces, see Section 16.1 below.

Proof of Ellis’ Theorem in the Metrisable Case

The proof of Theorem 14.6 is fairly involved. The major difficulty is deducingjoint continuity from separate continuity. By making use of the group structureit is enough to find one single point in G×G at which multiplication is contin-uous. This step can be achieved by applying a more general result of Namioka[Namioka (1974)] that asserts that under appropriate assumptions a separately con-tinuous function has many points of joint continuity, see also [Kechris (1995)] or[Todorcevic (1997)]. For our purpose the following formulation suffices. Recallfrom Appendix A that in a Baire space, Baire’s Category Theorem A.10 is valid.

Proposition 14.8. Let Z,Y be metric spaces, X a Baire space and let f : X×Y → Zbe a separately continuous function. For every b ∈ Y , the set

x ∈ X : f is not continuous at (x,b)

is of first category in X.

Proof. a) Let b ∈ Y be fixed. For n,k ∈ N define

Xb,n,k :=⋂

y∈B(b, 1k )

x ∈ X : d( f (x,y), f (x,b))≤ 1

n

.


By the continuity of f in the first variable, we obtain that the sets Xb,n,k are closed.From the continuity of f in the second variable, we infer that for all n ∈ N the setsXb,n,k cover X , i.e.,

X =⋃k∈N

Xb,n,k =⋃k∈N

Xb,n,k ∪⋃k∈N

∂Xb,n,k.

(Here Xb,n,k denotes the interior of Xb,n,k.) Since X is Baire space and ∂Xb,n,k arenowhere dense, it follows by Theorem A.11 that the open set⋃

k∈NXb,n,k

is dense in X . This yields, again by Theorem A.11, that

Xb :=⋂

n∈N

⋃k∈N

Xb,n,k

is a dense Gδ set. If a ∈ Xb, then for all n ∈ N there is k ∈ N with a ∈ Xb,n,k. Sowith an appropriate open neighbourhood U of a we even obtain U ⊆ Xb,n,k. Sincef is continuous in the first variable, we can take the neighbourhood U such thatd( f (a,b),d(x,b)) ≤ 1

n holds for all x ∈U . Now, if x ∈U ⊆ Xb,n,k and y ∈ B(b, 1k ),

we obtain

d( f (x,y), f (a,b))≤ d( f (x,y), f (x,b))+d( f (x,b), f (a,b))≤ 2n.

Hence f is continuous at (a,b) ∈ X ×Y if a ∈ Xb. Since Xb is a dense Gδ set, theassertion follows.

Proof of Ellis’ Theorem (compact, metrisable case). Apply Proposition 14.8 to themultiplication to obtain joint continuity at one point in G×G. Then, by translation,the joint continuity everywhere follows.We prove that the inversion mapping g 7→ g−1 is continuous on G. Let gn → g.It suffices to show that there is a subsequence of (g−1

n )n∈N convergent to g−1. Bycompactness, we find a subsequence (g−1

nk)k∈N convergent to some h as k→ ∞. By

the joint continuity of the multiplication 1 = gnk g−1nk→ gh, whence h = g−1.

Remark 14.9. A completely different proof of Ellis’ theorem in the compact (butnot necessarily metrisable) case is based on Grothendieck’s theorem about weakcompactness in C(K)-spaces. Roughly sketched, one first constructs Haar measurem on the compact semitopological group G, then represents G by the right regularrepresentation on the Hilbert space H = L2(G,m), and finally uses the fact that onthe set of unitary operators on H the weak and strong operator topologies coincide.As multiplication is jointly continuous for the latter topology and the representationis faithful, G is a topological group. A detailed account can be found in AppendixE, see in particular Theorem E.12.

14.2 Compact Groups 217

As we have seen, by passing to the minimal ideal we are naturally led fromcompact semitopological semigroups to compact topological groups. Their study isthe subject of the next section.

14.2 Compact Groups

In the following, G always denotes a topological group with neutral element 1 = 1G.For a set A ⊆ G we write A−1 := a−1 : a ∈ A and call it symmetric if A−1 = A.Note that A is open if and only if A−1 is open since the inversion mapping is ahomeomorphism of G. In particular, every open neighbourhood U of 1 contains asymmetric open neighbourhood of 1, namely U ∩U−1.

Since the left and right multiplications by any given element g ∈ G are homeo-morphisms of G, the set of open neighbourhoods of the neutral element 1 completelydetermines the topology of G. Indeed, U is an open neighbourhood of 1 if and onlyif Ug (or gU) is an open neighbourhood of g ∈ G.

The following are the basic properties of topological groups needed later.

Lemma 14.10. For a topological group G the following statements hold.

a) If V is an open neighbourhood of 1, then there is 1∈W open, symmetric withWW ⊆V .

b) If H is a topological group and ϕ : G→ H is a group homomorphism, thenϕ is continuous if and only if it is continuous at 1G.

c) If H is a normal subgroup of G, then the factor group G/H is a topologicalgroup with respect to the quotient topology.

d) If H is a (normal) subgroup of G, then H is also a (normal) subgroup of G.

Proof. To prove a) we use that the multiplication is (jointly) continuous to find 1∈Uopen with UU ⊆V . Then W :=U ∩U−1 has the desired property. For b), c) and d)see Exercise 2 and Exercise 4.

Recall that a topological group is called a (locally) compact group if its topologyis (locally) compact. Important (locally) compact groups are the additive groups Rd

and the (multiplicative) tori Td , d ∈N. Furthermore, every group G can be made intoa locally compact group, denoted by Gd, by endowing it with the discrete topology.

Proposition 14.11. Let G be a compact group, and let f ∈ C(G). Then f is uni-formly continuous, i.e., for any ε > 0 there is a symmetric open neighbourhood Uof 1 such that gh−1 ∈U implies | f (g)− f (h)|< ε .

Proof. Fix ε > 0. By continuity of f , for each g ∈ G the set

Vg :=

z ∈ G : | f (zg)− f (g)|< ε/2


is an open neighbourhood of 1. By Lemma 14.10.a) there is a symmetric open neigh-bourhood Wg of 1 such that WgWg ⊆ Vg. Now G =

⋃g∈G Wgg, and hence by com-

pactness G =⋃

g∈F Wgg for some finite set F ⊆ G. Define

U :=⋂

g∈F

Wg,

which is a symmetric open neighbourhood of 1. For xy−1 ∈U we find g ∈ F suchthat y ∈ Wgg ⊆ Vgg. Hence x ∈ Uy ⊆ WgWgg ⊆ Vgg. Therefore xg−1,yg−1 ∈ Vg,which implies that

| f (x)− f (y)| ≤ | f (x)− f (g)|+ | f (y)− f (g)|< ε.

Proposition 14.11 yields a second proof for a fact already established in Proposi-tion 10.11. Recall that for an element a ∈ G we denote by La and Ra the Koopmanoperators associated with the left and right rotations by a, respectively. That is

(La f )(x) := f (ax), (Ra f )(x) := f (xa) (x ∈ G, f : G→ C).

Corollary 14.12. Let G be a compact group and let f ∈ C(G). Then the mappings

G→ C(G), a 7→ La f (14.2)G→ C(G), a 7→ Ra f (14.3)

are continuous.

Proof. Let ε > 0 and choose U as in Proposition 14.11. If ab−1 ∈ U ,then (ax)(bx)−1 = ab−1 ∈ U for every x ∈ G. Hence ‖La f −Lb f‖

∞=

supx∈G | f (ax)− f (bx)| ≤ ε . This proves the first statement. To prove the second,apply the first to the function f∼(x) := f (x−1) and note that x 7→ x−1 is a homeo-morphism of G.

We remark that Corollary 14.12 allows to prove that the operators La and Ra onC(G) are mean ergodic, see Corollary 10.12.

The Haar Measure

As explained in Chapter 5, on any compact group there is a unique rotation invariant(Baire) probability measure m, called the Haar measure. In this section, we shallprove this for the Abelian case; a proof in the general case is in Appendix E.4.

Theorem 14.13 (Haar Measure). On a compact group G there is a unique leftinvariant Baire probability measure. This measure is also right invariant and inver-sion invariant and is strictly positive.

Proof. It is easy to see that if µ ∈M1(G) is left (right) invariant, then µ ∈M1(G),defined by

14.3 The Character Group 219

〈 f , µ〉=∫

Gf (x−1)dµ(x) ( f ∈ C(G))

is right (left) invariant. On the other hand, if µ is left invariant and ν is right invari-ant, then by Fubini’s theorem for f ∈ C(G)∫

Gf (x)dµ(x) =

∫G

∫G

f (x)dµ(x)dν(y) =∫

G

∫G

f (xy)dµ(x)dν(y)

=∫

G

∫G

f (xy)dν(y)dµ(x) =∫

G

∫G

f (y)dν(y)dµ(x) =∫

Gf (y)dν(y),

and hence µ = ν . This proves uniqueness, and that every left invariant measure isalso right invariant and inversion invariant.Suppose that m is such an invariant probability measure. To see that it is strictly pos-itive, let 0 6= f ≥ 0. Then the sets Ua :=[La f > 0 ] cover G, hence by compactnessthere is a finite set F ⊆ G such that G⊆

⋃a∈F [La f > 0 ]. Define g := ∑a∈F La f and

c := infx∈G h(x)> 0. Then g≥ c1, whence

c =∫

Gc1dm≤

∫G

gdm = ∑a∈F

∫K

La f dm = ∑a∈F

∫K

f dm.

It follows that 〈 f ,m〉> 0.Only existence is left to show. As announced, we treat here only the case that G isAbelian. Then the family (L′a)a∈G of adjoints of left rotations is a commuting familyof continuous affine mappings on M1(K), which is convex and weakly∗ compact,see Section 10.1. Hence the Markov–Kakutani Theorem 10.1 yields a common fixedpoint m, which is what we were looking for.

Remark 14.14. A proof for the non-Abelian case can be based on a more re-fined fixed-point theorem, say of Ryll-Nardzewski, see [Ryll-Nardzewski (1967)]or [Dugundji and Granas (2003), §7.9]. Our proof in Appendix E.4 is based on theclassical construction, see e.g. [Rudin (1991)]. However, we start there from slightlymore general hypotheses in order to prove Ellis’ theorem, see Theorem E.10.

14.3 The Character Group

An important tool in the study of locally compact Abelian groups is its charactergroup. A character of a locally compact group G is a continuous homomorphismχ : G→ T. The set of characters

G∗ :=

χ : χ character of G

is an Abelian group with respect to pointwise multiplication and neutral element 1;it is called the character group (or dual group).


For many locally compact Abelian groups the dual group can be described ex-plicitly. For example, the character group of R is given by

R∗ =

t 7→ e2πiα t : α ∈ R' R,

and the character group of T' [0,1) by

T∗ = z 7→ zn : n ∈ Z '

t 7→ e2πint : n ∈ Z' Z

(see Examples 14.28 and Exercise 7). In these examples, the character group seemsto be quite “rich”, and an important result says that this is actually a general fact.

Theorem 14.15. Let G be a locally compact Abelian group. Then its dual group G∗

separates the points of G, i.e., for every 1 6= a ∈ G there is a character χ ∈ G∗ suchthat χ(a) 6= 1.

We shall not prove this theorem in its full generality, but shall treat (eventually)the two cases most interesting to us, namely that G is discrete and that G is compact.For discrete groups, the proof reduces to pure algebra, see Proposition 14.24 at theend of this section. The other case is treated in Chapter 16 where we develop therepresentation theory of compact groups to some extent. There we shall prove themore general fact that the finite-dimensional unitary representations of a compactgroup separate the points (Theorem 16.39). Theorem 14.15 for compact Abeliangroups is then a straightforward consequence, see Theorem 16.43.

We proceed with exploring the consequences of Theorem 14.15. The proof of thefirst one is Exercise 5.

Corollary 14.16. Let G be a locally compact Abelian group and let H be a closedsubgroup of G. If g 6∈H, then there is a character χ ∈G∗ with χ(g) 6= 1 and χ|H = 1.

We now turn to compact groups. For such groups one has G∗ ⊆ C(G) ⊆ L2(G).The following is a preliminary result.

Proposition 14.17 (Orthogonality). Let G be a compact Abelian group. Then G∗

is an orthonormal set in L2(G).

Proof. Let χ1,χ2 ∈ G∗ be two characters and consider χ := χ1χ2. Then

α := (χ1 |χ2 ) =∫

Gχ1χ2 dm =

∫G

χ(g)dm(g) =∫

Gχ(gh)dm(g) = χ(h)α

for every h ∈ G. Hence either χ = 1 (in which case χ1 = χ2) or α = 0.

By the orthogonality of the characters we obtain the following.

Proposition 14.18. Let G be a compact Abelian group and let X ⊆ G∗ be a subsetseparating the points of G. Then the subgroup 〈X〉 generated by X is equal to G∗.Moreover, lin〈X〉 is dense in the Banach space C(G).

14.3 The Character Group 221

Proof. Consider A = lin〈X〉 which is a conjugation invariant subalgebra(!) of C(G)separating the points of G. So by the Stone–Weierstraß Theorem 4.4 it is dense inC(G). If there is χ ∈ G∗ \ 〈X〉, take f ∈ A with ‖χ− f‖∞ < 1. Then by Proposition14.17 χ is orthogonal to A and we obtain the following contradiction

1 > ‖ f −χ‖2L2 = ‖ f‖2

L2 − ( f |χ )− (χ | f )+‖χ‖2L2 = 1+‖ f‖2

L2 ≥ 1.

For a compact Abelian group, any linear combinations of characters is called atrigonometric polynomial.

Corollary 14.19. For a compact Abelian group G the following statements hold.

a) The set linG∗ of trigonometric polynomials is dense in C(G).

b) The dual group G∗ forms an orthonormal basis of L2(G).

Proof. a) follows from Proposition 14.18 with X =G∗, since G∗ separates the pointsof G by Theorem 14.15. A fortiori, linG∗ is dense in L2(G), and hence b) followsfrom Proposition 14.17.

Topology on the Character Group

Let G be a locally compact Abelian group. In this section we describe a topologyturning G∗ into a locally compact group as well.

Consider the product space TG of all functions from G to T and endow it withthe product topology (Appendix A.5). Then by Tychonov’s Theorem A.5, this is acompact space. It is an easy exercise to show that it is actually a compact group withrespect to the pointwise operations. The subspace topology on G∗ ⊆ TG, called thepointwise topology, makes G∗ a topological group. However, only in exceptionalcases this topology is (locally) compact.

It turns out that a better choice is the topology of uniform convergence on compactsets, called the compact-open topology. This also turns G∗ into a topological group,and unless otherwise specified, we always take this topology on G∗.

Examples 14.20. 1) If the group G is compact, the compact-open topology is thesame as the topology inherited from the norm topology on C(G).

2) If the group G is discrete, then the compact-open topology is just the pointwisetopology.

One has the following nice duality.

Proposition 14.21. Let G be a locally compact Abelian group. If G is compact, thenG∗ is discrete; and if G is discrete, then G∗ is compact.

Proof. Suppose that G is compact. Take χ ∈ G∗, then χ(G) ⊆ T is a compact sub-group of T. Therefore, if χ satisfies


‖1−χ‖∞ = supg∈G|1−χ(g)|< dist(1,e2πi/3) =

√3

then we must have χ(G) = 1, and hence χ = 1. Consequently, 1 is an openneighbourhood of 1, and G∗ is discrete.Suppose that G is discrete. Then by Example 14.20.2 the dual group G∗ carries thepointwise topology. But it is clear that a pointwise limit of homomorphims is againa homomorphism. This means that G∗ is closed in the compact product space TG,hence compact.

It is actually true that G∗, endowed with the compact-open topology, is a lo-cally compact Abelian group whenever G is a locally compact Abelian group, see[Hewitt and Ross (1979), §23].

The next is an auxiliary result, whose proof is left as Exercise 9. Note that analgebraic isomorphism Φ : G→H between topological groups G,H is a topologicalisomorphism if Φ and Φ−1 are continuous.

Proposition 14.22. Let G and H be locally compact Abelian groups. Then the prod-uct group G×H with the product topology is a locally compact topological group.For any χ ∈ G∗ and ψ ∈ H∗ we have χ ⊗ψ ∈ (G×H)∗, where χ ⊗ψ(x,y) :=χ(x)ψ(y). Moreover, the mapping

Ψ : G∗×H∗→ (G×H)∗, (χ,ψ) 7→ χ⊗ψ

is a topological isomorphism.

Supplement: Characters of Discrete Abelian Groups

In this supplement we shall examine the character group of a discrete Abelian group.This is, of course, pure algebra. Our first result states, in algebraic terms, that thegroup T is an injective Z-module.

Proposition 14.23. Let G be an Abelian group, H ⊆G a subgroup, and let ψ : H→T be a homomorphism. Then there is a homomorphism χ : G→ T extending ψ .

Proof. Consider the following collection of pairs

M :=(K,ϕ) : K ⊆ G subgroup, ϕ : K→ T homomorphism, ϕ|H = ψ

.

This set is partially ordered by the following relation: (K1,ϕ1) ≤ (K2,ϕ2) if andonly if K1 ⊆ K2 and ϕ2|K1 = ϕ1. By Zorn’s lemma, as the chain condition is clearlysatisfied, there is a maximal element (K,ϕ) ∈M . We prove that K = G.

If x∈G\K, then we construct an extension of ϕ to K1 := 〈K∪x〉 contradictingmaximality. We have K1 = xnh : n ∈ Z, h ∈ K. If no power xn for n ≥ 2 belongsto K, then the mapping

14.4 The Pontryagin Duality Theorem 223

ϕ1 : K1→ T, ϕ1(xih) := ϕ(h), (i ∈ Z, h ∈ K)

is a well-defined group homomorphism. If xn ∈ K for some n ≥ 2, then take thesmallest such n. Let α ∈ T be some nth root of ϕ(xn) and set ϕ1(xih) := α iϕ(h) fori∈Z, h∈K. This mapping is well-defined: If xih = x jh′ and i > j, then xi− jh = h′ ∈K and xi− j ∈K, so ϕ(h′) = ϕ(xi− jh) = ϕ(xi− j)ϕ(h). By assumption, n divides i− j,so ϕ(xi− jh) = α i− jϕ(h). Of course, ϕ1 : K1→ T is a homomorphism extending ϕ

(hence ψ), so (K1,ϕ1) ∈M with (K1,ϕ1)> (K,ϕ), a contradiction.

As a corollary we obtain the version of Theorem 14.15 for the discrete case.

Proposition 14.24. If G is a (discrete) Abelian group, then the characters separatethe points of G.

Proof. Let x ∈ G, x 6= 1. If there is n ≥ 2 with xn = 1, then take 1 6= α ∈ T an nth

root of unity. If the order of x is infinite, then take any 1 6= α ∈ T. For k ∈ Z setψ(xk) = αk. Then ψ is a nontrivial character of 〈x〉. By Proposition 14.23 we canextend it to a character χ on G that separates 1 and x.

14.4 The Pontryagin Duality Theorem

For a locally compact Abelian group G consider the compact space TG∗ which en-dowed with the pointwise multiplication is a compact group. Define the mapping

Φ : G→ TG∗ , Φ(g)χ = χ(g). (14.4)

It is easy to see that Φ is a continuous homomorphism, and by Theorem 14.15 it isinjective. By Exercise 10, Φ(g) is actually a character of the dual group G∗ for anyg ∈ G.

Lemma 14.25. The mapping

Φ : G→ G∗∗ := (G∗)∗

is continuous.

We prove this lemma for discrete or compact groups only. The general case requiressome more efforts, see [Hewitt and Ross (1979), §23].

Proof. If G is discrete, then Φ is trivially a continuous mapping. If G is compact,then G∗ is discrete. So the compact sets in G∗ are just the finite ones, and the topol-ogy on G∗∗ is the topology of pointwise convergence. So Φ is continuous.

The following fundamental theorem of Pontryagin asserts that G is topolog-ically isomorphic to its bi-dual group G∗∗ under the mapping Φ . Again, weprove this theorem only for discrete or compact groups; for the general case see[Hewitt and Ross (1979), Theorem 24.8] or [Folland (1995), (4.31)].


Theorem 14.26 (Pontryagin Duality Theorem). For a locally compact Abeliangroup G the mapping Φ : G→ G∗∗ defined in (14.4) is a topological isomorphism.

Proof. It remains to prove that Φ is surjective and its inverse in continuous.First, suppose that G is discrete. Then, by Proposition 14.21, G∗ is compact and

the subgroup ranΦ ⊆ G∗∗ clearly separates the points of G∗. By Proposition 14.18we have ranΦ = G∗∗. Since both G and G∗∗ are discrete, the mapping Φ is a home-omorphism.

Let now G be compact. Then G∗ is discrete and G∗∗ is compact by Proposition14.21. Since Φ is continuous and injective, ranΦ is a compact, hence closed sub-group of G∗∗. If ranΦ 6= G∗∗, then by Corollary 14.16 there is a character γ 6= 1of G∗∗ with γ|ranΦ = 1. Now, since G∗ is discrete, by the first part of the proof wesee that there is χ ∈ G∗ with γ(ϕ) = ϕ(χ) for all ϕ ∈ G∗∗. In particular, for g ∈ Gand ϕ = Φ(g) we have χ(g) = Φ(g)χ = γ(Φ(g)) = 1, a contradiction. Since G iscompact and Φ continuous onto G∗∗, it is actually a homeomorphism (see AppendixA.7).

An important message of Pontryagin’s Theorem is that the dual group determinesthe group itself. This is stated as follows.

Corollary 14.27. Two locally compact Abelian groups are topologically isomorphicif and only if their duals are topologically isomorphic.

Examples 14.28. 1) For n ∈ Z define

χn : T→ T, χn(z) := zn (z ∈ T).

Clearly, each χn is a charachter of T, and (n 7→ χn) is a (continuous) homo-morphism of Z into T∗. Since χ1 separates the points of T, Proposition 14.18tells that this homomorphism is surjective, i.e.,

T∗ =

χn : n ∈ Z' Z.

2) By inductive application of Proposition 14.22 and by part 1) we obtain

(Td)∗ =(z 7→ zk1

1 zk22 · · ·z

kdd ) : ki ∈ Z, i = 1, . . . ,d

' Zd .

3) The dual group Z∗ is topologically isomorphic to T. This follows from part 1)and Pontryagin’s Theorem 14.26.

4) The mapping Ψ : R→R∗, Ψ(α) = (t 7→ e2πiαt) is a topological isomorphism.We leave the proof as Exercise 7.

Let us return to the mapping Φ from G into TG∗ defined in (14.4). Since this latterspace is compact, the closure bG of ranΦ in TG∗ is compact. Then bG is a compactgroup (see Exercise 2.b) and G is densely and continuously embedded in bG by Φ ;bG is called the Bohr-compactification of G.

14.5 Application: Ergodic Rotations and Kronecker’s Theorem 225

Proposition 14.29. Let G be a locally compact Abelian group and bG its Bohr-compactification. Then the following assertions hold.

a) If G is compact, thenbG = (G∗)∗ ' G.

b) The dual group of bG is topologically isomorphic to the dual group of G, butendowed with the discrete topology, i.e.,

(bG)∗ ' (G∗)d.

c) The Bohr-compactification bG is obtained by taking the dual of G, endowingit with the discrete topology and taking the dual again, i.e.,

bG' (G∗)∗d.

Proof. a) Since Φ is continuous, Φ(G) is compact, and hence closed in TG∗ . SobG = Φ(G) = (G∗)∗.b) For χ ∈G∗ the projection πχ : TG∗ → T, πχ(γ) = γ(χ), is a character of TG∗ andhence of bG, by restriction. We claim that the mapping

(G∗)d→ (bG)∗, χ 7→ πχ

is a (topological) isomorphism. Since both spaces carry the discrete topology, itsuffices to show that it is an algebraic isomorphism. Now, if χ,χ ′ ∈ G∗ and g ∈ G,then

(πχ πχ ′)(Φ(g)) = πχ(Φ(g)) ·πχ ′(Φ(g)) = Φ(g)χ ·Φ(g)χ ′ = χ(g) ·χ ′(g)= (χχ

′)(g) = Φ(g)(χχ′) = πχχ ′(Φ(g)).

Hence, πχ πχ ′ = πχχ ′ on Φ(G), and therefore on bG, by continuity. If πχ = 1, then

1 = πχ(Φ(g)) = · · ·= χ(g) for all g ∈ G,

whence χ = 1. Finally, note that X := πχ : χ ∈ G∗ is a subgroup of (bG)∗ thatseparates the points (trivially). Hence, by Proposition 14.18 it must be all of (bG)∗.c) follows from b) and Pontryagin’s Theorem 14.26.

14.5 Application: Ergodic Rotations and Kronecker’s Theorem

We now use the duality theory of compact/discrete groups in order to characterisedynamical properties of group rotations.

Theorem 14.30. Let G be a compact Abelian group with Haar measure m, let a∈Gand consider the rotation MDS (G,m;a) with Koopman operator T on L2(G). Thenthe peripheral point spectrum of T is


σp(T ) = χ(a) : χ ∈ G∗.

Moreover the following assertions are equivalent.

(i) The rotation system (G,m;a) is ergodic.

(ii) χ(a) = 1 implies χ = 1 for each χ ∈ G∗, i.e., a separates G∗.

(iii) The rotation system (G;a) is topologically transitive/minimal.

Proof. If χ ∈ G∗, then T χ = χ(a)χ , whence χ(a) ∈ σp(T ). Conversely, supposethat λ ∈ σp(T ) and f ∈ L2(G) is such that T f = λ f . By Corollary 14.19 we canwrite f = ∑χ∈G∗ ( f |χ )χ , apply T and obtain

∑χ∈G∗

λ ( f |χ )χ = λ f = T f = ∑χ∈G∗

χ(a)( f |χ )χ.

Therefore, (λ −χ(a))( f |χ ) = 0 for all χ ∈G∗. If f 6= 0 then at least one ( f |χ ) 6=0, whence χ(a) = λ .Suppose that (ii) holds. Then if T f = f it follows from the previous argument withλ = 1 that f ⊥ χ for each χ 6= 1. Hence dimfix(T ) = 1, i.e., (i). On the other handif (i) holds and χ(a) = 1, then χ ∈ fix(T ), whence χ = 1. This proves (ii).By Theorem 10.13 (i) is equivalent to (iii).

We now are in the position to give the hitherto postponed proof of Kronecker’stheorem (Theorem 2.35) Actually we present two proofs: One exploiting the factthat the characters form an orthonormal basis and another making use of the Bohr-compactification.

Theorem 14.31 (Kronecker). For a = (a1, . . . ,ad)∈Td the rotation system (Td ;a)is topologically transitive (=minimal) if and only if a1,a2, . . . ,ad are linearly inde-pendent in the Z-module T, i.e.,

k1,k2, . . . ,kd ∈ Z, ak11 ak2

2 · · ·akdn = 1 ⇒ k1 = k2 = · · ·= kd = 0.

First Proof. By Example 14.28.2) each character of Td has the form z 7→ zk1 · · ·zkd .Hence the Z-independence of a1, . . . ,ad is precisely assertion (ii) in Theorem 14.30for G = Td .

Second Proof. If ak11 ak2

2 · · ·akdn = 1 for 0 6= (k1,k2, . . . ,kd) ∈ Zd , then the set

A :=

x ∈ Td : xk11 xk2

2 · · ·xkdd = 1

is nonempty, closed, and invariant under the rotation by a. If ki 6= 0, there are exactlyki points in A of the form (1, . . . ,1,xi,1, . . . ,1), whence A 6= Td . Hence, the TDS(Td ;a) is not minimal.For the converse implication suppose that a1,a2, . . . ,ad are linearly independent inthe Z-module T, and recall from Example 14.28 that Z∗ and T are topologicallyisomorphic, so we may identify them.

Exercises 227

Let b := (b1, . . . ,bd) ∈ Td , and consider G = 〈a1,a2, . . . ,ad〉. Since a1, . . . ,anis linearly independent in the Z-module (= Abelian group) T, there exists a Z-linearmapping (= group homomorphism)

ψ : G→ T with ψ(ai) = bi for i = 1, . . . ,D.

By Proposition 14.23 below, there is a group homomorphism χ : T→ T extendingψ . Obviously χ ∈ (Td)

∗, which by Proposition 14.29 is topologically isomorphic tobZ. Now take ε > 0. By the definition of bZ and its topology, we find k ∈ Z suchthat

|bi−aki |=

∣∣χ(ai)−Φ(k)(ai)|< ε

for i = 1, . . . ,n (recall that Φ(k)(z) = zk for z ∈ T ' Z∗). Therefore orb(1) =(ak

1,ak2, . . . ,a

kd) : k ∈ Z is dense in Td .

Exercises

1. Show that in a compact left-topological semigroup S an element s ∈ S is idempo-tent if and only if it generates a minimal closed subsemigroup.

2. a) Let S be a semitopological semigroup and let H ⊆ S be a subsemigroup.Show that if H is a subsemigroup (ideal) then H is a subsemigroup (ideal),too. Show that if H is Abelian then also H is.

b) Show that if G is a topological group and H is a (normal) subgroup of G thenH is a (normal) subgroup as well.

3. Consider S := R∪∞, the one-point compactification of R, and define thereonthe addition t +∞ := ∞+ t := ∞, ∞+∞ := ∞ for t ∈ R. Show that S is a compactsemitopological but not a topological semigroup. Determine the minimal ideal.

4. Prove assertions b), c) in Lemma 14.10. Prove also the following generalisationof Proposition 14.11: If G and H be topological groups, G is compact and f : G→His continuous, then f is even uniformly continuous in the sense that for any V openneighbourhood of 1H there is an open neighbourhood U of 1G such that gh−1 ∈Uimplies f (g) f (h)−1 ∈V .

5. Prove Corollary 14.16 for

a) G a compact or discrete Abelian group,∗b) G a general locally compact Abelian group.

(Hint: Prove that the factor group G/H is a locally compact (in particular Hausdorff)Abelian group and apply Theorem 14.15 to this group.)

6. Let G be a topological group and H ⊆ G an open topological subgroup. Showthat H is closed. Show that the connected component G0 of 1∈G is a closed normalsubgroup.


7. Show that the dual group of R is as claimed in Section 14.3.

8. Determine the dual of the group A2 of dyadic integers. (Hint: Consider the TDSfrom Exercise 3.8. Show that ϕ2n

(0)→ 0, n→ ∞ (0 denotes the neutral element inA2). For χ ∈ A∗2, set a = χ(1) and prove that a2n → 1 as n→ ∞.)

9. Prove Proposition 14.22. (Hint: To see surjectivity of Ψ , for given ϕ(G×H)∗

define the characters χ(g) = ϕ(g,1H) and ψ(h) = ϕ(1G,h) for g ∈ G and h ∈ H.)

10. For Φ : G→ TG∗ defined in (14.4) prove the following assertions.

a) Φ maps into G∗∗.

b) For general locally compact Abelian groups Φ : G→ G∗∗ is continuous.

Chapter 15Topological Dynamics Revisited

15.1 The Cech–Stone Compactification

15.2 The Space of Ultrafilters

15.3 Multiple Recurrence

15.4 Applications to Ramsey Theory

Exercises

229

Chapter 16The Jacobs–de Leeuw–GlicksbergDecomposition

In what follows we shall apply the results of Chapter 14 to compact semigroupsof bounded linear operators on a Banach space E. We recall from Appendix C.8the following definitions: The strong operator topology on L (E) is the coarsesttopology that renders all evaluation mappings

L (E)→ E, T 7→ T x (x ∈ E)

continuous. The weak operator topology is the coarsest topology that renders allmappings

L (E)→ C, T 7→⟨T x,x′

⟩(x ∈ E, x′ ∈ E ′)

continuous, where E ′ is the dual space of E. Note that if E = H is a Hilbert space,in the latter definition we could have written

L (H)→ C, T 7→ (T x |y) (x,y ∈ H),

these two properties being equivalent by virtue of the Riesz–Frechet Theorem C.20.We write L s(E) and L w(E) to denote the space L (E) endowed with the strong

and the weak operator topology, respectively. To simplify terminology, we shallspeak of weakly/strongly closed, open, relatively compact, compact etc. sets whenwe intend the weak/strong operator topology.2 In order to render the notation morefeasible, we shall write clwT , clsT for the closure of a set T ⊆L (E) with respectto the weak/strong operator topology. For a subset A ⊆ E we shall write clσ A todenote its closure in the σ(E,E ′)-topology.

Note that a subset T ⊆L (E) is a semigroup (with respect to operator multipli-cation) if it is closed under this operation. The operator multiplication is separatelycontinuous with respect to both the strong and the weak operator topologies (cf. Ap-pendix C.8). Hence—in the terminology of Chapter 14—both L s(E) and L w(E)are semitopological semigroups with respect to the operator multiplication. The set

2 This should not be confused with the corresponding notion for the weak topology of L (E) as aBanach space, a topology which will not appear in this book.

231

232 16 The Jacobs–de Leeuw–Glicksberg Decomposition

of contractionsB1 = B1(E) :=

T ∈L (E) : ‖T‖ ≤ 1

is a closed subsemigroup (in both topologies); and this subsemigroup has jointlycontinuous multiplication in the strong operator topology. In the following we shallconcentrate mainly on L w(E).

16.1 Weakly Compact Operator Semigroups

Our main tool in identifying (relatively) weakly compact subsets of operators is thefollowing result.

Theorem 16.1. Let E be a Hilbert space, or, more general, a reflexive Banach space.Then for each M ≥ 0 the closed norm-ball

BM(E) :=

T ∈L (E) : ‖T‖ ≤M

is compact in the weak operator topology.

Proof. We shall give the proof in the Hilbert space case and leave the general caseto Exercise 1. Without loss of generality we may take M = 1. Let BH := x ∈ H :‖x‖ ≤ 1 be the closed unit ball of H. Consider the injective mapping

Φ : B1→ ∏x,y∈BH

C, Φ(T ) :=((T x |y)

)x,y∈BH

.

We endow the target space with the product topology (Appendix A.5), and B1 withthe weak operator topology, so Φ is a homeomorphism onto its image Φ(B1). Weclaim that this image is actually closed. Indeed, suppose (Tα)α to be a net in B1such that

c(x,y) := limα

(Tα x |y) exists for all x,y ∈ BH .

Then this limit actually exists for all x,y ∈ H, the mapping c : H ×H → C issesquilinear and |c(x,y)| ≤ M ‖x‖‖y‖ for all x,y ∈ H. By Corollary C.22 there isT ∈L (H) such that c(x,y) = (T x |y) for all x,y ∈H. It follows that ‖T‖ ≤M, andhence T ∈B1 and Φ(T ) = limα Φ(Tα).

Finally, note that Φ(B1) ⊆∏x,y∈BHz ∈ C : |z| ≤ 1, which is compact by Ty-chonoff’s theorem A.5. Since Φ(B1) is closed, it is compact as well, and since Φ

is a homeomorphism onto its image, B1 is compact as claimed.

If we restrict to the case M = 1 in the previous result, we obtain the following.

Corollary 16.2. Let E be a Hilbert space or, more general, a reflexive Banach space.Then the set of contractions B1 := T ∈L (E) : ‖T‖ ≤ 1 is a compact semitopo-logical semigroup in the weak operator topology, with unit element I.

16.1 Weakly Compact Operator Semigroups 233

Let H be a Hilbert space. Recall from Corollary 13.29 that on the set

Iso(H) =

T ∈L (H) : T is an isometry⊆B1(H)

the weak and the strong operator topologies coincide. Since the operator multiplica-tion is jointly continuous on B1(H) for the strong operator topology, we obtain thatIso(H) is a topological semigroup and the subgroup of unitary operators

U(H) :=

T ∈L (H) : T is unitary

is a topological group (with identity I) with respect to the weak (=strong) opera-tor topology. Hence we obtain almost for free the following special case of Ellis’Theorem 14.6.

Theorem 16.3. Let S ⊆B1(H) be a weakly compact subsemigroup of contractionson a Hilbert space H. If S is algebraically a group, then it is a compact topologicalgroup, and the strong and weak operator topologies coincide on S .

Proof. Let P be the unit element of S . Then P2 = P and ‖P‖ ≤ 1, hence P is anorthogonal projection onto the Hilbert space K := ran(P). Since S = PS for eachS ∈S , the space K is invariant under S and hence

Φ : S → U(K), S 7→ S|K

is a well defined homomorphism of groups. Since S = SP , this homomorphism isinjective. Clearly Φ is also continuous, and since S is compact, Φ is a homeomor-phism onto its image Φ(S ). But on Φ(S ) as a subgroup of U(K) the strong andweak operator topologies coincide, and since S = SP for each S ∈S , this holds onS as well. In particular, the multiplication is jointly continuous. Since the inversionmapping is continuous on Φ(S ), this must be so on S as well, and the proof iscomplete.

Let now X = (X ,Σ ,µ) be a probability space. Then the semitopological semi-group of contractions on L1(X) is, in general, not weakly compact. However, if werestrict to those operators which are also L2-contractive, we can use the Hilbert spaceresults from above.

Lemma 16.4. Let X be a probability space, and let E := L1(X). Then the semigroup

S :=

T ∈L (E) : ‖T‖ ≤ 1 and ‖T f‖2 ≤ ‖ f‖2 ∀ f ∈ L2(X)

of simultaneous L1- and L2-contractions is weakly compact. Moreover, on S theweak (strong) operator topologies of L (L1) and L (L2) coincide.

Proof. We can consider S as a subset of B1(L2). As such, it is weakly closed sincethe L1-contractivity of an L2-contraction T is expressible in weak terms by∣∣∣∫

X(T f ) ·g

∣∣∣≤ ‖ f‖1 ‖g‖∞


for all f ,g ∈ L∞. By Corollary 16.2 S is weakly compact. The weak (strong) oper-ator topologies on L (L1) and L (L2) coincide on S , since S is norm-bounded inboth spaces and L2 is dense in L1. Hence the claim is proved.

Trivially, if one intersects a weakly compact semigroup S ⊆ L (E) with anyweakly closed subsemigroup of L (E), then the result is again a weakly compactsemigroup. With the help of Exercises 2 and 3 and the interpolation Theorem 8.23,the following result is immediate.

Corollary 16.5. Let X be a finite measure space. Then the following sets are weaklycompact subsemigroups of L (L1).

1) The set of all Dunford–Schwartz operators.

2) The set of all positive Dunford–Schwartz operators.

3) The set of Markov operators

M(L1) :=

T ∈L (L1) : T ≥ 0, T 1 = 1, T ′1 = 1.

Moreover, on each of these sets the weak operator topologies of L (L1) and L (L2)coincide. In particular, the semigroup

Emb(X) =

S ∈M(X) : S Markov embedding

of Markov embeddings (see Definition 12.10) is a topological semigroup with re-spect to the weak (=strong) operator topology.

By combining Corollary 16.5 with Theorem 16.3 above, we obtain the followingimportant corollary.

Corollary 16.6. Let S ⊆M(L1(X)) be a weakly compact subsemigroup of Markovoperators on some probability space X. If S is algebraically a group, then it isa compact topological group. Moreover, the weak and strong operator topologiescoincide on S .

By actually looking at the proof of Theorem 16.3, we can say much more. Inthe situation of Corollary 16.6 let Q be the unit element of S . Then Q is a Markovprojection, and we can choose freely a model for F := ran(Q), i.e., a probabilityspace Y and a Markov embedding J ∈ M(X;Y) such that ran(J) = F (Theorem13.18). Then we can consider the mapping

Φ : S →M(Y), S 7→ J−1(S|F)J

which is a topological isomorphism onto its image Φ(S ). The latter is a compacttopological subgroup of Aut (Y), the group of Markov automorphisms on Y.

Finally, let us state a rather trivial case of weakly compact operator (semi)groups.

Proposition 16.7. Let G be a compact group. Then the sets of left and right rotations

16.1 Weakly Compact Operator Semigroups 235

L =

La : a ∈ G

and R =

Ra : a ∈ G

are strongly compact subgroups of B1(C(K)), topologically isomorphic to G.

Proof. The map a 7→ Ra, G→ B1(C(K)) is a group homomorphism, continuousfor the strong operator topology by Corollary 14.12. Since G is compact, R is atopological isomorphism onto its image R. The case of the left rotations is similar,by looking at the group homomorphism a 7→ La−1 .

More on Compact Operator Semigroups

Let E be a Banach space. Then the following result is useful to recognise relativelyweakly compact subsets of L (E).

Lemma 16.8. Let E be a Banach space, let T ⊆L (E), and let D⊆ E be a subsetsuch that linD is dense in E. Then the following assertions are equivalent.

(i) T is relatively weakly compact, i.e, relatively compact in the weak operatortopology.

(ii) T x = T x : T ∈T is relatively weakly compact in E for each x ∈ E.

(iii) T is norm-bounded and T x = T x : T ∈ T is relatively weakly compactin E for each x ∈ D.

Proof. (i)⇒ (ii): For fixed x ∈ E the mapping L σ (E)→ (E,σ(E,E ′)), T 7→ T x,is continuous and hence carries relatively compact subsets of L σ (E) to relativelyweakly compact subsets of E.

(ii)⇒ (iii): The first statement follows from the Principle of Uniform Boundedness,the second is a trivial consequence of (ii).

(iii)⇒ (i): Let (Tα)α ⊆ T be some net. We have to show that it has a subnet, con-verging weakly to some bounded operator T ∈L (E). Consider the product space

K := ∏x∈D

T xσ,

which is compact by Tychonov’s Theorem A.5. Then

α 7→ (Tα x)x∈D

is a net in K, hence there is a subnet (Tβ )β , say, such that T x := Tβ x exists inthe weak topology for every x ∈ D. By linearity, we may suppose without loss ofgenerality that D is a linear subspace of E, whence T : D→ E is linear. From thenorm-boundedness of T it now follows that ‖T x‖ ≤ M ‖x‖ for all x ∈ D, whereM := supS∈T ‖S‖. Since D is norm-dense in E, T extends uniquely to a boundedlinear operator on E, i.e., T ∈L (E).


It remains to show that Tβ → T in L σ (E). Take y∈ E,x∈D and x′ ∈ E ′. Then fromthe estimate∣∣⟨Tβ y−Tβ y,x′

⟩∣∣≤ ∣∣⟨Tβ (y− x),x′⟩∣∣+ ∣∣⟨Tβ x−T x,x′

⟩∣∣+ ∣∣⟨T(x− y),x′⟩∣∣

≤ 2M∥∥x′∥∥ ·∥∥y− x

∥∥+ ∣∣⟨Tβ x−T x,x′⟩∣∣

we obtainlimsup

β

∣∣⟨Tβ y−Tβ y,x′⟩∣∣≤ 2M

∥∥x′∥∥ ·∥∥y− x

∥∥.Since for fixed y the latter can be made arbitrarily small, the proof is complete.

It is clear that Theorem 16.1 about reflexive spaces follows from Lemma 16.8, asthe closed unit ball of a reflexive space is weakly compact.

Remarks 16.9. 1) For historical reasons, a relatively weakly compact semi-group T ⊆ L (E) is sometimes called weakly almost periodic (e.g., in[Krengel (1985)]). Analogously, relatively strongly compact semigroups aretermed (strongly) almost periodic.

2) If T ⊆L (E) is relatively strongly compact, then it is relatively weakly com-pact, and both topologies coincide on the (strong=weak) closure of T . InExercise 4 it is asked to establish the analogue of Lemma 16.8 for relativestrong compactness.

3) A semigroup T ⊆L (E) is relatively weakly compact if and only if conv(T )is relatively weakly compact. (This follows from Lemma 16.8 together withthe Kreın–Smulian Theorem C.11.) The same is true for the strong operatortopology.

4) Let T ∈L (E) be a single operator. The semigroup generated by T is

TT :=

T n : n ∈ N0=

I,T,T 2, . . .

Suppose that TT is relatively weakly compact. Equivalently, by Lemma 16.8,

clσ

T n f : n ∈ N0

is weakly compact for each f ∈ E.

Then T is certainly power-bounded. By 3), i.e., by the Kreın–Smulian Theo-rem C.11,

conv

T n f : n ∈ N0

is weakly compact for each f ∈ E.

In particular, for each f ∈ E the sequence (An[T ] f )n∈N has a weak clusterpoint, whence by Proposition 8.18 it actually converges in norm. To sum up,we have shown that if T generates a relatively weakly compact semigroup,then T is mean ergodic.The converse statement fails to hold. In fact, Example 8.27 exhibits a multi-plication operator T on c, the space of convergent sequences, such that T is

16.2 The Jacobs–de Leeuw–Glicksberg Decomposition 237

mean ergodic but T 2 is not. However, T 2 generates a relatively weakly com-pact semigroup as soon as T does so.

We close this section with a classical result. If S is a semigroup and a ∈ S, thenas in the group case we denote by La,Ra the left and the right rotation by a definedas (La f )(x) = f (ax) and (Ra f )(x) = f (xa) for f : S→ C and x ∈ S.

Theorem 16.10. Let S be a compact semitopological semigroup. Then the semi-group of left rotations L = La : a∈ S is a weakly compact semigroup of operatorson C(S).

Proof. By Lemma 16.8 we only have to check whether for fixed f ∈ C(S) the or-bit M := La f : a ∈ S is weakly compact in C(S). By Grothendieck’s TheoremE.5, this is equivalent to M being compact in Cp(S), the space C(S) endowed withthe pointwise topology. But by separate continuity of the multiplication of S, themapping

S→ Cp(S), a 7→ La f

is continuous, and since S is compact, its image is compact.

16.2 The Jacobs–de Leeuw–Glicksberg Decomposition

Let E be a Banach space and let T ⊆L (E) be a semigroup of operators on E. Then

S := clw(T ),

the weak operator closure of S , is a semitopological semigroup. We call the semi-group T JdLG-admissible if S is weakly compact and contains a unique minimalidempotent, i.e., satisfies the equivalent conditions of Theorem 14.5. (Note that byTheorem 14.3 every weakly compact operator semigroup contains at least one min-imal idempotent.)

The following gives a list of important examples.

Theorem 16.11. Let E be a Banach space and T ⊆L w(E) a semigroup of opera-tors on E. Then T is JdLG-admissible in the following cases.

a) T is Abelian and relatively weakly compact.

b) E is a Hilbert space and T consists of contractions.

c) E = L1(X), X a probability space, and T ⊆ M(X), i.e., T consists ofMarkov operators.

Proof. The case a) was already mentioned in Theorem 14.5. (Note that by Exercise14.2 the closure S of T is Abelian, too.) The case c) can be reduced to b) viaCorollary 16.5. To prove b) suppose that P and Q are minimal idempotents in S =clw(T ). Then P and Q are contractive projections, hence orthogonal projections


(Theorem 13.31). Since S Q is a minimal left ideal (Lemma 14.1) and S PQ is aleft ideal contained in S Q, it follows that these left ideals are identical. Hence thereis an operator S ∈S such that SPQ = Q. Now let f ∈H be arbitrary. Since S and Pare contractions, it follows that

‖Q f‖= ‖SPQ f‖ ≤ ‖PQ f‖ ≤ ‖Q f‖ .

So ‖PQ f‖ = ‖Q f‖, and since P is an orthogonal projection, by Corollary 13.32we obtain PQ f = Q f . Since f ∈ H was arbitrary, we conclude that PQ = Q, i.e.,ran(Q) ⊆ ran(P). By symmetry ran(P) = ran(Q), and since orthogonal projectionsare uniquely determined by their range, it follows that P = Q.

With a little more effort one can show the following.

Theorem 16.12. If E and its dual space E ′ both are strictly convex, then every rel-atively weakly compact semigroup of contractions on E is JdLG-admissible.

Proof. (Sketch) Recall that a Banach space E is strictly convex if ‖ f‖ = ‖g‖ = 1and f 6= g implies that ‖ f +g‖ < 2. If P is a contractive projection on a strictlyconvex space E, then it has the property a) of Corollary 13.32, see Exercise 5. Hencein the proof above we conclude that PQ = Q. By taking right ideals in place of leftideals, we find an operator T ∈S such that PQT = P, and taking adjoints yieldsP′ = T ′Q′P′. If E ′ is also strictly convex, it follows as before that P′ = Q′P′, i.e.,P = PQ = Q.

The theorem is applicable to spaces Lp(X) with 1 < p < ∞, since these are strictlyconvex and Lp′ = Lq with p−1 + q−1 = 1. Note that these spaces are reflexive,whence contraction semigroups are automatically relatively weakly compact.

Corollary 16.13. Every semigroup of contractions on a space Lp(X), X some mea-sure space and 1 < p < ∞, is JdLG-admissible.

Remark 16.14. Theorem 16.12 is due to de Leeuw and Glicksberg[de Leeuw and Glicksberg (1961), Corollary 4.14]. There one can find even acharacterisation of JdLG-admissible semigroups in terms of the amenability oftheir weak closures. See [Day (1957b)] and [Paterson (1988)] for more about thisnotion.

JdLG-admissible Semigroups

Let again E be a Banach space, T ⊆L (E) a semigroup of operators and

S = clw(T )

its weak operator closure. In the following we suppose that T is JdLG-admissible,i.e., S is weakly compact and contains a unique minimal idempotent Q.

16.2 The Jacobs–de Leeuw–Glicksberg Decomposition 239

By Theorem 14.5, S has a unique nonempty minimal ideal

G := K(S ) = QS = S Q = QS Q (16.1)

which is a group with unit element Q. Since Q2 = Q, Q is a projection, and hencethe space E decomposes into a direct sum

E = Er(T )⊕Es(T ) = ranQ⊕kerQ,

called the Jacobs–de Leeuw–Glicksberg decomposition (JdLG-decomposition forshort) of E associated with T . This decomposition goes back to [Jacobs (1956)],[de Leeuw and Glicksberg (1959)] and [de Leeuw and Glicksberg (1961)]. Thespace Er := Er(S ) is called the reversible part of E and its elements the re-versible vectors; the space Es := Es(T ) is called the (almost weakly) stable partand its element the almost weakly stable vectors. This terminology will becomeclear shortly. Moreover, we shall see below that for T = TT = T n : n ∈ N0 con-sisting of the iterates of one single operator T ∈L (E), we have Es(T ) = Es(T ), asintroduced in Section 9.2.

Proposition 16.15. In the situation above, Er and Es are invariant under S .

Proof. For S ∈S we have QS = S Q, hence there are operators S1,S2 ∈S suchthat QS = S1Q and SQ = QS2. Therefore, for f = Q f ∈ Er = ran(Q) we have S f =SQ f = QS2 f ∈ Er, and for f ∈ Es = ker(Q) we have QS f = S1Q f = 0, whenceS f ∈ Es.

By this proposition we may restrict the operators from S to the invariant sub-spaces Er and Es and obtain

Tr :=

T |Er : T ∈T, Sr :=

S|Er : S ∈S

⊆L (Er)

Ts :=

T |Es : T ∈T, Ss :=

S|Es : S ∈S

⊆L (Es).and

By Exercise 6, the restriction maps

S 7→ S|Er and S 7→ S|Es

are continuous for the weak operator topologies. Since S is compact and S =clw(T ), it follows that

Sr = clw(Tr) and Ss = clw(Ts).

Let us look at an example.

Example 16.16 (Mean Ergodic Decomposition). Let T ∈L (E) and suppose thatthe semigroup

TT =

I,T,T 2,T 3 . . .

generated by T is relatively compact in L w(E). This implies in particular that T ispower-bounded. Moreover, by Remark 16.9.3 the semigroup


S := conv(TT ) = conv

I,T,T 2,T 3 . . .

is weakly compact. By Remark 16.9.4, the operator T is mean ergodic, and one hasthe mean ergodic decomposition

E = fix(T )⊕ ran(I−T )

with ergodic projection P = PT . Since P is the strong limit of the Cesaro meansAn[T ], it belongs to S . But since PT = T P = P, one has S P = P. Hence P isthe minimal ideal of S , and the JdLG-decomposition for S = conv(TT ) coincideswith the mean ergodic decomposition for T .

Example 16.16 shows that we do not obtain new information when applying theJdLG-decomposition to the semigroup conv(TT ). This is different when we applyit to the semigroup TT itself.

Example 16.17 (The Kronecker Factor). Let (X;ϕ) be an MDS with Koopmanoperator T = Tϕ on E := L1(X). Since the convex semigroup M(X) of Markovoperators is weakly compact (Theorem 13.2.a)), the semigroup TT = T n : n ∈N0generated by T is relatively weakly compact. Since this semigroup is Abelian, itsclosure

ST := clw(TT )

is Abelian, too, and hence TT is JdLG-admissible, cf. Theorem 16.11. We obtain acorresponding JdLG-decomposition

E = Er⊕Es = ran(Q)⊕ker(Q),

where Q ∈ S is the unique minimal idempotent. Since Q is a Markov operator,it is a Markov projection, and hence Er = ran(Q) is a factor of (X;ϕ), called theKronecker factor. Clearly fix(T )⊆ fix(S) for each S ∈S , so in particular

fix(T )⊆ fix(Q) = Er.

On the other hand, if Q f = 0 then 0 ∈ clσT n f : n ∈ N0 (see Proposition 16.18below). Hence 0 ∈ fix(T )∩convT n f : n ∈N0, and by Proposition 8.18 it followsthat f ∈ ran(I−T ). That is to say

Es ⊆ ran(I−T ).

In particular, we have∫

X f = 0 for each f ∈ Es.

16.3 The Almost Weakly Stable Part

We now investigate the two parts of the JdLG-decomposition separately. Let us firstdeal with the “(almost weakly) stable” part.

16.3 The Almost Weakly Stable Part 241

Proposition 16.18. Let E = Er⊕Es be the JdLG-decomposition of a Banach spaceE with respect to a JdLG-admissible semigroup T ⊆L (E) with weak closure S :=clw(T ). Then the following assertions hold.

a) Ts is a JdLG-admissible semigroup on Es, with weak operator closure Ss.

b) The minimal idempotent of Ss is the zero operator 0 ∈Ss, and its minimalideal is K(Ss) = 0.

c) The elements of Es are characterised by

u∈Es ⇐⇒ 0∈ clσTu : T ∈T ⇐⇒ Su= 0 for some S ∈S .

Proof. Let Q be the minimal idempotent of S . Then Q restricts to 0 on Es, andhence a) and b) are clear.

For c), note first that for each u∈ E the mapping T 7→ Tu is continuous from L w(E)to (E,σ(E,E ′)). Since S = clw(T ) is compact, one has clσ (T u) = (clwT )u =S u. This proves the second equivalence of c).

If u ∈ Es = kerQ, then it follows immediately that 0 = Qu ∈S u. Conversely, sup-pose that Su = 0 for some S ∈S . Since QS ∈ G = K(S ) and G is a group withunit element Q, there is R ∈ G such that RQS = Q. Hence Qu = RQSu = 0, whenceu ∈ Es.

A JdLG-admissible semigroup T on a Banach space E is called almost weaklystable if E = Es. Equivalently, T is almost weakly stable if 0 is in the weak closureof each orbit Tu : T ∈T , u ∈ E.

In the following we shall investigate more closely the special case of the semi-group

TT = I,T,T 2,T 3, . . .

generated by a single operator T ∈L (E). We suppose that this semigroup is rela-tively weakly compact. (Recall that this is automatic if T is power bounded and Eis reflexive, or if T is a Markov operator on some L1(X).) Then S := clw(T ) isAbelian, whence T is JdLG-admissible, and we can form the JdLG-decompositionE = Er⊕Es associated with TT . The following theorem states in particular that thealmost weakly stable subspace Es coincides with Es(T ), introduced in Section 9.2.

Recall from Chapter 9, page 132, that a sequence (xn)n in a topological spaceconverges in density to x, in notation D-limn→ ∞xn = x, if there is a subsequenceJ ⊆ N with density d(J) = 1 such that limn∈J xn = x.

Theorem 16.19. Let T generate a relatively weakly compact semigroup TT on aBanach space E, and let M ⊆ E ′ be norm-dense in the dual space E ′. Then for x∈ Ethe following assertions are equivalent.

(i) D-limn→∞

⟨T nx,x′

⟩= 0 for all x′ ∈M.

(ii) D-limn→∞

⟨T nx,x′

⟩= 0 for all x′ ∈ E ′.


(iii) limn→∞

1n∑

n−1j=0

∣∣⟨T jx,x′⟩∣∣= 0 for all x′ ∈M.

(iv) limn→∞

1n∑

n−1j=0

∣∣⟨T jx,x′⟩∣∣= 0 for all x′ ∈ E ′.

(v) supx′∈E ′,‖x′‖≤1

1n

n−1

∑k=0

∣∣∣⟨T kx,x′⟩∣∣∣ −→

n→∞0.

(vi) 0 ∈ clσT nx : n ∈ N, i.e., x ∈ Es(T ).

(vii) There exists a subsequence (n j) j∈N ⊆ N such that T n j x→ 0 weakly as j→∞.

(viii) D-limn T nx = 0 weakly.

Proof. We note that T is necessarily power-bounded. For such operators the pair-wise equivalence of (i)–(v) has been shown already in Theorem 9.12 and Proposition9.14. For the remaining part note first that the implications (viii)⇒ (vii)⇒ (vi) aretrivial.For the proof of the remaining assertions we need some preparatory remarks. DefineF := linT nx : n ≥ 0, which is then a closed separable subspace of E. Since F isalso weakly closed, the set K := clσT nx : n ∈N0 (closure in E) is contained in Fand hence a weakly compact subset of F . (By the Hahn–Banach theorem the weaktopology of F is induced by the weak topology on E.) Hence, in the following wesuppose that E is separable.

Then, by the Hahn–Banach theorem we can find a sequence (x′j) j∈N ⊆ E ′ with‖x′j‖ = 1 for all j ∈ N and such that x j : j ∈ N separates the points of E (seeExercise 8), and define

d(y,z) :=∞

∑j=1

2− j ∣∣⟨y− z,x′j⟩∣∣ (y,z ∈ E). (16.2)

Then d is a metric on E and continuous for the weak topology. Since K is weaklycompact, d is a metric for the weak topology on K.

We can now turn to the proof of the remaining implications.(v)⇒ (viii): It follows from (v) and the particular form (16.2) of the metric on K that

1n∑

nj=0d(T jx,0)→ 0 as n→ ∞.

The Koopman–von Neumann Lemma 9.13 yields D-limn d(T nx,0) = 0, which is(viii).

(vi)⇒ (vii): Note that

0 ∈ clσ

T nx : n ∈ N0=

x,T x, . . . ,T k−1x∪ clσ

T nx : n≥ k

for every k ∈ N. Since T kx = 0 implies T mx = 0 for all m≥ k, one has

16.3 The Almost Weakly Stable Part 243

0 ∈⋂k∈N

clσT nx : n≥ k,

that is, 0 is a weak cluster point of (T nx)n∈N0 . Since K is metrisable, (vii) follows.

(vii)⇒ (v): This is similar to the proof of Proposition 9.14. By passing to the equiv-alent norm ‖|y|‖ := supn∈N0

‖T ny‖ (Exercise 9.3) one may suppose that T is a con-traction. Hence the (weakly∗) compact set B′ := x′ ∈ E ′ : ‖x′‖ ≤ 1 is invariantunder T ′, and we obtain a TDS (B′;T ′) with associated Koopman operator S onC(B′). Let f (x′) := |〈x,x′〉|. Then by hypothesis, there is a subsequence (n j) j suchthat Sn j f → 0 pointwise on B′. By the dominated convergence theorem, Sn j f → 0weakly. Hence 0 ∈ fix(S)∩convSn f : n≥ 1 and by Proposition 8.18, implication(iv)⇒ (i), it follows that An[S] f → 0 in the norm of C(B′), and this is (v).

Weakly Mixing Systems Revisited

Let (X;ϕ) be an MDS with Koopman operator T = Tϕ on E = L1(X), and recallthat the semigroup TT generated by T is JdLG-admissible, cf. Example 16.17. Thecorresponding JdLG-decomposition is

E = Er⊕Es,

where Er is the Kronecker factor and Es = Es(T ) by Theorem 16.19. Therefore, wecan extend the list of characterisations of weakly mixing systems given in Chapter9. For convenience we include also the equivalences obtained there.

Theorem 16.20. Let (X;ϕ) by an MDS with associated Koopman operator T = Tϕ

on E := Lp(X), p ∈ [1,∞). Then the following assertions are equivalent.


(ii) For every f ∈ Lp(X), g ∈ Lq(X) there is a subsequence J ⊆ N of densityd(J) = 1 with

limn∈J

∫X(T n f ) ·g =

(∫X

f)·(∫

Xg).

(iii)1n

n−1

∑k=0

∣∣∣⟨T k f ,g⟩−〈 f ,1〉 · 〈1,g〉

∣∣∣ −→n→∞

0 for all f ∈ Lp, g ∈ Lq.

(iv) For each f ∈ E with∫

X f = 0 one has f ∈ Es(T ).

(v) fix(T ) = lin1 and ran(I−T )⊆ Es(T ).

(vi) E = lin1⊕Es(T ).

(vii) The kth iterate MDS (X;ϕk) is weakly mixing for some/each k ∈ N.

(viii) (X⊗Y;ϕ×ψ) is weakly mixing for some/each weakly mixing system (Y;ψ).

(ix) (X⊗X;ϕ×ϕ) is weakly mixing.

(x)∫

Xf ·1 ∈ clσT n f : n ∈ N0 for each f ∈ E.


(xi) For each f ∈ Lp(X) there is subsequence J ⊆ N such that

limn∈J

∫X(T n f ) ·g→

(∫X

f)·(∫

Xg)

for all g ∈ Lq(X).

(xii) For each f ∈ Lp(X) there is subsequence J ⊆N of density d(J) = 1 such that

limn∈J

∫X(T n f ) ·g→

(∫X

f)·(∫

Xg)

for all g ∈ Lq(X).

(xiii) The Kronecker factor of (X;ϕ) is trivial, i.e., equals lin1.

Proof. The assertions (i)–(ix) are equivalent by Theorem 9.16 and Corollaries 9.18and 9.21.Now note that with h := f −〈 f ,1〉 · 1, assertions (x)–(xii) can be rewritten equiva-lently as:

(x) 0 ∈ clσT nh : n≥ 0.(xi) limn∈J T nh = 0 weakly for some subsequence J ⊆ N.(xii) D-limn∈N T nh = 0 weakly.

By Theorem 16.19, each of these assertions is equivalent to (ii). Finally, (xiii) isequivalent to (vi) since Es(TT ) = Es(T ), by what was shown above.

We shall now investigate the reversible part in the JdLG-decomposition. In thecase of a Koopman operator, we shall obtain a characterisation of the Kroneckerfactor that ultimately leads to yet another (and finally: spectral) characterisation ofweakly mixing systems in Corollary 16.35 below.

16.4 Compact Group Actions on Banach Spaces

We now turn to the reversible part of a Jacobs–de Leeuw–Glicksberg decomposition.Again let us suppose that T is a JdLG-admissible semigroup of operators on aBanach space E.

Proposition 16.21. Let E = Er⊕Es be the JdLG-decomposition of a Banach spaceE with respect to a JdLG-admissible semigroup T ⊆L (E) with weak closure S :=clw(T ). Then the following assertions hold.

a) Tr is a JdLG-admissible semigroup on Er, with weak operator closure Sr.

b) The minimal idempotent of Sr is the identity operator I∈Sr, and its minimalideal is K(Ss) = Sr.

c) The reversible part Er is given by

16.4 Compact Group Actions on Banach Spaces 245

Er =

u ∈ E : v ∈ clσ (T u) ⇒ u ∈ clσ (T v)

=

u ∈ E : v ∈S u ⇒ u ∈S v.

Hence a vector u ∈ E is reversible if and only if whenever some vector v can bereached by the action of S starting from u, then one can return to u again by theaction of S .

Proof. Let Q be the unique minimal idempotent in S , so Er = ran(Q). If u ∈ Er,then S u =S Qu = G u. Hence if v ∈S u, then there is R ∈ G with v = Ru. Since Gis a group, there is S∈G with SR=Q. Hence u=Qu= SRu= Sv∈S v. Conversely,if u ∈ E is such that there is S ∈S with the property that SQu = u, then u = SQu =QSu ∈ ranQ.

The next result opens the door for the methods of harmonic analysis.

Theorem 16.22. The group G is a compact topological group, the strong and theweak operator topology coincide on G , and the restriction map

G →Sr, S 7→ S|Er

is a topological isomorphism of compact groups.

Proof. For the special case that E is a Hilbert space and T is a semigroup of con-tractions, this is Theorem 16.3. In the case that E = L1(X), X some probabilityspace, and T ⊆M(X), it is Corollary 16.6. In the general case, that G is a com-pact group follows from Ellis’ Theorem 14.6. The assertion about the topologicalisomorphism is then evident.

To prove that the strong and the weak operator topologies coincide on G we needsome more sophisticated tools, to be developed below.

Continuous Representations of Compact Groups

To complete the proof of Theorem 16.22 and to study the reversible part, it is con-venient to pass to a little more abstract framework. Let G be a compact group withHaar measure m and neutral element 1. A representation of G on a Banach spaceE is any mapping π : G→L (E) satisfying

1) π1 = I,

2) πxy = πxπy (x,y ∈ G).

The representation π is called weakly continuous, if the mapping

G→ C, x 7→⟨πxu,u′

⟩is continuous for all u ∈ E, u′ ∈ E ′. In other words, π : G→L w(E) is continuous.In contrast, π is called (strongly) continuous if π : G→L s(E) is continuous, i.e.,each mapping


G→ E, x 7→ πxu (u ∈ E)

is continuous for the norm topology on E.

Lemma 16.23. Let π : G→L (E) be a representation of a compact group E on aBanach space E. Then the following assertions are equivalent.

(i) π is strongly continuous.

(ii) The mappingG×E→ E, (x,u) 7→ πxu

is continuous.

(iii) π is weakly continuous and on π(G) = πx : x ∈ G ⊆L (E) the weak andstrong operator topologies coincide.

Proof. (i)⇔ (ii): Suppose that (i) holds. By the uniform boundedness principle M :=supx∈G ‖πx‖ < ∞. Let (xα)α ⊆ G and (uα)α ⊆ E are nets with xα → x ∈ G anduα → u ∈ E. Then

‖πxαuα −πxu‖ ≤M ‖uα −u‖+‖πxα

u−πxu‖→ 0.

The implication (ii)⇒ (i) is trivial.(i)⇔ (iii): Suppose that (i) holds. Then clearly π is weakly continuous. Also, K :=π(G) is compact in the strong operator topology. Since the weak operator topologyis still Hausdorff, the two topologies must coincide on K (Proposition A.4). Theconverse implication (iii)⇒ (i) is trivial.

In the following let π : G→L (E) be a weakly continuous representation of thecompact group G on a Banach space E. It follows from the uniform boundednessprinciple that supx∈G ‖πx‖< ∞. For f ∈ L1(G) we define the integral

π f :=∫

Gf (x)πx dx (16.3)

as an operator E→ E ′′ by⟨π f u,u′

⟩:=∫

Gf (x)

⟨πxu,u′

⟩dx (u ∈ E, u′ ∈ E ′).

Our first goal is to show that actually π f u ∈ E for each u ∈ E.

Lemma 16.24. In the setting described above, one has π f u ∈ E for each f ∈ L1(G)and u ∈ E.

Proof. We may suppose that f ≥ 0 and∫

G f = 1. By the Kreın–Smulian TheoremC.11 the set

K := conv

πxu : x ∈ G⊆ E

is weakly compact. Let u′′ ∈ E ′′ \E. Then, by the Hahn–Banach theorem, E ′ sepa-rates u′′ from K. This means that there is u′ ∈ E ′ and c ∈ R such that


Re⟨πxu,u′

⟩≤ c < Re

⟨u′′,u′

⟩(x ∈ G).

Multiplying with f (x) and integrating yields

Re⟨π f u,u′

⟩≤ c < Re

⟨u′′,u′

⟩,

which shows that π f u 6= u′′.

Remark 16.25. We refer to [Rudin (1991), Theorem 3.27] for more information onweak integration. In special cases the use of the Kreın–Smulian theorem in the proofof Lemma 16.24 can be avoided. For instance, if E = H is a Hilbert space, one canappeal to Corollary C.22 from elementary Hilbert space theory. The same holds ifE contains a densely embedded Hilbert space such that π restricts to a (weakly)continuous representation on H.

By Lemma 16.24, π f ∈L (E) for each f ∈ L1(G). A simple estimate yields∥∥π f∥∥≤ (sup

x∈G

∥∥πx∥∥) ·∥∥ f1

∥∥, (16.4)

showing that π : L1(G)→L (E) is a bounded linear mapping.

Lemma 16.26. In the situation above, we have E = linπ f u : f ∈ C(G), u ∈ E.

Proof. Let u′ ∈ E ′ such that for all u ∈ E and f ∈ C(G) one has⟨π f u,u′

⟩= 0. For

fixed u ∈ E this means that∫G

f (x)⟨πxu,u′

⟩dx = 0 for all f ∈ C(G),

whence 〈πxu,u′〉 = 0 m-almost everywhere. But the Haar measure is strictly posi-tive, whence 〈πxu,u′〉 = 0 for all x ∈ G and u ∈ E. Letting x = 1 yields u′ = 0, andthe claim follows by a standard application of the Hahn–Banach theorem.

Corollary 16.27. Every weakly continuous representation π : G→L (E) of a com-pact group G on a Banach space E is strongly continuous.

Proof. We define

F :=

u ∈ E : the map G→ E, x 7→ πxu is continuous.

Then F is a closed subspace of E by the uniform boundedness of the operators πx,x ∈ G. If f ∈ L1(G) we have⟨

πxπ f u,u′⟩=⟨π f u,π ′xu′

⟩=∫

Gf (y)

⟨πyu,π ′xu′

⟩dy =

∫G

f (y)⟨πxyu,u′

⟩dy

=∫

Gf (x−1y)

⟨πyu,u′

⟩dy.

This shows that πxπ f u = πτx f u, where


(τx f )(y) = f (x−1y) (x,y ∈ G, f ∈ L1(G)).

But the mappingG→ L1(G), x→ τx f

is continuous, and hence ran(π f )⊆ F . It follows that F = E by Lemma 16.26.

We are now in the position to conclude the proof we have left unfinished.Proof of Theorem 16.22. Let G be the minimal ideal of the weak operator closureS = clw(T ) of a JdLG-admissible semigroup T ⊆L (E). We know already thatG is a compact topological group, by Ellis’ theorem. Let Q be the unit element inG , i.e., the unique minimal idempotent in S , and let Er := ran(Q) be the reversiblepart. The restriction mapping

π : G →L (Er), πS := S|Er

is a weakly and hence, by Corollary 16.27, a strongly continuous representationof G . By Lemma 16.23 on π(G ) = Sr the strong and weak operator topologiescoincide. Since S = SQ for each S ∈ G , the strong and weak operator topologiescoincide also on G , and that remained to be proved.

To obtain further insight into the structure of Er we shall employ some factsfrom the theory of unitary representations of compact groups. (For convenience,these facts are presented in the supplement below.) A (finite-dimensional) unitaryrepresentation of a compact group G is any continuous group homomorphism

χ : G→ U(n)

where U(n) is the group of unitary n×n-matrices. We write χ = (χi j)i, j and callthe χi j ∈ C(G) the associated coordinate functions. A unitary representation isirreducible if the canonical action of χ(G) on Cn has no invariant subspaces apartfrom 0 and Cn.

Let π : G→L (E) be any continuous representation of G on some Banach spaceE. A tuple (e j)

nj=1 is called a (finite) unitary system for (π,E) if the e1, . . . ,en ∈ E

are linearly independent, F := line1, . . . ,en is π(G)-invariant and the correspond-ing matrix representation χ : G→ Cn×n defined by

πxei = ∑nj=1χi j(x)e j ( j = 1, . . . ,n)

is unitary, i.e., satisfies χ(x) ∈U(n) for all x ∈G. A unitary system (e j)nj=1 is called

irreducible if F does not have any nontrivial π(G)-invariant subspaces.

Lemma 16.28. Let π : G→L (E) be a continuous representation of the compactgroup G on some Banach space E.

a) Let χ : G→ U(n) be a finite-dimensional unitary representation of G andu ∈ E. Then the finite-dimensional space linπχi j u : i = 1, . . . ,n is π(G)-invariant.


b) Let F ⊆ E be a finite-dimensional π(G)-invariant subspace. Then there is aunitary system (e j)

nj=1 for (π,E) with F = line1, . . . ,en.

Proof. a) We note first that

(τxχi j)(y) = χi j(x−1y) = ∑nk=1χik(x−1)χk j(y) (x,y ∈ G).

Hence πxπχi j u = πτxχi j u = ∑nk=1 χik(x−1)πχk j u.

b) Take any inner product ( · | ·)F on F , and define

(u |v) :=∫

G(πxu |πxv)F dx (u,v ∈ F).

Then ( · | ·) is an inner product since

(u |u) =∫

G‖πxu‖2

F dx = 0 ⇒ ‖u‖F = ‖π1u‖F = 0.

With respect to this new inner product, each πx acts isometrically by the rotationinvariance of the Haar measure. Hence any orthonormal basis e1, . . . ,en of F withrespect to this inner product is a unitary system for (π,E) spanning F .

We now arrive at the main result of this section.

Theorem 16.29. Let π : G→L (E) be a continuous representation of the compactgroup G on some Banach space E. Then

E = cl⋃

F : F is a π(G)-invariant subspace of E, dimF < ∞

= lin

e j : (e j)nj=1 (irreducible) unitary system, j = 1, . . . ,n, n ∈ N

.

Proof. By Lemma 16.28 each π(G)-invariant finite-dimensional subspace F of Eis the span of a unitary system. Since one can decompose each unitary representa-tion orthogonally into irreducible subspaces (see supplement below), the second setequality is clear. To prove the first, note that by Theorem 16.36 from the supple-ment below, the linear span of the coordinate functions of finite-dimensional uni-tary representations of G form a dense set in C(G). Since by (16.4) the mappingL1(G)→L (E), f 7→ π f , is bounded we have

π f u ∈ lin

πχi j u : n ∈ N, χ unitary representation of G on Cn, i, j = 1, . . . ,n

for every f ∈ C(G). Hence, by Lemma 16.28 we obtain

π f u ∈ cl⋃

F : F is a π(G)-invariant subspace of E, dimF < ∞

for each f ∈ C(G) and u ∈ E, and we conclude the proof by an appeal to Lemma16.26 above.


By Proposition 16.42 below, the irreducible unitary representations of a com-pact Abelian group coincides with its character group G∗. Hence we arrive at thefollowing corollary.

Corollary 16.30. If π : G → L (E) is a continuous representation of a compactAbelian group G, then

E = lin

u ∈ E : ∃χ ∈ G∗ with πxu = χ(x)u ∀ x ∈ G.

We refer to Exercise 14 for a more information about representations of compactAbelian groups.

16.5 The Reversible Part Revisited

The preceding considerations can now be used to obtain a new characterisation ofthe reversible part of a JdLG-admissible semigroup T of operators on a Banachspace E.

Similar to group representations, let us call a tuple (e j)nj=1 of vectors of E a

(finite) unitary system for a semigroup T ⊆L (E) if (e1, . . . ,en) is linearly inde-pendent, F := line1, . . . ,en is T -invariant, and the corresponding matrix repre-sentation of T on F defined by

Tei = ∑nj=1χi j(T )e j (i = 1, . . . ,n, T ∈T ).

satisfies χ(T )∈U(n) for each T ∈T . Furthermore, a unitary system (e j)nj=1 for T

is called irreducible if F does not have any nontrivial subspaces invariant under theaction of T .

Theorem 16.31 (Jacobs–de Leeuw–Glicksberg). Let E = Er⊕Es be the JdLG-decomposition of a Banach space E with respect to a JdLG-admissible semigroupT ⊆L (E). Then

Er = linu : (e j) j (irred.) unitary sys. for T , u = e j for some j.

Proof. We use the previous terminology, i.e., S = clw(T ) is the weak operatorclosure of T , Q is the minimal idempotent, Er = ran(Q) and G := S Q = QS .

Let B := (e1, . . . ,en) be some unitary system for T and denote by F :=line1, . . . ,en the generated subspace. As F is closed and invariant under T , itmust be invariant under S as well. Let χ(S) ∈Cn×n be the matrix representation ofS|F with respect to the basis B. Then χ is continuous and a homomorphism of semi-groups. By hypothesis, χ(T )⊆U(n), and since the latter is closed, χ(S)∈U(n) forall S ∈S . In particular, χ(Q) ∈U(n), and since Q is an idempotent, χ(Q) = In, then×n identity matrix. This means that Q acts as the identity on F , whence F ⊆ Er.

16.5 The Reversible Part Revisited 251

For the converse we first note that the compact group G acts continuously on Er.Hence, by Theorem 16.29, Er is the closed linear span of all irreducible unitarysystems for this representation. But it is clear that each unitary system (e j)

nj=1 in Er

for G is a unitary system for T , since T = T Q on Er for each T ∈ T . Further, if itis irreducible for the G -action, then it is irreducible for the T -action as well, sinceG ⊆ clw(T ).

We refer to Exercise 9 for a converse to Theorem 16.31. In the case of Abeliansemigroups we obtain the following corollary.

Corollary 16.32. Let E be a Banach space, and let T ⊆L (E) be an Abelian rela-tively weakly compact semigroup. Then

Er = lin

u ∈ E : ∀T ∈T ∃λ ∈ T : Tu = λu.

Example 16.33. Let H be a Hilbert space and let T ⊆ L (H) be semigroup ofcontractions. Since the set of contractions is compact in L w(H) by Corollary 16.2,T is relatively weakly compact. The associated JdLG-decomposition H = Hr⊕Hsis orthogonal since the JdLG-projection Q is a contraction, see Proposition 13.31.The group Sr is hence a group of invertible contractions, i.e., of unitary operators.In particular, each T ∈T restricts to a unitary operator on Hr.However, the reversible part can be trivial even if T is itself a unitary group. Tosee this, take H := `2(Z) and T the (right or left) shift. Then T is a unitary operatorand hence T := T n : n ∈ Z is a relatively weakly compact unitary group. ButT n f → 0 weakly for every f ∈ H, whence H = Hs and Hr = 0.

Finally, we apply our results to the semigroup TT generated by a single operatorT ∈L (E).

Theorem 16.34. Let T ∈ L (E) generate a relatively weakly compact semigroupTT . Then the following assertions hold.

a) Er(T ) is spanned by the eigenvectors associated with unimodular eigenval-ues of T , i.e.,

Er(T ) = lin

u ∈ E : ∃λ ∈ T with Tu = λu.

b) fix(T )⊆ Er(T ) and Es(T )⊆ ran(I−T ).

c) Es(T ) = ran(I−T ) if and only if σp(T )∩T= 1.d) E = Es(T ) if and only if σp(T )∩T= /0.

Proof. a) is an immediate consequence of Corollary 16.32 since if Tu = λu, thenT nu = λ nu for all n ∈ N0.

b) The inclusion fix(T ) ⊆ Er follows from a), but is actually elementary since theJdLG-projection Q is in clwTT . The second inclusion can be seen from Corollary9.15 together with Theorem 16.19. However, one can use a more direct argument:


If u ∈ Es then 0 ∈ clσT nu : n ∈ N0. But then 0 ∈ convT nu : n ∈ N0 and henceu = u−0 ∈ ran(I−T ) by Proposition 8.18.

c) follows from b) and a); and d) follows from a) directly.

Let us apply Theorem 16.34 to the Koopman operator of an MDS (X;ϕ).

Corollary 16.35. Let T be the Koopman operator on L1(X) of an MDS (X;ϕ). Thenthe following assertions hold.

a) The Kronecker factor is spanned by the eigenfunctions of T associated withunimodular eigenvalues.

b) If the system is ergodic, each such eigenfunction is contained in L∞(G), infact even unimodular up to a constant.

c) The system is weakly mixing if and only if it is ergodic and σp(T )∩T= 1.

Proof. a) holds by Theorem 16.34 and the definition of the Kronecker factor inExample 16.17. b) has been treated in Proposition 7.17. To prove c), note that byTheorem 16.20.(v) a system is weakly mixing if and only if fix(T ) = lin1 andran(I−T )⊆ Es(T ). Hence c) follows from Theorem 16.34, part b) and c).

We note that c) is in fact a spectral characterisation of weakly mixing systems, sincethe system is ergodic if and only if fix(T ) = lin1.

Supplement: Unitary Representations of Compact Groups

In the following G is a compact group with Haar measure m. For E = H a Hilbertspace a representation π : G→L (E) is called unitary if it is strongly continuous,and πx−1 = π∗x for each x ∈ G. That is to say, each operator πx is a unitary operatoron H. A unitary representation (π,H) is called finite-dimensional if dimH < ∞.

If (π,Hπ) and (ρ,Hρ) are two unitary representations of G, an operator A : Hπ →Hρ is called intertwining if Aπx = ρxA for all x ∈ G. A closed subspace F of His called reducing if it is invariant under the action of G. Since π(G) ⊆ L (H)is invariant under taking adjoints, a closed subspace F is reducing if and only itsorthogonal complement F⊥ is reducing, and this happens precisely then, when theorthogonal projection P onto F is intertwining.

Every reducing subspace F gives rise to a subrepresentation by restriction:πF

x := πx|F . A unitary representation is called irreducible if 0 and H are theonly reducing subspaces. By virtue of an induction argument one sees easily that afinite-dimensional unitary representation decomposes orthogonally into irreducibleones (however not in a unique manner).


Coordinate Functions

After a choice of orthonormal basis, a unitary representation on a finite-dimensionalHilbert space H is the same as a group homomorphism

ψ : G→ U(n)

the matrix group of unitary n×n matrices. A continuous function f ∈ C(K) is calleda coordinate function if there is a continuous unitary representation ψ : G→ U(n)and some indices i, j with f = ψi j. Equivalently, a coordinate function is any func-tion of the form f (x) =

(πxei

∣∣e j), where π : G→L (H) is a continuous unitary

representation and H is a finite-dimensional Hilbert space with orthonormal ba-sis (e j) j. If the representation is irreducible, f is called an irreducible coordinatefunction. The aim of the present section is mainly to prove the following centralresult.

Theorem 16.36. Let G be a compact group. Then the linear span of irreduciblecoordinate functions is dense in C(G).

The proof of this theorem is rather lengthy. First, we recall from the above thatany finite-dimensional unitary representation decomposes orthogonally into irre-ducible representations. By picking an orthonormal basis subordinate to this decom-position, we see that any coordinate function is a linear combination of irreduciblecoordinate functions. So it remains to prove that the linear span of coordinate func-tions is dense in C(G). The strategy is, of course, to employ the Stone–WeierstrassTheorem 4.4. The following auxiliary result takes the first step.

Lemma 16.37. The product of two coordinate functions is again a coordinate func-tion. The pointwise conjugate of a coordinate function is a coordinate function.

Proof. Let π : G→ U(n) and ρ : G→ U(m) be two unitary representations of thecompact group G. Denote πx = (πi j(x))i, j and ρx = (ρkl(x))k,l . Define χ : G →Cnm×nm by

χx = (χαβ (x))α,β , χαβ (x) := πi j(x)ρkl(x) (α = (i,k), β = (k, j));

and π : G→ Cn×n byπx := (πi j(x))i, j.

Then, by direct verification, χ : G→ U(nm) and π : G→ U(n) are again unitaryrepresentations of G (Exercise 10).

Remark 16.38. The constructions in the preceding proof can be formulated moreperspicuously in the language of abstract representation theory. Namely, let (π,H)and (ρ,K) be unitary representations of G then one can form the tensor productrepresentation (W,H⊗K) by Wx := πx⊗ρx, and the contragredient representation(π,H ′) on the dual space H ′ of H by πx := π ′x−1 . The matrix representations of theseconstructions lead to the formulation above.


It follows from Lemma 16.37 that the linear span of the coordinate functionsis a conjugation invariant subalgebra of C(G). Since the constant function 1 is acoordinate functions (of the trivial representation πx := I on C1), it remains to showthat the coordinate functions separate the points.

Theorem 16.39. The finite-dimensional unitary representations (equivalently, theset of coordinate functions) of a compact group G separate the points of G.

The proof of Theorem 16.39 needs some technical preparations.

Convolution and the Left Regular Representation

For f ∈ L1(G) and x ∈ G we define

f ∗(y) := f (y−1)and (τx f )(y) := f (x−1y) (y ∈ G). (16.5)

By invariance of the Haar measure and simple algebra

τ : G→L (Lp(G)), x 7→ τx

is a representation of G, called the left regular representation of G on Lp(G). In thecase p = 2 this representations is unitary. Moreover, it follows easily from Corollary14.12 that this representation is continuous. Hence, for f ∈ L1(G) we can define asin (16.3) the integral

τ f :=∫

Gf (x)τx dx

in the weak sense, see Remark 16.25. Applying this operator to g∈ L1(G) yields theconvolution

f ∗g := τ f (g) =∫

Gf (x)τxgdx.

If g ∈ C(G), then the map x 7→ τxg is even continuous in C(G) (Corollary 14.12),whence we can view τ as a continuous representation of G on C(G). It follows thatin this case f ∗g = τ f (g) ∈ C(G) again, and we can evaluate pointwise to obtain

( f ∗g)(y) =∫

Gf (x)(τxg)(y)dx =

∫G

f (x)g(x−1y)dx (16.6)

for y∈G, by the invariance of the Haar measure. We note the following useful facts.

Lemma 16.40. For f ,g,h ∈ L1(G) and x ∈ G the following statements are true.

a) ‖ f ∗g‖1 ≤ ‖ f‖1 ‖g‖1.

b) ( f ∗g)∗ = g∗ ∗ f ∗.

c) τxτ f g = ττx f g

d) (h∗ f )∗g = h∗ ( f ∗g).


Furthermore, if f ,g ∈ L2(G) then f ∗g ∈ C(G) and

‖ f ∗g‖∞≤ ‖ f‖2 ‖g‖2 , ‖ f ∗g‖2 ≤ ‖ f‖2 ‖g‖1 , ( f ∗ f ∗)(1) = ‖ f‖2

2 .

Proof. a) By (16.4) we have

‖ f ∗g‖1 =∥∥τ f g

∥∥≤ ∥∥τ f∥∥∥∥g

∥∥1 ≤

∥∥ f∥∥

1

∥∥g∥∥

1,

since each τx, x ∈ G, is an isometry on L1(G).b) For f ,g ∈ C(G) this is a simple computation; for general f ,g ∈ L1(G) use anapproximation argument and a).c) This has been established already in the proof of Corollary 16.27 for a generalrepresentation.d) Take u ∈ L∞(G). Then by c)⟨

τhτ f g,u⟩=∫

Gh(x)

⟨τxτ f g,u

⟩dx =

∫G

h(x)⟨ττx f g,u

⟩dx =

⟨ττh f g,u

⟩since v 7→ 〈τvg,u〉 is a continuous functional on L1(G).For the remaining part we note first that f ∗ g ∈ C(G) if f ,g are both continuous.The estimate ‖ f ∗g‖

∞≤ ‖ f‖2 ‖g‖2 is an easy consequence of the Cauchy–Schwarz

inequality and the invariance of the Haar measure. The other norm estimate followsfrom

‖ f ∗g‖2 = ‖( f ∗g)∗‖2 = ‖g∗ ∗ f ∗‖2 =

∥∥τg∗ f ∗∥∥

2 ≤ ‖g∗‖1 ‖ f ∗‖2 = ‖ f‖2 ‖g‖1

where we used again (16.4), the fact that that τ is a unitary representation on L2(G),and the reflection invariance of the Haar measure. The statements for general f ,g ∈L2(G) follow by approximation with continuous functions.

Our major tool for proving Theorem 16.39 is the following result.

Proposition 16.41. Let 0 6= f ∈ L2(G). Then there is a finite-dimensional unitaryrepresentation (π,H) of G such that π f 6= 0.

Proof. Let h := f ∗ f ∗. Then h∗ = h, h ∈ C(G) and h(1) = ‖ f‖22 6= 0, by Lemma

16.40. Since h is continuous and the Haar measure is strictly positive, ‖h‖2 6= 0.Define the operator A on L2(G) by Au := u∗h. By Lemma 16.40, A is bounded with‖A‖ ≤ ‖h‖1. Moreover, it is easy to see that A is self-adjoint (Exercise 11). We notethat

τ f A f ∗ = τ f ( f ∗ ∗h) = ( f ∗ ( f ∗ ∗h)) = ( f ∗ f ∗)∗h = h∗h.

Since h∗ = h, (h ∗ h)(1) = ‖h‖22 > 0, and as before it follows that τ f A f ∗ 6= 0. In

particular, A 6= 0.Next, we note that A is an integral operator with kernel k(y,x) := h(x−1y), since


(Au)(y) = (u∗h)(y) =∫

Gu(x)h(x−1y)dx =

∫G

k(y,x)u(x)dx (y ∈ G).

It follows that A is compact. (One can employ different reasonings here, see Exercise12.) By the Spectral Theorem C.23, A can be written as a convergent sum

A = ∑λ∈ΛλPλ

where Λ ⊆C\0 is a finite or countable set, for each λ ∈Λ the eigenspace Eλ :=ker(λ I−A) satisfies 0 < dimEλ < ∞, and Pλ is the orthogonal projection onto Eλ .(Note that since A 6= 0, Λ 6= /0.)Finally, A intertwines τ with itself, since by Lemma 16.40.c) we have

τxAu = τx(u∗h) = (τxu)∗h = Aτxu (u ∈ L2(G)).

This implies that each eigenspace Eλ is invariant under τx, and hence gives riseto the (finite-dimensional) subrepresentation πλ

x := τx|Eλ. We claim that there is at

least one λ ∈Λ with πλf 6= 0. Indeed, since τ f A f ∗ 6= 0 we must have λτ f Pλ f ∗ 6= 0

for some λ . But on Eλ we have τ f = πλf , and hence the claim is proved. Taking

(π,H) := (πλ ,Eλ ) concludes the proof of the proposition.

Proof of Theorem 16.39. Let a,b ∈ G with a 6= b. We have to devise a finite-dimensional unitary representation (π,H) of G with πa 6= πb. By passing to ab−1

we may suppose a 6= 1 and we aim for a representation with πa 6= I. We can picka neighbourhood U of 1 with aU ∩U = /0 and a continuous function 0 ≤ u ∈ C(G)with u(1) > 0 and supp(u) ⊆ U . Hence the function f := u− τau is nonzero. ByProposition 16.41 there is a finite-dimensional unitary representation (π,H) of Gwith π f 6= 0. This implies that

0 6= π f = πu−τau = πu−πτau = πu−πaπu,

whence πa 6= I.

Compact Abelian Groups

Let us consider the special situation that the compact group G is Abelian. Let (π,H)be a finite-dimensional irreducible unitary representation of G and let x ∈ G. SincedimH <∞, πx must have an eigenvalue λ (x)∈T. Since G is Abelian, πx intertwinesπ with itself, and hence ker(λ (x)I−πx) is a reducing subspace. By irreducibility, itmust be all of H and hence πx = λ (x)I. But then every one-dimensional subspace isreducing, whence dimH = 1. We have proved the following.

Proposition 16.42. A (finite-dimensional) unitary represention of a compactAbelian group is irreducible if and only if it is one-dimensional.

Exercises 257

Note that the unitary group of C1 is U(1) = T, and a one-dimensional repre-sentation of G is the same as a continuous group homomorphism G→ T, i.e., acharacter of G. The set of characters is G∗, the dual group of G, introduced in Sec-tion 14.3. The following is hence an immediate consequence of Proposition 16.42and Theorem 16.36.

Theorem 16.43. For a compact Abelian group G the dual group G∗ separates thepoints of G. The set linG∗ of trigonometric polynomials is a dense subalgebra ofC(G).

Remarks 16.44. 1) It follows from Theorem 16.29 that each infinite-dimensional (continuous) representation of a compact group G has a nontriv-ial finite-dimensional invariant subspace. In particular each irreducible unitaryrepresentation of G is finite-dimensional.

2) Two unitary representations (π,H) and (ρ,K) are called equivalent if there isa unitary operator U : H→ K that intertwines the representations. The Peter–Weyl theorem states that L2(G) decomposes orthogonally

L2(G)∼=⊕

αEα ⊕·· ·⊕Eα︸︷︷︸

dim(Eα )-times

into finite-dimensional subspaces Eα such that (1) each Eα reduces the leftregular representation, (2) the induced subrepresentation (τ,Eα) is irreducible,(3) each irreducible representation of G equivalent to precisely one (τ,Eα).We refer to [Deitmar and Echterhoff (2009), Chapter 7] for further details.

Exercises

1. Give a proof of Theorem 16.1 for a reflexive Banach space E. (Hint: Mimic theHilbert space proof.)

2. Let E be a Banach space. Show that the following sets are closed subsemigroupsof L w(E).

a) T ∈L (E) : T Q = QT, for some fixed operator Q ∈L (E).

b) T ∈L (E) : T f = f, for some fixed element f ∈ E.

3. Let E = C(K) or E = Lp(X) for some 1 ≤ p ≤ ∞. Show that the following setsare closed subsemigroups of L w(E).

a) T ∈L (E) : T ≥ 0.b) T ∈L (E) : T f = T f ∀ f ∈ E.c) T ∈L (E) : T ≥ 0, T h≤ h for some fixed element h ∈ ER.


4. Prove the analogue of Lemma 16.8 for the strong operator topology, i.e., the fol-lowing.

Let E be a Banach space, let T ⊆L (E), and let D⊆ E be a subset such that linDis dense in E. Then the following assertions are equivalent.

(i) T is relatively strongly compact, i.e, relatively compact in the strong opera-tor topology.

(ii) T x = T x : T ∈T is relatively compact in E for each x ∈ E.

(iii) T is norm-bounded and T x = T x : T ∈T is relatively compact in E foreach x ∈ D.

5. Let E be a strictly convex Banach space, and let P ∈L (E) be a projection with‖P‖ ≤ 1. Show that for any f ∈ E

‖P f‖= ‖ f‖ ⇐⇒ P f = f .

(Hint: Write P f = P( f+P f

2

).)

6. Let E be a Banach space, and F ⊆ E a closed subspace. Show that the set

L F(E) =

T ∈L (E) : T (F)⊆ F

is weakly closed and the restriction mapping

L F(E)→L (F), T 7→ T |F

is continuous for the weak and strong operator topologies. (Hint: Use the Hahn–Banach theorem.)

7. Let A be an algebra over R and let S ⊆ A be multiplicative, i.e., S · S ⊆ S. Showthat conv(S) is multiplicative, too. Show that if st = ts holds for all s, t ∈ S then italso holds for all s, t ∈ conv(S).

8. Let E be a separable Banach space. Show that there is a sequence (x′j) j∈N ⊆ E ′

with ‖x′j‖ = 1 for all j ∈ N and such that x j : j ∈ N separates the points of E.(Hint: Use the Hahn–Banach theorem.)

9. Let E be a Banach space, and let T ⊆L (E) be a norm-bounded subsemigroup.Recall from page 250 the notion of a (finite) unitary system for T . Suppose that

E = lin

u ∈ E : ∃ fin. unit. sys. (e j) j for T with u ∈ line1, . . . ,e j

Show that T is relatively strongly compact, and that clwT = clsT is a compactsubgroup of L (E) with the identity operator as neutral element (cf. Proposition17.1).

10. Let π : G→U(n) and ρ : G→U(m) be two unitary representations of the com-pact group G. Denote πx = (πi j(x))i, j and ρx = (ρkl(x))k,l . Define χ : G→ Cnm×nm

by

Exercises 259

χx = (χαβ (x))α,β , χαβ (x) := πi j(x)ρkl(x) (α = (i,k), β = (k, j));

and π : G→ Cn×n byπx := (πi j(x))i, j.

Show that χ : G→U(nm) and π : G→U(n) are again unitary representations of G.

11. Let u,v ∈ L2(G) and h ∈ L1(G). Show that

(u∗ |v∗ )L2 = (v |u)L2 and (u∗h |v) = (u |v∗h∗ ) .

Conclude that the operator A in the proof of Proposition 16.41 is self-adjoint.(Hint: Show the assertion for h ∈ C(G) first and then employ an approximationargument.)

12. Let K be a compact space, and let µ ∈M(K) be a positive measure. Further, letk ∈ C(K×K) and define

(T f )(y) :=∫

Kk(y,x) f (x)µ(dx) ( f ∈ L2(K,µ), y ∈ K).

Show that T is a compact operator through the following ways:

a) k can be approximated uniformly on K×K by finite linear combinations offunctions of the form f ⊗g, f ,g ∈ C(K), cf. the proof of Proposition 10.11.Show that this leads to an approximation in operator norm of T by finite rankoperators.

b) Realise that T is a Hilbert–Schmidt operator and as such is compact, see[Young (1988), Sec. 8.1] or [Deitmar and Echterhoff (2009), Sec. 5.3].

13. Let π : G→L (E) be a continuous representation of a compact group G on aBanach space E. For f ∈ L1(G) define π f as in (16.3). Show that

π f πg = π f∗g

where f ∗g denotes convolution of f ,g as in Section (16.6). Then show that if E =His a Hilbert space,

(π f )∗ = π f ∗

where f ∗(x) = f (x−1) as in (16.5).

14. Let G be a compact Abelian group and let π : G→L (E) be a continuous rep-resentation of G. For a character χ ∈ G∗ consider the operator Pχ := πχ , i.e.,

Pχ u =∫

Gχ(x)πxudx (u ∈ E).

a) Show that χ ∗η = (χ |η )η for any two characters χ,η ∈ G∗. Then showthat


Pχ Pη = δχη Pη (χ,η ∈ G∗).

b) Show that ran(Pχ) = u ∈ E : πxu = χ(x)u for all x ∈ G and

I = ∑χ∈G∗Pχ

the sum being convergent in the strong sense.

c) Show that if E = H is a Hilbert space, then each Pχ is self-adjoint and

H =⊕

χ∈G∗ranPχ

is an orthogonal decomposition of H.

Chapter 17Dynamical Systems with Discrete Spectrum

In the previous chapter we obtained important information on the structure ofweakly compact semigroups of linear operators on Banach spaces. The central ex-ample is the closure S in the weak operator topology of a semigroup TT := T n :n ∈ N0 generated by a single operator T ∈ L (E), E a Banach space. In such acompact semitopological semigroup we have found a smallest ideal K(S ), whichis actually a group. The unit element Q of this group is a projection inducing theJdLG-decomposition E = Er⊕Es.

In this chapter, with applications to dynamical systems in mind, we confine ourattention to the case when E = Er, i.e., Q = I is the identity operator. Equivalently,we suppose that K(S )=S consists of invertible operators only. By Theorem 16.34this implies that E is spanned by the eigenvectors of T corresponding to unimodulareigenvalues. In the first section we study this property.

17.1 Operators with Discrete Spectrum

Let E be a Banach space. An operator T ∈L (E) is said to have discrete spectrumon a Banach space E if it is power-bounded and E is spanned by the eigenvectors

of T corresponding to unimodular eigenvalues, i.e.,

E = lin

x : ∃λ ∈ T : T x = λx. (17.1)

The first result states that an operator with discrete spectrum generates a relativelycompact semigroup.

Proposition 17.1. Suppose T ∈L (E) has discrete spectrum on the Banach spaceE. Then the semigroup

T :=

T n : n ∈ N0

261

262 17 Dynamical Systems with Discrete Spectrum

is relatively strongly compact. Moreover, on the (strong=weak) closure S of T thetwo topologies coincide, and S is a topological group with the identity operator asneutral element.

Proof. Note that Exercise 16.9 contains a more general statement, and the followingproof works also there. By Exercise 16.4 we only have to prove that for all u ∈ E, ua linear combination of eigenvectors corresponding to unimodular eigenvalues, theorbit T nu : n ∈ N

is relatively compact in E. For such elements, however, the

orbit is a bounded subset of a finite dimensional subspace and, as such, is relativelycompact. The remaining assertions follow from the results in Section 16.4.

As a consequence an operator with discrete spectrum is mean ergodic, by Example16.16. The next proposition provides some detail about operators acting on the Lp

scale.

Proposition 17.2. Let X be a probability space and T ∈ L1(X) be a Dunford–Schwartz operator. If T has discrete spectrum on Lp(X) for some p ∈ [1,∞), then ithas discrete spectrum on spaces in the Lp-scale, p ∈ [1,∞).

Proof. If T has discrete spectrum on Lp for some p ∈ [1,∞), then it has discretespectrum on Lr for all r ∈ [1, p], since for such r the space Lr is densely and con-tinuously embedded in Lp. Therefore we may suppose that T has discrete spectrumon L1. Then we have to prove that it has discrete spectrum on Lr for all r ∈ (1,∞).By Theorem 16.1 we know that T generates a relatively weakly compact semigroupin L (E), E := Lr(X). Hence, by the results from Section 16.2 we obtain the JdLG-decomposition E = Es⊕Er. If f ∈ Es, then by Theorem 16.19 there is a sequencesuch that T n j f → 0 weakly in E, but this implies T n j f → 0 weakly in L1(X), prov-ing that T cannot have discrete spectrum on L1.

The aim is now to examine dynamical systems whose Koopman operator has dis-crete spectrum. The chief example for such dynamical systems is again the grouprotation.

Example 17.3. Let G be a compact Abelian group and let T := La be the Koopmanoperator induced by the rotation with a ∈ G. Since every character χ ∈ G∗ is aneigenfunction of T corresponding to the eigenvalue χ(a) ∈ T and since linG∗ isdense in C(G) (Proposition 14.18), T has discrete spectrum on C(G). A fortiori,T has discrete spectrum also on Lp(G) for every 1 ≤ p < ∞, where µ is the Haarmeasure on G.

A TDS (K;ϕ) is said to have discrete spectrum if its Koopman operator T onC(K) has discrete spectrum. If (X;ϕ) is an MDS, then by Proposition 17.2 above,its Koopman operator has discrete spectrum on Lp(X) for some p ∈ [1,∞) if theKoopman operator has this property for all p in this range. In this case, we say thatthe MDS has discrete spectrum.

If we have a topological or a measure-preserving dynamical system with discretespectrum, its Koopman operator T is a bijective isometry and hence satisfies σ(T )⊆

17.2 Monothetic Groups 263

T by Proposition C.19. If in addition the TDS is minimal (or the MDS is ergodic),then the eigenspaces of T are all one-dimensional, by Theorem 4.19 (or Proposition7.17). We shall show that the dynamical system is completely determined (up toisomorphism) by this spectral information. Indeed, it will turn out that the dynamicalsystem is essentially a minimal/ergodic rotation on a compact group. To begin with,let us have a closer look at this model situation.

17.2 Monothetic Groups

A topological group G is called monothetic if there is an element a∈G such that thecyclic subgroup 〈a〉= an : n∈Z is dense in G; in this case a is called a generatingelement of G. By Exercise 14.2 monothetic groups are necessarily Abelian. Thenext proposition yields some information about locally compact monothetic groups.

Proposition 17.4 (Weil’s Lemma). Let G be a locally compact monothetic groupwith generating element a ∈ G (hence G is Abelian). Then G is either topologicallyisomorphic to Z or compact. In the latter case (and only then) the set an : n ∈N0is dense in G.

Proof. Suppose that G and Z are not isomorphic. We prove that the set an : n∈Nis dense in G. Let U be a nonempty open subset in G. By assumption there is n ∈ Zwith an ∈U . Take a symmetric neighbourhood of 1 with anV ⊆U . If ak ∈ V holdsonly for finitely many k ∈ Z, then 〈a〉 would be discrete and infinite, and hence Gwould be isomorphic to Z, a contradiction. This shows that there is k ∈ N such thatn+ k ≥ 0 and an+k ∈ anV ⊆U , yielding the required denseness assertion.

Still assuming that G and Z are not isomorphic, we show that G is compact. LetU be a symmetric open neighbourhood of 1 ∈ G such that U is compact. Since wealready saw that A := an : n ∈ N0 is dense in G, we obtain that G = AU = AU .This means that for every g ∈ G there is n ∈ N0 with an ∈ gU . Take the minimal nwith this property and denote it by ng. By the compactness of U there is m∈N0 suchthat for the finite set F := a0,a1, . . . ,am one has U ⊆ FU . By definition we haveang ∈ gU ⊆ gFU , i.e., one has ang− j ∈ gU for some j ≤ m. From the minimalityof ng ∈ N0 we obtain ng − j ≤ 0, i.e., ng ≤ k. This proves that g ∈ a j−ngU−1 =a j−ngU ⊆ FU , so G = FU , which is a compact set. The assertion is thus proved.

For compact groups we have the next characterisations of being monothetic.

Proposition 17.5. For a compact group G with Haar measure m and an elementa ∈ G the following statements are equivalent.

(i) The group G is monothetic with generating element a.

(ii) The cyclic subgroup 〈a〉 := an : n ∈ Z is dense in G.

(iii) The TDS (G;a) is minimal.

(iv) The MDS (G,m;a) is ergodic.


(v) The element a separates G∗.

Proof. The equivalence of the first four statement was already stated in Theorem10.13 (together with a longer list of equivalences). By continuity, two characters areequal if they coincide on a dense subset, hence (ii) implies (v). By Corollary 14.16if a separates G∗, then 〈a〉 ⊆ G must be dense.

An immediate consequence of this and Proposition 17.1 is the following.

Corollary 17.6. Suppose T ∈L (E) has discrete spectrum on the Banach space E.Then the weak closure

G := clw

T n : n ∈ N0⊆L w(E)

is a compact monothetic group and the rotation TDS (G ;T ) is minimal.

We now characterise the duals of compact monothetic groups.

Proposition 17.7. Let G be a compact group. If G is monothetic with generatingelement a ∈ G, then G∗ is (algebraically) isomorphic to the group

G∗(a) :=

χ(a) : χ ∈ G∗⊆ T.

Conversely, if G∗ is algebraically isomorphic to a subgroup of T, then G is mono-thetic.

Proof. Suppose that a generates G. Then the set G∗(a) is a subgroup of T and theevaluation mapping G∗ 3 χ 7→ χ(a) is a surjective group homomorphism. Further-more, it is even injective by Proposition 17.5.

For the converse, suppose that γ : G∗→ T is an injective homomorphism, hence γ isa character of G∗. By Pontryagin’s Theorem 14.26 there is a ∈ G with γ(χ) = χ(a)for all χ ∈ G∗. We claim that 〈a〉 is dense in G. Indeed, if H := cl〈a〉 6= G, then byCorollary 14.16 there exists a character χ 6= 1 of G being constant 1 on 〈a〉. But thenγ is not injective, a contradiction.

Recall from Section 14.5 the following interpretation of the set G∗(a).

Proposition 17.8. Let G be a compact Abelian group with Haar measure m andfix a ∈ G. Consider T := La the Koopman operator of the rotation by a consideredeither as an operator on E = C(G) or on E = Lp(G), p ∈ [1,∞). Then

G∗(a) = σp(T ).

Proof. For the case when T is considered on E =L2(G), this was proved in Theorem14.30. Since the characters are continuous and are eigenvectors of T , the statementis also proved for the case E = C(G). Hence also the inclusion G∗(a) ⊆ σp(T ) isevident for any of the spaces Lp. Suppose λ 6∈ G∗(a) is an eigenvalue with an Lp-eigenvector f 6= 0. Since the characters belong to C(G) ⊆ Lq(G), q the conjugate

17.3 Dynamical Systems with Discrete Spectrum 265

exponent, we obtain 0 = 〈T f −λ f ,χ〉= (χ(a)−λ )〈 f ,χ〉. The assumptions imply〈 f ,χ〉= 0 for all χ ∈G∗ which in turn yields f = 0, a contradiction (use that linG∗

is dense in Lp).

The next result characterises isomorphic minimal/ergodic group rotations.

Proposition 17.9. For two monothetic compact groups G and H with generatingelements a ∈ G and b ∈ H the following statements are equivalent.

(i) The TDSs (G;a) and (H;b) are isomorphic.

(ii) The MDSs (G,mG;a) and (H,mH ;b) are isomorphic.

(iii) There is a topological group isomorphism Φ : G→ H with Φ(a) = b.

(iv) G∗(a) = H∗(b).

Proof. (iii)⇒ (i), (ii): If Φ is a topological group isomorphism with Φ(a) = b, thenΦ is also an isomorphism of the dynamical systems.(i) or (ii)⇒ (iv): If the dynamical systems are isomorphic, the Koopman operators onthe corresponding spaces are similar. Hence they have the same spectral properties,in particular they have the same point spectrum. So (iv) follows from Proposition17.8.(iv)⇒ (iii): If G∗(a) = H∗(b), then by Proposition 17.7 the discrete groups G∗ andH∗ are isomorphic. Recall that this isomorphism is given by

G∗ 3 χ 7→ χ(a) = η(b) 7→ η ∈ H∗.

By Pontryagin’s Theorem 14.26 this implies that G and H are topologically isomor-phic under Φ : G→H, where Φ(g)∈H is the unique element with η(Φ(g)) = χ(g)if χ(a) = η(b), χ ∈ G∗, η ∈ H∗. Now Φ(a) = b, and the assertion follows.

Example 17.10. Let H be an arbitrary subgroup of Td. By Proposition 17.7, thecompact group G := H∗ is monothetic with some generating element a ∈ G. Asshown before, the group rotation (G;a) is minimal and has discrete spectrum (seeProposition 17.5 and Example 17.3). Its Koopman operator T on C(G) has pointspectrum

σp(T ) =

χ(a) : χ ∈ G∗ = H∗∗ = H= H,

by Theorem 14.30 and Proposition 17.7.

The next section shows that this example is quite general.

17.3 Dynamical Systems with Discrete Spectrum

In this section, we study dynamical systems with discrete spectrum, both in thetopological and in the measure-preserving situation.


Minimal TDSs with Discrete Spectrum

We first confine our attention to the topological case. Our aim is to give a completedescription (up to isomorphism) of minimal TDSs with discrete spectrum.

If two TDSs are isomorphic, the Koopman operators on the corresponding C(K)-spaces are similar and hence have the same spectral properties. In particular, theyhave the same point spectrum. This shows that the point spectrum of the Koopmanoperator is an isomorphism invariant of TDSs. On the other hand, Proposition 17.9and Proposition 17.8 show that within the class of minimal group rotations, the pointspectrum completely determines the system up to isomorphism. Therefore we saythat it is a complete isomorphism invariant for this class. Our aim in this sectionis to show that it is a complete isomorphism invariant even for the (larger) class ofminimal TDS with discrete spectrum. This is done by showing that every minimalTDS with discrete spectrum is essentially a minimal group rotation.

Theorem 17.11. Let (K;ϕ) be a minimal TDS with discrete spectrum. Then it isisomorphic to a minimal rotation TDS on a compact monothetic group.

Proof. By Corollary 17.6 the Koopman operator T = Tϕ on C(K) generates a com-pact group

G := clw

T n : n ∈ N0= cls

T n : n ∈ N0

⊆L s(C(K)).

We now show that (K;ϕ) is isomorphic to the rotation TDS (G ;T ). The operator Tand hence every S ∈ G is a Banach algebra homomorphism on C(K). Therefore, byTheorem 4.13, there exist unique continuous mappings

ψS : K→ K

S f = f ψS for every S ∈ G , f ∈ C(K).such that

We also obtain

ϕ = ψT and ψS1S2 = ψS1 ψS2 for all S1,S2 ∈ G .

Pick x0 ∈ K and define

Ψ : G → K by Ψ(S) := ψS(x0) for S ∈ G .

We show that this map yields an isomorphism between (G ;T ) and (K;ϕ). Indeed,let S,R ∈ G , f ∈ C(K) and ε > 0 be given. Then

| f (Ψ(S))− f (Ψ(R))|= | f (ψS(x0))− f (ψR(x0))|= |S f (x0)−R f (x0)| ≤ ε,

whenever ‖S f −R f‖∞ ≤ ε . This shows the continuity of f Ψ for every f ∈ C(K)and, by Lemma 4.12, also the continuity of Ψ .

Observe that Ψ(G ) is a closed ϕ-invariant subset of K. Since (K;ϕ) is minimal, itfollows that Ψ(G ) = K, i.e., Ψ is surjective.

17.3 Dynamical Systems with Discrete Spectrum 267

To prove injectivity, suppose that Ψ(S1) =Ψ(S2) for some S1,S2 ∈ G . Then

ψS1(ϕn(x0)) = ψS1(ψ

nT (x0)) = ψ

nT (ψS1(x0)) = ψ

nT (ψS2(x0)) = ψS2(ϕ

n(x0))

for all n ∈ N0. By minimality, orb+(x0) = K. From the continuity of ψS1 and ψS2we conclude that ψS1 = ψS2 , which yields S1 = S2.

Since ϕ = ψT , the diagramG G

K K

-S 7→T S

?Ψ

?Ψ

-ϕ

commutes, and the proof is hence complete.

The above representation theorem and the characterisation of isomorphic mini-mal group rotations in Proposition 17.9 yield the next theorem.

Theorem 17.12. Two minimal TDSs (K1;ϕ1) and (K2;ϕ2) with discrete spectrumare isomorphic if and only if the corresponding Koopman operators have the samepoint spectrum.

Proof. Of course, operators induced by isomorphic TDSs always have the samepoint spectrum. For the converse, suppose that (K1;ϕ1) and (K2;ϕ2) are minimalTDSs having Koopman operators T1 := Tϕ1 , T2 := Tϕ2 with discrete spectrum andthe same point spectrum. The two TDSs are isomorphic to group rotations (G1;a1)and (G2;a2) by Theorem 17.11. But the Koopman operators of these TDSs have thesame point spectrum, so by Proposition 17.9 they are isomorphic.

Combining this theorem with Example 17.10 we see that for each subgroup H ⊆ Tthere is a (up to isomorphism) unique minimal TDS (K;ϕ) with discrete spectrumwhose Koopman operator has exactly H as its point spectrum.

Ergodic MDSs with Discrete Spectrum

If (X;ϕ), X = (X ,Σ ,µ), is an MDS with discrete spectrum, then the Koopmanoperator T := Tϕ is a bijective isometry on L2(X), hence unitary. Analogously towhat we have done above, we show that the isomorphism of two ergodic MDSs withdiscrete spectrum depends only on the point spectrum of the Koopman operators,and each ergodic MDS with discrete spectrum is isomorphic to an ergodic rotationon a compact group.

The approach is the following: Given an MDS (X;ϕ) with discrete spectrum, weconstruct a topological model (K,ν ;ψ) with corresponding Koopman operator hav-ing discrete spectrum in the space C(K). Then we can use the already established


TDS-result. The main point is to choose the model (i.e., the corresponding subalge-bra A ⊆ L∞(X), see Section 12.3) carefully so that the TDS becomes minimal andhas discrete spectrum.

The following result from [Halmos and von Neumann (1942)] is the fundamentalstep in the “classification” of ergodic MDSs with discrete spectrum.

Theorem 17.13 (Halmos–von Neumann). Let (X;ϕ) be an ergodic MDS suchthat the Koopman operator T := Tϕ has discrete spectrum in Lp(X) for some 1 ≤p < ∞. Then (X;ϕ) is isomorphic to an ergodic rotation on a compact monotheticgroup endowed with the Haar measure.

Proof. Every eigenfunction f ∈Lp(X) of T belongs to L∞(X) by Corollary 16.35.b).Since the product of two eigenfunctions is again an eigenfunction, the linear hull of

f ∈ Lp(X) : T f = λ f for some λ ∈ T

is a conjugation invariant subalgebra of L∞(X), and its closure A in L∞(X) is a com-mutative C∗-algebra. By the Gelfand–Naimark Theorem 4.21 there exists a compactspace K and an isometric ∗-isomorphism Φ : A→ C(K). The restriction of T to Ais an algebra homomorphism on A, therefore so is S := Φ T Φ−1 on C(K). ByTheorem 4.13, S is induced by some continuous mapping ψ : K→ K.We show that (K;ψ) is a minimal TDS with discrete spectrum. Since T |A has dis-crete spectrum on the space A, also S has discrete spectrum on the space C(K) andis mean ergodic thereon. We have 1 = dimfix(T ) = dimfix(S), i.e., ψ is uniquelyergodic. The measure ν on K defined by

〈 f ,ν〉 :=⟨Φ−1 f ,µ

⟩( f ∈ C(K))

is ψ-invariant and strictly positive, because µ is ϕ-invariant and strictly positive.This means that ψ is strictly ergodic, and by Proposition 10.9 the TDS (K;ψ) isminimal.We can now apply Theorem 17.12 to (K;ψ) and obtain the existence of a compactmonothetic group G with generating element a such that (K;ψ) is isomorphic undersome homeomorphism θ : G→ K to the rotation TDS (G;a). This isomorphisminduces an algebra isomorphism Tθ : C(K)→ C(G). We can sum up the above inthe following commutative diagram.

A C(K) C(G)

A C(K) C(G)?

T

-Φ

?

S

-Tθ

?

La

-Φ

-Tθ

where (La) f (g) := f (ag) for f ∈ C(G) (the Koopman operator of the rotation bya). Now A, C(K) and C(G) are dense subspaces in Lp(X), Lp(K,ν) and Lp(G), re-

17.4 Examples 269

spectively (where we take the Haar measure on G). Since (T ′θ)−1ν is an invariant

measure for the minimal TDS (G;a), it coincides with the Haar measure. There-fore Tθ is isometric for the corresponding Lp-norms. The Koopman operators T , Laand Tθ are all isometric lattice isomorphisms if considered on the correspondingLp-spaces (see Chapter 7). Similarly, Φ is isometric by the construction of ν . ByTheorem 7.18, A is Banach lattice and Φ : A→ C(K) is a lattice isomorphism. Bydenseness we can extend Φ to a positive isometry (hence a lattice isomorphism)Φ : Lp(X)→ Lp(K,µ). We obtain that the diagram

Lp(X) Lp(G)

Lp(X) Lp(G)

-Tθ Φ

?Tϕ

?Tψ

-Tθ Φ

is commutative, hence by Corollary 12.13 and its subsequent paragraph the claim isproved.

As in the topological case we deduce from the above theorem that ergodic MDSswith discrete spectrum are completely determined by their point spectrum, i.e., pointspectrum is a complete isomorphism invariant for this class of MDSs. The Halmos–von Neumann theorem is a crucial step in the study of the isomorphism problem ofMDSs, i.e., in the quest for complete isomorphism invariants for dynamical systems.A historical account can be found in [Redei and Werndl (2012)].

Corollary 17.14. Two ergodic MDSs with Koopman operators having discrete spec-trum in Lp, 1 ≤ p < ∞, are isomorphic if and only if the Koopman operators havethe same point spectrum.

A further type of isomorphism of MDSs can be defined as follows. Two MDSs(X;ϕ) and (Y;ψ) are called spectrally isomorphic if there is a Hilbert space iso-morphism S : L2(X)→ L2(Y) intertwining the Koopman operators, i.e., STϕ = Tψ Swith the corresponding Koopman operators on the L2-space. Isomorphic MDSs arespectrally isomorphic by Corollary 12.13 and in particular by the remark followingit. Since Koopman operators of spectrally isomorphic MDSs have the same pointspectrum the next result is an immediate consequence.

Corollary 17.15. Two ergodic MDSs with discrete spectrum are isomorphic if andonly if they are spectrally isomorphic.

17.4 Examples

1. Rotations on Tori

The d-tori G := Td are monothetic groups for d ∈N. Indeed, Kronecker’s Theorem14.31 describes precisely their generating elements. However, we can even form


“infinite dimensional” tori: Let I be a nonempty set, then the product G := TI withthe product topology and the pointwise operations is a compact group. The nexttheorem shows when this group is monothetic. The proof relies on the existence ofsufficiently many rationally independent elements in T.

Theorem 17.16. Consider the compact group G := TI , I a nonempty set. Then G ismonothetic if and only if card(I)≤ card(T).

Proof. If G is monothetic, then by Proposition 17.7 we have that card(G∗) ≤card(T). Since the projections πι : TI → T, ι ∈ I, are all different characters ofG, we obtain card(I)≤ card(G∗)≤ card(T).

Now suppose that card(I)≤ card(T). Take a Hamel basis B of R over Q such that1 ∈B. Then card(B) = card(T), so there is an injective function f : I→B \1.Define a := (aι)ι∈I := (e2πi f (ι))ι∈I ∈ TI . If U ⊆ TI is a nonempty open rectangle,then by Kronecker’s Theorem 14.31 we obtain an ∈U for some n ∈ Z.

The above product construction is very special and makes heavy use of the struc-ture of T. In fact, products of monothetic groups in general may not be monothetic(see Exercise 1). Notice also that these monothetic tori TI are all connected (as theproduct of connected spaces).

2. Dyadic Integers

Consider the measure space ([0,1),Bo,λ ), λ the Lebesgue measure on [0,1), anddefine

ϕ(x) := x− 2k−32k if x ∈

[2k−22k ,

2k−12k

)(k ∈ N).

Then ([0,1),Bo,λ ;ϕ) is an MDS (see Exercise 5.6). Let T := Tϕ be its Koopmanoperator which is unitary on L2([0,1),Bo,λ ). Define

gm := 1[0,2−m)

fm :=2m−1

∑k=0

e−2πik2m T kgm (m ∈ N0).and

Notice that T 2mgm = gm and that fm 6= 0 is an eigenvector of T , since

T fm =2m−1

∑k=0

e−2πik2m T k+1gm = e

2πi2m

2m

∑k=1

e−2πik2m T kgm = e

2πi2m fm,

that is e2πi2m ∈ σp(T ). Since T is multiplicative, it also follows that T f n

m = e2πin2m f n

m.Moreover, as T is unitary, f n

m and f n′m′ are orthogonal in L2 whenever m 6= m′ or

0≤ n 6= n′ ≤ 2m−1. Some computation gives

17.4 Examples 271

gm =1

2m

2m−1

∑k=0

f km.

So, as lingm : m ∈ N0 is dense in L2, lin f km : m ∈ N0, 0 ≤ k < 2m− 1 is also

dense, hence f km : m ∈N0, 0≤ k < 2m−1 forms an orthonormal basis in L2. This

shows that T has discrete spectrum and that

σp(T ) =

e2πik2m : m ∈ N0, 0≤ k ≤ 2m−1

.

By the usual argument of expanding a vector f ∈ fix(T ) in the above orthonor-mal basis, we can prove that f is constant, hence by Theorem 7.14 the MDS([0,1),Bo,λ ;ϕ) is ergodic (cf. proof of Theorem 14.30). Now the Halmos–vonNeumann Theorem 17.13 tells that there is compact group such that the MDS isisomorphic to a group rotation. In our case, however, this group can be describedconcretely. Those who solved Exercise 14.8 might have noticed that σp(T ) is actu-ally algebraically isomorphic to the dual group of the dyadic integers A (see Exam-ple 2.11). In fact, for 1 = (1,0,0, . . .) ∈ A one has

σp(T ) =

e2πik2m : m ∈ N0, 0≤ k ≤ 2m−1

=

χ(1) : χ ∈ A∗= σp(Tϕ1).

So by Corollary 17.14 ([0,1),Bo,λ ;ϕ) is isomorphic to (A,Σ ,µ;1), Σ the (product)Borel algebra, µ the Haar measure on A and ϕ1 the rotation by 1 (notice that therotation on A is ergodic because it is minimal as orb+(1) is dense in A).In contrast to the tori in Example 17.4.1, this group A is totally disconnected, i.e.,every nonempty connected subset of A is a singleton (because it is a product ofdiscrete spaces).

3. Bernoulli Shifts

The Bernoulli shift B(p0, p1, . . . , pk−1) = (W +k ,Σ ,µ;τ), discussed in Example

5.1.5, is ergodic, but—except the trivial case—does not have discrete spectrum.This can be seen as follows. Denote by T := Tτ the Koopman operator on L2 andrecall from Proposition 6.16 that

limn→∞

∫W +

k

T n1A ·1B dµ = limn→∞

µ(τ∗nA∩B) = µ(A)µ(B) =∫

W +k

1A ·1B dµ

holds for all A,B ∈ Σ . It is now a routine argument to show that this holds for anyf ,g ∈ L2(W +

k ,Σ ,µ) replacing 1A and 1B. If f ∈ ker(λ I−T ) for some λ ∈ T, then

limn→∞

λn 〈 f ,g〉= 〈 f ,g〉

follows for all g ∈ L2. But this can only happen if f = 0 or λ = 1. This showsthat σp(T )∩T = 1, and the ergodicity implies dimfix(T ) = 1, so T cannot have


discrete spectrum (except dimL2 = 1). The next example puts the present one in amore general perspective.

4. Weak mixing vs. Discrete Spectrum

Let (X;ϕ) be an MDS and consider its Koopman operator T := Tϕ , say, on E :=L2(X). Since E is a Hilbert space and T is a contraction, the semigroup generatedby T is JdLG-admissible, hence the space can be decomposed as

E := Er⊕Es.

The restriction Tr := T |Er has discrete spectrum on Er, whereas Ts := T |Es is almostweakly stable on Es. Furthermore, by Theorem 16.20, the MDS is weakly mixing ifand only if

E = lin

1⊕Es,

i.e., if the reversible part is lin1. These yield the following evident fact.

Proposition 17.17. Let (X;ϕ) be an MDS that has discrete spectrum and is weaklymixing, then L2(X) = lin1, hence this space is one-dimensional.

The above proposition is an instance of what is called by T. Tao the “dichotomybetween structure and randomness”, see [Tao (2007)].

Exercises

1. Show that a monothetic group is commutative. Describe all finite monotheticgroups.

2. Prove that if G is a closed subgroup of T, then G = T or G is finite cyclic group.

3. Prove that (T;a) and (T;a−1), a ∈ T, are isomorphic.

4. Show that if T ∈L (E), E a Banach space, is Cesaro-bounded, then T is meanergodic with

ran(I−T ) = lin

x ∈ E : ∃1 6= λ ∈ T : T x = λx

5. Let T be a power-bounded operator on a Banach space E and let A⊆ T such thatthe linear hull of ⋃

λ∈A

ker(λ I−T )

is dense in E. Show that σp(T )⊆ A.

6. Let (X;ϕ) be an ergodic MDS whose Koopman operator T := Tϕ has discretespectrum on H := L2(X). Prove that if dimH = ∞, then σ(T ) =T. (We note that thesame is true even without the assumption of “discrete spectrum”.)

Exercises 273

7. Determine the isomorphism classes of rotations on T.

8. Give an example of two nonisomorphic but spectrally isomorphic MDSs (X;ϕ)and (Y;ψ).

Chapter 18A Glimpse at Arithmetic Progressions

In this chapter we shall discuss some of the many applications of ergodic theory to(combinatorial) number theory. An arithmetic progression of length k ∈ N is a setA of natural numbers of the form

a,a+n,a+2n, . . . ,a+(k−1)n=

a+ jn : 0≤ j < k

for some a,n ∈ N. The number a is called the starting point and n is called thedistance of this arithmetic progression. An arithmetic progression of length k isalso called a k-term arithmetic progression.

One aspect of combinatorial number theory deals with the question how “big” asubset A ⊆ N must be in order to ensure that it contains infinitely many arithmeticprogressions of a certain length, or even of arbitrary length. The following result ofvan der Waerden [van der Waerden (1927)] is the classical example for a statementof this kind.

Theorem 18.1 (Van der Waerden). If N = A1 ∪ . . . ∪ Am, then there is j ∈1, . . . ,m such that

∀k ∃a ∃n : a,a+n,a+2n, . . . ,a+(k−1)n ∈ A j.

In other words, if you colour the natural numbers with finitely many colours, thenthere are arbitrarily long monochromatic arithmetic progressions, i.e., progressionsall of whose members carry the same colour. (However, in general one cannot finda monochromatic progression of infinite length.)

Many years later, a major extension was proved by Szemeredi in[Szemeredi (1975)] using the concept of the upper density

d(A) = limsupn→∞

card(A∩1, . . . ,n)n

of a subset A⊆ N (cf. Section 9.2 and Exercise 9.1).

275

276 18 A Glimpse at Arithmetic Progressions

Theorem 18.2 (Szemeredi). Any subset A⊆ N with d(A)> 0 contains arbitrarilylong arithmetic progressions.

In the 1970’s it was discovered that the results of van der Waerden and Szemeredi(and many others) could be obtained also by “dynamical” methods, i.e., by transfer-ring them to statements about dynamical systems. Major contributions came fromFurstenberg who was able to recover van der Waerden’s theorem by methods fromtopological dynamics (see [Furstenberg (1981)], [Pollicott and Yuri (1998)]). Lateron he used ergodic theory to obtain and extend the result of Szemeredi.

His ideas building heavily on the structure theory of ergodic MDSs were furtherdeveloped by many authors and finally led to one of the most striking results in thisarea so far.

Theorem 18.3 (Green–Tao). The set of prime numbers contains arbitrarily longarithmetic progressions.

We refer to [Green and Tao (2008)] and in particular to Tao’s ICM lecture[Tao (2007)] for the connection with ergodic theory.

The discovery of such a beautiful structure within the chaos of prime numberswas preceded and accompanied by various numerical experiments. Starting from thearithmetic progression 7,37,67,97,127,157 (of length 6 and distance 30) the actualworld record (since 16 March, 2012 by J. Fry2 is a progression of length 26 startingat

3 486 107 472 997 423

with distance371 891 575 525 470.

However striking, the Green–Tao theorem still falls short of the following auda-cious conjecture formulated by Erdos and Turan in 1936.

Conjecture 18.4 (Erdos–Turan). Let A⊆ N be such that ∑a∈A(1/a) = +∞. Then Acontains arbitrarily long arithmetic progressions.

The Green–Tao theorem is—even in its ergodic theoretic parts—far beyond thescope of this book. However, we shall describe the major link between “densityresults” (like Szemeredi’s) and ergodic theory and apply it to obtain weaker, butnevertheless stunning results (the theorems of Roth and Furstenberg–Sarkozy, seebelow). Again, our major operator theoretic tool is the JdLG-decomposition.

18.1 The Furstenberg Correspondence Principle

Let us fix a subset A ⊆ N with positive upper density d(A) > 0. For given k ∈ Nconsider the following statement.

2 http://users.cybercity.dk/˜dsl522332/math/aprecords.htmThe quest for a 27-term arithmetic progression still continues.

18.1 The Furstenberg Correspondence Principle 277

There exist a,n ∈ N such that a,a+n,a+2n, . . .a+(k−1)n ∈ A. (APk)

Our goal is to construct an associated dynamical system that allows us to reformulate(APk) in ergodic theoretic terms.

Consider the compact metric space W +2 = 0,1N0 and the left shift τ on it. A

subset B⊆ N0 can be identified with a point in W +2 via its characteristic function:

B⊆ N0 ←→ 1B ∈W +2 .

We further defineK := τn1A : n ∈ N0 ⊆ W +

2 .

Then K is a closed τ-invariant subset of W +2 , i.e., (K;τ) is a (forward transitive)

TDS. The setM :=

(xn)

∞n=0 ∈ K : x0 = 1

is open and closed in K, and hence so is every set τ− j(M) ( j ∈ N0). Note that wehave

n ∈ A if and only if τn(1A) ∈M. (18.1)

We now translate (APk) into a property of this dynamical system. By (18.1), weobtain for a ∈ N and n ∈ N0 the following equivalences.

a,a+n, . . .,a+(k−1)n ∈ A

⇐⇒ τa1A ∈M, τ

nτ

a1A ∈M, . . . , τ(k−1)n

τa1A ∈M

⇐⇒ τa1A ∈M∩ τ

−n(M)∩ . . .∩ τ−(k−1)n(M).

Since each τ− j(M) is open and τa1A : a ∈N is dense in K, we have that (APk) isequivalent to

∃n ∈ N : M∩ τ−n(M)∩ . . .∩ τ

−(k−1)n(M) 6= /0. (18.2)

The strategy to prove (18.2) is now to turn the TDS into an MDS by choosing aninvariant measure µ in such a way that for some n ∈ N the set

M∩ τ−n(M)∩ . . .∩ τ

−(k−1)n(M)

has positive measure. Now, since d(A)> 0, there is a subsequence (n j) j∈N ⊆N suchthat

limj→∞

1n j

n j

∑k=1

1A(k) = d(A)> 0.

With this subsequence we define a sequence of probability measures (µ j) j∈N on Kby

µ j(B) :=1n j

n j

∑k=1

δτk1A

(B) (B ∈ Ba(K), j ∈ N),


where δτk1A

stands for the Dirac measure at τk1A. By using (18.1) we have

µ j(M) =1n j

n j

∑k=1

1A(k)→ d(A) as j→ ∞. (18.3)

The metrisability of the compact set M1(K) for the weak∗-topology implies thatthere exists a subsequence (which we again denote by (µ j) j∈N) weakly∗ convergingto some probability measure µ on K. Since M is open and closed, the characteristicfunction 1M : K→ R is continuous, hence (18.3) implies that µ(M) = d(A).

To show that µ is τ-invariant, take f ∈ C(K) and note that

∫K( f τ)dµ j−

∫K

f dµ j =1n j

n j

∑k=1

(f (τk+11A)− f (τk1A)

)=

1n j

(f (τn j+11A)− f (τ1A)

)for every j ∈ N. By construction of µ , the first term in this chain of equations con-verges to ∫

K( f τ)dµ−

∫K

f dµ

as j→ ∞, the last one converges to 0 since n j → ∞, implying that µ is τ-invariant(cf. Remark 10.3.3).

In this way we obtain a topological MDS (K,µ;τ) such that µ(M) = d(A) > 0.Furthermore, we may suppose without loss of generality that this MDS is ergodic.Indeed,

M1τ(K) = convµ ∈M1

τ(K) : µ is ergodic

by (10.1). Since 1M ∈ C(K) and since by the above there is at least one µ ∈M1τ(K)

with∫

K 1M dµ = µ(M) > 0, there must also be an ergodic measure with this prop-erty.

Consider now the associated Koopman operator T := Tτ on L2(K,µ). The exis-tence of a k-term arithmetic progression in A, i.e., (APk), is certainly ensured by

∃n ∈ N : 1M ·T n1M · · ·T (k−1)n1M 6= 0

or the even stronger statement

limsupN→∞

1N

N−1

∑n=0

∫K

1M · (T n1M) · · ·(T (k−1)n1M)dµ > 0. (18.4)

Hence we have established the following result from [Furstenberg (1977)].

Theorem 18.5 (Furstenberg Correspondence Principle). If for every ergodicMDS (X;ϕ) the Koopman operator T := Tϕ on L2(X) satisfies the condition

18.2 A Multiple Ergodic Theorem 279

limsupn→∞

1N

N−1

∑n=0

∫X

f · (T n f ) · · ·(T (k−1)n f ) > 0 (18.5)

for all 0 < f ∈ L∞(X), then (APk) holds for every A⊆ N with d(A)> 0.

(Here and in the following we write f > 0 when we mean f ≥ 0 and f 6= 0.) In[Furstenberg (1977)] it was shown that (18.5) is indeed true for every ergodic MDSand every k∈N. This completed the ergodic theoretic proof of Szemeredi’s theorem.As a matter of fact, Furstenberg proved the truth of the apparently stronger version

liminfn→∞

1N

N−1

∑n=0

∫X

f · (T n f ) · · ·(T (k−1)n f ) > 0 (18.6)

of (18.5) As we shall see in a moment, this makes no difference because, actually,the limit exists.

18.2 A Multiple Ergodic Theorem

Furstenberg, in his correspondence principle, worked with expressions of the form

1N

N−1

∑n=0

∫X

f · (T n f ) · · ·(T (k−1)n f ) =∫

Xf · 1

N

N−1

∑n=0

(T n f ) · · ·(T (k−1)n f ).

Recently, B. Host and B. Kra in [Host and Kra (2005b)] discovered a surprising factregarding the multiple Cesaro sums on the right-hand side.

Theorem 18.6 (Host–Kra). Let (X;ϕ) be an ergodic MDS, and consider the Koop-man operator T := Tϕ on L2(X). Then the limit of the multiple ergodic averages

limN→∞

1N

N−1

∑n=0

(T n f1) · (T 2n f2) · · ·(T (k−1)n fk−1) (18.7)

exists in L2(X) for all f1, . . . , fk−1 ∈ L∞(X) and k ≥ 2.

Note that for k = 2 the Host–Kra theorem is just von Neumann’s Mean ErgodicTheorem 8.1 (and holds even for arbitrary f1 ∈ L2(X)). We shall give a proof of theHost–Kra theorem for the case k = 3 here. The general case is the subject of Chapter20.

Our major tool is the JdLG-decomposition of E := L2(X) with respect to thesemigroup TT into

L2(X) = ran(Q)⊕ker(Q) = Er⊕Es,

see Example 16.17. The space Er is the Kronecker factor and the projection


Q : L2(X)→ Er

is a Markov projection. Let us recall some facts from Chapters 13 and 16.

Lemma 18.7 (Properties of the Kronecker Factor). The space Er ∩L∞(X) is aclosed subalgebra of L∞(X) and

Q( f ·g) = (Q f ) ·g (18.8)

for all f ∈ L∞(X) and g ∈ Er∩L∞(X). Moreover, Er is generated by eigenfunctionscorresponding to unimodular eigenvalues of T .

In order to prove that

limN→∞

1N

N−1

∑n=0

(T n f ) · (T 2ng) (18.9)

exists for every f ,g ∈ L∞(X), we may split f according to the JdLG-decompositionand consider the cases

(1) f ∈ Es and (2) f ∈ Er

separately. Case (1) is covered by the following lemma, which actually yieldsslightly more information. Notice that the proof is very similar to that of “weaklymixing of all orders”, i.e., Theorem 9.25.

Lemma 18.8. Let f ,g ∈ L∞(X) such that f ∈ Es or g ∈ Es. Then

limN→∞

1N

N−1

∑n=0

(T n f ) · (T 2ng) = 0

in L2(X).

Proof. The proof is an application of the van der Corput Lemma. We let un :=(T n f ) · (T 2ng) and write

(un |un+m ) =∫

X(T n f ) · (T 2ng) · (T n+m f ) · (T 2n+2mg)

=∫

XT n[( f T m f ) ·T n(gT 2mg)

]=∫

X( f T m f ) ·T n(gT 2mg).

Hence, for fixed m ∈ N

1N

N−1

∑n=0

(un |un+m ) =∫X

( f T m f )1N

N−1

∑n=0

T n(gT 2mg) →∫X

f T m f ·∫X

gT 2mg

18.2 A Multiple Ergodic Theorem 281

as N→ ∞ since (X;ϕ) is ergodic. Therefore we have

γm : = limN→∞

∣∣∣ 1N

N−1

∑n=0

(un |un+m )∣∣∣= ∣∣∣∫

X( f T m f ) ·

∫X(gT 2mg)

∣∣∣=∣∣∣( f |T m f )

(g∣∣T 2mg

)∣∣∣.If f ∈ Es, by Theorem 16.19, we obtain that

0≤ 1N

N−1

∑m=0

γm ≤1N

N−1

∑m=0

∣∣( f |T m f )∣∣ · ‖g‖2

∞→ 0 as N→ ∞.

In the case that g ∈ Es, the reasoning is similar. Now, the van der Corput Lemma9.24 implies that

1N

N−1

∑n=0

(T n f ) · (T 2ng) =1N

N−1

∑n=0

un → 0.

Proof of Theorem 18.6, case k = 3: By Lemma 18.8 we may suppose that f ∈ Er.Let g ∈ L∞(X) be fixed, and define (SN)N∈N ⊆L (E) by

SN f :=1N

N−1

∑n=0

(T n f ) · (T 2ng).

If f is an eigenfunction of T with eigenvalue λ ∈ T, then

SN f = f1N

N−1

∑n=0

(λT 2)ng

holds. Since the operator λT 2 is mean ergodic as well (see Theorem 8.6), we havethat (SN f )N∈N converges as N→ ∞. This yields that (SN f )N∈N converges for every

f ∈ lin

h : T h = λh for some λ ∈ T,

which is a dense subset of Er by Theorem 16.34. Since the estimate ‖SN‖ ≤ ‖g‖∞ isvalid for all N ∈ N, we obtain the convergence of (SN)N∈N on the whole Er.

In order to obtain Theorem 18.2 it remains to show that the limit in the Host–Kratheorem is positive whenever 0 < f = f1 = . . .= fk−1.

Proposition 18.9. In the setting of Theorem 18.6

limN→∞

1N

N−1

∑n=0

f · (T n f ) · (T 2n f ) · · ·(T (k−1)n f ) > 0

whenever 0 < f ∈ L∞(X).


Again, we prove this result for k = 3 only. Our argument follows Furstenberg[Furstenberg (1981), Theorem 4.27].

Proof. It suffices to show that

limN→∞

1N

N−1

∑n=0

∫X

f · (T n f ) · (T 2n f ) > 0. (18.10)

We have f = fs + fr with fr := Q f ∈ Er and fs = f −Q f ∈ Es. Since Q is a Markovoperator and f > 0, it follows that fr > 0 as well. Let g,h ∈ L∞(X) be arbitrary.Since Er ⊥ Es in L2(X) it follows from Lemma 18.7 that∫

Xf · (T ng) · (T 2nh) = 0

whenever two of the functions f ,g,h are in Er and the remaining one is from Es. ByLemma 18.8 we have

1N

N−1

∑n=0

∫X

f · (T ng) · (T 2nh) → 0

if at least one of the functions g,h is from Es. So we have

limN→∞

1N

N−1

∑n=0

∫X

f · (T n f ) · (T 2n f ) = limN→∞

1N

N−1

∑n=0

∫X

fr · (T n fr) · (T 2n fr).

We may therefore suppose without loss of generality that f = fr.Take now 0 < ε <

∫X f 3 and suppose that ‖ f‖∞ ≤ 1, again without loss of gen-

erality. We show that there is some subset B ⊆ N with bounded gaps (also calledrelatively dense or syndetic; see Definition 3.9.b) such that∫

Xf · (T n f ) · (T 2n f ) >

∫X

f 3 − ε for all n ∈ B. (18.11)

Recall that this property of B means that there is L ∈ N such that every interval oflength L−1 intersects B.

Since f ∈ Er, there exists ψ of the form ψ = c1g1 + . . .+ c jg j for some eigen-functions g1, . . . ,g j of T corresponding to eigenvalues λ1, . . . ,λ j ∈ T such that‖ f −ψ‖2 < ε

6 . Observe first that for every δ > 0 there is a relatively dense setB⊆ N such that

|λ n1 −1|< δ , . . . , |λ n

j −1|< δ for all n ∈ B.

Indeed, if we consider the rotation by (λ1, . . . ,λ j) on T j, this is precisely the almostperiodicity of the point (1, . . . ,1) ∈ T j (see Proposition 3.12). Now, by taking asuitable δ > 0 depending on the coefficients c1, . . .c j, we obtain

18.3 The Furstenberg–Sarkozy Theorem 283

‖T nψ−ψ‖2 <

ε

12

‖T n f − f‖2 ≤ ‖T n f −T nψ‖2 +‖T n

ψ−ψ‖2 +‖ψ− f‖2 <ε

2and

for all n ∈ B. Moreover, for n ∈ B one has

‖T 2n f − f‖2 ≤ ‖T 2n f −T 2nψ‖2 +‖T 2n

ψ−T nψ‖2 +‖T n

ψ−ψ‖2 +‖ψ− f‖2

≤ 2‖ f −ψ‖2 +2‖T nψ−ψ‖2 <

ε

2.

Altogether we obtain∣∣∣∫X

f ·T n f ·T 2n f −∫X

f 3∣∣∣≤ ∫

X

f ·T n f · |T 2n f − f | +∫X

f 2 · |T n f − f |

≤ ‖ f‖2∞

(‖T 2n f − f‖2 +‖T n f − f‖2

)< ε,

and (18.11) is proved. Since B∩ [ jL,( j+1)L) 6= /0 for all j ∈ N0, it follows that

1LN

LN−1

∑n=0

∫X

f ·T n f ·T 2n f ≥ 1LN

LN−1

∑n∈B,n=0

∫X

f ·T n f ·T 2n f

≥ 1L

(∫X

f 3 − ε

)> 0.

Combining the above results leads to the classical theorem of Roth, a precursorof Szemeredi’s Theorem 18.2.

Theorem 18.10 (Roth). Let A ⊆ N have positive upper density. Then A containsinfinitely many arithmetic progressions of length 3.

The full proof of Szemeredi’s theorem is based on an inductive proce-dure involving “higher-oder” decompositions of a JdLG-flavour. See for in-stance [Furstenberg (1977), Furstenberg (1981)], [Petersen (1989)], [Tao (2008a)],[Green (2008)] or [Einsiedler and Ward (2011)].

18.3 The Furstenberg–Sarkozy Theorem

We continue in the same spirit, but instead of arithmetic progressions we now con-sider pairs of the form a,a+n2 for a,n ∈ N.

Theorem 18.11 (Furstenberg–Sarkozy). Let A ⊆ N have positive upper density.Then there exist a,n ∈ N such that a,a+n2 ∈ A.

For a more general statement we refer to [Furstenberg (1981), Proposition 3.19].In order to prove the above theorem we first need an appropriate correspondenceprinciple.


Theorem 18.12 (Furstenberg Correspondence Principle for Squares). If for ev-ery ergodic MDS (X;ϕ) and its Koopman operator T := Tϕ on L2(X) the condition

limsupN→∞

1N

N−1

∑n=0

∫X

f · (T n2f ) > 0 (18.12)

is satisfied for all 0 < f ∈ L∞(X), then Theorem 18.11 holds.

The proof is analogous to the one of Theorem 18.5. To show (18.12) we begin withthe following “nonconventional” ergodic theorem.

Theorem 18.13 (Mean Ergodic Theorem for Squares). Let T be a contractionon a Hilbert space H. Then the Cesaro square averages

SNu :=1N

N−1

∑n=0

T n2u

converge for every u ∈ H as N→ ∞.

Proof. We suppose here in addition that T is an isometry and leave the proof ofthe general case to the reader. By the JdLG-decomposition it suffices to prove theassertion for u ∈ Hr and u ∈ Hs separately.Take first u ∈ Hr satisfying Tu = e2πiα u for some α ∈ [0,1). Then we have

SNu =1N

N−1

∑n=0

e2πiαn2u.

If α is irrational, this sequence converges to 0 by the equidistribution of (αn2)n (seePropositions 10.20 and 10.21). If α ∈ Q, say dα ∈ N0 for some d ∈ Z \ 0, thenone can see that

1N

N−1

∑n=0

e2πiαn2 → 1d

d−1

∑j=0

e2πiα j2 (N→ ∞).

Since each SN ise contractive and since the linear span of all eigenvectors corre-sponding to unimodular eigenvalues is dense in Hr (see Theorem 16.34), we obtainconvergence for every u ∈ Hr.

Take now u ∈ Hs and define un := T n2u. In order to use the van der Corput lemma

we calculate

(un |un+m ) =(

T n2u∣∣∣T (n+m)2

u)=(

u∣∣∣T 2nm+m2

u)=(

T ∗m2u∣∣∣(T 2m)nu

),

where we have used that T is an isometry. This implies for any fixed m ∈ N that

18.3 The Furstenberg–Sarkozy Theorem 285

γm = limsupN→∞

∣∣∣ 1N

N−1

∑n=0

(un |un+m )∣∣∣≤ limsup

N→∞

1N

N−1

∑n=0

∣∣∣(T ∗m2u∣∣∣(T 2m)nu

)∣∣∣≤ 2m limsup

N→∞

12mN

2mN−1

∑j=0

∣∣∣(T ∗m2u∣∣∣T ju

)∣∣∣= 0,

since u ∈ Hs, where the last equality follows from Theorem 16.19. The van derCorput Lemma 9.24 concludes the proof.

So we see that, in particular, for an MDS (X;ϕ) and its Koopman operator T := Tϕ

the limit

limN→∞

1N

N−1

∑n=0

f ·T n2f (18.13)

exists in L2(X) for every f ∈ L∞(X). To finish the proof of Theorem 18.11 it remainsto show that it is positive whenever f is.

Proposition 18.14. Let (X;ϕ) be an ergodic MDS. Then the limit (18.13) is positivefor every 0 < f ∈ L∞(X).

The proof is analogous to that of Proposition 18.9 and again uses the decomposi-tion L2(X) = Es⊕Er, the vanishing of the above limit on Es and a relative densityargument on Er.

Remarks 18.15. 1) A sequence (nk)k∈N is called a Poincare sequence (see, e.g.,[Furstenberg (1981), Def. 3.6]) if for every MDS (X;ϕ), X = (X ,Σ ,µ), andA ∈ Σ with µ(A)> 0 one has

µ(A∩ϕ−nk(A))> 0 for some k ∈ N.

Poincare’s Theorem 6.10 tells that (n)n∈N is a Poincare sequence, hence theterminology. One can prove that Proposition 18.14 remains valid even for notnecessarily ergodic MDSs, so we obtain that (n2)n∈N is a Poincare sequence.

2) The above results remain true if one replaces the polynomial p(n) = n2 by anarbitrary polynomial p with integer coefficients and p(0) = 0, see Furstenberg[Furstenberg (1981), pp. 66–75].

Final remarks

There are many generalisations and modifications of the above results. We onlyquote a few examples. The first is from [Furstenberg and Weiss (1996)].

Theorem 18.16 (Furstenberg–Weiss). Let (X;ϕ) be an MDS with associatedKoopman operator T := Tϕ on X = L2(X). Then the limit of the multiple ergodicaverages

limN→∞

1N

N−1

∑n=0

T an f ·T bng (18.14)


exists in L2(X) for all f ,g ∈ L∞(X) and a,b ∈ N.

Remark 18.17. In [Furstenberg and Weiss (1996)] it is also shown that the averages

limN→∞

1N

N−1

∑n=0

T n f ·T n2g

converge in L2(X) for all f ,g ∈ L∞(X). Together with the positivity of this limit forf = g > 0 and the Furstenberg correspondence principle this yields the existence of(many) triples a,a+n,a+n2 in any A⊆ N with d(A)> 0.

Host and Kra [Host and Kra (2005a)] proved convergence of multiple ergodicaverages for strongly ergodic systems with powers p1(n), . . . , pk−1(n) replacingn, . . . ,(k− 1)n in (18.7), where p1, . . . , pk−1 are arbitrary polynomials with integercoefficients. This leads to the existence of a subset of the form a,a+ p1(n), . . . ,a+pk−1(n) in every set A⊆ N with positive upper density.

Remark 18.18. The results of Szemeredi and Furstenberg–Sarkozy presented inthis Chapter remain true in a slightly strengthened form. One can replace the upperdensity d by the so-called upper Banach-density defined as

Bd(A) := limsupm,n→∞

card(A∩m, . . . ,m+n)n+1

.

The actual change to be carried out in order to obtain these generalisations is inFurstenberg’s Correspondence Principle(s), see [Furstenberg (1981)].

Exercises

Chapter 19Joinings

19.1 Dynamical Systems and their Joinings

19.2 Inductive Limits and the Minimal Invertible Extension

19.3 Relatively Independent Joinings

Supplement: Tensor Products and Projective Limits

Exercises

287

Chapter 20The Host–Kra–Tao Theorem

20.1 Idempotent Classes

20.2 The Existence of Sated Extensions

20.3 The Furstenberg Self-Joining

20.4 Proof of the Host–Kra–Tao Theorem

Exercises

289

Chapter 21More Ergodic Theorems

21.1 Polynomial Ergodic Theorems

21.2 Weighted Ergodic Theorems

21.3 Wiener–Wintner Theorems

21.4 Even More Ergodic Theorems

1. Subsequential ergodic theorems

2. Entangled ergodic theorems

3. Multiple ergodic theorems

4. Bourgain return time theorem

Exercises

291

Appendix ATopology

A.1 Metric Spaces

A metric space is a pair (Ω ,d) consisting of a nonempty set Ω and a functiond : Ω ×Ω → R with the following properties:

1) d(x,y)≥ 0, and d(x,y) = 0 if and only if x = y.

2) d(x,y) = d(y,x).

3) d(x,y)≤ d(x,z)+d(z,y) (triangle inequality).

Such a function d is called a metric on Ω . If instead of 1) we require only d(x,x)= 0for all x ∈Ω , we have a semi-metric. For A⊆Ω and x ∈Ω we define

d(x,A) := inf

d(x,y) : y ∈ A,

called the distance of x from A. By a ball with center x and radius r > 0 we meaneither of the sets

B(x,r) :=

y ∈Ω : d(x,y)< r,

B(x,r) :=

y ∈Ω : d(x,y)≤ r.

A set O⊆Ω is called open if for each x ∈O there is a ball B⊆O with centre x andradius r > 0. A set A⊆Ω is called closed if Ω \A is open. The ball B(x,r) is open,and the ball B(x,r) is closed for any x ∈Ω and r > 0.

A sequence (xn)n∈N in Ω is convergent to the limit x ∈ Ω (in short: xn → x),if for all ε > 0 there is n0 ∈ N with d(xn,x) < ε for n ≥ n0. Limits are unique, i.e.,if xn → x and xn → y, then x = y. For a subset A ⊆ Ω the following assertions areequivalent.

(i) A is closed.

(ii) if x ∈Ω and d(x,A) = 0, then x ∈ A.

(iii) if (xn)n∈N ⊆ A and xn→ x ∈Ω , then x ∈ A.

293

294 A Topology

A Cauchy sequence (xn)n∈N in Ω is a sequence with the property that for all ε >0 there is n0 ∈ N with d(xn,xm) < ε for n,m ≥ n0. Each convergent sequence isa Cauchy sequence. If conversely every Cauchy sequence is convergent, then themetric d as well as the metric space (Ω ,d) is called complete.

A.2 Topological Spaces

Let Ω be a set. A set system O ⊆ P(Ω) is called a topology on Ω if it has thefollowing three properties:

1) /0,Ω ∈ O;

2) O1, . . . ,On ∈ O, n ∈ N ⇒ O1∩·· ·∩On ∈ O;

3) Oι ∈ O, ι ∈ I ⇒⋃

ι∈I Oι ∈ O.

A topological space is a pair (Ω ,O), where Ω is a set and O⊆ P(Ω) is a topologyon Ω . A subset O⊆ Ω is called open if O ∈ O. A subset A⊆ Ω is called closed ifAc = Ω \A is open. By de Morgans’ laws, finite unions and arbitrary intersectionsof closed sets are closed.

If (Ω ,d) is a metric space, the family O := O ⊆ Ω : O open of open sets(as defined in A.1) is a topology, called the topology induced by the metric[Willard (2004), Thm. 2.6].

A topological space is called metrisable if there exists a metric that induces thetopology. It is called completely metrisable if it is metrisable by a complete metric.Not every topological space is metrisable, cf. Section A.7 below. On the other hand,two different metrics may give rise to the same topology. In this case the metrics arecalled equivalent. It is possible in general that a complete metric is equivalent toone which is not complete.

If O,O′ are both topologies on Ω and O′ ⊆ O, then O is called finer than O′ andO′ coarser than O. On each set there is a coarsest topology, namely O = /0,Ω,called the trivial topology; and a finest topology, namely O = P(Ω), called thediscrete topology. The discrete topology is metrisable, e.g., by the discrete metricd(x,y) := 1x(y) = δxy (Kronecker delta). Clearly, a topology on Ω is the discretetopology if and only if every singleton x is an open set.

If (Oι)ι is a family of topologies on Ω , then⋂

ι Oι is again a topology. Hence ifE⊆ P(Ω) is any set system one can consider

τ(E) :=⋂

O : E⊆ O⊆ P(Ω), O is a topology,

the coarsest topology in which each member of E is open. It is called the topologygenerated by E.

Let (Ω ,O) be a topological space. A neighbourhood in Ω of a point x ∈Ω is asubset U ⊆ Ω such that there is an open set O ⊆ Ω with x ∈ O ⊆U . A set is open

A.2 Topological Spaces 295

if and only if it is a neighbourhood of all of its points. If A is a neighbourhood ofx, then x is called an interior point of A. The set of all interior points of a set A isdenoted by A and is called the interior of A. The closure of a subset A⊆Ω is

A :=⋂

F : A⊆ F ⊆Ω , F closed,

which is obviously the smallest closed set that contains A. Alternatively, the closureof A is sometimes denoted by

clA or clO A

especially when one wants to stress the particular topology considered.The (topological) boundary of A is the set ∂A := A\A. For x∈Ω one has x∈ A

if and only if every neighbourhood of x has nonempty intersection with A. If (Ω ,d)is a metric space and A⊆Ω , then A = x : d(x,A) = 0, and x ∈ A if and only if xis the limit of a sequence in A.

A subset A of a topological space (Ω ,O) is called dense in Ω if A = Ω . Atopological space Ω is called separable if there is a countable set A⊆Ω which isdense in Ω .

If Ω ′ ⊆Ω is a subset, then the subspace topology on Ω ′ (induced by the topol-ogy O on Ω ) is given by OΩ ′ := Ω ′∩O : O ∈O. If O is induced by a metric d onΩ , then the restriction of d to Ω ′×Ω ′ is a metric on Ω ′ and this metric induces thesubspace topology there.

A subspace of a separable metric space is again separable, but the analogousstatement for general topological spaces is false [Willard (2004), §16].

A topological space Ω is called Hausdorff if any two points x,y∈Ω can be sep-arated by disjoint open neighbourhoods, i.e., there are U,V ∈O with U ∩V = /0 andx ∈U , y ∈ V . A subspace of a Hausdorff space is Hausdorff and each metric spaceis Hausdorff. If O, O′ are two topologies on Ω , O finer than O′ and O′ Hausdorff,then also O is Hausdorff. In a Hausdorff space each singleton x is a closed set.

If (Ω ,O) is a topological space and x ∈Ω , then one calls

U(x) := U ⊆Ω : U is a neighbourhood of x

the neighbourhood filter of x ∈ Ω . A system U ⊆ U(x) is called an (open) neigh-bourhood base for x or a fundamental system of (open) neighbourhoods of x,if for each O ∈ U(x) there is some U ∈ U with x ∈ U ⊆ O (and each U ∈ U isopen). In a metric space (Ω ,d), the family of open balls B(x,1/n), n ∈N, is an openneighbourhood base of x ∈Ω .

A base B ⊆ O for the topology O on Ω is a system such that each open setcan be written as a union of elements of B. If B is a base for the topology of Ω ,then U ∈ B : x ∈U is an open neighbourhood base for x, and if Ux is an openneighbourhood base for x, for each x ∈ Ω , then B :=

⋃x∈Ω Ux is a base for the

296 A Topology

topology. For example, in a metric space (Ω ,d) the family of open balls is a basefor the topology.

A topological space is called second countable if it has a countable base for itstopology.

Lemma A.1. Every second countable space is separable, and every separable met-ric space is second countable.

A proof is in [Willard (2004), Thm. 16.11]. A separable topological space need notbe second countable [Willard (2004), §16].

Let (Ω ,O) be a topological space and let Ω ′ ⊆ Ω . A point x ∈ Ω ′ is calledan isolated point of Ω ′ if the singleton x is open. Further, x ∈ Ω is called anaccumulation point of Ω ′ if it is not isolated in Ω ′∪x for the subspace topology.Equivalently, every neighbourhood of x in Ω contains points of Ω ′ different from x.A point x ∈Ω is a cluster point of the sequence (xn)n∈N ⊆Ω if any neighbourhoodof x contains infinitely many members of the sequence, i.e., if

x ∈⋂

n∈N

xk : k ≥ n

.

If Ω is a metric space, then x a cluster point of a sequence (xn)n∈N ⊆Ω if and onlyif x is the limit of a subsequence (xkn)n∈N.

A.3 Continuity

Let (Ω ,O), (Ω ′,O′) be topological spaces. A mapping f : Ω →Ω ′ is called contin-uous at x ∈Ω , if for each (open) neighbourhood V of f (x) in Ω ′ there is an (open)neighbourhood U of x such that f (U)⊆V .

The mapping f is called continuous if it is continuous at every point. Equiva-lently, f is continuous if and only if the inverse image f−1(O) of each open (closed)set O ∈ O′ is open (closed) in Ω . If we want to indicate the used topologies, we saythat f : (Ω ,O)→ (Ω ′,O′) is continuous.

If Ω is endowed with the discrete topology or Ω ′ is endowed with the trivialtopology, then every mapping f : Ω →Ω ′ is continuous.

For metric spaces, the continuity of f : Ω → Ω ′ (at x) is the same as sequentialcontinuity (at x), i.e., f is continuous at x∈Ω if and only if f (xn)→ f (x) wheneverxn→ x in Ω .

A mapping f : Ω →Ω ′ is called a homeomorphism if f is bijective and both fand f−1 are continuous. Two topological spaces are called homeomorphic if thereis a homeomorphism that maps one onto the other. The Hausdorff property, separa-bility, metrisability, complete metrisability and the property of having a countablebase are all preserved under homeomorphisms.

A mapping f : (Ω ,d) → (Ω ′,d′) between two metric spaces is called uni-formly continuous if for each ε > 0 there is δ > 0 such that d(x,y) < δ implies

A.4 Inductive and Projective Topologies 297

d′( f (x), f (y))< ε for all x,y ∈Ω . Clearly, a uniformly continuous mapping is con-tinuous. If A⊆ Ω then the distance function d(·,A) from Ω to R is uniformly con-tinuous since the triangle inequality implies that

|d(x,A)−d(y,A)| ≤ d(x,y) (x,y ∈Ω).

A.4 Inductive and Projective Topologies

Let (Ωι ,Oι), ι ∈ I, be a family of topological spaces, let Ω be a nonempty set, andlet fι : Ωι →Ω , ι ∈ I, be given mappings. Then

Oind :=

A⊆Ω : f−1ι (A) ∈ Oι for all ι ∈ I

.

is a topology on Ω , called the inductive topology with respect to the family ( fι)ι∈I .It is the finest topology such that all the mappings fι become continuous. A mappingg : (Ω ,Oind)→ (Ω ′,O′) is continuous if and only if all the mappings g fι : Ωι →Ω ′, ι ∈ I, are continuous.

As an example of an inductive topology we consider a topological space (Ω ,O)and a surjective map f : Ω → Ω ′. Then the inductive topology on Ω ′ with respectto f is called the quotient topology. In this case Ω ′ is called a quotient space of Ω

with respect to f , and f is called a quotient mapping.A common instance of this situation is when Ω ′ = Ω/∼ is the set of equivalence

classes with respect to some equivalence relation ∼ on Ω , and f : Ω →Ω/∼ is thenatural mapping, i.e., f maps each point to its equivalence class. A set A⊆ Ω/∼ isopen in Ω/∼ if and only if

⋃A, the union of the elements in A, is open in Ω .

Let (Ωι ,Oι), ι ∈ I, be a family of topological spaces, let Ω be a nonempty set,and let fι : Ω →Ωι , ι ∈ I, be given mappings. The projective topology on Ω withrespect to the family ( fι)ι∈I is

Oproj := τ

f−1ι (Oι) : Oι ∈ Oι for all ι ∈ I

⊆ P(Ω).

This is the coarsest topology for which all the functions fι become continuous. Abase for this topology is

Bproj := f−1ι1

(Oι1)∩·· ·∩ f−1ιn (Oιn) : n ∈ N, Oιk ∈ Oιk for all k = 1, . . . ,n.

A mapping g : (Ω ′,O′)→ (Ω ,Oproj) is continuous if and only if all the functionsfι g : Ω ′→Ωι , ι ∈ I, are continuous.

An example of a projective topology is the subspace topology. Indeed, if Ω is atopological space and Ω ′ ⊆ Ω is a subset, then the subspace topology on Ω ′ is theprojective topology with respect to the inclusion mapping Ω ′→Ω .

298 A Topology

Ω Ω ′

Ωι

g

fι g fι

Ω ′ Ω

Ωι

g

fιfι g

Fig. A.1 Continuity for the inductive respectively for the projective topol-ogy

A.5 Product Spaces

Let (Ωι)ι∈I be a nonempty family of nonempty topological spaces. The producttopology on

Ω := ∏ι∈I

Ωι =

x : I→⋃

Ωι : x(ι) ∈Ωι ∀ ι ∈ I

is the projective topology with respect to the canonical projections πι : Ω → Ωι .Instead of x(ι) we usually write xι . A base for this topology is formed by the opencylinder sets

Aι1,...,ιn := x = (xι)ι∈I : xιk ∈ Oιk for k = 1, . . . ,n

for ι1, . . . , ιn ∈ I, n ∈ N and Oιk open in Ωιk . For the product of two (or finitelymany) spaces we also use the notation Ω ×Ω ′ and the like. Convergence in theproduct space is the same as coordinatewise convergence. If Ωι = Ω for all ι ∈ I,then we use the notation Ω I for the product space.

If (Ω j,d j), j ∈ I is a family of at most countably many (complete) metric spaces,then their product ∏ j∈I Ω j is (completely) metrisable. E.g., if I ⊆N one can use thethe metric

d(x,y) := ∑j∈I

d(x j,y j)

2 j(1+d(x j,y j))(x,y ∈∏ j∈I Ω j).

It follows from the triangle inequality that in a metric space (Ω ,d) one has

|d(x,y)−d(u,v)| ≤ d(x,u)+d(y,v) (x,y,u,v ∈Ω).

This shows that the metric d : Ω ×Ω → R is uniformly continuous.

A.7 Compactness 299

A.6 Spaces of Continuous Functions

For Ω a topological space the sets C(Ω) and Cb(Ω) of all continuous, respectivelybounded and continuous functions f : Ω → K are algebras over K with respect topointwise multiplication and addition (K stands for R or C). The unit element is 1,the function which is constant with value 1. For f ∈ Cb(Ω) one defines

‖ f‖∞

:= sup| f (x)| : x ∈Ω

.

Then ‖·‖∞

is a norm on Cb(Ω), turning it into a commutative unital Banach algebra.Convergence with respect to ‖·‖

∞is called uniform convergence. In the case that

Ω is a metric space, the set

BUC(Ω) :=

f ∈ Cb(Ω) : f is uniformly continuous

is a closed subalgebra of Cb(Ω).

For a general topological space Ω , the space C(Ω) may be quite “small”. Forexample, if Ω carries the trivial topology, the only continuous functions thereon arethe constant ones. In “good” topological spaces, the continuous functions separatethe points, i.e., for every x,y ∈ Ω such that x 6= y there is f ∈ C(Ω) such thatf (x) 6= f (y). (Such spaces are necessarily Hausdorff.) An even stronger property iswhen continuous functions separate closed sets. This means that for every pair ofdisjoint closed subsets A,B⊆Ω there is a function f ∈ C(Ω) such that

0≤ f ≤ 1, f (A)⊆ 0, f (B)⊆ 1.

If (Ω ,d) is a metric space and A ⊆ Ω , then the function d(·,A) is uniformly con-tinuous; moreover, it is zero precisely on A. Hence by considering functions of thetype

f (x) :=d(x,B)

d(x,A)+d(x,B)(x ∈Ω)

for disjoint closed sets A,B⊆Ω , one obtains the following.

Lemma A.2. On a metric space (Ω ,d), bounded uniformly continuous functionsseparate closed sets.

A.7 Compactness

Let Ω be a set. A collection F ⊆ P(Ω) of subsets of Ω is said to have the finiteintersection property if

F1∩·· ·∩Fn 6= /0

for every finite subcollection F1, . . . ,Fn ∈ F, n ∈ N.

300 A Topology

A topological space (Ω ,O) has the Heine–Borel property if every collection ofclosed sets with the finite intersection property has a nonempty intersection. Equiv-alently, Ω has the Heine–Borel property if every open cover of Ω has a finite sub-cover.

A topological space is called compact if it is Hausdorff and has the Heine–Borelproperty. (Note that this deviates slightly from the definition in [Willard (2004),Def. 17.1].) A subset Ω ′ ⊆Ω of a topological space is called compact if it is com-pact with respect to the subspace topology. A compact set in a Hausdorff spaceis closed, and a closed subset in a compact space is compact [Willard (2004),Thm. 17.5]. A relatively compact set is a set whose closure is compact.

Lemma A.3. a) Let Ω be a Hausdorff topological space, and let A,B ⊆ Ω bedisjoint compact subsets of Ω . Then there are disjoint open subsets U,V ⊆Ω

such that A⊆U and B⊆V .

b) Let Ω be a compact topological space, and let O ⊆ Ω be open and x ∈ O.Then there is an open set U ⊆Ω such that x ∈U ⊆U ⊆ O.

A proof of a) is in [Lang (1993), Prop. 3.5]. For b) apply a) to A = x and B = Oc.

Proposition A.4. Let (Ω ,O) be a compact space.

a) If f : Ω →Ω ′ is continuous and Ω ′ Hausdorff, then f (A) = f (A) is compactfor every A⊆Ω . If in addition f is bijective, then f is a homeomorphism.

b) If O′ is another topology on Ω , coarser than O but still Hausdorff, thenO= O′.

c) Every continuous function on Ω is bounded, i.e., Cb(Ω) = C(Ω).

For the proof see [Willard (2004), Thm.s 17.7 and 17.14].

Theorem A.5 (Tychonov). Suppose (Ωι)ι∈I is a family of nonempty topologicalspaces. Then the product space Ω = ∏ι∈I Ωι is compact if and only if each Ωι , ι ∈ Iis compact.

The proof rests on Zorn’s lemma and can be found in [Willard (2004), Thm. 17.8]or [Lang (1993), Thm. 3.12].

Theorem A.6. For a metric space (Ω ,d) the following assertions are equivalent.

(i) Ω is compact.

(ii) Ω is complete and totally bounded, i.e., for each ε > 0 there is a finite setF ⊆Ω such that Ω ⊆

⋃x∈F B(x,ε).

(iii) Ω is sequentially compact, i.e., each sequence (xn)n∈N ⊆ Ω has a conver-gent subsequence.

Moreover, if Ω is compact, it is separable and has a countable base, and C(Ω) =Cb(Ω) = BUC(Ω).

For the proof see [Lang (1993), Thm. 3.8–Prop. 3.11].

A.8 Connectedness 301

Theorem A.7 (Heine–Borel). A subset of Rd is compact if and only if it is closedand bounded.

A proof can be found in [Lang (1993), Cor. 3.13].if each of

Theorem A.8. For a compact topological space K the following assertions areequivalent.

(i) K is metrisable.

(ii) K is second countable.

The proof is based on Urysohn’s metrisation theorem [Willard (2004), Thm. 23.1].

A Hausdorff topological space (Ω ,O) is locally compact if each of its points hasa compact neighbourhood. Equivalently, a Hausdorff space is locally compact if therelatively compact open sets form a base for its topology.

If Ω is locally compact, A⊆Ω is closed and O⊆Ω is open, then A∩O is locallycompact [Willard (2004), Thm. 18.4].

A compact space is (trivially) locally compact. If Ω is locally compact but notcompact, and if p is some point not contained in Ω , then there is a unique compacttopology on Ω ∗ := Ω ∪p such that Ω is a dense open set in Ω ∗ and the subspacetopology on Ω coincides with its original topology. This construction is called theone-point compactification of Ω [Willard (2004), Def. 19.2].

A.8 Connectedness

A topological space (Ω ,O) is connected if Ω = A∪B with disjoint open (closed)sets A,B ⊆ Ω implies that A = Ω or B = Ω . Equivalently, a space is connected ifand only if the only closed and open (abbreviated clopen) sets are /0 and Ω itself.A subset A ⊆ Ω is called connected if it is connected with respect to the subspacetopology. A disconnected set (space) is one which is not connected.

A continuous image of a connected space is connected. The connected subsets ofR are precisely the intervals (finite or infinite).

A topological space (Ω ,O) is called totally disconnected if the only connectedsubsets in Ω are the trivial ones that is /0 and the singletons x, x ∈Ω .

Subspaces and products of totally disconnected spaces are totally disconnected[Willard (2004), Thm. 29.3]. Every discrete space is totally disconnected. Conse-quently, products of discrete spaces are totally disconnected. An example of a notdiscrete, compact, uncountable space is the Cantor set C. One way to construct C isto remove the open middle third from the unit interval, and then in each step removethe open middle third of each of the remaining intervals. In the limit, one obtains

C =

x ∈ [0,1] : x = ∑∞

n=1a j3 j , j ∈ 0,2

.

302 A Topology

The space C is homeomorphic to 0,1N via the map x 7→ (a j/2) j∈N[Willard (2004), Ex. 19.9.c].

Theorem A.9. A compact space is totally disconnected if and only if the set ofclopen subset of Ω is a base for its topology.

The proof is in [Willard (2004), Thm. 29.7].

A topological space (Ω ,O) is called extremally disconnected if the closure ofevery open set G ∈ O is open. Equivalently, a space is extremally disconnected ifeach pair of disjoint open sets have disjoint closures.

Open subspaces of extremally disconnected spaces are extremally disconnected.Every discrete space is extremally disconnected, and an extremally disconnectedmetric space is discrete. In particular, the Cantor set is not extremally disconnected,and this shows that products of extremally disconnected spaces need not be ex-tremally disconnected [Willard (2004), §15G].

A.9 Category

A subset A of a topological space Ω is called nowhere dense if its closure has emptyinterior, i.e., (A) = /0. A set A is called of first category in Ω if it is the union ofcountably many nowhere dense subsets of Ω . Clearly, countable unions of sets offirst category are of first category.

A set A is called of second category in Ω if it is not of first category. Equiva-lently, A is of second category if, whenever A⊆

⋃n∈N An then one of the sets An∩A

must contain an interior point.

Sets of first category are considered to be “small”, whereas sets of second cat-egory are “large”. A topological space (Ω ,O) is called a Baire space if everynonempty open subset of Ω is “large”, i.e., of second category in Ω . The proofof the following result is in [Willard (2004), Cor. 25.4].

Theorem A.10. Each locally compact space and each complete metric space is aBaire space.

A countable intersection of open sets in a topological space Ω is called a Gδ -set;and a countable union of closed sets of Ω is called a Fσ -set. In a metric space, everyclosed set A is a Gδ -set since A =

⋂n∈Nx ∈ Ω : d(x,A) < 1/n. The following

result follows easily from the definitions.

Theorem A.11. Let Ω be a Baire-space.

a) An Fσ -set is of first category if and only if it has empty interior.

b) A Gδ -set is of first category if and only if it is nowhere dense.

c) A countable intersection of dense Gδ -sets is dense.

d) A countable union of Fσ -sets with empty interior has empty interior.

A.11 Nets 303

A.10 Polish Spaces

A topological space Ω is called a Polish space if it is separable and completelymetrisable.

Lemma A.12. Any Gδ -subset of a Polish space is again a Polish space.

The proof is in [Willard (2004), Thm. 24.12].

Theorem A.13. Let Ω be a locally compact space. Then Ω is Polish if and only if itis second countable.

Proof. If Ω is Polish, then it is separable and metrisable, so it is second countable.Conversely, if it is second countable, then so is its one-point compactification Ω ∗.By Theorem A.8, this is metrisable, and hence a Polish space by Theorem A.6. ThenΩ is also Polish because it is an open subset of a Polish space.

A.11 Nets

A directed set is a set Λ together with a relation ≤ on Λ with the following proper-ties:

1) α ≤ α ,

2) α ≤ β and β ≤ γ imply that α ≤ γ ,

3) for every α,β ∈Λ there is γ ∈Λ such that α,β ≤ γ .

The relation≤ is called a direction on Λ . Typical examples of directed sets include

a) N with the natural meaning of ≤.

b) N with the direction given by divisibility: n≤ m ⇔ n|m.

c) U(x), the neighbourhood filter of a given point x in a topological space Ω ,with the direction given by reversed set inclusion: U ≤V ⇔ V ⊆U .

Let (Ω ,O) be a topological space. A net in Ω is a map x : Λ →Ω , where (Λ ,≤)is a directed set. In the context of nets one usually writes xα in place of x(α), andx = (xα)α∈Λ ⊆Ω .

A net (xα)α∈Λ ⊆Ω is convergent to x ∈Ω , written xα → x, if

∀U ∈ U(x) ∃α0 ∈Λ ∀α ≥ α0 : xα ∈U.

In this case the point x ∈Ω is called the limit of the net (xα)α . (Note that if Λ = Nwith the natural direction and Ω is a metric space, this coincides with the usualnotion of a limit of a sequence.) If Ω is Hausdorff, then a net in Ω can have at mostone limit.

A subnet of a net (xα)α∈Λ ⊆ Ω is each net (xλ (α ′))α ′∈Λ ′ , where (Λ ′,≤) is adirected set and λ : Λ ′→Λ is increasing and cofinal. The latter means that for each

304 A Topology

α ∈ Λ there is α ′ ∈ Λ ′ such that α ≤ λ (α ′). A subnet of a sequence need not be asequence any more [Willard (2004), Ex. 11B].

All the fundamental notions of set theoretic topology can be characterized interms of nets, see [Willard (2004), Chap. 11] and [Willard (2004), Thm. 17.4].

Theorem A.14. a) If x ∈ Ω and A ⊆ Ω , then x ∈ A if and only there is a net(xα)α∈Λ ⊆ A such that xα → x.

b) If A ⊆ Ω , then A is closed if an only if it contains the limit of each net in A,convergent in Ω .

c) For a net (xα)α∈Λ ⊆Ω and x ∈Ω the following assertions are equivalent.

(i) Some subnet of (xα)α∈Λ is convergent to x.

(ii) x is a cluster point of (xα)α∈Λ , i.e.,

x ∈⋂

α∈Λxβ : β ≥ α.

d) A Hausdorff space Ω is compact if and only if each net in Ω has a convergentsubnet.

e) The map f : Ω → Ω ′ is continuous at x ∈ Ω if and only if xα → x impliesf (xα)→ f (x) for every net (xα)α∈Λ ⊆Ω .

f) Let fι : Ω → Ωι , ι ∈ I. Then Ω carries the projective topology with respectto the family ( fι)ι∈I if and only if for each net (xα)α∈Λ ⊆Ω and x ∈Ω onehas

xα → x ⇔ fι(xα)→ fι(x) ∀ ι ∈ I.

In particular, f) implies that in a product space Ω = ∏ι Ωι a net (xα)α∈Λ ⊆ Ω

converges if and only if it converges in each coordinate ι ∈ I.

Let (Ω ,d) be a metric space. A Cauchy net in Ω is a net (xα)α∈Λ ⊆Ω such that

∀ε > 0 ∃α ∈Λ : α ≤ β ,γ ⇒ d(xβ ,xγ)< ε.

Every subnet of a Cauchy net is a Cauchy net. Every convergent net is a Cauchy net,and if a Cauchy net has a convergent subnet, then it is convergent.

Theorem A.15. In a complete metric space every Cauchy net converges.

See [Willard (2004), Thm. 39.4] for a proof.

Appendix BMeasure and Integration Theory

We begin with some general set theoretic notions. Let X be a set. Then its powerset is denoted by

P(X) := A : A⊆ X.

The complement of A ⊆ X is denoted by Ac := X \A, and its characteristic func-tion is

1A(x) :=

1 (x ∈ A)0 (x /∈ A)

for x ∈ X . One often writes 1 in place of 1X if the reference set X is understood. Fora sequence (An)n ⊆ P(X) we write An A if

An ⊇ An+1 and⋂

n∈N An = A.

Similarly, An A is short for

An ⊆ An+1 and⋃

n∈N An = A.

A family (Aι)ι ⊆P(X) is called pairwise disjoint if ι 6= η implies that Aι ∩Aη = /0.A subset E ⊆ P(X) is often called a set system over X . A set system is called ∩-stable (∪-stable, \-stable) if A,B∈ E implies that A∩B (and A∪B, A\B) belongs toE as well. If E is a set system, then any mapping µ : E→ [0,∞] is called a (positive)set function. Such a set function is called σ -additive if

µ

(∞⋃

n=1

An

)=

∞

∑n=1

µ(An)

whenever (An)n∈N ⊆ E is pairwise disjoint and⋃

n An ∈ E. Here we adopt the con-vention that

a+∞ = ∞+a = ∞ (−∞ < a≤ ∞).

305

306 B Measure and Integration Theory

A similar rule holds for sums a+(−∞) where a ∈ [−∞,∞). The sum ∞+(−∞) isnot defined. Other conventions for computations with the values ±∞ are:

0 ·(±∞) = (±∞) ·0 = 0, α ·(±∞) = (±∞) ·α =±∞ β ·(±∞) = (±∞) ·β =∓∞

for −∞ < β < 0 < α . If f : X → Y is a mapping and B⊆ Y then we denote

[ f ∈ B ] := f−1(B) :=

x ∈ X : f (x) ∈ B.

Likewise, if P(x1, . . . ,xn) is a property of n-tuples (x1, . . . ,xn)∈ (Y )n and f1, . . . , fn :X → Y are mappings, then we write

[P( f1, . . . , fn) ] :=

x ∈ X : P( f1(x), . . . , fn(x)) holds.

For example, for f ,g : X → Y we abbreviate [ f = g ] := x ∈ X : f (x) = g(x).

B.1 σ -Algebras

Let X be any set. A σ -algebra over X is a set system Σ ⊆ P(X), such that thefollowing hold.

1) /0,X ∈ Σ .

2) If A,B ∈ Σ then A∪B,A∩B,A\B ∈ Σ .

3) If (An)n∈N ⊆ Σ , then⋃

n∈N An,⋂

n∈N An ∈ Σ .

If a set system Σ satisfies merely 1) and 2), it is called an algebra, and if Σ satisfiesjust 2) and /0 ∈ Σ , then it is called a ring. A pair (X ,Σ) with Σ being a σ -algebraover X is called a measurable space.

Clearly, /0,X is the smallest and P(X) is the largest σ -algebra over X . Anarbitrary intersection of σ -algebras over the same set X is again a σ -algebra. Hencefor E⊆ P(X) one can form

σ(E) :=⋂

Σ : E⊆ Σ ⊆ P(X), Σ a σ -algebra,

the σ -algebra generated by E. It is the smallest σ -algebra that contains all sets fromE. If Σ = σ(E), we call E a generator of Σ .

If X is a topological space, the σ -algebra generated by all open sets is called theBorel σ -algebra Bo(X). By 1) and 2), Bo(X) contains all closed sets as well. A setbelonging to Bo(X) is called a Borel set or Borel measurable.

As an example consider the extended real line X = [−∞,∞]. This be-comes a compact metric space via the (order-preserving) homeomorphism arctan :[−∞,∞]→ [−π/2,π/2]. The Borel algebra Bo([−∞,∞]) is generated by (α,∞] :α ∈ R.

B.2 Measures 307

A Dynkin system (also called λ -system) on a set X is a subset D ⊆ P(X) withthe following properties:

1) X ∈D.

2) If A,B ∈D and A⊆ B then B\A ∈D.

3) If (An)n∈N ⊆D and An A, then A ∈D.

Dynkin systems play an important technical role in measure theory due to the fol-lowing result, see [Bauer (1990), p.8] or [Billingsley (1979), Thm.3.2].

Theorem B.1 (Dynkin). If D is a Dynkin system and E ⊆ D is ∩-stable, thenσ(E)⊆D.

3

B.2 Measures

Let X be a set and Σ ⊆ P(X) a σ -algebra over X . A (positive) measure is a σ -additive set function

µ : Σ → [0,∞].

In this case the triple (X ,Σ ,µ) is called a measure space and the sets in Σ are calledmeasurable sets. If µ(X)< ∞, the measure is called finite. If µ(X) = 1, it is calleda probability measure and (X ,Σ ,µ) is called a probability space. Suppose E⊆ Σ

is given and there is a sequence (An)n ⊆ E such that

µ(An)< ∞ (n ∈ N) and X =⋃

n∈NAn,

then the measure µ is called σ -finite with respect to E. If E= Σ , we simply call itσ -finite.

From the σ -additivity of the measure one derives the following properties:

a) (Finite Additivity) µ( /0) = 0 and

µ(A∪B)+µ(A∩B) = µ(A)+µ(B) (A,B ∈ Σ).

b) (Monotonicity) A,B ∈ Σ , A⊆ B ⇒ µ(A)≤ µ(B).

c) (σ -Subadditivity) (An)n ⊆ Σ ⇒ µ(⋃

n∈N An)≤ ∑∞n=1 µ(An).

See [Billingsley (1979), p.134] for the elementary proofs. The following is an ap-plication of Dynkin’s theorem, see [Billingsley (1979), Thm.10.3].

Theorem B.2 (Uniqueness Theorem). Let Σ = σ(E) with E being ∩-stable. Letµ,ν be two measures on Σ , both σ -finite with respect to E. If µ and ν coincide onE, they are equal.


B.3 Construction of Measures

An outer measure on a set X is a mapping

µ∗ : P(X)→ [0,∞]

such that µ∗( /0) = 0 and µ∗ is monotone and σ -subadditive. The following result al-lows to obtain a measure from an outer measure, see [Billingsley (1979), Thm.11.1].

Theorem B.3 (Caratheodory). Let µ∗ be an outer measure on the set X. Define

M(µ∗) :=

A⊆ X : µ∗(H) = µ

∗(H ∩A)+µ∗(H \A) ∀H ⊆ X.

Then M(µ∗) is a σ -algebra and µ∗∣∣M(µ∗) is a measure on it.

The set system E ⊆ P(X) is called a semi-ring if it satisfies the following twoconditions:

(i) E is ∩-stable and /0 ∈ E.

(ii) If A,B ∈ E, then A\B is a disjoint union of members of E.

An example of such a system is E = (a,b] : a ≤ b ⊆ P(R). If E is a semi-ring,then the system of all disjoint unions of members of E is a ring.

Theorem B.4 (Hahn). Let E ⊆ P(X) and let µ : E → [0,∞] satisfy /0 ∈ E andµ( /0) = 0. Then µ∗ : P(X)→ [0,∞] defined by

µ∗(A) := inf

∑n∈N µ(En) : (En)n ⊆ E, A⊆

⋃n

En

(A ∈ P(X))

is an outer measure. Moreover, if E is a semi-ring and µ : E→ [0,∞] is σ -additiveon E, then σ(E)⊆M(µ∗) and µ∗|E = µ .

See [Billingsley (1979), p.140] for a proof. One may summarise these results inthe following way: If a set function on a semi-ring E is σ -additive on E then it has aextension to a measure on σ(E). If in addition X is σ -finite with respect to E, thenthis extension is unique.

Sometimes, for instance in the construction of infinite products, it is convenientto work with the following criterion from [Billingsley (1979), Thm.10.2].

Lemma B.5. Let E be an algebra over a set X, and let µ : E→ [0,∞) be a finitelyadditive set function with µ(X) < ∞. Then µ is σ -additive on E if and only if foreach decreasing sequence (An)n ⊆ E, An /0, one has µ(An)→ 0.

B.4 Measurable Functions and Mappings

Let (X ,Σ) and (Y,Σ ′) be measurable spaces. A mapping ϕ : X → Y is called mea-surable if

B.5 The Integral of Positive Measurable Functions 309

[ϕ ∈ A ] ∈ Σ for all A ∈ Σ′.

(It suffices to check this condition for each A from a generator of Σ ′.) We denote by

M(X ;Y ) =M(X ,Σ ;X ,Σ ′)

the set of all measurable mappings between X and Y . For the special case Y = [0,∞]we write

M+(X) :=

f : X → [0,∞] : f is measurable.

A composition of measurable mappings is measurable. For A ∈ Σ its characteristicfunction 1A is measurable. If X ,Y are topological spaces and ϕ : X → Y is con-tinuous, then it is Bo(X)-Bo(Y ) measurable. The following lemma summarises thebasic properties of positive measurable functions, see [Billingsley (1979), Sec.13].

Lemma B.6. Let (X ,Σ ,µ) be a measure space. Then the following assertions hold.

a) If f ,g ∈M+(X),α ≥ 0, then f g, f +g,α f ∈M+(X).

b) If f ,g ∈M(X ;R) and α,β ∈ R, then f g,α f +βg ∈M(X ;R).c) If f ,g : X → [−∞,∞] are measurable, then − f ,min f ,g,max f ,g are

measurable.

d) If fn : X → [−∞,∞] is measurable for each n ∈ N then supn fn, infn fn aremeasurable.

A simple function on a measure space (X ,Σ ,µ) is a linear combination of char-acteristic functions of measurable sets. [Billingsley (1979), Thm.13.5].

Lemma B.7. Let f : X → [0,∞] be measurable. Then there exists a sequence ofsimple functions ( fn)n such that

0≤ fn ≤ fn+1 f (pointwise as n→ ∞),

and such that the convergence is uniform if f is bounded.

B.5 The Integral of Positive Measurable Functions

Given a measure space (X ,Σ ,µ) there is a unique mapping

M+(X)→ [0,∞], f 7→∫

Xf dµ,

called the integral, such that the following assertions hold for f , fn,g ∈M+(X),α ≥ 0, and A ∈ Σ :

a)∫

X1A dµ = µ(A);


b) f ≤ g ⇒∫

Xf dµ ≤

∫X

gdµ;

c)∫

X( f +αg)dµ =

∫X

f dµ +α

∫X

gdµ;

d) If 0≤ f1 ≤ f2 ≤ . . . and fn→ f pointwise, then∫

Xf dµ = lim

n→∞

∫X

fn dµ .

See [Rana (2002), Sec.5.2] or [Billingsley (1979), Sec.15]. Assertion d) is called theMonotone Convergence Theorem or the theorem of Beppo Levi.

If (X ,Σ ,µ) is a measure space, (Y,Σ ′) a measurable space and ϕ : X → Y mea-surable, then a measure is defined on Σ ′ by

[ϕ∗µ](B) := µ[ϕ ∈ B] (B ∈ Σ).

The measure ϕ∗µ is called the image of µ under ϕ , or the push-forward of µ alongϕ . If µ is finite or a probability measure, so is ϕ∗µ . If f ∈M+(Y ), then∫

Yf d(ϕ∗µ) =

∫X( f ϕ)dµ.

B.6 Product Spaces

If (X1,Σ1) and (X2,Σ2) are measurable spaces, then the product σ -algebra on theproduct space X1×X2 is

Σ1⊗Σ2 := σ

A×B : A ∈ Σ1, B ∈ Σ2.

If E j is a generator of Σ j with X j ∈ E j for j = 1,2, then

E1×E2 :=

A×B : A ∈ E1, B ∈ E2

is a generator of Σ1⊗Σ2. If (X ,Σ) is another measurable space, then a mapping f =( f1, f2) : X → X1×X2 is measurable if and only if the projections f1 = π1 f , f2 =π2 f are both measurable. If f : (X1×X2,Σ1⊗Σ2)→ (Y,Σ ′) is measurable, thenf (x, ·) : X2→ Y is measurable for every x ∈ X1, see [Billingsley (1979), Thm.18.1].

Theorem B.8 (Tonelli). Let (X j,Σ j,µ j), j = 1,2, be σ -finite measure spaces andf ∈M+(X1×X2). Then the functions

F1 :X1→ [0,∞], x 7→∫

X2

f (x, ·)dµ2

F2 :X2→ [0,∞], y 7→∫

X1

f (·,y)dµ1

are measurable and there is a unique measure µ1⊗µ2 such that

B.6 Product Spaces 311∫X1

F1 dµ1 =∫

X1×X2

f d(µ1⊗µ2) =∫

X2

F2 dµ2.

For a proof see [Billingsley (1979), Thm.18.3]. The measure µ1⊗ µ2 is called theproduct measure of µ1 and µ2. Note that for the particular case F = f1⊗ f2 with

( f1⊗ f2)(x1,x2) := f1(x1) · f2(x2) ( f j ∈M+(X j),x j ∈ X j ( j = 1,2)),

we obtain∫

X1×X2

( f1⊗ f2)d(µ1⊗µ2) =

(∫X1

f1 dµ1

) (∫X2

f2 dµ2

).

Let us turn to infinite products. Suppose ((Xι ,Σι))ι∈I is a collection of measur-able spaces. Consider the product space

X := ∏ι∈I

Xι

with corresponding projections πι : X → Xι . A set of the form⋂ι∈F

[πι ∈ Aι ] = ∏ι∈F

Aι ×∏ι /∈F

Xι

with F ⊆ I finite and sets Aι ∈ Σι for ι ∈ F , is called a (measurable) cylinder. Let

E :=⋂

ι∈F[πι ∈ Aι ] : F ⊆ I finite, Aι ∈ Σι , (ι ∈ F)

be the semi-ring(!) of all cylinders. Then⊗

ι∈I

Σι := σ(E)

is called the product σ -algebra of the Σι .

Theorem B.9 (Infinite Products). Let (Xι ,Σι) with ι ∈ I be a collection of proba-bility spaces. Then there is a unique probability measure µ :=

⊗ι µι on Σ =

⊗ι Σι

such thatµ

(⋂ι∈F

[πι ∈ Aι ])= ∏ι∈F µι(Aι)

for every finite subset F ⊆ I and sets Aι ∈ Σι , ι ∈ F.

The proof is based on Lemma B.5, see [Hewitt and Stromberg (1969), Chap.22] or[Halmos (1950), Sec.38]. In the case that I = N is countable, Theorem B.9 is aconsequence of the Ionescu-Tulcea theorem, to be treated next.

The Ionescu Tulcea’s Theorem

For a measurable space (X ,Σ) we denote by M+(X ,Σ) the set of all positive and byM1(X ,Σ) the set of all probability measures on (X ,Σ). There is a natural σ -algebraΣ on M+(X ,Σ), the smallest such that each mapping


M+(X ,Σ)→ [0,∞], ν 7→ ν(A) (A ∈ Σ)

is measurable.Let (X j,Σ j), j = 1,2 be measurable spaces. A measure kernel from X1 to X2 is a

measurable mapping µ : X2→M+(X1,Σ1). Such a kernel µ can also be interpretedas a mapping of two variables

µ : X2×Σ → [0,∞],

and we shall do so when it seems convenient. If µ(y, ·)∈M1(X1,Σ1) for each y∈ X2then µ is called a probability kernel.

Let (X ,Σ) be another measurable space, and let µ : X2→M+(X1,Σ1) be a kernel.Then there is a Koopman operator

Tµ : M+(X×X1)→M+(X×X2)

(Tµ f )(x,x2) :=∫

X1

f (x,x1)µ(x2,dx1) (x ∈ X ,x2 ∈ X2).

The operator Tµ is additive and positively homogeneous, and if fn f pointwiseon X1 then Tµ fn Tµ f pointwise on X2. Moreover,

Tµ( f ⊗g) = ( f ⊗1) ·Tµ(1⊗g) ( f ∈M+(X),g ∈M+(X2)).

Conversely, each operator T : M+(X×X1)→M+(X×X2) with these properties isof the form Tµ , for some kernel µ .

If µ : X2→M+(X1) and ν : X3→M+(X2) are kernels, then Tν Tµ = Tη for

η(x3,A) :=∫

X2

µ(x2,A)ν(x3,dx2) (x3 ∈ X3,A ∈ Σ1).

Kernels can be used to construct measures on infinite products. Let (Xn,Σn),n ∈ N, be measurable spaces, and let X := ∏n∈N Xn be the Cartesian product, withthe projections πn : X → Xn. The natural σ -algebra over X is⊗

nΣn := σ

π−1n (An) : n ∈ N, An ∈ Σn

.

A generating algebra is

A :=

An×∏k>n

Xk : n ∈ N, An ∈ Σ1⊗ . . .⊗Σn

,

the algebra of cylinder sets.

Theorem B.10 (Ionescu Tulcea). [Ethier and Kurtz (1986), p.504] Let (Xn,Σn),n ∈ N, be measurable spaces, let

µn : X1×·· ·×Xn−1→M1(Xn) (n ∈ N, n≥ 2)

B.7 Null Sets 313

be probability kernels, and let µ1 be a probability measure on X1. Let

X (n) := X1×·· ·×Xn with Σ(n) = Σ1⊗ . . .⊗Σn.

Let, for n≥ 1, Tn : M+(X (n),Σ (n))→M+(X (n−1),Σ (n−1)) be given by

(Tn f )(x(n−1)) =∫

Xn

f (x(n−1),xn)µn(x(n−1),dxn) (x(n−1) ∈ X (n−1)).

Then there is a unique probability measure ν on X (∞) := ∏n∈N Xn such that∫X(∞)

f (x1, . . . ,xn)dν(x1, . . .) = T1T2 . . .Tn f ( f ∈M+(X1×·· ·×Xn))

for every n ∈ N.

An important special case of the Ionescu Tulcea theorem is the construction ofthe infinite product measure in the case that the index set I is countable. Indeed, ifone has a probability measure νn on (Xn,Σn), for each n∈N and applies the IonescuTulcea theorem with µn ≡ νn, then the measure ν of the theorem satisfies

(π1, . . . ,πn)∗ν = ν1⊗ . . .⊗νn (n ∈ N).

That means, ν =⊗

n νn is the product of the νn.

B.7 Null Sets

Let (X ,Σ ,µ) be a measure space. A set A ⊆ X is called a null set if there is a setN ∈ Σ such that A⊆N and µ(N) = 0. (In general a null set need not be measurable).Null sets have the following properties:

a) If A is a null set and B⊆ A, then B is also a null set.

b) If each An is a null set, then⋃

n∈N An is a null set.

The following results shows the connection between null sets and the integral, see[Billingsley (1979), Thm.15.2].

Lemma B.11. Let (X ,Σ ,µ) be a measure space and let f : X → [−∞,∞] be mea-surable. Then the following assertions hold.

a)∫

X | f |dµ = 0 if and only if the set [ f 6= 0 ] = [ | f |> 0 ] is a null set.

b) If∫

X | f | dµ < ∞, then the set [ | f |= ∞ ] is a null set.

One says that two functions f ,g are equal µ-almost everywhere (abbreviated by“ f = g a.e.” or “ f ∼µ g”) if the set [ f 6= g ] is a null set. More generally, let P be aproperty of points of X . Then P holds almost everywhere or for µ-almost all x ∈ Xif the set


x ∈ X : P does not hold for x

is a µ-null set. If µ is understood, we leave out the reference to it.For each set E, the relation ∼µ (“is equal µ-almost everywhere to”) is an equiv-

alence relation on the space of mappings from X to E. For such a mapping f wesometimes denote by [ f ] its equivalence class, in situations when notational clarityis needed. If µ is understood, we write simply ∼ instead of ∼µ .

If (E,d) is a separable metric space with its Borel σ -algebra, we denote by

L0(X ;E) := L0(X ,Σ ,µ;E) :=M(X ;E)/∼

the space of equivalence classes of measurable mappings modulo equality almosteverywhere. Prominent examples here are E =R, E = [0,∞] or E = [−∞,∞]. In thecase E = C we abbreviate L0(X) := L0(X ,Σ ,µ;C) if Σ and µ are understood.

B.8 The Lebesgue Spaces

Let X := (X ,Σ ,µ) be a measure space. For f ∈ L0(X) we define

‖ f‖∞

:= inf

t > 0 : µ [ | f (·)|> t ] = 0

and, if 1≤ p < ∞,

‖ f‖p :=(∫

X| f (·)|p dµ

) 1p

.

Then we let

Lp(X) := Lp(X ,Σ ,µ;C) :=

f ∈ L0(X) : ‖ f‖p < ∞

for 1 ≤ p ≤ ∞. The space L1(X) is called the space of (equivalence classes of)integrable functions. The following is in [Rudin (1987), Chap.3].

Theorem B.12. Let (X ,Σ ,µ) be a measure space and 1≤ p≤∞. Then the followingassertions hold.

a) The space Lp(X) is a Banach space with respect to the norm ‖·‖p.

b) If 1≤ p<∞ and fn→ f in Lp(X), then there is g∈Lp(X) and a subsequence( fnk)k∈N such that

∣∣ fnk

∣∣≤ g a.e. for all k ∈ N and fnk → f pointwise a.e..

c) If 1 ≤ p < ∞ and if ( fn)n ⊆ Lp(X) is such that fn→ f a.e. and there is g ∈Lp(X) such that | fn| ≤ g a.e. for all n ∈N, then f ∈ Lp(X), and ‖ fn− f‖p→0.

Assertion c) is called the Dominated Convergence Theorem or Lebesgue’s theo-rem.

B.8 The Lebesgue Spaces 315

The Integral

If f ∈ L1(X ,Σ ,µ) then the functions (Re f )±,(Im f )± are positive and have finiteintegral. Hence one can define the integral of f by∫

Xf dµ :=

∫X(Re f )+ dµ−

∫X(Re f )− dµ + i

[∫X(Im f )+ dµ−

∫X(Im f )− dµ

].

The integral is a linear mapping L1(X ,Σ ,µ)→ C and satisfies∣∣∣∣∫Xf dµ

∣∣∣∣≤ ∫X| f | dµ,

see [Rudin (1987), Chap.1].

Theorem B.13 (Averaging Theorem). Let S ⊆ C be a closed subset and supposethat either (X ,Σ ,µ) is σ -finite or 0 ∈ S. Let f ∈ L1(X) such that

1µ(A)

∫A

f dµ ∈ S

for all A ∈ Σ such that 0 < µ(A)< ∞. Then f (·) ∈ S almost everywhere.

A proof is in [Lang (1993), Thm.IV.5.15]. As a corollary one obtains that if∫

A f = 0for all A of finite measure, then f = 0 almost everywhere.

Approximation

The following is an immediate consequence of Lemma B.7.

Theorem B.14. Let (X ,Σ ,µ) be a measure space. Then the space lin1A : A ∈ Σof simple functions is dense in L∞(X).

Let (X ,Σ ,µ) be a measure space, and let E ⊆ Σ . The space of step functions(with respect to E) is

Step(X ,E,µ) := lin

1B : B ∈ E, µ(B)< ∞.

We abbreviate Step(X) := Step(X ,Σ ,µ). By using the approximation f 1[ | f |≤n ]→ fand Theorem B.14 we obtain an approximation result for Lp.

Theorem B.15. For any measure space (X ,Σ ,µ) and 1≤ p < ∞, the space Step(X)is dense in Lp(X).

Theorem B.15 combined with a Dynkin system argument yields the following.

Lemma B.16. Let (X ,Σ ,µ) be a finite measure space. Let E ⊆ Σ be ∩-stable withX ∈ E and σ(E) = Σ . Then Step(X ,E,µ) is dense in Lp(X) for each 1≤ p < ∞.


Theorem B.17. Let (X ,Σ ,µ) be any measure space, and let E⊆ Σ be ∩-stable withσ(E) = Σ . Suppose further that X is σ -finite with respect to E. Then Step(X ,E,µ)is dense in Lp(X) for each 1≤ p < ∞.

A related result, proved by a Dynkin system argument, holds on the level of sets.

Lemma B.18. Let (X ,Σ ,µ) be a finite measure space, and let E⊆ Σ be an algebrawith σ(E) = Σ . Then for each A ∈ Σ and each ε > 0 there is B ∈ E such thatµ(A4B)< ε .

See also [Billingsley (1979), Thm.11.4] and [Lang (1993), Section VI.§6].

Fubini’s Theorem

Consider two σ -finite measure spaces (X j,Σ j,µ j), j = 1,2, and their product

(X ,Σ ,µ) = (X1×X2,Σ1⊗Σ2,µ1⊗µ2).

Let E := A1×A2 : A j ∈ Σ j,µ(A j)< ∞ ( j = 1,2) be the set of measurable rect-angles. Then E satisfies the conditions of Theorem B.17.

Corollary B.19. The space lin1A1⊗1A2 : A j ∈Σ j, µ(A j)<∞ ( j = 1,2) is densein Lp(X) for 1≤ p < ∞.

Combining this with Tonelli’s theorem yields the following famous theorem[Lang (1993), Thm.8.4].

Theorem B.20 (Fubini). Let f ∈ L1(X1×X2). Then for µ1-almost every x ∈ X1,f (x, ·) ∈ L1(X2) and with

F :=(

x 7→∫

X2

f (x, ·)dµ2

)(defined almost everywhere on X1) one has F ∈ L1(X1); moreover,∫

X1

F dµ1 =∫

X1

∫X2

f (x,y)dµ2(y)dµ1(x) =∫

X1⊗X2

f d(µ1⊗µ2).

B.9 Complex Measures

A complex measure on a measurable space (X ,Σ) is a mapping µ : Σ → C whichis σ -additive and satisfies µ( /0) = 0. If the range of µ is contained in R, µ is calleda signed measure. For a complex measure µ one defines its total variation |µ| by

|µ|(A) := inf ∞

∑n=1|µ(An)| : (An)n∈N ⊆ Σ pairwise disjoint, A =

⋃nAn

B.9 Complex Measures 317

for A ∈ Σ . Then |µ| is a positive finite measure, see [Rudin (1987), Thm.6.2]. Withrespect to the norm ‖µ‖1 := |µ|(X), the space of complex measures on (X ,Σ) is aBanach space.

Let µ be a complex measure on a measurable space (X ,Σ). For a step functionf = ∑

nj=1 x j1A j ∈ Step(X ,Σ , |µ|) one defines

∫X

f dµ =n

∑j=1

x jµ(A j)

as usual, and shows (using finite additivity) that this does not depend on the repre-sentation of f . Moreover, one obtains∣∣∣∫

Xf dµ

∣∣∣≤ ∫X| f | d|µ|= ‖ f‖L1(|µ|) ,

whence the integral has a continuous linear extension to all of L1(X ,Σ , |µ|).

Appendix CFunctional Analysis

In this appendix we review some notions and facts from functional analy-sis but refer to [Conway (1990)], [Dunford and Schwartz (1958)], [Rudin (1987)],[Rudin (1991)], [Megginson (1998)] or [Schaefer (1980)] for more information.

C.1 Banach Spaces

Let E be a vector space over K, where K=R or C. A semi-norm on E is a mapping‖·‖ : E→ R satisfying

‖x‖ ≥ 0, ‖λx‖= |λ |‖x‖ , and ‖x+ y‖ ≤ ‖x‖+‖y‖

for all x,y ∈ E, λ ∈K.A semi-norm is a norm if ‖x‖ = 0 happens if and only if x = 0. A norm on a

vector space defines a metric via

d(x,y) := ‖x− y‖ (x,y ∈ E)

and hence a topology, called the norm topology. It follows directly from the axiomsof the norm that the mappings

E×E→ E, (x,y) 7→ x+ y and K×E→ E, (λ ,x) 7→ λx

are continuous. From the triangle inequality it follows that∣∣‖x‖−‖y‖∣∣≤ ‖x− y‖ (x,y ∈ E),

which implies that also the norm mapping itself is continuous.A normed space is a vector space together with a norm on it, and a Banach

space if the metric induced by the norm is complete. If E is a normed space, itsclosed unit ball is

319

320 C Functional Analysis

BE :=

x ∈ E : ‖x‖ ≤ 1.

The set BE is compact if and only if E is finite-dimensional.

Recall that a metric space is separable if it contains a countable dense set. Anormed space E is separable if and only if there is a countable set A such that itslinear span

linA =

∑nj=1 λ ja j : n ∈ N, λ j ∈K, a j ∈ A ( j = 1, . . . ,n)

is dense in E.

A subset A ⊆ E of a normed space is called (norm) bounded if there is c > 0such that ‖a‖ ≤ c for all a ∈ A. Every compact subset is norm-bounded since thenorm mapping is continuous.

If E is a Banach space and F ⊆ E is a closed subspace, then the quotient spaceX := E/F becomes a Banach space under the quotient norm

‖x+F‖Y := inf‖x+ f‖ : f ∈ F

.

The linear mapping q : E→ X , q(x) := x+F , is called the canonical surjection orquotient map.

C.2 Banach Algebras

An algebra is a (real or complex) linear space A together with an associative bilinearmapping

A×A→ A ( f ,g) 7→ f g

called multiplication and a distinguished element e ∈ A satisfying

ea = ae = a for all a ∈ A,

called the unit element. (Actually, one should call A a “unital” algebra then, butall the algebras we have reason to consider are unital, so we include the existenceof a unit element into our notion of “algebra” for convenience.) An algebra A isnondegenerate if A 6= 0, if and only if e 6= 0. An algebra A is commutative iff g = g f for all f ,g ∈ A.

An algebra A which is also a Banach space is called a Banach algebra if

‖ab‖ ≤ ‖a‖‖b‖ for all a,b ∈ A.

In particular, multiplication is continuous. If e 6= 0 then ‖e‖ ≥ 1.An element a ∈ A of an algebra A is called invertible if and only if there is an

element b such that ab = ba = e. This element, necessarily unique, is then called theinverse of a and denoted by a−1.

C.2 Banach Algebras 321

A linear map T : A→ B between two algebras A,B with unit elements eA,eB,respectively, is called multiplicative if

T (ab) = (Ta)(T b) for all a,b ∈ A,

and an algebra homomorphism if in addition T (eA) = eB. An algebra homomor-phism is an algebra isomorphism if it is bijective. In this case, its inverse is alsoan algebra homomorphism.

Let A be a commutative Banach algebra with unit e. An (algebra) ideal of A is alinear subspace I ⊆ A satisfying

f ∈ I, g ∈ A ⇒ f g ∈ I.

Since the multiplication is continuous, the closure of an ideal is again an ideal. Anideal I of A is called proper if I 6= A, if and only if e /∈ I. For an element a ∈ A, theideal Aa is called the principal ideal generated by a. It is the smallest ideal thatcontains a. Then a is not invertible if and only if aA is proper. A proper ideal I of Ais called maximal if

I ⊆ J ⊆ A ⇒ J = I or J = A

for any ideal J of A.If T : A→ B is a continuous algebra homomorphism, then kerT is a closed ideal.

Conversely, if I is closed ideal of A, then the quotient space A/I becomes a Banachalgebra with respect to the multiplication

(x+ I)(y+ I) := (xy+ I) (x,y ∈ A)

and unit element e+I. Moreover, the canonical surjection q : A→A/I is a continuousalgebra homomorphism with kerq = I.

An involution on a complex Banach algebra A is a map x 7→ x∗ satisfying

(x∗)∗ = x, (x+ y)∗ = x∗+ y∗, (λx)∗ = λx∗, (xy)∗ = y∗x∗

for all x,y ∈ A, λ ∈ C. In an algebra with involution one has e∗ = e since

e∗ = e∗e = (e∗e)∗∗ = (e∗e∗∗)∗ = (e∗e)∗ = (e∗)∗ = e.

An algebra homomorphism T : A→ B between algebras with involution is called a∗-homomorphism if

T (x∗) = (T x)∗ for all x ∈ A.

A bijective ∗-homomorphism is called a ∗-isomorphism.A Banach algebra with involution is a C∗-algebra if ‖x‖2 = ‖x∗x‖ for all x ∈ A.

It follows that ‖x∗‖= ‖x‖ for every x ∈ A. If A is a nondegenerate C∗-algebra, then‖e‖= 1.


C.3 Linear Operators

A linear mapping T : E → F between two normed spaces is called bounded ifT (BE) is a norm-bounded set in F . The linear mapping T is bounded if and only ifit is continuous if and only if it satisfies a norm estimate of the form

‖T x‖ ≤ c‖x‖ (x ∈ E) (C.1)

for some c≥ 0 independent of x ∈ E. The smallest c such that (C.1) holds is calledthe operator norm of T , denoted by ‖T‖. One has

‖T‖= sup‖T x‖ : x ∈ BE

.

This defines a norm on the space

L (E;F) :=

T : E→ F : T is a bounded linear mapping,

which is a Banach space if F is a Banach space. If E,F,G are normed spaces andT ∈L (E;F), S∈L (F ;G), then ST := ST ∈L (E;G) with ‖ST‖≤ ‖S‖‖T‖. Theidentity operator I is neutral with respect to multiplication of operators, and clearly‖I‖= 1 if E 6= 0. In particular, if E is a Banach space, the space

L (E) := L (E;E)

is a Banach algebra with unit element I.

Linear mappings are also called operators. Associated with an operator T : E→F are its kernel and its range

kerT :=

x ∈ E : T x = 0, ranT :=

T x : x ∈ E

.

One has kerT = 0 if and only if T is injective. If T is bounded, its kernel kerT isa closed subspace of E.

A subset T ⊆ L (E;F) that is a bounded set in the normed space L (E;F)is often called uniformly (norm) bounded. The following result—sometimescalled the Banach–Steinhaus theorem—gives an important characterisation. (See[Rudin (1987), Thm. 5.8] for a proof.)

Theorem C.1 (Principle of Uniform Boundedness). Let E,F be Banach spacesand T ⊆L (E;F). Then T is uniformly bounded if and only if for each x ∈ E theset T x : T ∈T is a bounded subset of F.

A bounded linear mapping T ∈L (E;F) is called contractive or a contractionif ‖T‖ ≤ 1. It is called isometric an isometry if ‖T x‖ = ‖x‖ holds for all x ∈ E.Isometries are injective contractions. An operator P ∈L (E) is called a projectionif P2 = P. In this case, Q := I−P is also a projection, and P induces a direct sumdecomposition

C.4 Duals, Bi-Duals and Adjoints 323

E = ranP ⊕ ran(I−P) = ranP ⊕ kerP

of E into closed subspaces. Conversely, whenever E is a Banach space and onehas a direct sum decomposition X = F ⊕E into closed linear subspaces, then theassociated projections X → F , X → E are bounded. This is a consequence of theClosed Graph Theorem [Rudin (1987), p.114, Ex.16].

An operator T ∈ L (E;F) is called invertible or an isomorphism (of normedspaces) if T is bijective and T−1 is also bounded. If E,F are Banach spaces,the boundedness of T−1 is automatic by the following important theorem[Rudin (1987), Thm.5.10].

Theorem C.2 (Inverse Mapping Theorem). If E,F are Banach spaces and T ∈L (E;F) is bijective, then T is an isomorphism.

A linear operator T ∈L (E;F) is compact if it maps bounded sets in E to rela-tively compact sets in F . Equivalently, whenever (xn)n∈N⊆E is a bounded sequencein E, the sequence (T xn)n∈N ⊆ F has a convergent subsequence. The set of compactoperators

C (E;F) := T ∈L (E;F) : T is compact

is an (operator-)norm closed linear subspace of L (E;F) that contains all finite-rank operators. Moreover, a product of operators ST is compact whenever one of itsfactors is compact. See [Conway (1990), VI.3].

C.4 Duals, Bi-Duals and Adjoints

For a normed space E its dual space is E ′ := L (E;K). Since K is complete, thisis always a Banach space. Elements of E ′ are called (bounded linear) functionals.One frequently writes 〈x,x′〉 in place of x′(x) for x∈ E,x′ ∈ E ′ and calls the mapping

E×E ′→K, (x,x′) 7→⟨x,x′⟩

the canonical duality.

Theorem C.3 (Hahn–Banach). Let E be a Banach space, F ⊆ E a linear subspaceand f ′ ∈ F ′. Then there exists x′ ∈ E ′ such that x′ = f ′ on F and ‖x′‖= ‖ f ′‖.

For a proof see [Rudin (1987), Thm. 5.16]. A consequence of the Hahn–Banachtheorem is that E ′ separates the points of E. Another consequence is that the normof E can be computed as

‖x‖= sup∣∣⟨x,x′⟩∣∣ :

∥∥x′∥∥≤ 1

. (C.2)

A third consequence is that E is separable whenever E ′ is.


Given normed spaces E,F and an operator T ∈L (E;F) we define its adjointoperator T ′ ∈L (F ′;E ′) by T ′y′ := y′ T , y′ ∈ F ′. Using the canonical duality thisjust means ⟨

T x,y′⟩=⟨x,T ′y′

⟩(x ∈ E,y′ ∈ F ′).

The map (T 7→ T ′) : L (E;F)→ L (F ′;E ′) is linear and isometric, and one has(ST )′ = T ′S′ for T ∈L (E;F) and S ∈L (F ;G).

The space E ′′ is called the bi-dual of E. The mapping

E→ E ′′, x 7→ 〈x, ·〉=(x′ 7→

⟨x,x′⟩)

is called the canonical embedding. The canonical embedding is a linear isometry(by (C.2)), and hence one can consider E to be a subspace of E ′′. Given T ∈L (E;F)one has T ′′|E = T . If the canonical embedding is surjective, in sloppy notation:E = E ′′, the space E is called reflexive.

C.5 The Weak∗ Topology

Let E be a Banach space and E ′ its dual. The coarsest topology on E ′ that makes allmappings

x′ 7→⟨x,x′⟩

(x ∈ E)

continuous, is called the weak∗ topology (or: σ(E ′,E)-topology) on E ′ (cf. Ap-pendix A.4). A fundamental system of open (and convex) neighbourhoods for apoint y′ ∈ E ′ is given by the sets

x′ ∈ E ′ : max1≤ j≤d

∣∣⟨x j,x′− y′⟩∣∣< ε

(d ∈ N, x1, . . . ,xd ∈ E, ε > 0).

Since the weak∗ topology is the projective topology with respect to the functions(〈x, ·〉)x∈E , a net (x′α)α ⊆ E ′ converges weakly∗ (that is, in the weak∗ topology) tox′ ∈ E ′ if and only if 〈x,x′α〉 → 〈x,x′〉 for every x′ ∈ E ′ (cf. Theorem A.14).

Theorem C.4 (Banach–Alaoglu). Let E be a Banach space. Then the dual unitball BE ′ = x′ ∈ E ′ : ‖x′‖ ≤ 1 is weakly∗ compact.

The proof is a more or less straightforward application of Tychonov’s Theorem A.5,see [Rudin (1991), Thm. 3.15]. Since the weak∗ topology is usually not metris-able, one cannot in general test continuity of mappings or compactness of setsvia criteria using sequences. Therefore, the following theorem is often useful, see[Dunford and Schwartz (1958), V.5.1].

Theorem C.5. Let E be a Banach space. Then the weak∗ topology on the dual unitball BE ′ is metrisable if and only if E is separable.

C.6 The Weak Topology 325

Recall that E can be considered (via the canonical embedding) as a norm closedsubspace of E ′′. The next theorem shows in particular that E is σ(E ′′,E ′)-dense inE ′′, see [Dunford and Schwartz (1958), V.4.5].

Theorem C.6 (Goldstine). Let E be a Banach space. Then the closed unit ball BEof E is σ(E ′′,E ′)-dense in the closed unit ball of E ′′.

C.6 The Weak Topology

Let E be a Banach space and E ′ its dual. The coarsest topology on E that makes allmappings

x 7→⟨x,x′⟩

(x′ ∈ E ′)

continuous is called the weak topology (or: σ(E,E ′)-topology) on E (cf. AppendixA.4). A fundamental system of open (and convex) neighbourhoods of the point y∈Eis given by the sets

x ∈ E : max1≤ j≤d

∣∣⟨x− y,x′j⟩∣∣< ε

(d ∈ N, x′1, . . . ,x

′d ∈ E ′, ε > 0).

Since the weak topology is the projective topology with respect to the functions(〈·,x′〉)x′∈E ′ , a net (xα)α ⊆ E converges weakly (that is, in the weak topology) tox ∈ E if and only if 〈xα ,x′〉 → 〈x,x′〉 for every x′ ∈ E ′ (Theorem A.14). When weview E ⊆ E ′′ via the canonical embedding as a subspace and endow E ′′ with theweak∗ topology, then the weak topology on E coincides with the subspace topology.If A⊆ E is a subset of E then its closure in the weak topology is denoted by clσ A.

Theorem C.7 (Mazur). Let E be a Banach space and let A ⊆ E be convex. Thenthe norm closure of A coincides with its weak closure.

See [Rudin (1991), Thm. 3.12] or [Dunford and Schwartz (1958), V.3.18].

The weak topology is not metrisable in general. Fortunately, there is a series ofdeep results on metrisability and compactness for the weak topology, facilitating itsuse. The first is the analogue of Theorem C.5, see [Dunford and Schwartz (1958),V.5.2].

Theorem C.8. The weak topology on the closed unit ball of a Banach space E ismetrisable if and only if the dual space E ′ is norm separable.

The next theorem allows using sequences in testing of weak compactness, seeAppendix E, Theorem E.17 below, or [Dunford and Schwartz (1958), V.6.1] or themore recent [Vogt (2010)].

Theorem C.9 (Eberlein–Smulian). Let E be a Banach space. Then A⊆ E is (rel-atively) weakly compact if and only if A is (relatively) weakly sequentially compact.


Although the weak topology is usually far from being metrisable, thefollowing holds, see the proof of Theorem 16.19 in the main text or[Dunford and Schwartz (1958), V.6.3].

Theorem C.10. The weak topology on a weakly compact subset of a separable Ba-nach space is metrisable.

The next result is often useful when considering Banach-space valued (weak)integrals.

Theorem C.11 (Kreın–Smulian). The closed convex hull of a weakly compact sub-set of a Banach space is weakly compact.

For the definition of the closed convex hull see Section C.7 below. For a proof seeAppendix E, Theorem E.7 below or [Dunford and Schwartz (1958), V.6.4].

Finally we state a characterisation of reflexivity, which is a mere combination ofthe Banach–Alaoglu theorem and Goldstine’s theorem.

Theorem C.12. A Banach space is reflexive if and only if its closed unit ball isweakly compact.

For a proof see [Dunford and Schwartz (1958), V.4.7]. In particular, every boundedset is relatively weakly compact if and only if the Banach space is reflexive.

C.7 Convex Sets and their Extreme Points

A subset A⊆ E of a real vector space E is called convex if, whenever a,b ∈ A thenalso ta+(1− t)b∈ A for all t ∈ [0,1]. Geometrically this means that the straight linebetween a and b is contained in A whenever a and b are. Inductively one shows thata convex set contains arbitrary convex combinations

t1a1 + · · ·+ tnan (0≤ t1, . . . , tn ≤ 1, t1 + · · ·+ tn = 1)

whenever a1, . . . ,an ∈ A. Intersecting convex sets yields a convex set, and so for anyset B⊆ E there is a smallest convex set containing B, its convex hull

conv(B) =⋂

A : B⊆ A, A convex.

Alternatively, conv(B) is the set of all convex combinations of elements of B.

A vector space E endowed with a topology such that the operations of additionE×E → E and scalar multiplication K×E → E are continuous, is called a topo-logical vector space. In a topological vector space, the closure of a convex set isconvex. We denote by

conv(B) := conv(B)

C.8 The Strong and the Weak Operator Topology on L (E) 327

the closed convex hull of B⊆ E, which is the smallest closed convex set containingB.

A topological vector space is called locally convex if it is Hausdorff and thetopology has a base consisting of convex sets. Examples are the norm topology of aBanach space E, the weak topology on E and the weak∗ topology on E ′.

Let A ⊆ E be a convex set. A point p ∈ A is called an extreme point of A ifit cannot be written as a convex combinations of two points of A distinct from p.In other words, whenever a,b ∈ A and t ∈ [0,1] such that p = ta+(1− t)b, thena = b = p. Let us denote the set of extreme points of a convex set A with exA.

Theorem C.13 (Kreın–Milman). Let E be a locally convex space and let /0 6= K ⊆E be compact and convex. Then K is the closed convex hull of its extreme points:K = conv(exK). In particular, exK is not empty.

For a proof see [Rudin (1991), Thm. 3.23]. Since for a Banach space E the weak∗

topology on E ′ is locally convex, the Kreın–Milman theorem applies in particularto weakly∗ compact convex subsets of E ′.

Theorem C.14 (Milman). Let E be a locally convex space and let /0 6= K ⊆ E becompact. If convK is also compact, then K contains the extreme points of convK.

A proof is in [Rudin (1991), Thm. 3.25]. Combining Milman’s result with theKreın–Smulian Theorem C.11 yields the following.

Corollary C.15. Let E be a Banach space and let K ⊆ E be weakly compact. ThenK contains the extreme points of convK.

C.8 The Strong and the Weak Operator Topology on L (E)

Let E be a Banach space. Besides the norm topology there are two other canonicaltopologies on L (E). The strong operator topology is the coarsest topology suchthat all evaluation mappings

L (E)→ E, T 7→ T x (x ∈ E)

are continuous. Equivalently, it is the projective topology with respect to the family(T 7→ T x)x∈E . A fundamental system of open (and convex) neighbourhoods of T ∈L (E) is given by the sets

S ∈L (E) : max1≤ j≤d

∥∥T x j−Sx j∥∥< ε

(d ∈ N, x1, . . . ,xd ∈ E, ε > 0).

A net of operators (Tα)α ⊆L (E) converges strongly (i.e., in the strong operatortopology) to T if and only if

Tα x→ T x for every x ∈ E.


We denote by L s(E) the space L (E) endowed with the strong operator topology,and the closure of a subset A⊆L (E) in this topology is denoted by cls A. The strongoperator topology is metrisable on (norm-)bounded sets if E is separable.

The weak operator topology is the coarsest topology such that all evaluationmappings

L (E)→K, T 7→⟨T x,x′

⟩(x ∈ E, x′ ∈ E ′)

are continuous. Equivalently, it is the projective topology with respect to the family(T 7→ 〈T x,x′〉)x∈E,x′∈E ′ . A fundamental system of open (and convex) neighbour-hoods of T ∈L (E) is given by the sets

S ∈L (E) : max1≤ j≤d

∣∣⟨(T −S)x j,x′j⟩∣∣< ε

(d ∈ N, x1, . . . ,xd ∈ E, x′1, . . . ,x

′d ∈ E ′, ε > 0).

A net of operators (Tα)α converges to T in the weak operator topology if and onlyif ⟨

Tα x,x′⟩→⟨T x,x′

⟩for every x ∈ E, x′ ∈ E ′.

The space L (E) considered with the weak operator topology is denoted by L σ (E),and the closure of a subset A⊆L (E) in this topology is denoted by clσ A. The weakoperator topology is metrisable on norm-bounded sets if E ′ is separable.

The following is a consequence of the Uniform Boundedness Principle.

Proposition C.16. Let E be a Banach space and let T ⊆L (E). Then the followingassertions are equivalent.

(i) T is bounded for the weak operator topology, i.e., supT∈T |〈T x,x′〉|< ∞ forall x ∈ E and x′ ∈ E ′.

(ii) T is bounded for the strong operator topology, i.e., supT∈T ‖T x‖ < ∞ forall x ∈ E.

(iii) T is uniformly bounded, i.e., supT∈T ‖T‖< ∞.

The following simple property of strong operator convergence is very useful.

Proposition C.17. For a uniformly bounded net (Tα)α ⊆L (E) and T ∈L (E) thefollowing assertions are equivalent.

(i) Tα → T in L s(E).

(ii) Tα x→ T x for all x from a (norm-)dense set D⊆ E.

(iii) Tα x→ T x uniformly in x from (norm-)compact subsets of E.

We now consider continuity of the multiplication

(S,T ) 7→ ST (S,T ∈L (E))

for the operator topologies considered above.

C.9 Spectral Theory 329

Proposition C.18. The multiplication on L (E) is

a) jointly continuous for the norm topology in L (E);

b) jointly continuous on bounded sets for the strong operator topology;

c) separately continuous for the weak and strong operator topology.

The multiplication is in general not jointly continuous for the weak operator topol-ogy. Indeed, if T is the right shift on `2(Z) given by T ((xk)k∈Z) := (xk−1)k∈Z.Then (T n)n converges to 0 weakly, and the same holds for the left shift T−1. But(T T−1)n = I does not converge to 0.

C.9 Spectral Theory

Let E be an at least one-dimensional complex Banach space and T ∈L (E). Theresolvent set ρ(T ) of T is the set of all λ ∈ C such that the operator λ I− T isinvertible. The function

ρ(T )→L (E), λ 7→ R(λ ,T ) := (λ I−T )−1

is called the resolvent of T . Its complement σ(T ) := C \ρ(T ) is called the spec-trum of T . The resolvent set is an open subset of C, and given λ0 ∈ ρ(T ) one has

R(λ ,A) = ∑∞

n=0(λ0−λ )nR(λ0,T )n+1 for |λ −λ0|< ‖R(λ0,T )‖−1.

In particular, dist(λ0,σ(T ))≥ ‖R(λ0,T )‖−1, showing that the norm of the resolventblows up when λ0 approaches a spectral point.

Every λ ∈ C with |λ | > ‖T‖ is contained in ρ(T ), and in this case R(λ ,T ) isgiven by the Neumann series

R(λ ,T ) := ∑∞

n=0 λ−(n+1)T n.

In particular, one has r(T )≤ ‖T‖, where

r(T ) := sup|λ | : λ ∈ σ(T )

is the spectral radius of T . An important result states that the spectrum is alwaysa nonempty compact subset of C and the spectral radius can be computed by theformula

r(T ) = limn→∞‖T n‖1/n = inf

n∈N‖T n‖1/n.

(See [Rudin (1991), Thm 10.13] for a proof.) An operator is called power-boundedif supn ‖T n‖< ∞. From the spectral radius formula it follows that the spectrum of apower-bounded operator is contained in the closed unit disc. For isometries one hasmore precise information.


Proposition C.19. If E is a complex Banach space and T ∈L (E) is an isometry,then exactly one of the following two cases holds.

1) T is not surjective and σ(T ) = z : |z| ≤ 1 is the full closed unit disc.

2) T is an isomorphism and σ(T )⊆ T.

An important part of the spectrum is the point spectrum

σp(T ) :=

λ ∈ C : (λ I−T ) is not injective.

Each λ ∈ σp(T ) is called an eigenvalue of T . For λ ∈ σp(T ) the closed subspace

ker(λ I−T )

is called the corresponding eigenspace, and every nonzero element 0 6= x ∈ker(λ I− T ) is a corresponding eigenvector. An eigenvalue λ is called simple ifits eigenspace is one-dimensional.

C.10 Hilbert Spaces

An inner product on a complex vector space E is a mapping ( · | ·) : E×E→C thatis sequilinear, i.e., satisfies

( f +λg |h) = ( f |h)+λ (g |h) (C.3)

(h | f +λg) = (h | f )+λ (h |g) ( f ,g,h ∈ E, λ ∈ C), (C.4)

and symmetric, i.e., satisfies

( f |g) = (g | f ) ( f ,g ∈ E)

and positive definite, i.e., satisfies

0 6= f ⇒ ( f | f )> 0.

The norm associated with an inner product on E is

‖ f‖ :=√

( f | f ). (C.5)

For inner product and norm the the following elementary identities hold:

a) ‖ f +g‖2 = ‖ f‖2 +‖g‖2 +2Re( f |g);b) 4Re( f |g) = ‖ f +g‖2−‖ f −g‖2;

c) 2‖ f‖2 +2‖g‖2 = ‖ f +g‖2 +‖ f −g‖2.

Here b) is called the polarisation identity and a) is called the parallelogram iden-tity.

C.10 Hilbert Spaces 331

In an inner product space two vectors f ,g are called orthogonal, in symbols:f ⊥ g, if ( f |g) = 0. For a set A⊆ E one defines

A⊥ :=

f ∈ E : f ⊥ g ∀ g ∈ A.

Two sets A,B⊆ E are called orthogonal, in symbols A⊥ B if f ⊥ g for all f ∈ A,g∈B. Property a) above accounts for Pythagoras’ lemma

f ⊥ g ⇐⇒ ‖ f +g‖2 = ‖ f‖2 +‖g‖2 .

That (C.5) indeed defines a norm on E is due to the Cauchy–Schwarz inequality

|( f |g)| ≤ ‖ f‖ ‖g‖ ( f ,g ∈ E).

A inner product space is a vector space E together with an inner product on it. Aninner product space (E,( · | ·)) is a Hilbert space if E is a Banach space with respectto the norm (C.5).

The following result is fundamental in the theory of Hilbert spaces. For a proofsee [Rudin (1991), Theorem 4.12] or [Conway (1990), Theorem I.3.4].

Theorem C.20 (Riesz–Frechet). Let H be a Hilbert space and let ψ : H → C bea bounded linear functional on H. Then there is a unique g ∈ H such that ψ( f ) =( f |g) for all f ∈ H.

An important consequence of the Riesz–Frechet theorem is that a net ( fα)α in Hconverges weakly to a vector f ∈ H if and only if ( fα − f |g)→ 0 for each g ∈ H.

Corollary C.21. Each Hilbert space is reflexive. In particular, the closed unit ballof a Hilbert space is weakly compact.

Also the following result is based on the Riesz–Frechet theorem.

Corollary C.22. Let H,K be Hilbert spaces and let b : H×K→C be a sesquilinearmap (see (C.3),(C.4)) such that there is M ≥ 0 with

|b( f ,g)| ≤M ‖ f‖ ‖g‖ ( f ∈ H,g ∈ K).

Then there is a unique linear operator S : H→ K with

b( f ,g) = (S f |g) ( f ∈ H,g ∈ K).

For a given bounded linear operator T : K → H there is hence a unique linearoperator T ∗ : H→ K such that

(T ∗ f |g)K = ( f |T g)H ( f ∈ H,g ∈ K).

The operator T ∗ is called the (Hilbert space) adjoint of T .

An operator T ∈L (H) is called self-adjoint if T ∗ = T . Self-adjoint operatorshave a nice spectral theory, in particular if they are compact.


Theorem C.23 (Spectral Theorem). Let T be a compact self-adjoint operator on aHilbert space H. Then H has an orthonormal basis consisting of eigenvectors of T ,and every eigenspace corresponding to a nonzero eigenvector is finite-dimensional.

More precisely, A is the norm convergent sum

A = ∑λ∈ΛλPλ (C.6)

where Λ ⊆C\0 is a finite or countable set, for each λ ∈Λ the eigenspace Eλ :=ker(λ I−A) satisfies 0 < dimEλ < ∞ and Pλ is the orthogonal projection onto Eλ .

It is easy to see that the eigenspaces Eλ are pairwise orthogonal. A proof of theSpectral Theorem can be found in [Conway (1990), II.5.1].

Appendix DThe Riesz Representation Theorem

In this appendix we give a proof for the Riesz Representation Theorem 5.7. In thewhole section, K denotes a nonempty compact topological space and Ba(K) andBo(K) denote the σ -algebra of Baire and Borel sets of K, respectively, i.e.,

Ba(K) = σ[ f > 0 ] : 0≤ f ∈ C(K)

and Bo(K) = σ

O : O⊆ K open

.

Moreover, M(K) denotes the space of complex Baire measures on K. It will eventu-ally become clear that each such Baire measure has a unique extension to a regularBorel measure.

Each µ ∈M(K) induces a linear functional

dµ : C(K)→ C, dµ( f ) := 〈 f ,µ〉 :=∫

Kf dµ.

This functional is bounded (by ‖µ‖M) since

|〈 f ,µ〉| ≤∫

K| f | d|µ| ≤ ‖ f‖

∞|µ|(K) = ‖ f‖

∞‖µ‖M

for each f ∈ C(K). In this way we obtain a linear and contractive map

d : M(K)→ C(K)′, µ 7→ dµ = 〈·,µ〉 (D.1)

of M(K) into the dual space of C(K). The statement of the Riesz representationtheorem is that this map is an isometric isomorphism.

D.1 Uniqueness

Let us introduce the space

BM(K) :=

f : K→ C : f is bounded and Baire measurable.

333

334 D The Riesz Representation Theorem

This is a commutative C∗-algebra with respect to the sup-norm. Moreover, it isclosed under the so-called bp-convergence. We say that a sequence ( fn)n∈N of func-tions on K converges boundedly and pointwise (or: bp-converges) to a function fif

(1) supn∈N‖ fn‖∞

< ∞ and (2) fn(x)→ f (x) for each x ∈ K.

The following result is very useful in extending results from continuous to generalBaire measurable functions.

Theorem D.1. Let K be a compact topological space, and let E ⊆BM(K) such that(1) C(K)⊆ E and (2) E is closed under bp-convergence. Then E = BM(K).

Proof. By taking the intersection of all subsets E of BM(K) with the properties (1)and (2), we may suppose that E is the smallest one, i.e., is contained in each of them.We show first that E is a linear subspace of BM(K). For f ∈ BM(K) we define

E f :=

g ∈ E : f +g ∈ E.

If f ∈ C(K), then E f has the properties (1) and (2), whence E ⊆ E f . This meansthat C(K)+E ⊆ E. In particular, it follows that E f has properties (1) and (2) evenif f ∈ E. Hence E ⊆ E f for each f ∈ E, whence E +E ⊆ E. For λ ∈ C the set

g ∈ E : λg ∈ E

has properties (1) and (2), whence λE ⊆ E. This establishes that E is a linear sub-space of BM(K).

In the same way one can show that if f ,g∈ E, then | f | ∈ E and f g∈ E. It followsin view of condition (2) that the set

F :=

A ∈ Ba(K) : 1A ∈ E

is a σ -algebra. If 0≤ f ∈C(K) then n f ∧ 1 1[ f>0 ] pointwise. Hence [ f > 0 ]∈ Σ ,and therefore Σ =Ba(K). By standard measure theory, see Lemma B.7, lin1A : A∈Ba(K) is sup-norm dense in BM(K), so by (2) we obtain E = BM(K).

As a consequence we recover Lemma 5.5.

Corollary D.2 (Uniqueness). Let K be a compact topological space. If µ,ν arecomplex Baire measures on K such that∫

Kf dµ =

∫K

f dν for all f ∈ C(K),

then µ = ν .

Proof. a) Define E := f ∈ BM(K) :∫

f dµ =∫

f dν. By dominated convergence,E satisfies (1) and (2) in Theorem D.1, hence E = BM(K).

D.2 The Space C(K)′ as a Banach Lattice 335

To prove that the mapping d defined in (D.1) is isometric, we need a refinedapproximation result.

Lemma D.3. Let 0≤ µ ∈M(K). Then C(K) is dense in Lp(K,µ) for each 1≤ p <∞. More precisely, for f ∈ L∞(K,µ) there is a sequence of continuous functionsfn ∈ C(K) such that fn→ f µ-a.e. and | fn| ≤ ‖ f‖L∞ for all n ∈ N.

Proof. Let 1≤ p < ∞ and let Y be the Lp-closure of C(K) with respect to µ . DefineE :=BM(K)∩Y . Then E satisfies (1) and (2) of Theorem D.1, whence E =BM(K).But this implies that L∞(K,µ)⊆ Y , whence Y = Lp(K,µ).

For the second assertion, let 0 6= f ∈ L∞(K,µ). Then, by the part already proved,there is a sequence (gn)n∈N of continuous functions such that gn→ f in L1. Passingto a subsequence we may suppose that gn→ f µ-almost everywhere. Now define

fn :=

‖ f‖

∞gn/ |gn| on [ |gn| ≥ ‖ f‖

∞],

gn on [ |gn| ≤ ‖ f‖∞].

Then fn is continuous, | fn| ≤ ‖ f‖∞

, and fn→ f µ-almost everywhere.

Now the claimed isometric property of the mapping d is in reach.

Corollary D.4. The map d : M(K)→ C(K)′ is isometric.

Proof. Fix µ ∈M(K). The functional dµ is bounded by 1 with respect to the normof L1(K, |µ|):

|〈 f ,µ〉| ≤ 〈| f | , |µ|〉= ‖ f‖L1(|µ|)

for f ∈ C(K). Hence there is g ∈ L∞(K, |µ|) such that

〈 f ,µ〉=∫

Kf gd|µ| for all f ∈ C(K).

Let h := g/ |g| on the set where g 6= 0, and h = 0 elsewhere. By Lemma D.3 fromabove there is a sequence of continuous functions fn ∈ C(K) with ‖ fn‖∞

≤ 1 andfn→ h in L1(K, |µ|). But then

〈 fn,µ〉=∫

Kfngd|µ| →

∫K

hgd|µ|= |µ|(K) = ‖µ‖M .

This implies that ‖µ‖M ≤ ‖dµ‖C(K)′ , whence d is isometric.

D.2 The Space C(K)′ as a Banach Lattice

In this section we reduce the problem of proving the surjectivity of the mapping dfrom (D.1), i.e., the representation problem, to positive functionals. We do this byshowing that each bounded linear functional on C(K) can be written as a linear com-bination of positive functionals. Actually we shall show more: C(K)′ is a complex


Banach lattice. (See Definition 7.2 for the terminology and recall from Example7.3.2 that C(K) is a complex Banach lattice.)

Let us recall that a linear functional λ : C(K)→C is called positive, written λ ≥ 0,if

λ ( f )≥ 0 for all 0≤ f ∈ C(K).

This means that λ is a positive operator in the sense of Section 7.1.

Lemma D.5. Each positive linear functional λ on C(K) is bounded with norm‖λ‖= λ (1).

Proof. This is actually a general fact from Banach lattice theory, see Lemma 7.5,but we give a direct proof here. Let f ∈C(K) and determine c∈T such that cλ ( f ) =|λ ( f )|. Then

|λ ( f )|= Re(cλ ( f )) = Reλ (c f ) = λ (Rec f )≤ λ | f | ≤ λ (1) ‖ f‖∞.

Inserting f = 1 concludes the proof.

Our aim is to show that the notion of positivity from the above turns C(K)′ into aBanach lattice. In order to to decompose C(K)′ naturally into a real and an imaginarypart, for λ ∈ C(K)′ we define λ ∗ ∈ C(K) by

λ∗( f ) = λ ( f ) ( f ∈ C(K))

and subsequently

Reλ :=12(λ +λ

∗) and Imλ :=12i(λ −λ

∗).

A linear functional λ ∈ C(K)′ is called real if λ = λ ∗, and

C(K)′R :=

λ ∈ C(K)′ : λ = λ∗

is the R-linear subspace of real functionals. Then we have

λ ∈ C(K)′R ⇔ λ ( f ) ∈ R for all 0≤ f ∈ C(K).

Moreover, λ = Reλ + i Imλ for each λ ∈ C(K)′, and hence

C(K)′ = C(K)′R⊕ iC(K)′R

is the desired decomposition. Clearly, each positive linear functional is real, and by

λ ≤ µDef.⇔ λ ( f )≤ µ( f ) for each 0≤ f ∈ C(K)

a partial order is defined in C(K)′R turning it into a real ordered vector space.In the next step we construct the modulus mapping, and for the we need a tech-

nical tool.


Theorem D.6 (Partition of Unity). Let K be a compact space, and let O1, . . . ,On⊆K be open subsets such that

K = O1∪·· ·∪On.

Then there are functions 0≤ f j ∈ C(K), j = 1, . . . ,n, with

∑nj=1 f j = 1 and supp( f j)⊆ O j for all j = 1, . . . ,n. (D.2)

Proof. We first construct compact sets A j ⊆ O j for j = 1, . . . ,n with

K =n⋃

j=1

A j. (D.3)

By hypothesis, for each x ∈ K there is j(x) ∈ 1, . . . ,n such that x ∈ O j(x). Sincesingletons are closed, Lemma 4.1 yields a closed set Bx and an open set Ux such that

x ∈Ux ⊆ Bx ⊆ O j(x) (x ∈ K).

By compactness, there is a finite set F ⊆ K such that K ⊆⋃

x∈F Ux. Define

A j :=⋃

Ax : x ∈ F, j(x) = j

( j = 1, . . . ,n).

Then every A j is closed, A j ⊆ O j and (D.3) holds as required.

In the second step we apply Urysohn’s Lemma 4.2 to obtain continuous functionsg j ∈ C(K) such that

1A j ≤ g j ≤ 1O j and supp(g j)⊆ O j ( j = 1, . . . ,n).

Now note that ∑nk=1 gk ≥ ∑

nk=1 1Ak ≥ 1, whence the well-defined continuous func-

tionsf j :=

g j

∑nk=1 gk

( j = 1, . . . ,n)

are positive and satisfy ∑nj=1 f j = 1. Moreover, supp( f j)⊆ supp(g j)⊆ O j for each

j = 1, . . . ,n, as required.

The sequence ( f j)nj=1 yielded by the theorem above are called partition of unity

subordinate to the covering O1, . . . ,On.

For 0≤ f ∈ C(K) we consider the “order-ball”

B f :=

g ∈ C(K) : |g| ≤ f

around 0 with “radius” f . Based on the partitions of unity, we can prove the follow-ing result.

Lemma D.7. For a compact space K the following assertions hold.


a) Given 0≤ f ∈ C(K), the set of functions g = ∑nj=1 α j f j with

n ∈ N, 0≤ f j ∈ C(K), α j ∈ T ( j = 1, . . . ,n), ∑nj=1 f j ≤ f

is dense in B f .

b) Given 0≤ f1, f2 . . . , fn ∈ C(K), the set of functions g = ∑nj=1 g j with

g j ∈ C(K),∣∣g j∣∣≤ f j ( j = 1, . . . ,n)

is dense in B f1+···+ fn .

Proof. a) Let |g| ≤ f and ε > 0. For each x∈K let αx ∈T such that αx |g(x)|= g(x).By compactness of K there are finitely many points x1, . . . ,xn ∈K such that the opensets

U j :=[∣∣g−αx j |g|

∣∣< ε]

( j = 1, . . . ,n)

cover K. Let (ψ j)nj=1 be a subordinate partition of unity and define f j := |g|ψ j ≥ 0.

Then∑

nj=1 f j = |g| ≤ f and

∣∣∣g−∑nj=1 αx j f j

∣∣∣≤ ε.

b) Fix |g| ≤ f1 + · · ·+ fn and ε > 0. The open sets

U1 :=[ |g|> 0 ] and U2 :=[ |g|< ε ]

cover K. Take a subordinate partition of unity ψ and 1−ψ and define

g j :=

ψg f j

f1+···+ fnon supp(ψ),

0 on [ψ = 0 ]

for j = 1, . . . ,n. Then g1 +g2 + · · ·+gn = ψ2g,∣∣g j∣∣≤ f j on K and g j ∈ C(K). The

proof is concluded by ∣∣∣g−∑nj=1 g j

∣∣∣= |(1−ψ)| |g| ≤ ε1.

Finally, we can define the modulus of a functional λ ∈ C(K)′ by

|λ |( f ) := sup|λ (g)| : g ∈ C(K), |g| ≤ f

= sup

g∈B f

|λ (g)| (D.4)

for 0 ≤ f ∈ C(K). This is a finite number since we have |λ |( f ) ≤ ‖λ‖‖ f‖∞< ∞.

On a moments reflection, we also notice that

‖λ‖= |λ |(1) (λ ∈ C(K)′). (D.5)

The following is the key step.

Lemma D.8. For λ ∈ C(K)′, the mapping |λ | defined by (D.4) on the positive coneon C(K) extends (uniquely) to a bounded linear functional on C(K).


Proof. The positive homogeneity of |λ | is obvious from the definition. We turn toprove additivity of |λ |. Take f1, f2 ≥ 0 and |g1| ≤ f1 and |g2| ≤ f2. Then for certainc1,c2 ∈ T

|λ (g1)|+ |λ (g2)|= |c1λ (g1)+ c2λ (g2)|= |λ (c1g1 + c2g2)| ≤ |λ |( f1 + f2)

since |c1g1 + c2g2| ≤ f1 + f2. By varying g1,g2 we obtain

|λ | f1 + |λ | f2 ≤ |λ |( f1 + f2).

For the converse inequality take |g| ≤ f1 + f2 and ε > 0. By Lemma D.7 we find|g1| ≤ f1 and |g2| ≤ f2 such that ‖g− (g1 +g2)‖∞

≤ ε . This yields

|λ (g)| ≤ |λ (g1 +g2)|+ ε ‖µ‖ ≤ |λ | f1 + |λ | f2 + ε ‖λ‖ .

Letting ε 0 and varying g yields |λ |( f1 + f2)≤ |λ | f1 + |λ | f2 as desired.

Finally, we extend |λ | first to all of C(K;R) by

|λ |( f ) := |λ |( f+)−|λ |( f−) ( f ∈ C(K;R)),

and then to C(K) by

|λ |( f ) := |λ |(Re f )+ i |λ |(Im f ) ( f ∈ C(K)).

To prove that in this way |λ | becomes a linear mapping on all of C(K) is tedious butstraighforward, and we omit the proof.

We note that by definition

|λ ( f )| ≤ |λ | | f | for all f ∈ C(K).

It follows that if λ is a real functional, then λ ≤ |λ |. Hence λ = λ − (|λ |−λ ) is adifference of positive functionals.

Corollary D.9. Every bounded real functional on C(K) is the difference of two pos-itive linear functionals. The positive linear functionals generate C(K)′ as a vectorspace.

In order to complete the proof that C(K)′ is a Banach lattice we note that by (D.5)we have

|λ | ≤ |µ| ⇒ ‖λ‖ ≤ ‖µ‖ ,

and hence the norm on C(K)′ is a lattice norm. The next result connects the modulusand the real lattice structure, characteristic for complex vector lattices, see Definition7.2 and [Schaefer (1974), Definition II.11.3].

Proposition D.10. For λ ∈ C(K)′ one has |λ | = supc∈T Re(cλ ) in the ordering ofC(K)′R.


Proof. Fix c ∈ T and f ≥ 0. Then

[Re(cλ )] f =12(cλ + cλ

∗)( f ) =12(cλ ( f )+ cλ ( f )

)= Re[cλ ( f )]≤ |λ ( f )| ≤ |λ | f .

On the other hand, suppose that ν satisfies Re(cλ ) ≤ ν for all c ∈ T. Let g =

∑nj=1 α j f j ∈ B f be as in Lemma D.7, and let c j ∈ T be such that c jλ ( f j) =

∣∣λ ( f j)∣∣

for each j. Then

|λ (g)| ≤ ∑nj=1

∣∣α j∣∣ ∣∣λ ( f j)

∣∣= ∑nj=1

∣∣λ ( f j)∣∣

= ∑nj=1

(Re(c jλ )

)( f j)≤ ∑

nj=1 ν( f j)≤ ν( f ).

By Lemma D.7 we obtain that |λ (g)| ≤ ν( f ) for all g∈ B f , and hence |λ | ≤ ν . Thiscompletes the proof.

D.3 Representation of Positive Functionals

By Corollary D.9, to complete the proof of the Riesz representation theorem it suf-fices to establish the following result.

Theorem D.11 (Riesz–Markov). Let λ be a positive linear functional on the Ba-nach space C(K). Then λ is bounded and there is a unique positive regular Borelmeasure µ ∈M(K) such that

λ ( f ) =∫

Kf dµ for all f ∈ C(K).

Our presentation is a modification of the treatments in [Rudin (1987), Chapter 2]and [Lang (1993), Chapter IX].

Let K be a compact topological space, and let λ : C(K)→ C be a positive lin-ear functional. We shall construct a positive regular Borel measure µ ∈M(K) withλ = 〈·,dµ〉. Such representing measure is necessarily unique: If µ,ν are both reg-ular Borel measures representing λ , then there Baire restrictions must coincide byCorollary D.2, and then they must be equal by Proposition 5.6.

The program for the construction of the measure µ is now as follows. First, weconstruct an outer measure µ∗. Next, we show that every open set is µ∗-measurablein the sense of Caratheodory. This yields the measure µ on the Borel sets. Then weshow that µ is regular and, finally, that λ is given by integration against µ .

For an open set O⊆ K we define

µ∗(O) := sup

λ ( f ) : f ∈ C(K), 0≤ f ≤ 1, supp( f )⊆ O

.

Then µ∗( /0) = 0, µ∗ is monotone, and µ∗( /0)≤ µ∗(K) = λ (1).

D.3 Representation of Positive Functionals 341

Lemma D.12. The map µ∗ is σ -subadditive on open sets.

Proof. Suppose that O =⋃

k∈N Ok, and take 0≤ g ∈ C(K) such that 0≤ g≤ 1 andsupp(g)⊆O. Then the compact set supp(g) is covered by the collection of open setsOk, and hence there is n ∈ N such that

supp(g)⊆n⋃

j=1

O j.

Take a partition of unity f0, . . . , fn on K subordinate to the cover O0 := K \ supp(g),O1, . . . ,On, see Theorem D.6. Then each g j := f · f j has support within O j and

g = g ·∑mj=0 f j = ∑

mj=1 g j.

It follows that

λ (g) = ∑mj=1 λ (g j)≤ ∑

mj=1 µ

∗(O j)≤ ∑∞

j=1 µ∗(O j).

Taking the supremum on the left-hand side yields µ∗(O) ≤ ∑

∞

j=1 µ∗(O j) as

claimed.

We now extend the definition of µ∗ to all subsets A⊆ K by

µ∗(A) := inf

µ∗(O) : A⊆ O⊆ K, O open

. (D.6)

Recall from Appendix B.3 the notion of an outer measure.

Lemma D.13. The set function µ∗ : P(K)→ [0,λ (1)] is an outer measure.

Proof. By Lemma D.12, µ∗ coincides with the Hahn-extension to P(K) of its re-striction to open sets, see Appendix B.4. Hence it is an outer measure.

By Caratheodory’s Theorem B.3, the outer measure µ∗ is a measure on the σ -algebra

M(µ∗) :=

A⊆ K : ∀ H ⊆ K : µ∗(H)≥ µ

∗(H ∩A)+µ∗(H \A)

of “µ∗-measurable” sets. Our next aim is to show that each open set is µ∗-measurable. We need the following auxiliary result.

Lemma D.14. If A,B⊆ K such that A∩B = /0, then µ∗(A∪B) = µ∗(A)+µ∗(B).

Proof. Fix ε > 0 and take an open set O of K such that

µ∗(O)≤ µ

∗(A∪B)+ ε.

Then take, by Lemma 4.1, open sets U,V of K such that

A⊆U, B⊆V, U ∩V = /0.


Now take continuous functions 0 ≤ f ,g ≤ 1 such that supp( f ) ⊆ U ∩ O andsupp(g)⊆V ∩O. Then 0≤ f +g≤ 1 and

supp( f +g)⊆ supp( f )∪ supp(g)⊆ (U ∩O)∪ (V ∩O)⊆ O.

Hence λ ( f )+λ (g) = λ ( f +g) ≤ µ∗(O) ≤ µ∗(A∪B)+ ε . By varying f and g weconclude that

µ∗(A)+µ

∗(B)≤ µ∗(U ∩O)+µ

∗(V ∩O)≤ µ∗(A∪B)+ ε,

and this finishes that proof.

The next result says that µ∗ is regular on open sets.

Lemma D.15. One has µ∗(O) = supµ∗(L) : L⊆O, L compact for each open setO⊆ K.

Proof. Fix t < µ∗(O) and f ∈ C(K) such that 0 ≤ f ≤ 1, L := supp( f ) ⊆ O andt < λ ( f ). Then for any open set U ⊇ L we have supp( f ) ⊆U , and hence λ ( f ) ≤µ∗(U). As U was arbitrary, λ ( f )≤ µ∗(L), and this concludes the proof.

We have all tools in our hands to take the next step.

Lemma D.16. Every open set is µ∗-measurable.

Proof. Let O be an open set, and let H ⊆ K be an arbitrary subset. We need to showthat

µ∗(H)≥ µ

∗(H ∩O)+µ∗(H \O). (D.7)

Fix an open set U ⊇ H and a compact set L ⊆U ∩O. Then H \O ⊆ Oc, which isdisjoint from L. Hence by Lemma D.14 we obtain

µ∗(U)≥ µ

∗(L∪ (H \O)) = µ∗(L)+µ

∗(H \O).

By Lemma D.15 it follows that

µ∗(U)≥ µ

∗(U ∩O)+µ∗(H \O)≥ µ

∗(H ∩O)+µ∗(H \O).

By taking the infimum with respect to U ⊇ H we arrive at (D.7).

At this stage we conclude from Caratheodory’s Theorem B.3 that the restrictionµ of µ∗ to the Borel algebra of K is a (finite) measure of total mass µ(K) = λ (1).

Lemma D.17. The measure µ is regular.

Proof. Define

D :=

B ∈ Bo(K) : B has the regularity properties as in (5.3).

D.3 Representation of Positive Functionals 343

By Lemma D.15, Σ contains all open sets. We shall show that D is a Dynkin system.Since the open sets form a ∩-stable generator of the Borel algebra, it follows thatD= Bo(K).

Clearly /0,K ∈ D. If A,B ∈ D with A ⊆ B and ε > 0 then there are L ⊆ A ⊆ O,L′ ⊆ B ⊆ O′, L,L′ compact and O,O′ open, such that µ(O \ L) + µ(O′ \ L′) < ε .Then

L′ \O⊆ B\A⊆ O′ \L, L′ \O compact, O′ \L open.

Moreover, (O′ \L)\ (L′ \O)⊆ (O\L)∪ (O′ \L′) and hence

µ((O′ \L)\ (L′ \O))≤ µ(O\L)+µ(O′ \L′)< ε.

Since ε > 0 was arbitrary, it follows that B \A ∈ D. Finally, suppose that An ∈ D

and An A. Find Ln ⊆ An ⊆On, Ln compact and On open, such that µ(On \Ln)< ε .Without loss of generality we may suppose that Ln. Let O :=

⋃n∈N On, and find

N such that µ(O\ON)< ε . Then LN ⊆ A⊆ O and

µ(O\LN)≤ µ(O\ON)+µ(ON \LN)≤ 2ε.

It follows that A ∈D, and that concludes the proof.

Finally, we conclude the proof of the Riesz–Markov Theorem D.11 by showing thatµ induces the functional λ .

Lemma D.18. We have λ ( f ) =∫

K f dµ for all f ∈ C(K).

Proof. By considering real and imaginary part separately we may suppose that f isreal-valued. Then there are real numbers a,b such that K =[a≤ f < b ]. Fix ε > 0and take points

a = t0 < t1 < · · ·< tn = b

with t j− t j−1 < ε . The sets

A j :=[

t j−1 < f ≤ t j]

( j = 1, . . . ,n)

form a partition of K into disjoint Baire sets, and the step function

g :=n

∑j=1

t j1A j satisfies f ≤ g≤ f + ε1.

For each j we choose s j < t j. Then the open sets[

s j−1 < f < t j], j = 1, . . . ,n, cover

K. Let ( f j)nj=1 be a partition of unity on K subordinate to this cover. Then f f j ≤ t j f j

for each j, hence

λ ( f ) =n

∑j=1

λ ( f f j)≤n

∑j=1

t jλ ( f j)≤n

∑j=1

t jµ[

s j−1 < f < t j].

(The last inequality here comes from the definition (D.6) of µ .) Now we let s j−1t j−1 for each j = 1, . . . ,n and obtain


λ ( f )≤n

∑j=1

t jµ[

t j−1 ≤ f < t j]=∫

Kgdµ ≤

∫K

f dµ + εµ(K).

Since ε > 0 was arbitrary, this implies that λ ( f )≤∫

K f dµ . Finally, we replace f by− f and arrive at λ ( f ) =

∫K f dµ .

With the Riesz theorem we have established an isometric isomorphism betweenthe dual space C(K)′ and the space M(K) of complex Baire (regular Borel) mea-sures on K, allowing us to identify functionals with measures. This may lead to theproblem that the symbol |µ| now has two meanings, depending on whether we inter-pret µ as a functional (Section D.2 or as a measure (Appendix B.9.) The next resultshows that this ambiguity is only virtual. More precisely, it says that the isometricisomorphism

d : M(K)→ C(K)′

from the Riesz’ representation theorem is a lattice isomorphism.

Lemma D.19. For µ ∈M(K) one has |dµ|= d |µ|.

Proof. For |g| ≤ f ∈ C(K) we have

|(dµ)(g)|=∣∣∣∣∫K

gdµ

∣∣∣∣≤ ∫K|g| d|µ| ≤ (d |µ|)( f ).

Hence, |dµ| ≤ d |µ|. For the converse, let ν be the unique positive regular Borelmeasure on K such that dν = |dµ|. Then |

∫K f dµ| ≤

∫K | f | dν for all f ∈ C(K). By

virtue of Theorem D.1 we obtain that∣∣∣∣∫Kf dµ

∣∣∣∣≤ ∫K| f | dν

holds even for all f ∈ BM(K). By the definition of the measure |µ| (Appendix B.9)it follows that |µ| ≤ ν , and hence d |µ| ≤ dν = |dµ|.

Appendix ETheorems of Eberlein, Grothendieck, and Ellis

Let K be a compact space. We denote by Cp(K) the space C(K) endowed with thepointwise topology, i.e., as a topological subspace of

X := CK = ∏x∈KC

with the product topology. If M ⊆ Cp(K) then its closure in X is denoted byMp, so that Mp ∩ C(K) is its closure in Cp(K). We shall also speak of p-open/closed/dense/. . . subsets of C(K) to refer to these notions in the pointwisetopology.

We begin with a central auxiliary result.

Lemma E.1. Let M ⊆ C(K), and let (xm)m∈N ⊆ K and x ∈ K such that

f (xm)→ f (x) (E.1)

for every f ∈M. Then (E.1) holds for every f ∈Mp∩C(K).

Proof. To begin with, let A :=⋂

j∈N xm : m≥ j be the set of accumulation pointsof the sequence (xn)n. By assumption, for each g ∈M we have

g(A)⊆⋂

j∈N

g(xm) : m≥ j

= g(x),

i.e., g is constant with value g(x) on A. Hence, by the definition of the pointwisetopology, it follows that f (A)⊆ f (x) for every f ∈Mp.

On the other hand, if f ∈ C(K) and s ∈ C is an accumulation point of ( f (xn))n,then, by compactness of K and the continuity of f , we have s ∈ f (A). Combiningthis with the previous observation, we see that for f ∈Mp∩C(K) the point f (x) isthe unique accumulation point of the sequence ( f (xn))n, and hence its limit.

345

346 E Theorems of Eberlein, Grothendieck, and Ellis

E.1 Separability and Metrisability

Every compact metrisable space is separable, but the converse is not true. The nextresult shows that compact subsets of Cp(K), however, are special in this respect.

Theorem E.2. Let A ⊆ Cp(K) be compact. Then A is separable if and only if A ismetrisable.

Proof. For the nontrivial implication suppose that A is separable, and let M = fn :n ∈ N ⊆ A be p-dense in A. The function

d : K×K→ R+, d(x,y) := ∑∞

n=1| fn(x)− fn(y)|‖ fn‖∞

2n .

is a continuous pseudo-metric on K. By looking at the d-balls Ux :=[d(x, ·)< 1/n ]and using the compactness of K one finds a countable set xm : m ∈ N ⊆ K that isd-dense in K.

Now define cm := 1+ sup f∈A | f (xm)|, which is finite by the compactness of A, and

e : A×A→ R+, e( f ,g) := ∑∞

m=1| f (xm)−g(xm)|

2mcm

for f ,g ∈ A. We claim that e is a metric inducing the topology of A. Clearly, e iscontinuous, hence by compactness it suffices to show that e is indeed a metric. Thetriangle inequality is trivial. Suppose that e( f ,g) = 0. Then f (xm) = g(xm) for allm ∈ N. Let x ∈ K be arbitrary. By passing to a subsequence we my suppose thatd(xm,x)→ 0, i.e., fn(xm)→ fn(x) for all n ∈ N. By Lemma E.1, it follows thatf (xm)→ f (x) and g(xm)→ g(x). Hence f (x) = g(x).

Recall that a subset A of a topological space X is called relatively compact inX if its closure A is compact. Note that A⊆ C(K) is relatively compact in Cp(K) ifand only if the pointwise closure Ap is compact and contained in C(K).

Theorem E.3 (Eberlein). Let M ⊆ C(K) and f ∈ Mp ∩C(K). Then there is acountable M0 ⊆ M such that f ∈ M0

p. If in addition M is relatively compact inCp(K), then f is the pointwise limit of a sequence in M.

Proof. For n,m ∈ N and x := (x1, . . . ,xn) ∈ Kn take gx ∈ M such that∣∣gx(x j)− f (x j)∣∣< 1/m for j = 1, . . .n. Then

x ∈Ux :=n

∏j=1

[ |gx− f |< 1/m ] ,

which is an open subset of Kn. By compactness, there is a finite set Fn,m ⊆ Kn suchthat Ux : x ∈ Fn,m is a cover of Kn. Then M0 :=

⋃n,mgx : x ∈ Fn,m is countable

and its pointwise closure contains f .

E.2 Grothendieck’s Theorem 347

To prove the second statement, suppose that M is relatively compact in Cp(K).Then M0

p is a separable and compact subset of Cp(K), whence by Theorem E.2 itis metrisable. Consequently, there is a sequence in M0 converging to f .

A subset A of a topological space X is called sequentially closed in X if it con-tains the limit of each sequence ( fn)n that converges in X .

Corollary E.4. If A⊆C(K) is relatively compact and sequentially closed in Cp(K),then it is compact.

E.2 Grothendieck’s Theorem

The next result connects the pointwise topology on C(K) with its weak topologyC(K) as a Banach space. To facilitate argumentation, we shall write (A, p) and (A,w)to denote the pointwise and weak topologies on A⊆ C(K), respectively.

Theorem E.5 (Grothendieck). Let M ⊆ C(K) be norm-bounded. Then the follow-ing statements are equivalent.

(i) M is weakly compact.

(ii) M is compact in Cp(K).

If (i) and (ii) hold, then the two mentioned topologies coincide on M.

Proof. Note first that the identity mapping

id : (C(K),w)→ (C(K), p)

is continuous, and both topologies are Hausdorff. Hence (i) implies (ii) and bothtopologies coincide on M.

Conversely, suppose that M is p-compact in C(K). It suffices to show that id :(M, p)→ (M,w) is continuous. To this end, let A ⊆M be weakly closed in M andf ∈ Ap. Then f ∈M ⊆ C(K) by hypothesis. By Eberlein’s Theorem E.3 there is asequence ( fn)n ⊆ A such that fn→ f pointwise. By the Riesz Representation The-orem 5.7 and dominated convergence, we obtain fn→ f weakly. Since A is weaklyclosed in M, we obtain that f ∈ A. Consequently, A is p-closed.

Corollary E.6. For a norm-bounded set M ⊆ C(K) the following statements areequivalent.

(i) M is relatively weakly compact.

(ii) M is relatively compact in Cp(K).

If (i) and (ii) hold, then Mp= Mw and that the two mentioned topologies coincide

on M.


Proof. Let A :=Mw and B=Mp. Then A⊆B. If (i) holds, then A is weakly compact,hence by Theorem E.5 it is p-compact. This implies that A = B and that the twotopologies coincide on A, hence also on M.

Conversely, if (ii) holds then by Theorem E.5 B is weakly compact. Hence A⊆ B isalso weakly compact, i.e., (i) holds.

E.3 The Theorem of Kreın–Smulian

Grothendieck’s theorem has an enormous range of applications. As an example, wegive a quite simple proof of the following result, see Theorem C.11.

Theorem E.7 (Kreın–Smulian). Let K ⊆ X be a weakly compact subset of a Ba-nach space X. Then conv(K) is weakly compact.

Proof. Let B := x′ ∈ X ′ : ‖x′‖ ≤ 1 be the dual unit ball. Then B is weakly∗ com-pact, by the Banach–Alaoglu theorem. The map

ϕ : (B,w∗)→ Cp(K) x′ 7→ x′∣∣K

is continuous and its image C := ϕ(B) is norm-bounded. By Grothendieck’s the-orem, C is weakly compact and the pointwise topology coincides with the weaktopology on C. It follows that ϕ : (B,w∗)→ (C(K),w) is continuous. For µ ∈M(K)we define f = Φ(µ) by

f (x′) :=∫

Kϕ(x′)dµ (x′ ∈ X ′).

Then f ∈ X ′′ and f∣∣B is weakly∗ continuous. By Proposition E.8 below, f ∈ X .

Hence Φ : M(K)→ X is a well-defined bounded linear mapping, weakly∗-to-weakcontinuous. Since the set M1(K) of probability measures is a weakly∗ compact andconvex set, its image Φ(M1(K)) is weakly compact and convex. But Φ(δx) = x forevery x ∈ K, whence K ⊆Φ(M1(K)). Consequently

conv(K)⊆Φ(M1(K))

and since the latter is weakly compact, the proof is complete.

Our proof of the Kreın–Smulian theorem still rests on an auxiliary result fromthe theory of locally convex vector spaces. For A ⊆ X , where X is a locally convexspace, the polar is defined as

A :=

x′ ∈ X ′ :∣∣⟨x′,x⟩∣∣≤ 1 for all x ∈ A

.

The bipolar theorem [Conway (1990), Theorem V.1.8]—a simple consequence ofthe Hahn–Banach separation theorems [Conway (1990), Cor. VI.3.10]—states that

E.4 Ellis’ Theorem and the Haar Measure on a Compact Group 349

A = absconv(A)

where on the X ′ one takes the weak∗, i.e., σ(X ,X ′)-topology. (The set absconv(A)is the closed absolutely convex hull of A, i.e., the the intersection of all closedabsolutely convex set containing A.) We can now state the result, interesting in itsown right.

Theorem E.8. Let X be a Banach space and f ∈ X ′′ be such that its restriction toB := x′ ∈ X ′ : ‖x′‖ ≤ 1 is weakly∗ continuous. Then f ∈ X.

Proof. For the proof we regard X as a subset of X ′′ and consider the dual pair(X ′′,X ′) of locally convex spaces, each with the topology induced by the other. Letε > 0. Then the assumption implies that there are vectors x1, . . . ,xn ∈ X and δ > 0such that ∥∥x′

∥∥≤ 1,∣∣⟨x′,x j

⟩∣∣≤ δ ( j = 1, . . . ,n) ⇒∣∣ f (x′)∣∣≤ ε. (E.2)

By scaling, we may suppose that δ = 1. Let

U :=

x′ ∈ X ′ :∣∣⟨x′,x j

⟩∣∣≤ 1 for j = 1, . . . ,n=

x1, . . . ,xn

= K,

where K = absconvx1, . . . ,xn. Then (E.2) simply says that (1/ε) f ∈ (U ∩ B).Now,

(U ∩B) = (K∩BX ′′) = (K∪BX ′′)

= absconv(K∪BX ′′)⊆ K +BX ′′

where the closure is in the weak∗ topology on X ′′. But K is compact, and henceK +BX ′′ is already weakly∗ closed. This yields (1/ε) f ∈ K +BX ′′ , and hence thereis x ∈ K ⊆ X such that ‖ f − εx‖ ≤ ε . As ε > 0 was arbitrary and X is complete, itfollows that f ∈ X .

E.4 Ellis’ Theorem and the Haar Measure on a Compact Group

A semigroup S endowed with a topology is called a semitopological semigroup ifthe multiplication mapping

S×S→ S (a,b) 7→ ab

is separately continuous. A group G endowed with a topology is a topologicalgroup if multiplication is jointly continuous and also inversion is continuous. Equiv-alently, the mapping

G×G→ G, (a,b) 7→ ab−1

is continuous.

An important theorem of R. Ellis states that a compact semitopological semi-group that is algebraically a group must be a topological group. Another important


theorem, due to A. Haar, states that on a compact topological group there existsa unique invariant probability measure, called the Haar measure. In this sectionwe shall prove both results, however, in reverse order. We shall first construct theHaar measure (on a compact semitopological semigroup which is algebraically agroup) and afterwards prove Ellis’ theorem. The cornerstones for the proofs areGrothendieck’s theorem and the Kreın–Smulian theorem.

Construction of the Haar Measure

Let G be a compact semitopological semigroup, which is algebraically a group. Wedenote by

La,Ra : C(G)→ C(G), (La f )(x) := f (ax), (Ra f )(x) := f (xa)

the left and the right rotation by a ∈ G. The operators La and Ra are isometric iso-morphisms on C(G). Since the multiplication is separately continuous, for f ∈C(G)the mapping

G→ Cp(G) a 7→ La f

is continuous. Hence, by Grothendieck’s Theorem E.5, the orbitLa f : a ∈ G

is weakly compact. By the Kreın–Smulian Theorem E.7, its closed convex hull

K f := conv

La f : a ∈ G

is weakly compact, too. (The closure here is the same in the weak and in normtopology, by Mazur’s Theorem C.7.) Grothendieck’s theorem, again, implies thatK f is p-compact.

The first step now consists in finding a constant function c f ∈K f . To achieve this,let us define the oscillation

osc(g) := supx,y∈G

|g(x)−g(y)|

of g ∈ C(G). Then g is constant if and only if osc(g) = 0. Note that

g ∈ K f implies Kg ⊆ K f and osc(g)≤ osc( f ).

Now we let s := infg∈K f osc(g). For every n ∈ N, the setg ∈ K f : osc(g)≤ s+ 1

n

is not empty and p-closed, whence also weakly closed, by Grothendieck’s theorem.By compactness, the intersection of these sets is not empty, hence there is g ∈ K f

E.4 Ellis’ Theorem and the Haar Measure on a Compact Group 351

such that osc(g) = s. The following lemma now shows that in case f is real-valued,s = 0, i.e., g is constant.

Lemma E.9. Let g∈C(K;R) and osc(g)> 0. Then there exists h∈Kg with osc(h)<osc(g).

Proof. After scaling and shifting we may suppose that −1≤ g≤ 1 and osc(g) = 2.By compactness, there are a1, . . . ,an ∈ G such that

[g≥ 1/2 ]⊆⋃n

j=1a−1

j [g <−1/2 ] .

Define h := 1n+1 (g+∑

nj=1 La j g) ∈ Kg. Then clearly −1≤ h. If g(x)≥ 1/2 then there

exists at least on a j such that g(a jx)<−1/2. Hence

h(x)≤ 1n+1

(1− 1

2+(n−1)

)=

n− 1/2n+1

.

If g(x)< 1/2, then

h(x)≤ 1n+1

(12+n)=

n+ 1/2n+1

.

It follows that h≤ (n+ 12 )/(n+1) and hence osc(h)< 2 = osc(g).

By decomposing a function f into real and imaginary parts, we conclude thateach set K f contains a constant function c f . Of course, the same argument is validfor right rotations, and hence for f ∈ C(K) there is a constant function

c′f ∈ K′f := conv

Ra f : a ∈ G.

Claim: We have c f = c′f .

Proof. By construction there are sequences Sn ∈ convLa : a ∈ G and Tn ∈convRa : a ∈ G such that Sn f → c f and Tn f → c′f . Hence TmSn f → c f as n→ ∞

andTmSn = SnTm f → c′f as m→ ∞.

But since these convergences are in norm and all the operators are contractions, theclaim follows.

It follows that c f is the unique constant function in K f as well as the uniqueconstant function in K′f . We shall write

c f = P f .

It is clear that P1 = 1 and P f ≥ 0 whenever f ≥ 0. To show that PRa = P for a ∈G,note that since left and right rotations commute, we have

P f = RaP f ∈ Ra(K f ) = KRa f ,

whence P f = PRa f , by uniqueness. Analogously, P f = PLa for every a ∈ G.


It remains to show that P is linear. Clearly P is R-homogeneous, and P f =P(Re f )+ iP(Im f ) by uniqueness. It follows that P is C-homogeneous.

Claim: P( f +g) = P f +Pg.

Proof. Let ε > 0. Then there are S,T ∈ convLa : a ∈ G such that‖S f −P f‖ , ‖T Sg−PSg‖ ≤ ε . Note that T P f = P f and PSg = Pg. Hence

‖T S( f +g)− (P f +Pg)‖ ≤ ‖T S f −P f‖+‖T Sg−Pg‖= ‖T (S f −P f )‖+‖T Sg−PSg‖ ≤ 2ε,

since T is a contraction. It follows that P f + Pg ∈ K f+g, and hence P( f + g) =P f +Pg by uniqueness.

Since P f is a constant function for each f ∈ C(K) we may write

P f = m( f )1 ( f ∈ C(K)).

Then m is a positive, linear functional, invariant under left and right rotations. Wehave proved the major part of the following theorem.

Theorem E.10 (Haar Measure). Let G be a compact semitopological semigroupwhich is algebraically a group. Then there is a unique probability measure m on Gthat is invariant under all left rotations. This measure has the following additionalproperties:

a) m is strictly positive, i.e., supp(m) = G.

b) m is invariant under right rotations.

c) m is invariant under inversion.

Proof. Existence was proved above, as well as the invariance under right rotations.For uniqueness, suppose that µ is a probability measure on G that is invariant underall left rotations. Given f ∈ C(G), it follows that 〈 f ,µ〉= 〈g,µ〉 for all g ∈ K f , andhence

〈 f ,µ〉= 〈P f ,µ〉= m( f )〈1,µ〉= m( f ).

In order to see that m is strictly positive, let 0 ≤ f ∈ C(G) such that f 6= 0. DefineUx :=[Lx f > 0 ] for x ∈ K and note that the sets Ux cover K. By compactness, thereare x1, . . . ,xn ∈ K such that K =

⋃nj=1 Ux j . Hence the function g := ∑

nj=1 Lx j f is

strictly positive, so that there exists c > 0 such that c1≤ g. It follows that

c = m(c1)≤m(g) = ∑nj=1m(Lx j f ) = nm( f ),

which implies that m( f )> 0.Finally, let S : C(G)→ C(G) be the reflection mapping defined by (S f )(x) :=

f (x−1). Since m is right-invariant, the probability measure µ := S′m ∈M(G) is leftinvariant, and hence by uniqueness µ = m. This is assertion c).

The unique probability measure m from Theorem E.10 is called the Haar measureon G.

E.5 Sequential Compactness and the Eberlein–Smulian Theorem 353

Ellis’ Theorem

For a Hilbert space H let

Iso(H) :=

T ∈L (H) : T ∗T = I

and U(H) :=

U ∈L (H) : U∗ =U−1be the semigroup of isometries and the unitary group, respectively.

Lemma E.11. For a Hilbert space H, the strong and the weak operator topologiescoincide on Iso(H). The unitary group U(H) is a topological group with respect tothis topology.

Proof. By an elementary computation one shows that

‖S f −T f‖2 = 2Re(T f −S f |T f ) ( f ∈ H)

whenever S,T ∈ Iso(H). The first claim follows directly from this. By PropositionC.18 the second claim is true.

We are now in a position to prove Ellis’ theorem.

Theorem E.12 (Ellis). Let G be a compact semitopological semigroup. If G isalgebraically a group, it is a compact group.

Proof. By Theorem E.10 we can employ the Haar measure m on G. Let H := L2(G)and let

R : G→L (H), a 7→ Ra

be the right regular representation of G on H. Then R is a homomorphism of G intothe unitary group U(H) on H. Also, R is injective, by Urysohn’s lemma and sincem is strictly positive.

We claim that R is continuous with respect to the weak operator topology on H.To prove this claim, we have to show that for f ,g ∈ H the map

ψ = ψ f ,g : G→ C, ψ(a) := (Ra f |g) =∫

G(Ra f ) ·g dm

is continuous. By denseness of C(K) in H we may suppose that f ∈ C(K). Thenby Grothendieck’s Theorem E.5 the mapping a 7→ La f is continuous for the weaktopology on C(K), so the claim is established.

By Lemma E.11, the weak and the strong operator topologies coincide, and U(H)is a topological group for this latter topology. Since the right regular representationis a homeomorphism onto its image, the theorem is proved.

E.5 Sequential Compactness and the Eberlein–Smulian Theorem

A Hausdorff topological space X is called countably compact if every sequence( fn)n ⊆ X has a cluster point, i.e.,

354 E Theorems of Eberlein, Grothendieck, and Ellis⋂n fk : k ≥ n 6= /0.

And it is called sequentially compact if every sequence has a convergent subse-quence. Furthermore, a subset A of a topological space X is called relatively count-ably compact if every sequence in A has a cluster point in X and relatively se-quentially compact in X if every sequence in A has a subsequence that convergesin X .

It is clear that a (relatively) compact or sequentially compact subset is also (rela-tively) countably compact. However, the converses are false in general. Also, neithernotion—(relative) compactness and (relative) sequential compactness—implies theother. Eberlein’s Theorem E.2 is the key to the important fact, that for subsets ofCp(K) these notions all coincide.

Lemma E.13. Let A⊆Cp(K) be (relatively) compact. Then A is (relatively) sequen-tially compact.

Proof. By hypothesis, B := Ap= Ap ∩C(K) is compact. Let ( fn)n ⊆ A be any se-

quence. Then Y := fn : n ∈ Npis a separable compact set in Cp(K). By Theorem

E.2, Y is metrisable. Consequently, there is f ∈ Y and a subsequence ( fkn)n suchthat fkn → f pointwise.

The next theorem is again due to Grothendieck.

Theorem E.14 (Grothendieck). A subset A ⊆ Cp(K) is relatively compact if andonly if it is relatively countably compact.

Proof. If A is relatively compact, it is relatively countably compact. For the con-verse, suppose that A is relatively countably compact and let B := Ap ⊆ CK . Forx ∈ K the set f (x) : f ∈ A is relatively countably compact in C, hence bounded.By Tychonoff’s theorem, it follows that B is compact, and it remains to show thatB⊆ C(K).

Suppose towards a contradiction that there exists g ∈ B \C(K). Then there isy ∈ K at which g fails to be continuous, so there ist ε > 0 such that y is an accumu-lation point of Z :=[ |g(y)−g| ≥ ε ]. We shall recursively construct points xn ∈ K,functions fn and open neighbourhoods Un of y such that

1) fn ∈ A;

2) | fn(xm)−g(xm)|< ε/2n for m < n;

3) | fn(y)−g(y)|< ε/2n;

4) y ∈Un ⊆Un ⊆Un−1∩[ | fn− fn(y)|< ε/2n ];

5) xn ∈Un∩Z.

Suppose that x j, f j, and U j are constructed for 1 ≤ j < n. (This is a vacuous con-dition for n = 1.) Since g is in the p-closure of A, one can find fn satisfying 1)–3).Since fn is continuous and by the induction hypothesis, Un−1∩[ | fn− fn(y)|< ε/2n ]is an open neighbourhood of y, hence one can find another open neighbourhood

E.5 Sequential Compactness and the Eberlein–Smulian Theorem 355

Un of y satisfying 4). Finally, since y is an accumulation point of Z, one can pickxn ∈Un∩Z, i.e., we have 5).

Now, by relative countable compactness of A in Cp(K), there is an accumulationpoint f ∈ Cp(K) of fn : n≥ 1. Since K is compact, there is also an accumulationpoint x of xm : m≥ 1.

By 4) and 5), xm : m≥ n ⊆Un and hence x ∈Un for each n. It follows that

| fn(x)− fn(y)|< ε/2n (n≥ 1).

Since by 3) also | fn(y)−g(y)| < ε/2n for all n ∈ N, and by choice of f we obtainf (x) = g(y).

On the other hand, from 2) we obtain f (xm)= g(xm) for each m∈N. Since xm ∈ Zwe also have |g(xm)−g(y)| ≥ ε , and this leads to | f (xm)−g(y)| ≥ ε for all m ∈ N.By choice of x we arrive at | f (x)−g(y)| ≥ ε , a contradiction.

Corollary E.15. For a subset A⊆ Cp(K) the following assertions are equivalent.

(i) A is relatively compact in Cp(K).

(ii) A is relatively countably compact in Cp(K).

(iii) A is relatively sequentially compact in Cp(K).

This is just a combination of Lemma E.13 and Theorem E.14.

Corollary E.16. For a subset A⊆ Cp(K) the following assertions are equivalent.

(i) A is compact in Cp(K).

(ii) A is countably compact in Cp(K).

(iii) A is sequentially compact in Cp(K).

Proof. By Lemma E.13 (i) implies (iii), and (iii) trivially implies (ii). Suppose that(ii) holds. Then A is relatively compact by Grothendieck’s Theorem E.14, and se-quentially closed. Hence A is compact by Corollary E.4.

The Theorem of Eberlein–Smulian

For (our) convenience, we state and prove only the restricted version of theEberlein–Smulian theorem as stated already in Appendix C.6.

Theorem E.17 (Eberlein–Smulian). Let X be a Banach space. Then A⊆ X is (rel-atively) weakly compact if and only if A is (relatively) weakly sequentially compact.

In this case, every f ∈ Aw is the weak limit of a sequence in A.

Proof. Let K := x′ ∈ X ′ : ‖x′‖ ≤ 1 with the weak∗ topology inherited from X ′.Then K is compact by the Banach–Alaoglu Theorem. Let Φ : X → C(K) be thecanonical map which maps an element x ∈ X to 〈·,x〉

∣∣K . Then

Φ : (X ,w)→ Cp(K)


is a homeomorphism onto its image. If A is relatively weakly compact, then A isweakly compact, and hence Φ(A) is compact in Cp(K). By Corollary E.16, Φ(A)is sequentially compact in Cp(K). Since Φ is a homeomorphism, A is seqentiallycompact. In particular, A is relatively sequentially compact.

Conversely, suppose that A is relatively weakly sequentially compact. Then Φ(A)is relatively sequentially compact in Φ(X), and a fortiori in Cp(K). By CorollaryE.15 it follows that Φ(A) is relatively compact in Cp(K). By Eberlein’s Theo-rem E.3, every f ∈ Φ(A)

pis the p-limit of a sequence (Φ(an))n∈N for some se-

quence (an)n ⊆ A. By hypothesis, there is a ∈ X such that an → a weakly. Hencef = Φ(a) ∈ Φ(X). This shows that Φ(A)

p ⊆ X , hence Φ(A)p= Φ(A). Since Φ is

a homeomorphism onto its image, A is relatively compact.Finally, suppose that A is sequentially weakly compact. Then Φ(A) is sequen-

tially compact in Cp(K), whence by Corollary E.16, it is compact. Since Φ is ahomeomorphism onto its image, A is weakly compact as well.

Bibliography

M. Akcoglu and L. Sucheston[1972] On operator convergence in Hilbert space and in Lebesgue space, Pe-

riod. Math. Hungar. 2 (1972), 235–244. Collection of articles dedi-cated to the memory of Alfred Renyi, I.

M. A. Akcoglu[1975] A pointwise ergodic theorem in Lp-spaces, Canad. J. Math. 27 (1975),

no. 5, 1075–1082.

A. Baker[1984] A Concise Introduction to the Theory of Numbers, Cambridge Univer-

sity Press, Cambridge, 1984.

H. Bauer[1981] Probability Theory and Elements of Measure Theory, Academic Press

Inc. [Harcourt Brace Jovanovich Publishers], London, 1981. 2nd edi-tion of the translation by R. B. Burckel from the 3rd German edition,Probability and Mathematical Statistics.

[1990] Maß- und Integrationstheorie, de Gruyter Lehrbuch, Walter de Gruyter& Co., Berlin, 1990.

V. Bergelson[2004] Some historical comments and modern questions around the Ergodic

Theorem, Dynamics of Complex Systems, Kyoto, 2004, pp. 1–11.

357

358 Bibliography

P. Billingsley[1979] Probability and Measure, Wiley Series in Probability and Mathemat-

ical Statistics, John Wiley & Sons, New York-Chichester-Brisbane,1979.

G. D. Birkhoff[1912] Quelques theoremes sur le mouvement des systemes dynamiques, S.

M. F. Bull. 40 (1912), 305–323 (French).[1931] Proof of the ergodic theorem, Proc. Nat. Acad. Sci. U. S. A. 17 (1931),

656–660 (English).

G. Birkhoff[1948] Lattice Theory, revised ed., American Mathematical Society Collo-

quium Publications, vol. 25, American Mathematical Society, NewYork, N. Y., 1948.

J. R. Blum and D. L. Hanson[1960] On the mean ergodic theorem for subsequences, Bull. Amer. Math.

Soc. 66 (1960), 308–311.

V. I. Bogachev[2007] Measure Theory. Vol. I, II, Springer-Verlag, Berlin, 2007.

L. Boltzmann[1885] Ueber die Eigenschaften monocyklischer und anderer damit ver-

wandter Systeme, J. Reine Angew. Math. 98 (1885), 68–94 (German).

E. Borel[1909] Les probabilites denombrables et leurs applications arithmetiques,

Palermo Rend. 27 (1909), 247–271 (French).

S. G. Brush[1971] Milestones in mathematical physics. Proof of the impossibility of er-

godic systems: the 1913 papers of Rosenthal and Plancherel, Trans-port Theory Statist. Phys. 1 (1971), no. 4, 287–298.

D. L. Burkholder[1962] Semi-Gaussian subspaces, Trans. Amer. Math. Soc. 104 (1962), 123–

131.

Bibliography 359

R. V. Chacon[1964] A class of linear transformations, Proc. Amer. Math. Soc. 15 (1964),

560–564.[1967] A geometric construction of measure preserving transformations,

Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berke-ley, Calif., 1965/66), Vol. II: Contributions to Probability Theory, Part2, Univ. California Press, Berkeley, Calif., 1967, pp. 335–360.

[1969] Weakly mixing transformations which are not strongly mixing, Proc.Amer. Math. Soc. 22 (1969), 559–562.

J. B. Conway[1990] A Course in Functional Analysis, 2nd ed., Graduate Texts in Mathe-

matics, vol. 96, Springer-Verlag, New York, NY, 1990.

M. M. Day[1957a] Amenable semigroups, Ill. J. Math. 1 (1957), 509–544 (English).[1957b] Amenable semigroups, Illinois J. Math. 1 (1957), 509–544.

K. de Leeuw and I. Glicksberg[1959] Almost periodic compactifications, Bull. Amer. Math. Soc. 65 (1959),

134–139.[1961] Applications of almost periodic compactifications, Acta Math. 105

(1961), 63–97.

A. Deitmar and S. Echterhoff[2009] Principles of Harmonic Analysis, Universitext, Springer, New York,

2009.

M. Denker, C. Grillenberger, and K. Sigmund[1976] Ergodic Theory on Compact Spaces, Lecture Notes in Mathematics,

vol. 527, Springer-Verlag, Berlin, 1976.

R. Derndinger, R. Nagel, and G. Palm[1987] Ergodic Theory in the Perspective of Functional Analysis, unpublished

book manuscript, 1987.

J. Dugundji and Granas[2003] Fixed Point Theory, Springer Monographs in Mathematics, Springer-

Verlag, New York, 2003.

360 Bibliography

N. Dunford and J. T. Schwartz[1958] Linear Operators. I. General Theory, Pure and Applied Mathematics,

vol. 7, Interscience Publishers, Inc., New York, 1958. With the assis-tance of W. G. Bade and R. G. Bartle.

P. Ehrenfest and T. Ehrenfest[1912] Begriffliche Grundlagen der statistischen Auffassung in der Mechanik,

Encykl. d. Math. Wissensch. IV 2 II, Heft 6, 1912 (German).

M. Einsiedler and T. Ward[2011] Ergodic Theory with a View Towards Number Theory, Graduate Texts

in Mathematics, vol. 259, Springer-Verlag, London, 2011.

T. Eisner[2010] Stability of Operators and Operator Semigroups, Operator Theory:

Advances and Applications, vol. 209, Birkhauser Verlag, Basel, 2010.

R. Ellis[1957] Locally compact transformation groups, Duke Math. J. 24 (1957),

119–125.

K.-J. Engel and R. Nagel[2000] One-Parameter Semigroups for Linear Evolution Equations, Graduate

Texts in Mathematics, vol. 194, Springer-Verlag, Berlin, 2000.

S. N. Ethier and T. G. Kurtz[1986] Markov Processes, Wiley Series in Probability and Mathematical

Statistics: Probability and Mathematical Statistics, John Wiley & SonsInc., New York, 1986.

G. B. Folland[1995] A Course in Abstract Harmonic Analysis, Studies in Advanced Math-

ematics, CRC Press, Boca Raton, FL, 1995.[1999] Real Analysis, 2nd ed., Pure and Applied Mathematics (New York),

John Wiley & Sons Inc., New York, 1999. A Wiley-Interscience Pub-lication.

H. Furstenberg[1961] Strict ergodicity and transformation of the torus, Amer. J. Math. 83

(1961), 573–601.

Bibliography 361

[1981] Recurrence in Ergodic Theory and Combinatorial Number Theory,Princeton University Press, Princeton, N.J., 1981. M. B. Porter Lec-tures.

[1977] Ergodic behavior of diagonal measures and a theorem of Szemeredion arithmetic progressions, J. Analyse Math. 31 (1977), 204–256.

H. Furstenberg and B. Weiss

[1996] A mean ergodic theorem for (1/N)∑Nn=1 f (T nx)g(T n2

x), Convergencein ergodic theory and probability (Columbus, OH, 1993), Ohio StateUniv. Math. Res. Inst. Publ., vol. 5, de Gruyter, Berlin, 1996, pp. 193–227.

G. Gallavotti[1975] Ergodicity, ensembles, irreversibility in Boltzmann and beyond, J.

Statist. Phys. 78 (1975), 1571–1589.

A. M. Garsia[1965] A simple proof of E. Hopf’s maximal ergodic theorem, J. Math. Mech.

14 (1965), 381–382.[1970] Topics in Almost Everywhere Convergence, Lectures in Advanced

Mathematics, vol. 4, Markham Publishing Co., Chicago, Ill., 1970.

E. Glasner[2003] Ergodic Theory via Joinings, Mathematical Surveys and Monographs,

vol. 101, American Mathematical Society, Providence, RI, 2003.

W. H. Gottschalk and G. A. Hedlund[1955] Topological Dynamics, American Mathematical Society Colloquium

Publications, Vol. 36, American Mathematical Society, Providence, R.I., 1955.

B. Green[2008] Lectures on Ergodic Theory, Part III,

http://www.dpmms.cam.ac.uk/ bjg23/ergodic-theory.html, 2008.

B. Green and T. Tao[2008] The primes contain arbitrarily long arithmetic progressions, Ann. of

Math. (2) 167 (2008), no. 2, 481–547.

362 Bibliography

P. R. Halmos[1944] In general a measure preserving transformation is mixing, Ann. of

Math. (2) 45 (1944), 786–792.[1950] Measure Theory, D. Van Nostrand Company, Inc., 1950.[1956] Lectures on Ergodic Theory, Publications of the Mathematical Society

of Japan, no. 3, The Mathematical Society of Japan, 1956.

P. R. Halmos and J. von Neumann[1942] Operator methods in classical mechanics. II, Ann. of Math. (2) 43

(1942), 332–350.

E. Hewitt and K. A. Ross[1979] Abstract Harmonic Analysis. Vol. I, 2nd ed., Grundlehren der Math-

ematischen Wissenschaften [Fundamental Principles of MathematicalSciences], vol. 115, Springer-Verlag, Berlin, 1979.

E. Hewitt and K. Stromberg[1969] Real and Abstract Analysis. A Modern Treatment of the Theory of

Functions of a Real Variable, 2nd printing corrected, Springer-Verlag,New York, 1969.

N. Hindman and D. Strauss[1998] Algebra in the Stone-Cech Compactification, de Gruyter Expositions

in Mathematics, vol. 27, Walter de Gruyter & Co., Berlin, 1998.

E. Hlawka[1979] Theorie der Gleichverteilung, Bibliographisches Institut, Mannheim,

1979.

K. H. Hofmann and S. A. Morris[2006] The Structure of Compact Groups, augmented ed., de Gruyter Studies

in Mathematics, vol. 25, Walter de Gruyter & Co., Berlin, 2006.

K. H. Hofmann and P. S. Mostert[1966] Elements of Compact Semigroups, Charles E. Merrill Books Inc,

Columbus, OH, 1966.

E. Hopf[1954] The general temporally discrete Markoff process, J. Rational Mech.

Anal. 3 (1954), 13–45.

Bibliography 363

B. Host and B. Kra[2005a] Convergence of polynomial ergodic averages, Israel J. Math. 149

(2005), 1–19.[2005b] Nonconventional ergodic averages and nilmanifolds, Ann. of Math.

(2) 161 (2005), no. 1, 397–488.

A. Ionescu Tulcea[1964] Ergodic properties of isometries in Lp spaces, 1 < p < ∞, Bull. Amer.

Math. Soc. 70 (1964), 366–371.[1965] On the category of certain classes of transformations in ergodic the-

ory, Trans. Amer. Math. Soc. 114 (1965), 261–279.

K. Jacobs[1956] Ergodentheorie und fastperiodische Funktionen auf Halbgruppen,

Math. Z. 64 (1956), 298–338.

R. I. Jewett[1970] The prevalence of uniquely ergodic systems, J. Math. Mech. 19

(1969/1970), 717–729.

L. K. Jones and V. Kuftinec[1971] A note on the Blum–Hanson theorem, Proc. Amer. Math. Soc. 30

(1971), 202–203.

L. K. Jones and M. Lin[1976] Ergodic theorems of weak mixing type, Proc. Amer. Math. Soc. 57

(1976), no. 1, 50–52.

S. Kakutani[1938] Iteration of linear operations in complex Banach spaces, Proc. Imp.

Acad. 14 (1938), no. 8, 295–300.[1943] Induced measure preserving transformations, Proc. Imp. Acad. Tokyo

19 (1943), 635–641.

S. Kalikow and R. McCutcheon[2010] An Outline of Ergodic Theory, Cambridge Studies in Advanced Math-

ematics, vol. 122, Cambridge University Press, Cambridge, 2010.

364 Bibliography

A. Katok and B. Hasselblatt[1995] Introduction to the Modern Theory of Dynamical Systems, Encyclope-

dia of Mathematics and its Applications, vol. 54, Cambridge Univer-sity Press, Cambridge, 1995. With a supplementary chapter by Katokand Leonardo Mendoza.

A. S. Kechris[1995] Classical Descriptive Set Theory, Graduate Texts in Mathematics, vol.

156, Springer-Verlag, New York, 1995.

J. L. Kelley[1975] General Topology, Graduate Texts in Mathematics, Springer-Verlag,

New York, 1975. Reprint of the 1955 edition [Van Nostrand, Toronto,Ont.].

M. Kern, R. Nagel, and G. Palm[1977] Dilations of positive operators: construction and ergodic theory,

Math. Z. 156 (1977), no. 3, 265–277.

A. Kolmogoroff[1933] Grundbegriffe der Wahrscheinlichkeitsrechnung, Ergebnisse der Math.

2, Nr. 3, IV + 62 S , 1933 (German).

A. N. Kolmogoroff[1925] Sur les fonctions harmoniques conjugees et les series de Fourier,

Fund. Math. 7 (1925), 23–28.

B. Koopman and J. von Neumann[1932] Dynamical systems of continuous spectra, Proc. Natl. Acad. Sci. USA

18 (1932), 255–263 (English).

B. Kra[2006] The Green-Tao theorem on arithmetic progressions in the primes: an

ergodic point of view, Bull. Amer. Math. Soc. (N.S.) 43 (2006), no. 1,3–23 (electronic).

[2007] Ergodic methods in additive combinatorics, Additive combinatorics,CRM Proc. Lecture Notes, vol. 43, Amer. Math. Soc., Providence, RI,2007, pp. 103–143.

Bibliography 365

U. Krengel[1985] Ergodic Theorems, de Gruyter Studies in Mathematics, vol. 6, Walter

de Gruyter & Co., Berlin, 1985. With a supplement by Antoine Brunel.

W. Krieger[1972] On unique ergodicity, Proceedings of the Sixth Berkeley Symposium

on Mathematical Statistics and Probability (Univ. California, Berkeley,Calif., 1970/1971), Vol. II: Probability theory (Berkeley, Calif.), Univ.California Press, 1972, pp. 327–346.

L. Kuipers and H. Niederreiter[1974] Uniform Distribution of Sequences, Pure and Applied Mathematics,

Wiley-Interscience [John Wiley & Sons], New York, 1974.

K. Kuratowski[1966] Topology. Vol. I, Academic Press Inc., New York, 1966. New edition,

revised and augmented. Translated from the French by J. Jaworowski.

S. Lang[1993] Real and Functional Analysis, 3rd ed., Graduate Texts in Mathematics,

vol. 142, Springer-Verlag, New York, 1993.[2005] Algebra, 3rd ed., Graduate Text in Mathematics, vol. 211, Springer-

Verlag, 2005.

E. R. Lorch[1939] Means of iterated transformations in reflexive vector spaces, Bull.

Amer. Math. Soc. 45 (1939), 945–947.

M. Mathieu[1988] On the origin of the notion ’ergodic theory’, Expo. Math. 6 (1988),

373–377.

R. E. Megginson[1998] An Introduction to Banach Space Theory, Graduate Texts in Mathe-

matics, vol. 183, Springer-Verlag, New York, NY, 1998.

V. Muller and Y. Tomilov[2007] Quasisimilarity of power bounded operators and Blum–Hanson prop-

erty, J. Funct. Anal. 246 (2007), no. 2, 385–399.

366 Bibliography

R. Nagel and G. Palm[1982] Lattice dilations of positive contractions on Lp-spaces, Canad. Math.

Bull. 25 (1982), no. 3, 371–374.

R. J. Nagel[1973] Mittelergodische Halbgruppen linearer Operatoren, Ann. Inst. Fourier

(Grenoble) 23 (1973), no. 4, 75–87.

R. J. Nagel and M. Wolff[1972] Abstract dynamical systems with an application to operators with dis-

crete spectrum, Arch. Math. (Basel) 23 (1972), 170–176.

I. Namioka[1974] Separate continuity and joint continuity, Pacific J. Math. 51 (1974),

515–531.

G. Palm[1976] Entropie und Erzeuger in dynamischen Verbanden, Z. Wahrschein-

lichkeitstheorie und Verw. Gebiete 36 (1976), no. 1, 27–45.

A. L. Paterson[1988] Amenability, Mathematical Surveys and Monographs, vol. 29, Ameri-

can Mathematical Society, Providence, RI, 1988.

K. Petersen[1989] Ergodic Theory, Cambridge Studies in Advanced Mathematics, vol. 2,

Cambridge University Press, Cambridge, 1989. Corrected reprint ofthe 1983 original.

R. R. Phelps[1966] Lectures on Choquet’s Theorem, D. Van Nostrand Co., Inc., Princeton,

N.J.-Toronto, Ont.-London, 1966.

M. Pollicott and M. Yuri[1998] Dynamical Systems and Ergodic Theory, London Mathematical Soci-

ety Student Texts, vol. 40, Cambridge University Press, Cambridge,1998.

Bibliography 367

R. A. Raimi[1964] Minimal sets and ergodic measures in βN−N, Bull. Amer. Math. Soc.

70 (1964), 711–712.

I. K. Rana[2002] An Introduction to Measure and Integration, 2nd ed., Graduate Studies

in Mathematics, vol. 45, American Mathematical Society, Providence,RI, 2002.

M. Redei and C. Werndl[2012] On the history of the isomorphism problem of dynamical systems with

special regard to von Neumann’s contribution, Archive for History ofExact Sciences 66 (2012), no. 1, 71–93.

M. Reed and B. Simon[1972] Methods of Modern Mathematical Physics. I: Functional Analysis,

Academic Press Inc., New York, San Francisco, London, 1972.

V. Rohlin[1948] A “general” measure-preserving transformation is not mixing, Dok-

lady Akad. Nauk SSSR (N.S.) 60 (1948), 349–351.

A. Rosenthal[1988] Strictly ergodic models for noninvertible transformations, Israel J.

Math. 64 (1988), no. 1, 57–72.

H. L. Royden[1963] Real Analysis, The Macmillan Co., New York, 1963.

W. Rudin[1987] Real and Complex Analysis, 3rd ed., McGraw-Hill Book Co., New

York, NY, 1987.[1990] Fourier Analysis on Groups, paperback ed., John Wiley & Sons., New

York etc., 1990. Wiley Classics Library; A Wiley-Interscience Publi-cation.

[1991] Functional Analysis, 2nd ed., International Series in Pure and AppliedMathematics, McGraw-Hill Book Co., New York, NY, 1991.

368 Bibliography

C. Ryll-Nardzewski[1967] On fixed points of semigroups of endomorphisms of linear spaces,

Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berke-ley, Calif., 1965/66), Vol. II: Contributions to Probability Theory, Part1, Univ. California Press, Berkeley, Calif., 1967, pp. 55–61.

H. H. Schaefer[1974] Banach Lattices and Positive Operators, Die Grundlehren der math-

ematischen Wissenschaften [Fundamental Principles of Mathemati-cal Sciences], vol. 215, Springer-Verlag, Berlin-Heidelberg-New York,1974.

[1980] Topological Vector Spaces, 4th corrected printing, Graduate Texts inMathematics, vol. 3, Springer-Verlag, New York-Heidelberg-Berlin,1980.

C. E. Silva[2008] Invitation to Ergodic Theory, Student Mathematical Library, vol. 42,

American Mathematical Society, Providence, RI, 2008.

E. M. Stein[1961a] On limits of seqences of operators, Ann. of Math. (2) 74 (1961), 140–

170.[1961b] On the maximal ergodic theorem, Proc. Nat. Acad. Sci. U.S.A. 47

(1961), 1894–1897.[1993] Harmonic Analysis: Real-Variable Methods, Orthogonality, and Os-

cillatory Integrals, Princeton Mathematical Series, vol. 43, PrincetonUniversity Press, Princeton, NJ, 1993. With the assistance of TimothyS. Murphy.

E. Szemeredi[1975] On sets of integers containing no k elements in arithmetic progression,

Acta Arith. 27 (1975), 199–245. Collection of articles in memory ofJuriı Vladimirovic Linnik.

T. Tao[2007] The dichotomy between structure and randomness, arithmetic progres-

sions, and the primes, International Congress of Mathematicians. Vol.I, Eur. Math. Soc., Zurich, 2007, pp. 581–608.

[2008a] Topics in Ergodic Theory,http://terrytao.wordpress.com/category/254a-ergodic-theory/, 2008.

Bibliography 369

[2008b] The van der Corput trick, and equidistribution on nilmanifolds, Topicsin Ergodic Theory, 2008.http://terrytao.wordpress.com/2008/06/14/the-van-der-corputs-trick-and-equidistribution-on-nilmanifolds.

S. Todorcevic[1997] Topics in Topology, Lecture Notes in Mathematics, vol. 1652,

Springer-Verlag, Berlin, 1997.

B. L. van der Waerden[1927] Beweis einer Baudetschen Vermutung, Nieuw Archief 15 (1927), 212–

216 (German).

H. Vogt[2010] An Eberlein-Smulian type result for the weak∗ topology, Arch. Math.

(Basel) 95 (2010), no. 1, 31–34.

J. von Neumann[1932a] Einige Satze uber messbare Abbildungen, Ann. of Math. (2) 33 (1932),

no. 3, 574–586.[1932b] Proof of the quasi-ergodic hypothesis, Proc. Nat. Acad. Sci. USA 18

(1932), 70–82 (English).

P. Walters[1982] An Introduction to Ergodic Theory, Graduate Texts in Mathematics,

vol. 79, Springer-Verlag, New York, 1982.

H. Weyl[1916] Uber die Gleichverteilung von Zahlen mod. Eins, Math. Ann. 77

(1916), no. 3, 313–352.

N. Wiener[1939] The ergodic theorem, Duke Math. J. 5 (1939), no. 1, 1–18.

E. Wigner[1960] The unreasonable effectiveness of mathematics in the natural sciences,

Comm. Pure Appl. Math. 13 (1960), 1–14.

S. Willard[2004] General Topology, Dover Publications Inc., Mineola, NY, 2004.

Reprint of the 1970 original [Addison-Wesley, Reading, MA].

370 Bibliography

K. Yosida[1938] Mean ergodic theorem in Banach spaces, Proc. Imp. Acad. 14 (1938),

no. 8, 292–294.

K. Yosida and S. Kakutani[1939] Birkhoff’s ergodic theorem and the maximal ergodic theorem, Proc.

Imp. Acad., Tokyo 15 (1939), 165–168.

N. Young[1988] An introduction to Hilbert space, Cambridge Mathematical Textbooks,

Cambridge University Press, Cambridge, 1988.

Index

absolute value, see modulusaccumulation point, 296algebra, 320

C∗-algebra, 321abstract measure algebra, 179Boole algebra, 179disc algebra, 57homomorphism, 321isomorphism, 321isomorphism of MDSs, 180isomorphism of measure spaces, 180measure algebra, 75of sets, 306

almost everywhere, 313almost periodic

sequence, 49strongly, 236weakly, 236

almost stable subspace, 134almost weakly stable, 134, 239

semigroup, 241alphabet, 8amenable semigroup, 215, 238arithmetic progression, 275asymptotic density, 132automorphism

of a TDS, 12

Baire algebra, 66Baire measure, 66Baire space, 302baker’s transformation, 61, 178Banach algebra, 320Banach density, 286Banach lattice, 95Banach limit, 161Banach space, 319

Bernoulli shift, 64bi-dual

group, 223of a normed space, 324

bi-invariant set (in a TDS), 15bi-Markov operator, 193block, 16Bohr-compactification, 224Borel algebra, 66, 306Borel measure, 66bounded gaps, 30bp-covergence, 334

Cantor set, 8, 301Cantor system, 8Cech–Stone compactification, 49ceiling, 86Cesaro mean, 108Cesaro-bounded, 117character

group, 219of T, 48of a group, 257of an Abelian group, 219orthogonality, 220

character group, 219characteristic function, 305clopen set, 301closed absolutely convex hull, 349cluster point, 296

of a net, 304compact

countably, 353relatively countably compact, 354relatively sequentially compact, 354relatively weakly compact, 326relatively weakly sequentially compact, 326

371

372 Index

sequentially, 354compact group, 9, 217compact space, 300compactification

Bohr, 224one-point, 49, 301two-point, 18

complement (in a lattice), 179complete

contraction, 119isomorphism invariant, 266lattice, 92metric space, 294order complete Banach lattice, 97relatively complete sub-σ -algebra, 201

conditional expectation, 112, 199operator, 202

conditional probability, 81congruence (TDS), 12conjugate exponent, 125conjugation

invariant, 40of elements (in a vector lattice), 95

convergence in density, 132convolution on L1(G), 254coordinate function, 248, 253cylinder set, 311, 312

d-torus, 13, 269decomposition

Jacobs–de Leeuw–Glicksberg, 239mean ergodic, 109, 239

densityasymptotic, 132Banach density, 286convergence in density, 132lower, 143upper, 143

Diophantine approximation, 33Dirac functional, 43directed set, 303disconnected

extremally, 302totally, 301

discrete spectrumdyadic integers, 270dynamical system, 262operator, 261rotation on torus, 269

disjoint union of TDSs, 16doubling map, 24, 62, 176

TDS, 24dual group, see character groupdual of a normed space, 323

Dunford–Schwartz operator, 119dyadic adding machine, 11dyadic integers, 11, 270dynamical system, 2

measure-preserving, 61topological, 7topological MDS, 183with discrete spectrum, 262

Dynkin system, 307

eigenvalueof an operator, 330peripheral, 47

equidistributed sequence, 157Ergode, 1Ergodenhypothese, 107ergodic

group rotation, 154hypothesis, 107Markov shift, 84, 115MDS, 82MDS (characterisation), 100measure, 149rotation on torus, 101strongly, 136uniquely, 150

ergodic hypothesis, 4ergodic theorem

Akcoglu’s, 165dominated, 174individual, 163maximal, 168Mean ergodic theorem, 110pointwise, 163, 164stochastic, 165von Neumann’s mean ergodic theorem, 108

ergodisches System, 1essential inverse of MDSs, 76essentially

equal sets, 75invertible (MDS), 76

excluded block, 17extremally disconnected, 302extreme point, 327

factor, 205bi-invariant, 204Kronecker, 240Markov operator, 197model, 205of an MDS, 204

finite intersection property, 299first category, 302fixed space, 47, 100, 108

Index 373

forward (topologically) transitive, 18full subalgebra, 185functional

Dirac, 43evaluation, 43linear, 323order continuous, 187positive, 336

Furstenberg Correspondence, 278, 284

Gauss map, 63Gδ -set, 302Gelfand map, 51Gelfand space, 49, 50generating element, 263generator of a σ -algebra, 306group, 212

Abelian, 211bi-dual, 223character, 219compact, 9, 217dual, 219dyadic integers, 11Heisenberg, 10inverse element, 212left rotation, 9locally compact group, 217monothetic, 263representation, 245topological, 9, 213unitary, 353

group extension of a TDS, 15group representation, 245

finite dimensional, 252left regular, 254strongly continuous, 245unitary, 248weakly continuous, 245

group rotation, 9ergodic, 154MDS, 70minimal, 154TDS, 9

group translation TDS, 9

Haar measure, 70, 218, 352Heisenberg group, 10Hilbert space, 330homeomorphism, 296homogeneous system, 13homomorphism

of algebras, 321of Banach lattices, 96of lattices, 93

of MDSs, 179of measure algebras, 179of TDSs, 11point homomorphism of MDSs, 175

idealin a Banach lattice, 98in a semigroup, 212in an algebra, 321maximal, 321principal, 321proper, 321

idempotent, 211induced transformation, 85infimum in a lattice, 92infinite product probability space, 311infinitely recurrent set, 78integral, 309intertwining, 182, 252invariant measure, 60invariant set

in a TDS, 15in an MDS, 82

inverseessential (MDS), 76in a Banach algebra, 320

invertibleMDS, 76TDS, 7

involution, 321irrational rotation, 20irreducible

matrix, 21positive operator, 99row-stochastic matrix, 114unitary representation, 248, 252

isolated point, 296isometric TDS, 29isomorphism∗-isomorphism, 321algebra isomorphism of measure spaces, 180Markov, 181of algebras, 321of lattices, 93of MDSs, 180of TDSs, 12of topological groups, 222point isomorphism of MDSs, 176point isomorphism of measure spaces, 180spectral, 269

isomorphism invariant, 266isomorphism problem, 269iterate MDS, 128

374 Index

Jacobs–de Leeuw–Glicksberg decomp., 239reversible part, 239stable part, 239

JdLG-decomposition, see Jacobs–deLeeuw–Glicksberg decomposition

Koopman operator, 37, 45, 91Kronecker factor, 240

lattice, 92Banach lattice, 95distributive, 179

law of large numbersstrong, 171weak, 113

left ideal (in a semigroup), 212left rotation, 9left rotation operator, 48left shift, 8Lemma

Fekete, 56Hopf, 174Kakutani–Rokhlin, 85Koopman–von Neumann, 133Pythagoras’, 331Urysohn, 38van der Corput, 139Weil, 263

letters, 8limit

of a net, 303projective, 11

linear span, 320locally compact group, 217locally compact space, 301locally convex space, 327

Markovembedding, 181, 196embedding of MDSs, 182isomorphism, 181measure, 64operator, 149, 161, 193shift, 65

Markov operator, 180automorphism, 199embedding, 196factor map, 197isomorphism, 198projection, 199trivial, 194

Markov shift, 65transition matrix, 65

matrix

aperiodic, 130irreducible, 21, 114primitive, 130row-stochastic, 114row-substochastic, 8strictly positive, 114transition, 21, 65

maximal ideal, 321in an algebra, 44

maximal inequality, 166maximal operator, 166MDS, 61

baker’s transformation, 61discrete spectrum, 262doubling map, 62ergodic, 82factor, 204faithful topological, 183Gauss map, 63group rotation, 70homogeneous system, 71invariant set, 82invertible, 76iterate, 128metric, 183product, 65right rotation system, 70skew product, 65spectrally isomorphic, 269tent map, 62topological, 183

mean ergodicoperator, 109, 116projection, 109, 203subalgebra, 189

measurablecylinder set, 311mapping, 308rectangle, 316set, 307

measurable space, 60, 306measure, 307

Baire, 66Borel, 66complex, 316ergodic, 149Haar, 70, 218, 352invariant, 60inversion invariant, 70Markov, 64outer, 308probability, 307product, 311push-forward, 60, 310

Index 375

regular, 66regular Borel, 66right-invariant, 70signed, 316strictly positive, 69, 151support, 68total variation, 316

measure algebra, 49, 75measure space, 307measure-preserving dyn. system, see MDSmeasure-preserving map, 60metric, 293

discrete, 294model, 185space, 293

metrisabilityof weak topology, 325of weak∗ topology, 324of weakly compact sets, 326

minimalgroup rotation, 154subsystem, 28TDS, 28

minimal ideal (in a semigroup), 212minimal idempotent, 212mixing

Bernoulli shift, 126Markov shift, 129strongly, 126weakly, 132weakly of all orders, 138, 142weakly of order k, 138

modelof a factor, 205Stone, 186topological, 183topological model of a factor, 205

modulusin a lattice, 95of a functional, 338

Morse sequence, 36multiplicative (linear map), 321

net, 303Neumann series, 52norm

order continuous, 97strictly monotone, 97

normalnumber, 174simply normal number, 170

nowhere dense, 302null set, 313

one-dimensional torus, 9one-sided shift, 8operator

adjoint, 324bi-Markov, 193contraction, 322Dunford–Schwartz, 119intertwining (representations), 252isometry, 207, 322Koopman, 37, 45, 91left rotation, 48Markov, 149, 161, 180, 193maximal, 166mean ergodic, 109, 116pointwise ergodic, 164positive, 96, 336positive semi-definite, 208power-bounded, 329projection, 322with discrete spectrum, 261

operator semigroupweakly compact, 232, 234

orbit, 17forward, 17total, 17

order continuousfunctional, 187norm, 97

order interval, 97order of a subshift, 17orthogonal projection, 207oscillation, 350

partition of unity, 337peripheral point spectrum, 47Poincare sequence, 285point

almost periodic, 30generic, 173infinitely recurrent, 30periodic , 30recurrent, 30

point factor map, 175point spectrum

of an operator, 330peripheral, 47

pointwise topology (on the dual group), 221polar (of a set), 348polarisation identity, 330positive functional, 336positive operator, 96

Holder inequality, 103irreducible, 99reducible, 99

376 Index

positive semi-definite operator, 208power-bounded, 119principal ideal, 321Principle

Banach, 166Furstenberg Correspondence, 278, 284of determinism, 2of uniform boundedness, 322pigeonhole, 36

probabilitymeasure, 307

probability space, 307product

σ -algebra, 310measure, 311of MDSs, 65of TDSs, 13of TDSs (infinite), 14of TDSs (skew), 14topology, 298

projectionmean ergodic, 109, 203orthogonal, 207

quotientmapping, 297space, 320

random literature, 89rational rotation, 20recurrent set, 78reducible positive operator, 99reducing subspace, 252reflexive space, 324relatively compact, 300relatively complete sub-σ -algebra, 201relatively dense, see bounded gaps, 143, 282representation

Gelfand, 50of a group, 245Stone, 186Stone representation space, 186

return timeexpected, 84of almost periodic points, 30time of first return, 81

reversible part, 239right ideal (in a semigroup), 212Rokhlin tower, 86rotation, 9

irrational, 20rational, 20TDS, 9torus, 9, 269

row-stochastic matrix, 114irreducible, 114

row-substochastic matrix, 8

σ -additive, 305σ -algebra, 306

invariant, 112second category, 302semigroup, 211

JdLG-admissible, 237Abelian, 211amenable, 238compact left-topological, 213generated by a single operator, 236ideal, 212idempotent element, 211left ideal, 212left-topological, 213right ideal, 212right-topological, 213semitopological, 213, 349subsemigroup, 212weakly compact, 232

separablemetric space, 320topological space, 295

separately continuous, 349separating, 299

subalgebra, 40subset, 118

sequencealmost periodic, 49equidistributed, 157Morse, 36Poincare, 285

shift, 8Bernoulli, 64, 176, 177, 271left, 8left (operator), 123Markov, 65one-sided, 8right (operator), 123TDS, 8two-sided, 8two-sided Bernoulli, 178

skew productcocyle, 15of MDSs, 65of TDSs, 14

skew rotation, 15, 156spectral radius

formula, 54in unital algebras, 51of an operator, 329

Index 377

spectrally isomorphic, 269spectrum

in unital algebras, 51of an operator, 329

stable∧ , ∨ -stable, 92∩, ∪, \-stable, 305

stable set in a TDS, 15step function, 315Stone representation, 186strictly convex Banach space, 238strictly ergodic TDS, 152strictly positive measure, 69, 151strongly

ergodic, 136mixing, 126

strongly compact operator group, 245subalgebra, 40

conjugation invariant, 40full, 185in C(K), 40mean ergodic, 189

sublattice, 92subsemigroup, 212subshift, 16

of finite type, 17subsystem

maximal surjective, 16, 162minimal, 28of a TDS, 15

support of a measure, 68, 151supremum in a lattice, 92Sushkevich kernel, 212symmetric set (in a group), 217syndetic, see bounded gaps, 282

TDS, 7automorphism, 12bi-invariant set, 15conjugation, see isomorphism of TDSsdiscrete spectrum, 262disjoint union, 16doubling map, 24dyadic adding machine, 11extension, 11factor, 11factor map, 11homogeneous, 13homomorphism, 11invariant set, 15invertible, 7isometric, 29isomorphism, 12minimal, 28

skew rotation, 15strictly ergodic, 152subshift, 16subsystem, 15surjective, 7tent map, 24uniquely ergodic, 150

tent map, 24, 62, 177MDS, 62TDS, 24

TheoremAkcoglu’s ergodic theorem, 165Banach’s principle, 166Banach–Alaoglu, 324Banach–Steinhaus, 322Bipolar theorem, 348Birkhoff’s pointwise ergodic theorem, 163Birkhoff’s recurrence theorem, 32Blum–Hanson, 130Borel, 170Caratheodory, 308Dirichlet, 36Doinated convergence theorem, 314Dominated ergodic theorem, 174Dynkin, 307Eberlein, 346Eberlein–Smulian, 325, 355Ellis (minimal idempotents), 213Ellis (topological groups), 215, 353Existence of Haar measure, 218, 352Fubini, 316Furstenberg, 156Furstenberg–Sarkozy, 283Furstenberg–Weiss, 285Gelfand–Mazur, 53Goldstine, 325Green–Tao, 4, 276Grothendieck, 347, 354Hahn, 308Hahn–Banach, 323Halmos–von Neumann, 268Heine–Borel, 301Host–Kra, 279Inverse Mapping, 323Ionescu Tulcea, 312Jewett–Krieger, 190Jones–Lin, 134Kac, 84Kolmogorov’s law of large numbers, 171Kreın–Smulian, 326, 348Kreın–Milman, 327Kronecker, 20, 226Krylov–Bogoljubov, 69, 148Markov–Kakutani, 148

378 Index

Maximal ergodic theorem, 168Maximal inequality, 168Mazur, 325Mean ergodic theorem for squares, 284Milman, 327Monotone convergence theorem, 310Perron, 72, 114Peter–Weyl, 257Poincare recurrence theorem, 78Pointwise ergodic theorem, 164Pontryagin duality theorem, 224Principle of Uniform Boundedness, 322Riesz’ representation theorem, 68Riesz–Frechet, 331Riesz–Markov, 340Roth, 283Spectral theorem, 332Stochastic ergodic theorem, 165Stone–Weierstraß, 40Szemeredi, 5, 276Tietze, 39Tonelli, 310Tychonov, 300Uniqueness theorem for measures, 307Van der Waerden, 275von Neumann (isomorphism of MDSs), 180von Neumann’s mean ergodic theorem, 108Weyl, 158

time mean, 3topological dynamical system, see TDS

backward, 7forward, 7

topological group, 9, 213topological MDS, 183topological model, 183topological semigroup, 213topological space, 294

connected, 301disconnected, 301metrisabile, 294normal, 38Polish, 303separable, 42, 295

topological vector space, 326topologically transitive, 18topology, 294

compact, 300compact-open (on the dual group), 221discrete, 294Hausdorff, 295inductive, 297pointwise, 345pointwise (on the dual group), 221product, 298

projective, 297quotient, 297strong operator, 231, 327subspace, 297weak, 325weak operator, 231, 328weak∗, 324

torus, 9d-torus, 13, 269monothetic, 269one-dimensional, 9rotation, 269

total orbit, 17totally disconnected, 271, 301transition matrix, 21

of a Markov shift, 65translation, see rotationtranslation mod 1, 9translation system, 9trigonometric polynomial, 221two-sided shift, 8

uniformly continuous, 217, 296uniquely ergodic TDS, 150unit element, 320unitary

group, 353representation, 248system, 248system for a semigroup, 250

unitary group, 233unitary representation, 252

equivalent, 257irreducible, 252

upper Banach density, 286

vector latticeimaginary part, 95positive cone, 94, 95positive element, 94real, 94real part, 95

weakly compactoperator group, 245operator semigroup, 232

weakly mixing, 132of all orders, 138, 142of order k, 138

weakly stable, 128words, 8

zero element (in a semigroup), 211

Symbol Index

379

operator theoretic aspects of ergodic theoryhaase/files/efhn-july2012.pdf · ergodic theory has its...

Documents