dynamics and numbers kristensen.pdf · 1 the birkhoﬀ ergodic theorem we will describe the...

Dynamics and numbersLecture notes for the workshop

Analysis and Dynamics in Number Theory

Simon Kristensen

December 13, 2018

Contents

Contents i

Preface ii

1 The Birkhoff ergodic theorem 11.1 From abstract dynamics to ergodic theory . . . . . . . . . . 11.2 Ergodic theorems . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Markov shifts and the base b-map 122.1 Base b-maps, coin flips and shift maps . . . . . . . . . . . . 122.2 Isomorphic systems . . . . . . . . . . . . . . . . . . . . . . . 142.3 Borel’s theorem on normal numbers . . . . . . . . . . . . . . 18

3 Rotations of the circle and unique ergodicity 223.1 Unique ergodicity . . . . . . . . . . . . . . . . . . . . . . . . 22

4 The Gauss map and the Gauss measure 244.1 Dynamical methods . . . . . . . . . . . . . . . . . . . . . . . 24

Bibliography 31

i

Preface

These are the notes for my short course at the Abdus Salam School ofMathematical Sciences during the Second Winter Workshop on AdvancedTopics in Mathematics in December 2018.

The course is concerned with the interplay between arithmetic – and inparticular numeration – and dynamical systems – and in particular ergodictheory. While the theory of numeration systems and arithmetic reflects thefundamentals of our understanding of the real numbers, ergodic theory hasits roots in physics and Bolzmann’s ergodic hypothesis. Thus, it is perhapsat a first glance surprising that the two are in anyway related. We willhowever see that they are in fact intimately related, and that arithmeticalphenomena can be explained via the ergodic theorem.

The course is organized in four lectures. In the first lecture, we givea brief introduction to ergodic theory completely independent from arith-metical questions. The remaining three lectures will each study a dynamicalsystem of arithmetic origin and study the interplay between the arithmeticand dynamical properties. The first is concerned with base b-map, whichencodes the base b-expansion of a real number. The second is concernedwith rotations of a circle through an angle which is an irrational multipleof 2⇡. The third is concerned with the so-called Gauss map, which encodesthe expansion of a real number as a simple continued fraction.

With all lectures, a set of exercises is supplied.I am greatly indebted to the work of Manfred Einsiedler and Tom Ward,

who wrote the excellent monograph Ergodic theory – with a view towardsnumber theory [3]. Most of the material in these lectures can be found in thatbook; and any participant in the workshop wishing to dig deeper into thesetopics should definitely read it. Some background in analysis is required. Agood reference is [6].

ii

1 The Birkhoff ergodic theorem

We will describe the fundamental concepts of ergodic theory and give aproof of the von Neumann ergodic theorem as well as the Birkhoff ergodictheorem.

1.1 From abstract dynamics to ergodictheory

Let us begin with an abstract definition of a dynamical system. By a dy-namical system, we mean a set X and a semigroup with identity G actingon X, i.e. for each g 2 G, there is a map fg : X ! X, and semigroupmultiplication correspond to composition of maps, so that for g1, g2 2 G,fg1g2 = fg1 �fg2 . Furthermore, if e 2 G denotes the identity, fe is the identitymap.

The setup given here is too general to do anything useful with, so westart making restrictions. First, consider the semigroup. It is natural tothink of a dynamical system as the points of X moving around as timepasses. Therefore, we are inclined to think as the semigroup as time. Wewill measure time either by N0 (discrete time) or by R�0 (continuous time),both as additive semigroups. Most of the systems considered here are withdiscrete time. The reader should be aware that this is already a significantrestriction. Many interesting dynamical systems – even from the point ofview of number theory – are multi-parameter systems.

If the semigroup in question is in fact a group, this forces the maps bywhich it acts on X to be invertible. In this case, we have an action of Zor R in the one-parameter setting. Both these situations will occur in thesenotes.

Now for the set X (the phase space). Having restricted ourselves toone-parameter actions, we define the orbit of a point x 2 X to be the set{fg(x) : g 2 G}. In dynamics, we are typically interested in the behaviour

1

2 CHAPTER 1. THE BIRKHOFF ERGODIC THEOREM

of the orbits, but unless more structure is imposed on X, studying thedynamical system given by X and G would be a rather dull affair.

Usually, a set has a lot more structure. Furthermore, the structure shouldbe reflected in the maps by which G acts. If X is a topological space and Gacts by continuous maps (or homeomorphisms in the invertible case), thisplaces our system in the realm of topological dynamics, where one typicallystudies topological properties of orbits (density, limit points, etc.). If X isa smooth manifold and G acts by smooth maps or diffeomorphisms, we arestudying differentiable dynamics, and the problems mostly boil down to thestudy of differential equations.

Most of the systems considered here will fall into at least one of these twocategories, but these structures will not be our focus. Instead, we will insistthat our phase space is a probability space, and that the maps consideredpreserve the underlying probability measure (we will be more precise in alittle while). This will lead to a study of statistical properties of the orbits.

We now restrict to the exact situation to be studied here. We will let(X,B, µ) be a probability space and T : X ! X be some surjective map.The semigroup N0 acts on X by iterations of the map T , i.e. Tn = T n,where T n denotes composition of T with itself n times.

Definition 1.1. Let (X,B, µ) be a probability space. A map T : X ! Xis said to be measure preserving if for any set A 2 B, µ(T�1A) = µ(A).

Likewise, if (X,B) is a measureable space and T : X ! X is a map, wesay that the measure µ is T -invariant.

Measure preserving maps are the building blocks of what we will beworking with. Let us begin by giving a criterion for a measure to be T -invariant.

Lemma 1.2. A measure µ on X is T -invariant if and only ifZ

fdµ =

Zf � Tdµ

for all f 2 L1(µ). In this case, the integral identity holds for all f 2 L1(µ)and indeed for f 2 Lp(µ) for any p 2 [1,1].

Proof. Suppose first that the integral identity holds and let A 2 B. Letf = �A, the indicator function of A. Then,

µ(A) =

Z�Adµ =

Z�A � Tdµ =

Z�T�1Adµ = µ(T�1A)

so that µ is T -invariant.

1.1. FROM ABSTRACT DYNAMICS TO ERGODIC THEORY 3

Suppose now that µ is T -invariant. By the above calculation, the integralidentity holds true for �A for any A 2 B, and hence by linearity of theintegral for any simple function. Let (fn) be a sequence of simple functionsincreasing to some f 2 Lp(µ). Then, the sequence (fn�T ) increases to f �T ,and the result follows by monotone convergence.

Applying the operation on the right hand side of the integral identityof Lemma 1.2 with p = 2 suggest that some operator on L2(µ) is relevant,and indeed this is the case. We define an operator

UT : L2(µ) ! L2(µ) by UT (f) = f � T.

It is easy to see that this is an isometry. Indeed, for f1, f2 2 L2(µ),

hUTf1, UTf2i =Z

f1 � T · f2 � Tdµ =

Zf1f2dµ = hf1, f2i.

The middle inequality follows by Lemma 1.2. If furthermore T is invertible,UT becomes invertible and hence a unitary operator.

We conclude this section with the definition of ergodicity. If (X,B, µ) isa probability space and T : X ! X is a measure preserving transformation,it is entirely possible that the space may be decomposed into two or moreT -invariant components. In an ergodic system, this cannot happen, at leastnot in a non-trivial way. We formalize this in a definition.

Definition 1.3. A measure preserving transformation T : X ! X of aprobability space (X,B, µ) is ergodic if for any A 2 B, if T�1A = A, thenµ(A) 2 {0, 1}.

Likewise, if T : X ! X is a transformation of a measureable space(X,B), then a measure is said to be ergodic if T is an ergodic transformationof the probability space (X,B, µ).

A set satisfying the condition that T�1A = A is said to be T -invariant.In the exercises, you will prove that an equivalent definition of ergodicityrequires that µ(A) 2 {0, 1} whenever µ(T�1A 4 A) = 0. Here, A 4 B =(A \B)[ (B \A), the symmetric set difference. Such sets are called almostT -invariant.

Yet another characterization of ergodicity, which is useful to us is interms of T -invariant functions, i.e. functions such that f = f � T almosteverywhere.

Proposition 1.4. Let (X,B, µ) be a probability space and let T : X ! Xbe a measure preserving transformation. T is ergodic if and only if any(complex valued) T -invariant function is constant almost everywhere.


Proof. We first assume that T is ergodic and assume that f is T -invariant.Then, the real and imaginary parts of f are also T -invariant, so withoutloss of generality, f is real valued. For a fixed k 2 Z and n 2 N, let

Akn = {x 2 X : f(x) 2 [ kn ,

k+1n )}.

Note thatAk

n 4 T�1Akn ✓ {x 2 X : f(x) 6= f � T (x)},

so since f is T -invariant and T is ergodic, by Exercise 1.1, µ(Akn) 2 {0, 1}.

Now, for any n 2 N, X =S

k2Z Akn, and since µ(X) = 1, µ(Ak

n) = 1 forexactly one value of k, k(n), say. But then,

µ

1\

n=1

Ak(n)n

!= 1,

and for x an element of this set, we must evidently have f(x) = f � T (x).Conversely, suppose that any T -invariant function is constant almost

everywhere and let A 2 B satisfy µ(T�1A4 A) = 0. Let f = �A. Then, fis T -invariant, and hence constant almost everywhere. As it is an indicatorfunction, this constant is either 0 or 1, so that µ(A) 2 {0, 1}.

It follows that T is ergodic if and only if 1 is an eigenvalue of UT ofmultiplicity 1. This is deduced in the exercises.

1.2 Ergodic theoremsBolzmann’s ergodic hypothesis states roughly that the space mean of aprocess is equal to its time mean with probability 1. This is the contentsof the various ergodic theorems, we will now discuss. Of course, this mustbe made precise, and we must show that the hypothesis of a map beingergodic implies these properties. Our first result is the von Neumann ergodictheorem, which is a statement about convergence in L2(µ). Recall that ameasure preserving transformation T induces an isometric operator UT onL2(µ).

Theorem 1.5 (The von Neumann ergodic theorem). Let (X,B, µ) be aprobability space and let T : X ! X be a measure preserving transfor-mation. Let PT denote the orthogonal projection operator onto the closedsubspace of invariant functions,

I = {g 2 L2(µ) : UTg = g}.

1.2. ERGODIC THEOREMS 5

Then, for every f 2 L2(µ),

1

N

N�1X

n=0

UnT f ! PTf,

for N ! 1, where the convergence is in L2(µ).

We give some initial remarks. For our purposes, the result is not veryinteresting, but it does give some hope of an interesting result. Namely,convergence in L2(µ) is not appropriate for the problems we will be con-sidering, but rather we would like convergence almost surely. This is ournext project. Secondly, if the underlying transformation is ergodic, thenby Proposition 1.4, the subspace I is one-dimensional and consists only ofequivalence classes of constant functions.

Proof. We begin by identifying the orthogonal complement to I. We claimthat this is the set B = {UTg � g : g 2 L2(µ)}. To see this, suppose firstthat f 2 I, so that

UTf = f.

Then, for any g 2 L2(µ),

hf, UTg � gi = hUTf, UT gi � hf, gi = 0,

since UT is an isometry, so that f 2 B?.Now suppose that f 2 B?. Then, for any g 2 L2(µ), hUT g, fi = hg, fi,

so thathg, U⇤

Tfi = hUTg, fi = hg, fi,so that U⇤

Tf = f , where U⇤T denotes the adjoint operator to UT as usual.

Consequently,

kUTf � fk22 = hUTf � f, UTf � fi= kUTfk22 � hf, UTfi � hUTf, fi+ kfk22= 2kfk2 � hU⇤

Tf, fi � hf, U⇤Tfi = 0,

so that f = UTf .It follows from orthogonal decomposition that L2(µ) = I � B, so that

any f 2 L2(µ) can be written

f = PTf + h,

where h 2 B. It now suffices to prove that for such an h,

1

N

N�1X

n=0

UnTh ! 0


in L2(µ). If h = UT g � g 2 B, this is trivial, as the sum telescopes and��1

N

N�1X

n=0

UnT (UT g � g)

��2

=1

NkUN

T g � gk2 ! 0.

If on the other hand h 2 B \B, we pick a sequence (hi) in B converging toh and apply an approximation argument. We leave this for an exercise.

We now define the ergodic averages of a function f to be

AN(f) =1

N

N�1X

n=0

f � T n.

The von Neumann ergodic theorem states that in L2, the sequence withgeneral term AN(f) converges to PTf . We would like a pointwise statementrather than a statement on convergence in L2-norm. As a first steppingstone, we extend the result to L1(µ).

Corollary 1.6. Let (X,B, µ) be a probability space and let T : X ! Xbe a measure preserving transformation. For any f 2 L1(µ), the ergodicaverages AN(f) converge in L1(µ) to a T -invariant function f 0 2 L1(µ).

Proof. Suppose first that g 2 L1(µ) ✓ L2(µ). Then, AN(g) converges inL2(µ) to some g0 2 L2(µ). Furthermore, g0 2 L1(µ), since kAN(g)k1 kgk1, so that for any B 2 B,

|hAN(g),�Bi kgk1µ(B).

Since AN(g) converges in L2(µ) to g0, this implies that

|hg0,�Bi kgk1µ(B),

for any B 2 B. But then, kg0k1 kgk1, and g0 2 L1(µ).Now, k · k1 k · k2 in a probability space, so AN(g) ! g0 for g 2 L1(µ),

where the convergence is in L1(µ).Now, let ✏ > 0. Recall that L1(µ) is a dense subset of L1(µ) in the

L1-metric. Hence, if f 2 L1(µ), we may take g 2 L1(µ) with kf � gk1 < ✏.Averaging and recalling that T preserves µ,

kAN(f)� AN(g)k1 < ✏.

Since g 2 L1(µ), there is a g0 2 L1(µ) and an N0 2 N, such that

kAN(g)� g0k < ✏,


whenever N � N0. Let N,N 0 � N0. Then, by the triangle inequality,

kAN(f)� AN 0(f)k1 kAN(f)� AN(g)k1 + kAN(g)� g0k1+ kg0 � AN 0(g)k1 + kAN 0(g)� AN 0(f)k1 < 4✏.

Hence, the sequence of ergodic averages (AN(f)) is a Cauchy sequence inL1(µ), which is complete by the Riesz–Fischer theorem. Hence, it is conver-gent.

Finally, to see that the limiting function in T -invariant, note that

kAN(f) � T � AN(f)k1 2

Nkfk1,

since the left hand side telescopes.

For our purposes, L1-convergence is also not sufficient. After all, wewill mostly be interested in characteristic functions on measurable sets,which are evidently in all Lp-spaces. We would really like to say somethingabout almost sure convergence. To accomplish this, we will need another twoauxillary results, namely the Maximal Inequality and the Maximal EggodicTheorem.

Proposition 1.7 (The Maximal Inequality). Let U : L1(µ) ! L1(µ) be apositive linear operator with kUkop 1, and let f 2 L1(µ) be real valued.Define f0 = 0 and

fn =n�1X

k=0

Ukf,

for n � 1. Finally, let FN = max{fn : 0 n N}. Then, for all N 2 N,Z

{x:FN (x)>0}fdµ � 0.

Proof. Note first that FN 2 L1(µ). Now,

UFN + f � Ufn + f = fn+1,

for all n = 0, 1, . . . , N , since U is a positive operator and since FN � fn.Hence,

UFN + f � max1nN

fn.

Now, let P = {x : FN(x) > 0} be the domain of integration. For x 2 P ,since f0 = 0,

UFN + f � max1nN

fn = max0nN

fn = FN ,


so thatf(x) � FN(x)� UFN(x).

Note now that FN(x) � 0, so UFN(x) � 0 as the operator U is positive.Finally, for x /2 P , FN(x) = 0. Combining these inequalities,Z

P

fdµ �Z

P

FNdµ�Z

P

UFNdµ =

Z

X

FNdµ�Z

P

UFNdµ

� FNdµ�Z

X

UFNdµ = kFNk1 � kUFNk1 � 0,

where the latter inequality follows as kUkop 1.

Theorem 1.8 (The Maximal Ergodic Theorem). Let (X,B, µ) be a prob-ability space and let T : X ! X be measure preserving. Let g 2 L1(µ) bereal valued and for ↵ 2 R, define

E↵ =

⇢x 2 X : sup

n�1An(g)(x) > ↵

�.

Then,↵µ(E↵)

Z

E↵

gdµ kgk1.

Furthermore, if T�1A = A, then

↵µ(E↵ \ A) Z

E↵\Agdµ.

To prove the last statement, just restrict the system to (A,B|A, 1µ(A)µ|A).

Proof. The proof is simple in view of the maximal inequality. Let f = g�↵and Uf = f � T . Then,

E↵ =1[

N=0

{x : FN(x) � 0},

and the maximal inequality implies thatZ

E↵

gdµ� ↵µ(E↵) =

Z

E↵

g � ↵dµ =

Z

E↵

fdµ � 0.

Finally, we can prove that the convergence of the ergodic averages in factis almost sure. This is the main contents of the Birkhoff Ergodic Theorem,which will be a key player in the coming lectures.


Theorem 1.9 (The Birkhoff Ergodic Theorem). Let (X,B, µ) be a proba-bility space, let T : X ! X be measure preserving, and let f 2 L1(µ). Then,the sequence (AN(f)) converges in L1 and almost surely to a T -invariantfunction f ⇤ 2 L1(µ). Furthermore,

Zf ⇤dµ =

Zfdµ.

If T is ergodic,f ⇤(x) =

Zfdµ

for µ-almost every x 2 X.

Proof. Note first that on splitting f in real and imaginary parts, it sufficesto prove the result for a real valued function f . Hence, we may define

f ⇤(x) = lim supN!1

AN(f)(x), f⇤(x) = lim infN!1

AN(f)(x).

We must show that these two functions agree almost surely.First, note that

N + 1

NAN+1(f)(x) =

N + 1

N

1

N + 1

NX

n=0

f(T nx)

!

=1

N

N�1X

n=0

f(T n(Tx)) +1

Nf(x) = AN(f)(T (x)) +

1

Nf(x).

Considering subsequences converging to the limsup respectively the liminfon either side, we easily deduce in four steps that both f ⇤ and f⇤ are T -invariant.

Now, let ↵ > � be rational numbers and consider the set

E�↵ = {x 2 X : f⇤(x) < �, f ⇤(x) > ↵}.

Note that since f⇤ and f ⇤ are both T -invariant, so is the set E�↵. Let E↵

be the set from the Maximal Ergodic Theorem with f in place of g. Then,clearly E�

↵ ✓ E↵, and the Maximal Ergodic Theorem implies thatZ

E�↵

fdµ � ↵µ(E�↵).

Repeating the argument with �f in place of f , we similarly find thatZ

E�↵

fdµ �µ(E�↵),


so that↵µ(E�

↵) �µ(E�↵).

Since � < ↵, this implies that µ(E�↵) = 0.

Now,{x 2 X : f⇤(x) < f ⇤(x)} =

[

↵,�2Q�<↵

E�↵,

which is a countable union of null sets. Hence, f⇤(x) = f ⇤(x) for almostevery x 2 X.

To see that the almost sure limit f ⇤ agrees with the L1-limit from thecorollary to von Neumann Ergodic Theorem, Exercise 1.4 implies that sinceAN(f) converges to f 0 in L1, there is a subsequence ANk

(f), which convergesalmost everywhere to f 0. But this limit must agree with f ⇤ almost every-where, and hence, the two are the same, at least almost everywhere.

Finally, the equation with the integral follows immediately on notingthat Z

AN(f)dµ =

Zfdµ.

If T is ergodic, the only T -invariant functions are constants almost every-where, and the final statement follows.

ExercisesExercises to Section 1.11.1 Show that ergodicity of T on (X,B, µ) is equivalent to the property

that for any A 2 B, if µ(T�1A4 A) = 0 then µ(A) 2 {0, 1}. (Hint:Show that the limsup set \1

N=0 [1n=N T�n(A) satisfies the invariance

property of the definition and has the same measure as A).

1.2 Show that T is ergodic if and only if 1 is an eigenvalue of UT ofmultiplicity 1.

Exercises to Section 1.21.3 Complete the proof of the von Neumann ergodic theorem by carrying

out the approximation argument.

1.4 Let (X,B, µ) be a measure space. Show that if (fn) is a sequence inLp(µ) (1 p < 1), which converges to f 2 Lp(µ), then there isa subsequence (fnk

) which converges to f pointwise almost surely.


(Hint: Take a subsequence (fnk) with kfnk

� fkpp < k�2�p and showthat

µ�x 2 X : |fnk

(x)� f(x)| > 1k

< 1

k2 .

Now apply the Borel–Cantelli lemma.)

1.5 Deduce the following corollary of the Birkhoff ergodic theorem:(X,B, µ) be a probability space, let T : X ! X be measure preserv-ing, and let E 2 B. Then almost every x 2 E returns to E infinitelyoften, i.e. there is a set F ✓ E with µ(F ) = µ(E), such that for eachx 2 F , there is an increasing sequence of natural numbers (nk), suchthat T nk 2 E. This is the Poincaré recurrence theorem, which canalso be proved independently of the ergodic theorem.

2 Markov shifts and the base b-map

A particularly simple example of an ergodic system is the base-b map, whichencodes the digit distribution of the initial point in an orbit. We will provethe ergodicity (and more) of this map and continue with some arithmeticconsequences; in particular an ergodic proof of Émile Borel’s classical resultthat almost all numbers are absolutely normal.

2.1 Base b-maps, coin flips and shift mapsIt is well known that given a real number x 2 [0, 1) and an integer b 2 Zwith b � 2, we can express x in the form

x =1X

i=1

aibi, ai 2 {0, 1, . . . , b� 1}.

As it turns out, there is an underlying dynamical system to this expansion.To see this, note that we may read off the first digit a1 by splitting up theunit interval [0, 1) into b disjoint intervals of equal size, namely [ jb ,

j+1b ).

If x 2 [ jb ,j+1b ), then a1 = j. Thus, we define the function a : [0, 1) !

{0, 1, . . . , b� 1} by,a(x) = j for x 2 [ jb ,

j+1b ).

Finding a2 requires a little more work. For a real number x, let [x] denotethe integer part of x (or the floor function), and let {x} denote the fractionalpart of x, i.e. {x} = x� [x]. Multiplying x by b, we find that

bx = b1X

i=1

aibi

= a1 +1X

i=1

ai+1

bi.

Taking fractional parts,

{bx} =1X

i=1

ai+1

bi,

12

2.1. BASE b-MAPS, COIN FLIPS AND SHIFT MAPS 13

and thus a2 = a({bx}). We may of course repeat the procedure to find a3, a4and so on.

To put this into the dynamical context, define the map Tb : [0, 1) ! [0, 1)by Tb(x) = {bx}. In this way, we find that

x =1X

i=1

a(T i�1b x)

bi,

so that a base b-expansion of x is fully determined by the orbit of x underTb together with the test function a. The map Tb acts as a shift map on thedigits of x: it removes the first digit and shifts the rest up.

When we say a base b-expansion and not the base b-expansion, this isentirely on purpose. Indeed, the base b-expansion of a real number is notalways unique, as is easily seen by considering the decimal expansion of1/10, for which

1

10=

1X

i=2

9

10i.

Of course, one could consider the shift map on the full set of sequences,i.e. one could ignore the fact that a number may have more than one baseb-expansion. From this point of view, we consider a finite alphabet with bletters, ⇤ = {0, 1, . . . , b�1}, and as the phase space of our dynamical systemwe consider the set of sequences with elements from ⇤, namely X = ⇤N.Equipping the space with the shift map � : X ! X, given by

� ((ai)1i=1) = (ai+1)

1i=1,

we obtain a dynamical system in the sense first defined. Indeed, we have amap and an action of the semigroup N.

To get sufficient structure on the shift space X to apply any additionaltheory, we do the following. First, if we equip ⇤ with the discrete topology,it is entirely natural to put the product topology on X, so the dynamicalsystem is certainly a topological dynamical system. By Tychonoff’s Theo-rem, the space is compact, and in Exercise 2.1, you will show that in factthis topology is metrizable in a very concrete way.

Our objective in these notes is to study ergodic theory, so we willnow construct a family of measures on shift spaces. Let µ be some mea-sure on ⇤ = {0, 1, . . . , b � 1}, i.e. any function from ⇤ to [0, 1], such thatP

k2⇤ µ(k) = 1. On X = ⇤N, this induces the (infinite) product measureµ =

Qi2N µ, called a Bernoulli measure.

A very concrete interpretation is the following. Let b = 2, so that ⇤ ={0, 1}, and let µ be the uniform probability distribution on {0, 1}, which

14 CHAPTER 2. MARKOV SHIFTS AND THE BASE b-MAP

assigns measure 12 to each element. This probability space corresponds to

flipping a fair coin. The measure on the product space ⇤N corresponds torepeatedly flipping this coin an infinite number of times.

It is thus not at all surprising that the shift map is ergodic for thisparticular system. Indeed, the Strong Law of Large Numbers say that for aprocess of independent and identically distributed random variables,

1

N

NX

n=1

Xn ! E(X1)

almost surely. A Random variable is nothing but an integrable function on{0, 1} (so any function will do). Reading this in terms of the shift map, theleft hand side is just the ergodic averages of some such function, while theright hand side is the integral of this function with respect to µ. We haverecovered the Birkhoff ergodic theorem for this particular system.

Of course, there is nothing in our setup requiring that the coin is fair! Itmay just be that the coin comes up heads 3/4 of the time, and we would geta different distribution. However, the Strong Law of Large Numbers wouldremain valid. We will return to this issue, but for now let ud summarize thediscussion on shift spaces in a defintion.

Definition 2.1. Let ⇤ be a finite set and let X = ⇤N be the set of all (one-sided) sequences with elements from ⇤. Let µ be a probability measure on⇤ and let µ =

Qn2N µ be the infinite product measure on N. Note that this

is certainly a Borel probability measure. The probability space (X,B, µ)together with the shift map � : X ! X is called a Bernoulli shift.

2.2 Isomorphic systemsIt appears that proving that a Bernoulli shift is ergodic should be a simplematter in view of the above remarks on the Strong Law of Large Numbers.However, our objective is to study numbers, and in particular the base b-expansions of real numbers, and the base b-map Tb is not a Bernoulli shift,due to the non-uniqueness of base b-expansions.

We now show that from the point of view of ergodic theory, the base b-map on [0, 1) with Lebesgue measure and the Bernoulli shift on b letters withthe uniform measure are indistinguishable. For this, we need the notion ofisomorphic dynamical systems. A map � : X ! Y between measure spaces(X,BX , µ) and (Y,BY , ⌫) is measure preserving if for any A 2 BY , we havephi�1(A) 2 BX with µ(��1(A)) = ⌫(A).

2.2. ISOMORPHIC SYSTEMS 15

Definition 2.2. Let (X,BX , µ) and (Y,BY , ⌫) be probability spaces andlet T : X ! X and S : Y ! Y be measure preserving transformations ofX and Y respectively. The two measure preserving systems are said to beisomorphic if there exist sets X 0 ✓ X and Y 0 ✓ Y with µ(X 0) = ⌫(Y 0) = 1such that TX 0 ✓ X 0 and SY 0 ⇢ Y 0 and an invertible, measure preservingmap � : X 0 ! Y 0 such that

� � T (x) = S � �(x)

for all x 2 X 0.

Note that if two systems are isomorphic and one is ergodic, then imme-diately so is the other one.

Proposition 2.3. Let b 2 Z with b � 2 and let � denote the Lebesgue mea-sure restricted to [0, 1). The system ([0, 1),B,�) with the map Tb : [0, 1) ![0, 1) is isomorphic to the Bernoulli shift {0, 1, . . . , b � 1}N based on theuniform probability distribution.

Proof. We first observe, that the �-algebra in the shift space {0, 1, . . . , b�1}N is generated by cylinder sets, i.e. sets of the form

C(c1, c2, . . . , cM) = {(ai)1i=1 : ai = ci for i = 1, 2, . . . ,M} ,

where M 2 N and c1, c2, . . . , cM 2 {0, 1, . . . , b� 1} are fixed. Also, observethat the �-algebra B in [0, 1) is generated by the b-adic intervals,

j

bk,j + 1

bk

◆,

where k 2 N and j 2 {0, 1, . . . , bk�1} are fixed. Hence, it suffices to providean invertible map between sets of this type, which preserves the measure.

The map � is the obvious one, which takes the base b-expansion gener-ated in the beginning of the chapter and maps to the sequence of digits. Thedomain of � will thus be [0, 1). The image is not all of {0, 1, . . . , b�1}N, as weare only hitting one particular base b-expansion whenever there are more.This is easily dealt with, as we know (see e.g. [4]) that the only way in whicha number can have more than one base b-expansion is if it has expansionsending in an infinite number of 0’s and in an infinite number of (b�1)’s. Thishappens only for countably many numbers, so {0, 1, . . . , b � 1)N \ �([0, 1))is countable. Since the Bernoulli measure is evidently non-atomic, a count-able set has measure 0, and we now have a map between full subsets of theinterval [0, 1) and the Bernoulli shift. It is evident from the constructionthat the map intertwines the shift map and the base b-map, Tb.


It remains for us to prove that the map preserves the measure. LetM 2 N and c1, c2, . . . , cM 2 {0, 1, . . . , b � 1} be fixed and consider thepreimage of the cylinder set C(c1, c2, . . . , cM) under �. This is the set

(x 2 [0, 1) : x =

MX

i=1

cibi

+1X

i=m+1

aibi, ai 2 {0, 1, . . . , b� 1}

),

an interval of length b�M . But this is also the Bernoulli measure of theoriginal cylinder C(c1, c2, . . . , cM). This completes the proof.

With the above proposition in place, in order to prove that Tb is ergodicwith respect to the Lebesgue measure, it suffices to prove that Bernoullishifts are ergodic. We do this now.

Theorem 2.4. Bernoulli shifts are ergodic.

Proof. The cylinder sets generate the �-algebra B. This means that for anyset B 2 B and any ✏ > 0, there is an N 2 N and a set F ✓ ⇤N , such thatthe set

A = {(ai)1i=1 : (a1, a2, . . . , aN) 2 F}

satisfies that µ(A 4 B) < ✏. Let ✏ 2 (0, 1), let B be �-invariant and picksuch an A.

Take an A of this form. Then, for M > N ,

��MA = {(ai)1i=1 : (aM+1, aM+2, . . . , aM+N) 2 F} .

Note that the entries of the sequences in ��MA and A on which restrictionsare imposed by the structure of A are at a disjoint set of indices. Hence,since the measure is a product measure,

µ(��MA \A) = µ(��MA \ (X \A)) = µ(��MA)µX \A) = µ(A)µ(X \A),

since � is measure preserving.Since B is assumed to be �-invariant, µ(B 4 ��1B) = 0. Hence,

µ(��MA4 B) = µ(��MA4 ��MB) = µ(A4 B) < ✏,

so thatµ(��MA4 A) < 2✏.

But then,

µ(��MA \ A) µ(A \ ��MA) + µ(��MA4 A) = µ(��MA4 A) < 2✏.

2.2. ISOMORPHIC SYSTEMS 17

Now, finally

µ(B)µ(X \B) < (µ(A) + ✏)(µ(X \ A) + ✏)

= µ(A)µ(X \ A) + ✏µ(A) + ✏µ(X \ A) + ✏2

< µ(��MA \ A) + 3✏ < 5✏.

Since ✏ is arbitrary, this shows that µ(B)µ(X \ B) =, so that µ(B) 2{0, 1}.

Of course, Theorem 2.4 opens a whole new can of works! It says thatany Bernoulli shift is ergodic. Thus, if we start with a different measure on{0, 1, . . . , b�2}, we obtain an entirely different ergodic system (the resultingmeasures on the shift space are clearly different). It is reasonable to askwhether such a new measure gives rise to an ergodic measure on [0, 1). Theanswer is yes! Suppose we have two measurable spaces (X,BX) and (Y,BY )with a measurable bijection � : X ! Y which intertwines automorphismsT : X ! X and S : Y ! Y in the sense used to define isomorphic systems.Suppose further that T preserves a measure µ on X. We may then define ameasure ⌫ on Y by letting

⌫(A) = µ(��1(A)),

for any A 2 BY . Thus, there are infinitely many ergodic measures for themap Tb. We now examine this fact further.

Let us digress a little into functional analysis. Riesz’ RepresentationTheorem provides a correspondence between positive measures and positivelinear functionals on Cc(X), the space of continuous functions with compactsupport on X: Two measures µ and ⌫ are equal as measures if

Zfdµ =

Zfd⌫

for all f 2 Cc(X). As such, the set of measures is a subset of the dual spaceCc(X)⇤ to the Banach space Cc(X) endowed with the weak⇤-topology.

Now, the set of probability measures is a subset of the unit ball (in factof the unit sphere) of Cc(X)⇤. The unit ball is compact by the Banach–Alaoglu Theorem, and it is easily checked that being invariant for a mapT : X ! X is a closed and convex property. In other words, the set ofpreserved measures form a compact and convex subset of Cc(X)⇤.

Working a little harder, one finds that the extremal points of this setare exactly the ergodic measures for T . One might think that such a set isa simple thing to study, but this is far from the case. In exercise 2.2, you


will construct a compact and convex subset of a Banach space, inside whichthe extremal points are dense. This is generically the case for dynamicalsystems. The set of ergodic measures is very badly behaved.

Before proceeding, let us mention that the set of preserved measures maybe the empty set. This will not happen to us, and the existence of preserved(and hence ergodic) measures is guaranteed, if the space X considered is acompact metric space (for us the circle), and the map T is continuous.

2.3 Borel’s theorem on normal numbersLet us finally derive a result from number theory from the work done sofar. Let b � 2 be an integer. A real number x 2 [0, 1) is said to be normalto base b if all finite blocks of digits occur with the expected frequencyin a base b expansion of x, i.e. if for all N 2 N and all (c1, c2, . . . , cM) 2{0, 1, . . . , b� 1}M ,

limN!1

#{n N : (an+1, an+2, . . . , an+M)

N= (c1, c2, . . . , cM)} . . . N = b�M .

(2.1)A number is normal if it is normal to any integer base b � 2.

The following result is due to Émile Borel [2].

Theorem 2.5. Lebesgue almost all real numbers in [0, 1) are normal.

Proof. First, fix a base b and a block of digits, (c1, c2, . . . , cM) 2 {0, 1, . . . , b�1}M . Define the set

A =

(x =

1X

i=1

aibi

: (a1, a2, . . . , aM) = (c1, c2, . . . , cM)

).

Note that this is an interval of length B�M . Let f = �A, the characteristicfunction of this interval. Clearly, this is a function in L1(�), and since themap Tb : [0, 1) ! [0, 1) is ergodic with respect to �, the Birkhoff ergodictheorem implies that

1

N

N�1X

n=0

f(T nb x) !

Zfd�

for �-almost every x 2 [0, 1). Now, note that f(T nb ) = 1 if and only if

(an+1, an+2, . . . , an+M) = (c1, c2, . . . , cM) and f(T nb x) = 0 otherwise; and

clearlyRfd� = b�M . Hence, the above equation is nothing but (2.1) in dis-

guise, so this particular block of digits occurs with the expected frequency.

2.3. BOREL’S THEOREM ON NORMAL NUMBERS 19

Since the block of digits was arbitrary, and there are only countablymany possible blocks, the set of numbers for which any one of these limitsdoes not exist or has a different value than the expected is also a null set.This shows that almost all numbers are normal to base b.

Finally, to prove normality of almost all numbers, note that the set ofnumbers which fail to be normal to base b is a null set. As there are onlycountably many bases, the union of all of these sets is itself a null set.Everything in its complement is normal, and the complement is full withrespect to Lebesgue.

We have only considered numbers between 0 and 1 at this point, butsince the requirement to be normal is an asymptotic one, anything takingplace in the integer part of a number will have no effect on the combinato-rially defined normality. The result makes it a reasonable question whetherany nice and well-known number, such as ⇡, e,

p2 or log 2, can be shown

to be normal. These are all open (and seemingly very difficult) problems! Itis conjectured that algebraic numbers which are not irrational are normal,but very little is known about this. Classical transcendental constants suchas ⇡ are also well out of reach.

One can certainly construct normal numbers to a specific base. A famousexample is Champernowne’s number, which in base 10 is given by

0.1234567891011 . . .

Numbers normal to any base can also be constructed, though involves alot more technique. In recent years, a construction has been given, whichgenerates the n’th binary digit of such a number in close to quadratic time[1]. The number does not however resemble anything from other parts ofmathematics.

ExercisesExercises to Section 2.12.1 Let ⇤ be a finite set. Show that on X = ⇤N, the function d : X⇥X !

R�0 given by

d ((ai)1i=1, (bi)

1i=1) =

(0 if (ai)1i=1 = (bi)i=1,

2�min{i2N:ai 6=bi} otherwise,

defines a metric on X (Here, the minimum of the empty set ; is setto be equal to 0). Show that this metric induces the product topologyof the discrete topology on ⇤.


Exercises to Section 2.22.2 In this exercise, you will construct the so-called Poulsen simplex,

see also [7]. The construction takes place in `2(R), but simplexes ofthis form are the typical shape of the set of preserved measures fordynamical systems.First some notation. For a convex set S, let E(S) denote the extremalpoints in S. Let ej 2 `2(R) denote the sequence which has 1 as it’sj’th entry and 0’s everywhere else. Finally, let En denote the subspacespanned by {e1, e2, . . . , en}

(a) Construct a sequence of simplexes Sn with the following prop-erties:(1) Sn ✓ En for every n.(2) Sn ✓ Sm and E(Sn) ✓ E(Sm) for m > n.(3) PnSm = Sn for m > n.(4) For any ✏ > 0, there is an n 2 N such that the maximal

distance from any x 2 Sn to E(Sn) is at most ✏.(Hint: Start with a line segment of length 2�1 in E1. Then pickfinitely many points such that any point in the segment is withina distance of 2�2 of these points. Finally, stick on an orthogonaldirection on each of these points – one direction for each point– and take convex hulls successively. Repeat!)

(b) With Sn constructed, let

Tn = P�1n (Sn) andS =

1\

n=1

Tn.

Show that(1) Tn ◆ En for n < m.(2) PnTm = Sn for n < m.(3) PnS = Sn for all n.(4) The set

S1n=1 E(Sn) is dense in S.

(c) Show that E(S) is dense inside S. (Hint: It is enough to provethat E(Sn) ✓ E(S)).

(d) Show that S is a simplex, i.e. that any set of the form

A = S \ (qS � a), q > 0

containing at least 2 points is itself a homothetic copy of S, i.e.

A = rS + b, r > 0.

2.3. BOREL’S THEOREM ON NORMAL NUMBERS 21

Exercises to Section 2.32.3 In this exercise, we look at a particular instance of the so-called �-

expansions of Parry [5] and Rényi [8]. Most questions are open-ended,and you are encouraged to play around with the example.Let � = (1 +

p5)/2, the Golden Ratio and consider the map T� :

[0, 1) ! [0, 1). Show that the dynamical system is conjugate so thesubspace of shift space on two elements given by

⌃� =�(ai)

1i=1 2 {0, 1}N : ai = 1 ) ai+1 = 0

.

Does this have a meaning in the representation of real numbers onthe form

x =1X

i=1

ai�i, ai 2 {0, 1}?

Can you say anything about the ergodic properties of the map? Whatdoes it mean for a number to be normal to base �?The shift space in this exercise is a so-called sub-shift of finite type.These are defined by prohibiting a finite number of sub-words fromthe full shift. In this case, no sequence can contain the subword 11.Feel free to play around with finding a condition on � > 1 implyingthat T� : [0, 1) ! [0, 1) is conjugate to a sub-shift of finite type.

dynamics and numbers kristensen.pdf · 1 the birkhoﬀ ergodic theorem we will describe the...

Documents