the ergodic hierarchy

1

The Ergodic Hierarchy

Forthcoming in Stanford Encyclopedia of Philosophy

Joseph Berkovitz

Roman Frigg

Frederick Konz

1. Introduction

The so-called ergodic hierarchy (EH) is a central part of ergodic theory. It is a hierarchy of

properties that dynamical systems can possess. Its five levels are egrodicity, weak mixing,

strong mixing, Kolomogorov, and Bernoulli. Although EH is a mathematical theory, its

concepts have been widely used in the foundations of statistical physics, accounts of

randomness, and discussions about the nature of chaos. We introduce EH and discuss how its

applications in these fields.

2. Dynamical Systems

The object of study in ergodic theory is a dynamical system. We first introduce the basic

concepts with a simple example, from which we abstract the general definition of a dynamical

system, a fundamental concept of modern ergodic theory. For a brief history of the modern

notion of a dynamical system and the associated concepts of EH see Appendix A.

A lead ball is hanging from the ceiling on a spring. We then pull it down a bit and let it go.

The ball begins to oscillate. The mechanical state of the ball is completely determined by a

specification of its position x and momentum p; that is, if we know x and p, then we know all

2

that there is to know about the ball. If we now conjoin x and p in one vector space we obtain

the so-called phase space of the system (sometimes also referred to as ‘state space’).1 This is

illustrated in Figure 1 for a two-dimensional phase space of the state of a ball moving up and

down (i.e. the phase space has one dimension for the ball’s position and one for its

momentum).

Figure 1: The motion of a ball on a spring.

Each point of X represents a state of the ball (because it gives the ball’s position and

momentum). Accordingly, the time evolution of the ball’s state is represented a line in X, a so-

called phase space trajectory (from now on ‘trajectory’), showing where in phase space the

system was at each instant of time. For instance, let us assume that at time t=0 the ball is

located at point x1 and then moves to x2 where it arrives at time t=5. This motion is

represented in X by the line segment connecting points 1 and 2. In other words, the motion of

the ball is represented in X by the motion of a point representing the ball’s (instantaneous)

state, and all the states that the ball is in over the course of a certain period of time jointly

1 The use of the term ‘space’ in physics might cause confusion. On the one hand the term is used in its ordinary

meaning to refer to the three-dimensional space of our everyday experience. On the other hand, an entire class of

mathematical structures are referred to as ‘spaces’ even though they have nothing in common with the space of

everyday experience (except some abstract algebraic properties, which is why these structures earned the title

‘spaces’ in the first place). Phase spaces are abstract mathematical spaces.

x1

x2

x

γ1 X

γ2

p

3

form a trajectory. The motion of this point has a name: it is the phase flow t .2 The phase

flow tells us where the ball is at some later time t, if we specify where it is at t=0; or,

metaphorically speaking, t drags the ball’s state around in X so that the movement of the

state represents the motion of the real ball. In other words, t is a mathematical representation

of the system’s time evolution. The state of the ball at time t=0 is commonly referred to as the

initial condition. t then tells us, for every point in phase space, how this point evolves if it is

chosen as an initial condition. In our concrete example point 1 is the initial condition and we

have )( 152 t . More generally, let us call the ball’s initial condition 0 and let )(t be its

state at some later time t. Then we have )()( 0 tt . This is illustrated in figure 2a.

Figure 2: Evolution in Phase space.

Since t tells us for every point in X how it evolves in time, it also tells us how sets of points

move around. For instance, choose an arbitrary set A in X; then )(At is the image of A after t

time units under the dynamics of the system. This is illustrated in Fig 2b. Considering sets of

points rather than only points is important when we think about physical applications of this

mathematical formalism. We can never determine the exact initial condition of a ball

bouncing on a spring. No matter how precisely we measure 0 , there will always be some

measurement error. So what we really want to know in practical applications is not how a

2 Note that the time dimension of the ball’s motion is not an explicit part of the phase space.

x1

x5

x

γ1 X

)( 152 t

p

x

X

0

)()( 0 tt

A

)(At

(2a) (2b)

4

precise mathematical point evolves, but rather how set of points around the initial condition

0 evolves. In our example of the ball the evolution is ‘tame’, in that the set keeps its original

shape. As we will see below, this is not always the case.

An important feature of X is that it is endowed with a so-called measure . We are familiar

with measures in many contexts: from a mathematical point of view, the length that we

attribute to a part of a line, the surface we attribute to a part of a plane, and the volume we

attribute to a segment of space are measures. A measure is simply a device to attribute a ‘size’

to a part of a space (in everyday contexts one, two, or three dimensional). Although X is an

abstract mathematical space, the leading idea of a measure remains the same: it is a tool to

quantify the size of a set. So we say that the set A has measure )(A in much the same way

as we say that a certain collection of points of ordinary space (for instance the ones that lie on

the inside of a bottle) have a certain volume (for instance one litre).

There are many different measures in our daily lives: we can measure length in meters or in

yards; we can measure surfaces in square meters or acres; and we can measure volume in

litres or gallons. The same is the case in abstract mathematical spaces, where we can also

introduce many different measures. One of these measures is particularly important, namely

the so called Lebesgue measure. This measure has an intuitive interpretation: it is just a

precise formalisation of the measure we commonly use in geometry. The interval [0, 2] has

Lebesgue measure 2 and the interval [3, 4] has Lebegues measure 1. In two dimensions, a

square whose sides are 2 long has Lebesgue measure 4; etc. Although this sounds simple, the

mathematical theory of measures is rather involved. We state the basics of measure theory in

Appendix B and avoid appeal to technical issues in measure theory in what follows.

The essential elements in the discussion so far were the phase space X, the time evolution t ,

and the measure . And these are also the ingredients for the definition of an abstract

dynamical system. An abstract dynamical system is triple ],,[ tTX , where

}{ timeofntsinstaallaretTt is a family of automorphisms, i.e. a family of transformations of

X onto itself with the property that ))(()(2121

xTTxT tttt for all Xx (Arnold and Avez

5

1968, 1); we say more about time below.3 In the above example X is the phase space of the

ball’s motion, is the Lebesgue measure, and tT is t .

So far we have described tT as giving the time evolution of a system. Now let us look at this

from a more mathematical point of view: the effect of tT is that it assigns to every point in X

another point in X after t time units have elapsed. In the above example 1 is mapped onto

2 under the t after 5t seconds. Hence, from a mathematical point of view the time

evolution of a system consists in a mapping of X onto itself, which is why the above

definition takes tT to be a family of mappings of X onto itself. Such a mapping is a

prescription that tells you for every point x in X on which other point in X it is mapped

(from now on we use x to denote any point in X , and it no longer stands, as in the above

example, for the position of the ball). A mapping that takes X onto itself is called an

automorphism of X.

The systems studied in ergodic theory are forward deterministic. This means that if two

identical copies of that system are in the same state at one instant of time, then they must be in

the same state at all future instants of time. Intuitively speaking, this means that for any given

time there is only one way in which the system can evolve forward. For a discussion of

determinism see Earman (1986).

It should be pointed out that no particular interpretation is intended in an abstract dynamical

system. We have motivated the definition with an example from mechanics, but dynamical

systems are not tied to that context. They are mathematical objects in their own right, and as

such they can be studied independently of particular applications. This makes them a versatile

tool in many different domains. In fact, dynamical systems are used, among others, in fields

as diverse as physics, biology, geology, and economics. In population biology, for instance,

the points in X are taken to represent the number of animals in a population, and the map T

gives the change in number over time.

3 Sometimes a fourth component is mentioned in the definition: a sigma algebra . Although in certain

circumstances it is convenient to add it is not strictly necessary since the main purpose of is to provide a

basis to define the measure , and so is always present in the background when there is a measure and it

is not necessary to mention it explicitly. For a discussion of sigma algebras and measures see Appendix B.

6

There are many different kinds of dynamical systems. The three most important distinctions

are the following.

Discrete versus continuous time. We may consider discrete instants of time or a continuum of

instants of time. For ease of presentation, we shall say in the first case that time is discrete and

in the second case that time is continuous. This is just a convenient terminology that has no

implications for whether time is fundamentally discrete or continuous. In the above example

with the ball time was continuous (it was taken to be a real number). But often it is convenient

to regard time as discrete. If time is continuous, then t is a real number and the family of

automorphisms is }{ RtTt , where R are the real numbers. If time is discrete, then the t is in

the set ,...}2,1,0,1,2{..., Z , and the family of automorphisms is }{ ZtTt . In order to

indicate that we are dealing with a discrete family rather than a continuous one we sometimes

replace ‘ tT ’ with ‘ nT ’; this is just a notational convention of no conceptual importance.4 In

such systems the progression from one instant of time to the next is also referred to as a ‘step’.

In population biology, for instance, we often want to know how a population grows over a

typical breeding time (e.g. one month). In mathematical models of such a population the

points in X represent the size of a population (rather than the position and the momentum of

a ball, as in the above example), and the transformation nT represents the growth of the

population after n time units. A simple example would be nxxTn )( , where x now is just

a point in X (and not, as above, the position of a ball).

Discrete families of automorphisms have an interesting property the interesting property that

they are generated by one mapping. As we have seen above, all automorphisms satisfy

))(()(2121

xTTxT tttt . From this it follows that )()( 1 xTxT nn , that is nT is the n -th iterate of

1T . In this sense 1T generates }{ ZtTt ; or, in other words, }{ ZtTt can be ‘reduced’ to 1T .

For this reason one often drops the subscript ‘1’, simply calls the map ‘T ’, and writes the

dynamical system as the triple ],,[ TX , where it is understood that 1TT .

4 By using R and Z we assume that time extends to the past as well as to the future, and we also assume that the

time evolution is reversible. This need not be the case and these assumptions be relaxed in different ways.

Nothing in what follows depends on this.

7

For ease of presentation we use discrete transformations from now on. The definitions and

theorems we formulate below carry over to continuous without further ado, and where this is

not the case we explicitly say so and treat the two cases separately.

Measure preserving versus non-measure preserving transformations. Roughly speaking, a

transformation is measure preserving if the size of a set (like set A in the above example) does

not change over the course of time: a set can change its form but it cannot shrink or grow

(with respect to the measure). Formally, T is a measure-preserving transformation on X if

and only if (iff) for all sets A in X : ))(()( 1 ATA , where )(1 AT is the set of points that

gets mapped onto A under T ; that is })(:{)(1 AxTXxAT .5 From now on we also

assume that the transformations we consider are measure preserving.6

Differentiable versus (merely) measurable dynamics. A further issue concerns the question of

where T comes from. In the case of classical mechanics, we obtain T as the solution of the

equation of motion. In the introductory example, for instance, we would start by writing down

the forces acting on the ball, then plug these into Newton’s equation of motion (a differential

equation), and then solve that equation. The solution to the equation is t , and since it is the

solution to differential equation it is differentiable. Systems whose flows are differentiable are

differentiable systems. Those systems that are not differentiable are (merely) measurable

since they have a measure defined on them (namely ), which can be used to measure sets at

all times.7 Since being differentialble is a strong assumption that many systems don’t satisfy,

in what follows we shall not assume that systems are differentiable (and when we do we shall

mention it explicitly).

5 Strictly speaking A has to be measurable. In what follows we always assume that the sets we consider are

measurable. This is a technical assumption that has no bearing on the issues that follow since the relevant sets

are always measurable. 6 First appearances notwithstanding, this is not a substantial restriction. Systems in statistical mechanics are all

measure preserving. Some systems in chaos theory are not measure preserving, but these systems, if they are

chaotic on a certain part of the phase space (which can be an attractor or an interval, for instance), then there is

an invariant measure on this part and EH is applicable with respect to that measure. For a discussion of this point

see Werndl (2009b). 7 There are also other types of systems; e.g. topological ones. These are not considered here since the concepts of

EH are essentially tied to there being a measure.

8

In sum, from now on, unless stated otherwise, we consider discrete measure preserving

transformations which can but need not be differentiable.

In order to introduce the concept of ergodicity we have to introduce the phase and the time

mean of a function f on X . Mathematically speaking, a function assigns each point in X a

number. If the numbers are always real the function is a real-valued function; and if the

numbers may be complex, then it is a complex-valued function. Intuitively we can think of

these numbers as physical quantities of interest. Recalling the example of the bouncing ball,

f could for instance assign each point in the phase space X the kinetic energy the system

has at that point; in this case we would have mpf 2/2 , where m is the mass of the ball.

For every function we can take two kinds of averages. The first is the infinite time average

f . The general idea of a time average is familiar from everyday contexts. You play the

lottery on three consecutive Saturdays. On the first you win $10; on the second you win

nothing; and on the third you win $50. Your average gain is ($10 + $0 + $50)/3 = $20.

Technically speaking this is a time average. This simple idea can easily be put to use in a

dynamical system: follow the system’s evolution over time (and remember that we are now

assuming time is discrete), take the value of the relevant function at each step, add the values,

and then divide by the number of steps. This yields

k

ii xTfk

10 ))((/1 , where

k

i

xTf1

0 ))(( is

just an abbreviation for )(...)()()( 002010 xTfxTfxTfxf k . This is the finite time

average for f after k steps. If the system’s state continues to evolve infinitely and we keep

tracking the system forever, then we get the infinite time average:

k

ii

kxTf

kf

10 ))((

1lim ,

where the symbol ‘lim’ (from latin ‘limes’, meaning border or limit) indicates that we are

letting time tend towards infinity (in mathematical symbols: ). One point deserves special

attention, since it will become crucial later on: the presence of 0x in the above expression.

Time averages depend on where the system starts; i.e. they depend on the initial condition. If

the process starts in a different state, the time average may well be different.

9

Next we have the space average f . Let us again start with a colloquial example: the average

height of the students in a particular school. This is easily calculated: just take each student’s

height, add up all the numbers, and divide the result by the number of students we have.

Technically speaking this is a space average. In the example the students in the school

correspond to the points in X ; and the fact that we count each students once (we don’t, for

instance, take John’s height into account twice and omit Jim’s) corresponds to the choice of a

measure that gives equal ‘weight’ to each point in X . The transformation T has no pendant

in our example, and this is deliberate: space averages have nothing to do with the dynamics of

the system (that’s what sets them off from time averages). The general mathematical

definition of the space average is as follows:

X

dxff )( ,

where X

is the integral over the phase space X .8 If the space consists of discrete elements,

like the students of the school (they are ‘discrete’ in that you can count them), then the

integral becomes equivalent to a sum like the one we have when we determine the average

height of a population. If the X is continuous (as the phase space above) things are a bit more

involved.

3. Ergodicity

With these concepts in place, we can now define ergodicity.9 A dynamical system ],,[ TX is

ergodic iff

8 The basic idea of an integral is the following: slice up the space into small m cells mcc ,...,1 (e.g. by putting a

grid on it), then choose a point in each cell and take the value of f for that point. Then multiply that value with

the size of the cell (its measure) and add them all up: )()(...)()( 11 mm cxfcxf , where 1x is a point in 1c

etc. Now we start making the cells smaller (and as result we need more of them to cover X ) until they become

infinitely small (in technical terms, we take the limit). That is the integral. Put simply, the integral is just

)()(...)()( 11 mm cxfcxf for infinitely small cells.

9 The concept of ergodicity has a long and complex history. For a account of this history see Sklar (1993, Ch. 2)

10

ff

for all complex-valued Lebesgue integrable functions f almost everywhere, meaning for

almost all initial conditions. The first qualification, ‘for all complex-valued Lebesgue

integrable functions’, is usually satisfied for the functions that are of interest in science. The

second qualification, ‘almost everywhere’, is non-trivial and is the source of a famous

problem in the foundations of statistical mechanics, the so-called ‘measure zero problem’ (to

which we turn in Section 4). So it is worth unpacking carefully what this condition involves.

Not all sets have a finite size. In fact, there are sets of measure zero. This may sound abstract

but is very natural. Take a ruler and measure the length of certain objects. You will find, for

instance, that your pencil is 17cm long – in the language of mathematics this means that the

one dimensional Lebegue measure of the pencil is 17. Now measure a geometrical point and

answer the question: how long is the point? The answer is that such a point has no extension

and so its length is zero. In mathematical parlance: a set consisting of a geometrical point is a

measure zero set. The same goes for a set of two geometrical points: also two geometrical

points together have no extension and hence have measure zero. Another example is the

following: you have device to measure the surface of objects in a plane. You find out that an

A4 sheet has a surface of 623.7 square centimetres. Then you are asked what the surface of a

line is. The answer is: zero. Lines don’t have surfaces. So with respect to the two dimensional

Lebesgue measure lines are measure zero sets.

In the context of ergodic theory, ‘almost everywhere’ means, by definition, ‘everywhere in X

except, perhaps, in a set of measure zero’. That is, whenever a claim is qualified as ‘almost

everywhere’ it means that it could be false for some points in X, but these taken together have

measure zero. Now we are in a position to explain what the phrase means in the definition of

ergodicity. As we have seen above, the time average (but not the space average!), depends on

the initial condition. If we say that ff almost everywhere we mean that those initial

conditions for which it turns out to be the case that ff taken together form a set of

measure zero – they are like a line in the plane.

Armed with this understanding of the definition of ergodicity, we can now discuss some

important properties of ergodic systems. Consider a subset A of X . For instance, thinking

again about the example of the oscillating ball, take the left half of the phase space. Then

11

define the so-called characteristic function of A , Af , as follows: 1)( xf A for all x in A and

0)( xf A for all x not in A . Plugging this function into the definition of ergodicity yields:

)(Af A . This means that the proportion of time that the system’s state spends in set A is

proportional to the measure of that set. To make this even more intuitive, assume that the

phase space is normalised: 1)( X (this is a very common and unproblematic assumption).

If we then choose A so that 2/1)( A , then we know that the system spends half of the

time in A ; if 4/1)( A it spends a quarter of the time in A ; etc. As we will see below, this

property of ergodic systems plays a crucial role in certain approaches to statistical mechanics.

Since we are free to choose A as we wish, we immediately get another important result: a

system can be ergodic only if its trajectory may access all parts of X of positive measure, i.e.

if the trajectory passes arbitrarily close to any point in X infinitely many times as time tends

towards infinity. And this implies that the phase space of ergodic systems is what is called

‘irreducible’ or ‘inseparable’: every set invariant under T (i.e. every set that is mapped onto

itself under T ) has either measure 0 or 1. As a consequence, X cannot be divided into two or

more subspaces (of non-zero measure) that are invariant under T . Conversely a non-ergodic

system is reducible. A reducible system is schematically illustrated in Figure 3.

P Q

X

Fig. 3: Reducible system: no point in region P evolves into region Q and vice versa.

Finally, we would like to state a theorem that will become important in Section 5. One can

prove that a system is ergodic iff

12

)()()(1

lim1

0

ABABTn

n

kk

n

(E)

holds for all subsets A and B of X. Although this condition does not have an immediate

intuitive interpretation, we will see later on that it is crucial in understanding the kind of

randomness we find in ergodic systems.

4. The Ergodic Hierarchy

It turns out that ergodicity is only the bottom level of an entire hierarchy of dynamical

properties. This hierarchy is called the ergodic hierarchy, and the study of this hierarchy is the

core task of a mathematical discipline called ergodic theory. This choice of terminology is

somewhat misleading, since ergodicity is only the bottom level of this hierarchy and so EH

contains much more than ergodicity and the scope of ergodic theory stretches far beyond

ergodicty. Ergodic theory (thus understood) is part of dynamical systems theory, which

studies a wider class of dynamical systems than ergodic theory (in particular ones that have no

measure at all).

EH is a nested classification of dynamical properties. The hierarchy is typically represented as

consisting of the following five levels:

Bernoulli Kolmogorov Strong Mixing Weak Mixing Ergodic

The diagram is intended to indicate that all Bernoulli systems are Kolmogorov systems, all

Kolomogorov systems are strong mixing systems, and so on. Hence all systems in EH are

ergodic. However, the converse relations need not hold: not all ergodic systems are weak

mixing, and so on. A system that is ergodic but not weak mixing is referred to in what follows

as merely ergodic and similarly for the next three levels.10

10 Sometimes EH is presented as having another level, namely C-systems (also referred to as Anosov systems or

completely hyperbolic systems). Although interesting in their own right, C-systems are beyond the scope of this

review. They do not have a unique place in EH and their relation to other levels of EH depends on details, which

we cannot discuss here. Paradigm examples of C-systems are located between K- and B-systems; that is, they are

K-systems but not necessarily B-systems. The cat map, for instance, is a C-system that is also a K-system

13

Mixing can be intuitively explained by the following example, first used by Gibbs in

introducing the concept of mixing. Begin with a glass of water, then add a shot of scotch; this

is illustrated in Fig. 4a. The volume C of the cocktail (scotch + water) is (C) and the volume

of scotch that was added to the water is (S), so that in C the concentration of scotch is

(S)/(C).

(4a) (4b)

Fig. 4 Mixing

Now stir. Mathematically, stirring is represented by the time evolution T , meaning that T (S)

is the region occupied by the scotch after one unit of mixing time. Intuitively we say that the

cocktail is thoroughly mixed, if the concentration of scotch equals (S) / (C) not only with

respect to the whole volume of fluid, but with respect to any region V in that volume. Hence,

the drink is thoroughly mixed at time n if )(/)()(/)( CSVVSTn for any volume V

(of non-zero measure). Now assume that the volume of the cocktail is one unit: (C) 1

(which we can do without loss of generality since there is always a unit system in which the

volume of the glass is one). Then the cocktail is thoroughly mixed iff (Lichtenberg & Liebermann, 1992, p. 307); but there are K-systems such as the so-called stadium billiard which

are not C-systems (Ott, 1993, p. 262). Some C-systems preserve a smooth measure (where ‘smooth’ in this

context means absolutely continuous with respect to the Lebesgue measure), in which case they are Bernoulli

systems. But not all C-systems have smooth measures. It is always possible to find other measures such as SRB

(Sinai, Ruelle, Bowen) measures. However, matters are more complicated in such cases, as such C-systems need

not be mixing and a fortiori they need not be K- or B-systems (Ornstein & Weiss, 1991, pp. 75–82).

14

)()(/)( SVVSTn for any region V (of non-zero measure). But how large must n be

before the stirring ends with the cocktail well stirred? We now suppose that the bartender

takes infinitely long to thoroughly mix the drink, so that mixing is achieved just in the infinite

limit: )()(/)(lim SVVSTnn

for any region V (of non-zero measure). If we now

associate the glass with the phase space X and replace the scotch S and the volume V with two

arbitrary subsets A and B of X, then we get without further ado the general definition of what

is called strong mixing (often also referred to just as ‘mixing’): a system is strong mixing iff

)()()(lim ABABTnn

(S-M)

for all subsets A and B of X. This requirement for mixing can be relaxed a bit by allowing for

fluctuations. That is, instead of requiring that the cocktail reach a uniform state of being

mixed, we now only require that it be mixed on average. In other words, we allow that

bubbles of either scotch or water may crop up every now and then, but they do so in a way

that these fluctuations average out as time tends towards infinity. This translates into

mathematics in a straightforward way. The deviation from the ideally mixed state at some

time n is )()()( ABABTn . The requirement that the average of these deviations

vanishes inspires the notion of weak mixing. A system is weak mixing iff

0)()()(1

lim1

0

n

kk

nABABT

n (W-M)

for subsets A and B of X. The vertical strokes denote the so-called absolute value; for instance:

555 . One can prove that there is a strict implication relation between the three

dynamical properties we have introduced so far: strong mixing implies weak mixing, but not

vice versa; and weak mixing implies ergodicity, but not vice versa. Hence, strong mixing is

stronger condition than weak mixing, and weak mixing is stronger condition than ergodicity.

The next higher level in EH are K-systems. Unlike in the cases of ergodic and mixing

systems, there is unfortunately no intuitive way of motivating the standard definition of such

systems, and the definition is such that one cannot read off from it the characteristics of K-

systems are (we state this definition in Appendix C). The least unintuitive way to present K-

systems is via a theorem due to Cornfeld et al (1982, 283), who prove that a dynamical

15

system is a K-system iff it is K-mixing. A system is K-mixing iff for any subsets

rAAA ,...,, 10 of X (where r is a natural number of your choice) the following condition holds:

0)()()(suplim 00),(

ABABrnBn

, (K-M)

where ),( rn is the minimal -algebra generated by the set },...,1;:{ rjnkAT jk . It is far

from obvious what this so-called sigma algebra is and hence the content of this condition is

not immediately transparent. We will come back to this issue in Section 6 where we provide

an intuitive reading of this condition. What matters for the time being is its similarity to the

mixing condition. Strong mixing is, trivially, equivalent to 0)()()(lim

ABABTnn

.

So we see that K-mixing adds something to strong mixing.

In passing we would like to mention another important property of K-systems: one can prove

that K-systems have positive Kolmogorov-Sinai entropy (KS-entropy); for details see

Appendix C. The KS-entropy itself does not have an intuitive interpretation, but it relates

three other concepts of dynamical systems theory in an interesting way, and these do have

intuitive interpretation. First, Lypunov exponents are a measure for how fast two originally

nearby trajectories diverge on average, and they are often used in chaos theory to characterise

the chaotic nature of the dynamics of a system. Under certain circumstances (essentially, the

system has to be differentiable and ergodic) one can proof that a dynamical system has a

positive KS-entropy if and only if it has positive Lyapounov exponents (Lichtenberg and

Liebermann 1992, 304). In such a system initially arbitrarily close trajectories diverge

exponentially. This result is known as Pessin’s theorem. Second, the algorithmic complexity

of a sequence is the length of the shortest computer programme needed to reproduce the

sequence. Some sequences are simple; e.g. a string of a million ‘1’ is simple: the programme

need to reproduce it basically is ‘write ‘1’ a million times’, which is very short. Others are

complex: there is no pattern in the decimal expansion of number that one could exploit,

and so a programme reproducing that expansion essentially reads ‘write 3.14…’, which is as

long as the decimal expansion of itself. In the discrete case a trajectory can be represented

as a sequence of symbols of this kind (it is basically a list of states). It is then the case that if a

system is a K-system, then its KS-entropy equals the algorithmic complexity of almost all its

trajectories (Brudno 1978). This is known as Brudno’s theorem (Alekseev and Yakobson

1981). Third, the Shannon entropy is a common measure for the uncertainty of a future

16

outcome: the higher the entropy the more uncertain we are about what is going to happen.

One can then prove that, given certain plausible assumptions, the KS-entropy is equivalent to

a generalised version of the Shannon entropy, and can hence be regarded as a measure for the

uncertainty of future events given past events (Frigg 2004).

Bernoulli systems mark highest level in EH. In order to define Bernoulli systems we first have

to introduce the notion of a partition of X (sometimes also ‘coarse graining of X’). A partition

of X is a division of X in to different parts (so called ‘atoms of the partition’) so that these

parts don’t overlap and jointly cover X (i.e. they are mutually exclusive and jointly

exhaustive). For instance, in Figure 1 there is a partition on the phase space that has two

atoms (the left and the right part). More formally, },...,{ 1 m is a partition of X (and the

i its atoms) iff (i) the intersection of any two atoms of the partition is the empty set, and (ii)

the union of all atoms is X. Furthermore it is important to notice that a partition remains a

partition under the dynamics of the system. That is, if is a partition, then

}...,,{ 1 mnnn TTT is a partition too for all n.

There are of course many different ways of partitioning a phase space, and so there exist

different partitions. In what follows we are going to study how different partitions relate to

each other. An important concept in this connection is independence. Let and be two

partitions of X. By definition, these partitions are independent iff )()()( jiji

for all atoms i of and all atoms j of . We will explain the intuitive meaning of this

definition (and justify calling it ‘independence’) in Section 5; for the time being we just use it

as a formal definition.

With these notions in hand we can now define a Bernoulli transformation: a transformation T

is a Bernoulli transformation iff there exists a partition of X so that the images of under

T at different instants of time are independent; that is, the partitions ...,,,,... 101 TTT are all

independent.11 In other words, T is a Bernoulli transformation iff

11 To be precise, a second condition has to be satisfied: must be T-generating (Mañé 1983, 87). However,

what matters for out considerations is the independence condition.

17

)()()( jiji (B)

for all atoms i of kT and all atoms j of lT for all lk . We then refer to as the

Bernoulli partition, and we call a dynamical system ],,[ TX a Bernoulli system if T is a

Bernoulli automorphism, i.e. a Bernoulli transformation mapping X onto itself.

Let us illustrate this with a well-known example, the baker’s transformation (so named

because of its similarity to the kneading of dough). This transformation maps the unit square

onto itself. Using standard Cartesian coordinates the transformation can be written as follows:

Tx

y

2x

y / 2

for 0 x 1 / 2 , and Tx

y

2x 1

y / 2 1 / 2

for 1 / 2 x 1.

In words, for all points (x, y) in the unit square that have an x-coordinate smaller than 1/2, the

transformation T doubles the value of x and halves the value of y. For all the points (x, y) that

have an x-coordinate greater or equal to 1/2, T transforms x in to 2x–1 and y into y/2–1/2. This

is illustrated in Fig. 5a.

Fig 5a. The Baker’s transformation

Now regard the two areas shown above as the two atoms of a partition },{ 21 . It is then

easy to see that and T are independent: 4/1)()()( 2121 TT , and similarly

for all other atoms of and T . This is illustrated in Figure 5b.

0

baker’s transformation

x x

y y

1

1

0

18

Fig 5b. The independence of and T .

One can prove that independence holds for all other iterates of as well. So the baker’s

transformation together with the partition is a Bernoulli transformation.

In the literature Bernoulli systems are often introduced using so-called shift maps (or

Bernoulli shifts). Those readers not familiar with these can skip the rest of this section without

loss for what follows; for the other readers we here briefly indicate how shift maps are related

to Bernoulli systems with the example of the baker’s transformation (for a more general

discussion see Appendix D). Chose a point in the unit square and write its x and y co-ordinate

as binary numbers: x 0.a1a2a3...and y 0.b1b2b3... , where all the ai and bi are either 0 or 1.

Now put both strings together back to back with a dot in the middle to form one infinite

string: ....... 321123 aaabbbS , which may represent the state of the system just as a ‘standard’

two-dimensional vector does. Some straightforward algebra then shows that

...).0,....0(...).0...,.0( 211432321321 bbaaaabbbaaaT . From this we see that in our ‘one string’

representation of the point the operation of T amounts to shifting the dot one position to the

right: ....... 321123 aaabbbTS Hence, the baker’s transformation is equivalent to a shift on an

infinite string of zeros and ones.12

There are two further notions that are crucial to the theory of Bernoulli systems, the property

of being weak Bernoulli and very weak Bernoulli. These properties play a crucial role in

showing that certain transformations are in fact Bernoulli. The example of the baker’s

12 A formal isomorphism proof can be found in Cornfeld et al. (1982, 9-10).

0

x x

y y

1

1

0 x

y

2 2T

1

1T

21 T

19

transformation is one of the few examples that have a geometrically simple Bernoulli

partition, and so one often cannot prove directly that a system is a Bernoulli system. One then

shows that a certain geometrically simple partition is weak Bernoulli and uses a theorem due

to Ornstein to the effect that if a system is weak Bernoulli then there exists a Bernoulli

partition for that system. The mathematics of these notions and the associated proofs of

equivalence are intricate and a presentation of them is beyond the scope of this entry. The

interested reader is referred to Ornstein (1974) or Shields (1973).

5. The Ergodic Hierarchy and Statistical Mechanics

The concepts of EH, and in particular ergodicity itself, play important roles in the foundation

of statistical mechanics (SM). In this section we review what these roles are.

A discussion of statistical mechanics faces an immediate problem. Foundational debates in

many other fields of physics can take as their point of departure a generally accepted

formalism. Not so in SM. Unlike, say, quantum mechanics and relativity theory, SM has not

yet found a generally accepted theoretical framework, let alone a canonical formulation.13

What we find in SM is plethora of different approaches and schools, each with its own

programme and mathematical apparatus.14 However, all these schools use (slight variants) of

either of two theoretical frameworks, one of which can be associated with Boltzmann (1877)

and the other with Gibbs (1902), and can thereby be classify either as ‘Boltzmannian’ or

‘Gibbsian’. For this reason we divide our presentation of SM into a two parts, one for each of

these families of approaches.

Before delving into a discussion of these theories, let us briefly review the basic tenets of SM

by dint of a common example. Consider a gas that is confined to the left half of a box. Now

we remove the barrier separating the two halves of the box. As a result, the gas quickly

disperses, and it continues to do so until it homogeneously fills the entire box. The gas has

approached equilibrium. This raises two questions. First, how is equilibrium characterised?

13 A similar situation exists for quantum field theory, which has a number of inequivalent formulations including

(to name just a few) the canonical, algebraic, axiomatic, and path integral frameworks. 14 For detailed reviews of SM see Frigg (2008), Sklar (1993) and Uffink (2007). Those interested in the long and

intricate history of SM are referred to Brush (1976) and von Plato (1994).

20

That is, what does it take for a system to be in equilibrium? Second, how do we characterise

the approach to equilibrium? That is, what are the salient features of the approach to

equilibrium and what features of a system make it behave in this way? These questions are

addressed in two subdisciplines of SM: equilibrium SM and non-equilibrium SM.

There are two different ways of describing processes like the spreading of a gas.

Thermodynamics (TD) describes the system using a few macroscopic variables (in the case

of the gas pressure, volume and temperature) and characterises both equilibrium and the

approach to equilibrium in terms of the behaviour of these variables, while completely

disregarding the microscopic constitution of the gas. As far as TD is concerned matter could

be a continuum rather than consisting of particles – it just would not make any difference. For

this reason TD is a ‘macro theory’.

The cornerstone of TD is the so-called Second Law of TD. This law describes one of the

salient features of the above process: its unidirectionality. We see gases spread – i.e. we see

them evolving towards equilibrium – but we never observe gases spontaneously reverting to

the left half of a box – i.e. we never see them move away from equilibrium when left alone.

And this is not a specific feature gases. In fact, not only gases but also all other macroscopic

systems behave in this way, irrespective of their specific makeup. This fact is enshrined in the

Second Law of TD, which, roughly, states that transitions from equilibrium to non-

equilibrium states cannot occur in isolated systems, which is the same as saying that entropy

cannot decrease in isolated systems (where a system is isolated if it has no interaction with its

environment: there is no heat exchange, no one is compressing the gas, etc.).

But there is an altogether different way of looking at that same gas. The gas consists of a large

number of gas molecules (a vessel on a laboratory table contains something like 2310

molecules). These molecules bounce around under the influence of the forces exerted onto

them when they crash into the walls of the vessel and collide with each other. The motion of

each molecule under these forces is governed by laws of classical mechanics in the same way

as the motion of the bouncing ball. So rather than attributing some macro variables to the gas

and focussing on them, we could try to understand the gas’ behaviour by studying the

dynamics of its micro constituents.

21

This also raises the question of how the two ways of looking at the gas fit together. Since

neither the thermodynamic nor the mechanical approach is in any way privileged, both have

to lead to the same conclusions. Statistical mechanics is the discipline that addresses this task.

From a more abstract point of view we can therefore also say that SM is the study of the

connection between micro-physics and macro-physics: it aims to account for a system’s

macro behaviour in terms of the dynamical laws governing its microscopic constituents. The

term ‘statistical’ in its name is owed to the fact that, as we will see, a mechanical explanation

can only be given if we also introduce probabilistic elements into the theory.

5.1 Boltzmannian SM

We first introduce the main elements of the Boltzmannian framework and then turn to the use

of ergodicity in it. Every system can posses various macrostates kMM ,...,1 . These

macrostates are characterised by the values of macroscopic variables, in the case of a gas

pressure, temperature, and volume.15 In the introductory example one macro-state corresponds

to the gas being confined to the left half, another one to it being spread out. In fact, these two

states have special status: the former is the gas’ initial state; the latter is the gas’ equilibrium

state. We label the states IM and EqM respectively.

It is one of the fundamental posits of the Boltzmann approach that macrostates supervene on

microstates, meaning that a change in a system’s macrostate must be accompanied by a

change in its microstate (for a discussion of supervenience see McLaughlin and Bennett 2005,

and references therein). For instance, it is not possible to change the pressure of a system and

at the same time keep its micro-state constant. Hence, to every given microstate x there

corresponds exactly one macrostate. Let us refer to this macrostate as )(xM . This

determination relation is not one-to-one; in fact many different x can correspond to the same

macrostate. We now group together all microstates x that correspond to the same macro-state,

which yields a partitioning of the phase space in non-overlapping regions, each corresponding

to a macro-state. For this reason we also use the same letters, kMM ,...,1 , to refer to macro-

states and the corresponding regions in phase space. Two macrostates are of particular 15 It is a common assumption in the literature on Boltzmannian SM that there is a finite number of macrostates

that system can possess. We should point out, however, that this assumption is based on an idealisation if the

relevant macro variables are continuous. In fact, we obtain a finite number of macrostates only if we coarse-grain

the values of the continuous variables.

22

importance: the macrostate at the beginning of the process, which is also referred to as the

‘past state’, and the equilibrium state. For this reason we introduce special labels for them,

pM and eqM , respectively, and choose the numbering of the macrostates so that pMM 1

and eqk MM (which, trivially, we always can). This is illustrated in Figure 6a.

Figure 6: The macrostate structure of X.

We are now in a position to introduce the Boltzmann entropy. To this end recall that we have

a measure on the phase space that assigns to every set a particular volume, hence a fortiori

also to macrostates. With this in mind, the Boltzmann entropy of a macro-state jM can be

defined as )](log[)( jBjB MkMS , where Bk is the Boltzmann constant. The important

feature of the logarithm is that it is a monotonic function: the larger jM , the larger its

logarithm. From this it follows that the largest macro-state also has the highest entropy!

One can show that, at least in the case of dilute gases, the Boltzmann entropy coincides with

the thermodynamic entropy (in the sense that both have the same functional dependence on

the basic state variables), and so it is plausible to say that the equilibrium state is the macro-

state for which the Boltzmann entropy is maximal (since TD posits that entropy be maximal

for equilibrium states). By assumption the system begins in a low entropy state, the initial

state IM (the gas being squeezed into the left half of the box). The problem of explaining the

approach to equilibrium then amounts to answering the question: why does a system

originally in IM eventually move into EqM and then stay there? (see Figure 6b)

(6a) (6b)

Mp

M2

M3

M4

M5 M6

Meq

Mp

M2

M3

M4

M5 M6

Meq

23

In the 1870s Boltzmann offered an important answer to this question.16 At the heart of his

answer lies the idea to assign probabilities to macrostates according to their size. So

Boltzmann adopted the following postulate: )()( jj McMp for all kj ,...,1 , where c is a

normalisation constant assuring that the probabilities add up to one. Granted this postulate, it

follows immediately that the most likely state is the equilibrium state (since the equilibrium

state occupies the largest chunk of the phase space). From this point of view it seems natural

to understand the approach to equilibrium as the evolution from an unlikely macrostate to a

more likely macrostate and finally to the most likely macro-state. This, Boltzmann argued,

was a statistical justification of the Second Law of TD.

But Boltzmann knew that simply postulating )()( jj McMp would not solve any problems

unless the postulate could be justified in terms of the dynamics of the system. This is where

ergodicity enters the scene. As we have seen above, ergodic systems have the property of

spending a fraction of time in each part of the phase space that is proportional to its size (with

respect to ). As we have also seen, the equilibrium state is the largest macrostate. And what

is more, the equilibrium state in fact is much larger than the other states. So if we assume that

the system is ergodic, then it is in equilibrium most of the time! It is then natural to interpret

)( jMp as a time average: )( jMp is the fraction of time that the system spends in state jM

over the course of time. Now all the elements of Boltzmann’s position are on the table: (a)

partition the phase space of the system in macrostates and show that the equilibrium state is

by far the largest state; (b) adopt a time average interpretation of probability; (c) assume that

the system in question is ergodic. It then follows that the system is most likely to be found in

equilibrium, which justifies the Second law.

Three objections have been levelled against this line of thought. First, it is pointed that

assuming ergodicity is too strong in two ways. The first is that it turns out to be extremely

difficult to prove that the systems of interest really are ergodic. Contrary to what is sometimes

asserted, not even a system of n elastic hard balls moving in a cubic box with hard reflecting

walls has been proven to be ergodic for arbitrary n; it has been proven to be ergodic only for

4n . To this charge one could reply that what looks like defeat to some is in fact just a

16 Uffink (2004, 2007) provides an overview over the tangled development of Boltzmann’s constantly changing

views. Frigg (2009a) discusses probabilities in Boltzmann’s account.

24

challenge and progress in mathematics will eventually resolve the issue, and there is at least

one recent result that justifies optimism: Simanyi (2004) shows that a system of n hard balls

on a torus of dimension 3 or greater is ergodic, for an arbitrary natural number n.

The second way in which ergodicity seems to be too strong is that even if eventually we can

come by proofs of ergodicity for the relevant systems, the assumption is too strong because

there are systems that are known not to be ergodic and yet they behave in accordance with the

Second Law. Bricmont (2001) investigates the Kac Ring Model and a system of n uncoupled

anharmonic oscillators of identical mass, and points out that both systems exhibit

thermodynamic behaviour and yet they fail to be ergodic. Hence, ergodicity is not necessary

for thermodynamic behaviour. But Earman and Redei (1996, p. 70) and van Lith (2001a, p.

585) argue that if ergodicity is not necessary for thermodynamic behaviour, then ergodicity

cannot provide a satisfactory explanation for this behaviour. Either there must be properties

other than ergodicity that explain thermodynamic behaviour in cases in which the system is

not ergodic, or there must be an altogether different explanation for the approach to

equilibrium even for systems which are ergodic.

In response to this objection, Vranas (1998) argues that most systems that fail to be ergodic

are ‘almost ergodic’ in specifiable way, and this is good enough. We discuss Vranas’

approach below when discussing Gibbsian SM since that is the context in which he has put

forward his suggestion. Frigg (2009b) suggested exploiting the fact that almost all

Hamiltonian systems are non-integrable, and that these systems have so-called Arnold webs,

i.e. large regions of phase space on which the motion of the system is ergodic. Lavis (2005)

re-examined the Kac ring model and has pointed out that although the system is not ergodic, it

has an ergodic decomposition (roughly, there exists a partition of the system’s phase space

and the dynamics is ergodic in each atom of the partition), which is sufficient to guarantee the

approach to equilibrium. He has also challenged the assumption, implicit in the above

criticism, that providing an explanation for the approach to equilibrium amounts to identifying

one (and only one!) property that all systems have in common. In fact, it may be the case that

different properties are responsible for the approach to equilibrium in different systems, and

there is not reason to rule out such explanations. This squares well with Bricomont’s (2001)

own observation that what drives a system of anharmonic oscillators to equilibrium is some

kind of mixing in the individual degrees of freedom. In sum, the tenor of all these responses is

25

that even though ergodicity simpliciter does have the resources to explain the approach to

equilibrium, a somewhat qualified use of EH does.

The second objection is that even if ergodicity obtains, this is not sufficient to give us what

we need. As we have seen above, ergodicity comes with the qualification ‘almost

everywhere’. This qualification is usually understood as suggesting that sets of measure zero

can be ignored without detriment. The idea is that points falling in a set of measure zero are

‘sparse’ and can therefore be neglected. The question of whether or not this move is

legitimate is know as the ‘measure zero problem’.

Simply neglecting sets of measure zero seems to be problematic for various reasons. First,

sets of measure zero can be rather ‘big’; for instance, the rational numbers have measure zero

within the real numbers. Moreover, a set of measure zero need not be (or even appear)

negligible if sets are compared with respect to properties other than their measures. For

instance, we can judge the ‘size’ of a set by its cardinality or Baire category rather than by its

measure, which leads us to different conclusions about the set’s size (Sklar 1993, pp. 182-88).

It is also a mistake to assume that an event with measure zero cannot occur. In fact, having

measure zero and being impossible are distinct notions. Whether or not the system at some

point was in one of the special initial conditions for which the space and time mean fail to be

equal is a factual question that cannot be settled by appeal to measures; pointing out that such

points are scarce in the sense of measure theory does not do much, because it does not imply

that they are scarce in the world as well.

In response two things can said. First, discounting sets of measure zero is standard practice in

physics and the problem is not specific to ergodic theory. So unless there is a good reason to

suspect that specific measure zero states are in fact important, one might argue that the onus

of proof is on those who think that discounting them in this case is illegitimate. Second, the

fact that SM works in so many cases suggests that they indeed are scarce.

The third criticism is rarely explicitly articulated, but it is clearly in the background of

contemporary Boltzmannian approaches to SM such as Albert’s (2000) who rejects

Boltzmann’s starting point, namely the postulate )()( jj McMp . Instead Albert introduces

another postulate, essentially providing transition probabilities between two macrostates

conditional on the so-called Past Hypothesis, the posit that the universe came into existence in

26

a low entropy state (the Big Bang). Albert then argues that in such an account erogidicity

becomes an idle wheel, and hence he rejects it as completely irrelevant to the foundations of

SM. This, however, may well be too hasty. Although it is true that ergodicity simpliciter

cannot justify Alberts probability postulate, another dynamical assumption is needed in order

for this postulate to be true (Frigg 2009a). We don’t know yet what this assumption is, but EH

may well be helpful in discussing the issue and ultimately formulating a suitable condition.

5.2 Gibbsian SM

At the beginning of the Gibbs approach stands a radical rupture with the Boltzmann

programme. The object of study for the Boltzmannians is an individual system, consisting of a

large but finite number of micro constituents. By contrast, within the Gibbs framework the

object of study is a so-called ensemble. An ensemble is an imaginary collection of infinitely

many copies of the same system (they are the same in that they have the same phase space,

dynamics and measure), but who happen to be in different states. An ensemble of gases, for

instance, consists of infinitely many copies of the same gas which are, however, in different

states: one is concentrated in the left corner of the box, one is evenly distributed, etc. It is

important to emphasise that ensembles are fictions, or ‘mental copies of the one system under

consideration’ (Schrödinger 1952, 3). Hence, it is important not to confuse ensembles with

collections of micro-objects such as the molecules of a gas!

The instantaneous state of one system of the ensemble is specified by one point in its phase

space. The state of the ensemble as a whole is therefore specified by a density function on

the system's phase space. From a technical point of view is a function just like f that we

encountered in Section 2. We furthermore assume that is a probability density, reflecting

the probability of finding the state of a system chosen at random from the entire ensemble in

region R : R

dRp )( . To make this more intuitive consider the following simple

example. You play a special kind of darts: you fix a plank to the wall, which serves as your

dart board. For some reason you know that the probability of your dart landing at a particular

place on the board is given by the curve shown in Figure 7. You are then asked what the

probability is that your next dart lands in the left half of the board. The answer is 1/2 since

one half of the surface underneath the curve is on the left side. In SM R plays the role of a

27

particular part of the board (in the example here the left half), and is the probability, but

not for a dart landing but for finding a system there.

Fig. 7 Dart board.

The importance of this is that it allows us to calculate expectation values. Assume that the

game is such that you get one Pound if the dart hits the left half and three Pounds if it lands on

the right half. What is your average gain? The answer is 12/1 Pound 32/1 Pounds = 2

Pounds. This is the expectation value. The same idea is at work in SM. Physical magnitudes

like, for instance, pressure are associated with functions f on the phase space. We then

calculate the expectation value, which, in general is given by dff . In the context of

Gibbsian SM these expectation values are also referred to as phase averages or ensemble

averages. They are of central importance because it is the fundamental posits of Gibbsian SM

that these values are what we observe in experiments! So if you want to use the formalism to

make predictions, you first have to figure out what the probability distribution is, then find

the function f corresponding to the physical quantity you are interested in, and then calculate

the phase average. Neither of these steps is easy in practice and working physicists spend

most of their time doing these calculations. However, these difficulties need not occupy us if

we are interested in the conceptual issues underlying this ‘recipe’.

By definition, a probability distribution is stationary if it does not change over time. Given

that observable quantities are associated with phase averages and that equilibrium is defined

x

(x)

1/2 of the total surface under the curve

28

in terms of the constancy of the macroscopic parameters characterising the system, it is

natural to regard the stationarity of the distribution as a necessary condition for equilibrium

because stationary distributions yield constant averages. For this reason Gibbs refers to

stationarity as the ‘condition of statistical equilibrium’.

Among all stationary distributions those satisfying a further requirement, the Gibbsian

maximum entropy principle, play a special role. The Gibbs entropy (sometimes also

`ensemble entropy') is defined as dkS BG )log()( . The Gibbsian maximum entropy

principle then requires that )(GS be maximal, given the constraints that are imposed on the

system.

The last clause is essential because different constraints single out different distributions. A

common choice is to keep both the energy and the particle number in the system fixed. One

can prove that under these circumstances )(GS is maximal for the so-called microcanonical

distribution (or microcanonical ensemble). If we choose to hold the number of particles

constant while allowing for energy fluctuations around a given mean value we obtain the so-

called canonical distribution; if we also allow the particle number to fluctuate around a given

mean value we find the so-called grand-canonical distribution.17

This formalism is enormously successful in that correct predictions can be derived for a vast

class of systems. But the success of this formalism is rather puzzling. The first and most

obvious question concerns the relation of systems and ensembles. The probability distribution

in the Gibbs approach is defined over an ensemble, the formalism provides ensemble

averages, and equilibrium is regarded as a property of an ensemble. But what we are really

interested in is the behaviour of a single system! What could the properties of an ensemble – a

fictional entity consisting of infinitely many mental copies of the real system – tell us about

the one real system on the laboratory table? And more specifically, why do averages over an

ensemble coincide with the values found in measurements performed on an actual physical

system in equilibrium? There is no obvious reason why this should be so, and it turns out that

ergodicity plays a central role in answering these questions.

17 For details see, for instance, Tolman (1938 Chs. 3 and 4).

29

Common textbook wisdom justifies the use of phase averages as follows. As we have seen the

Gibbs formalism associates physical quantities with functions f on the system’s phase space.

Making an experiment measuring one of these quantities takes time. So what measurement

devices register is not the instantaneous value of the function in question, but rather its time

average over the duration of the measurement; hence time averages are what is empirically

accessible. Then, so the argument continues, although measurements take an amount of time

that is short by human standards, it is long compared to microscopic time scales on which

typical molecular processes take place. For this reason it is assumed that the measured finite

time average is approximately equal to the infinite time average of the measured function. If

we now assume that the system is ergodic, then time averages equal phase averages. The

latter can easily be obtained from the formalism. Hence we have found the sought-after

connection: the Gibbs formalism provides phase averages which, by ergodicity, are equal to

infinite time averages, and these are, to a good approximation, equal to the finite time

averages obtained from measurements.

This argument is problematic for at least two reasons. First, from the fact that measurements

take some time it does not follow that what is actually measured are time averages. For

instance, it could be the case that the value provided to us by the measurement device is

simply the value assumed by f at the last moment of the measurement, irrespective of what the

previous values of f were (e.g. it’s simply the last pointer reading registered). So we would

need an argument for the conclusion that measurements indeed produce time averages.

Second, even if we take it for granted that measurements do produce finite time averages, then

equating these with infinite time averages is problematic. Even if the duration of the

measurement is long by experimental standards (which need not be the case), finite and

infinite averages may assume very different values. That is not to say that they necessarily

have to be different; they could coincide. But whether or not they do is an empirical question,

which depends on the specifics of the system under investigation. So care is needed when

replacing finite with infinite time averages, and one cannot identify them without further

argument.

These criticisms call for a different strategy. Two suggestions stand out. Space constraints

prevent a detailed discussion and so we will only briefly indicate what the main ideas are;

more extensive discussion can be found in references given in footnote 13.

30

Malament and Zabell (1980) respond to this challenge by suggesting a way of explaining the

success of equilibrium theory that still invokes ergodicity, but avoids altogether appeal to time

averages. This avoids the above mentioned problems, but suffers from the difficulty that many

systems that are successfully dealt with by the formalism of SM are not ergodic. To

circumvent this difficulty Vranas (1998) has suggested replacing ergodicity with what he calls

ergodicity. Intuitively a system is ergodic if it is ergodic not on the entire phase space,

but on a very large part of it (those parts on which it is not ergodic having measure , where

is very small). The leading idea behind his approach is to challenge the commonly held

belief that even if a system is just a ‘little bit’ non-ergodic, then it behaves in a completely

‘un-ergodic’ way. Vranas points out that there is a middle ground and then argues that this

middle ground actually provides us with everything we need. This is a promising proposal,

but it faces three challenges. First, it needs to be shown that all relevant systems really are

ergodicity. Second, the argument so far has only been developed for the microcanonical

ensemble, but one would like to know whether, and if so how, it works for the canonical and

the grandcanonical ensemble. Third, it is still based on the assumption that equilibrium is

characterised by a stationary distribution, which, as we will see below, is an obstacle when it

comes to formulating a workable Gibbsian non-equilibrium theory.

The second response begins with Khinchin's work. Khinchin (1949) pointed out that the

problems of the ergodic programme are due to the fact that it focuses on too general a class of

systems. Rather than studying dynamical systems at a general level, we should focus on those

cases that are relevant in statistical mechanics. This involves two restrictions. First, we only

have to consider systems with a large number of degrees of freedom; second, we only need to

take into account a special class of phase functions, the so-called ‘sum functions’. These are

functions are a sum of one-particle functions, i.e. functions that take into account only the

position and momentum of one particle. Under these assumption Khinchin proved that as n

becomes larger, the measure of those regions on the energy hypersurface18 where the time and

the space means differ by more than a small amount tends towards zero. Roughly speaking,

this result says that for large n the system behaves, for all practical purposes, as if it was

ergododic.

The problem with this result is that it is valid only for sum functions, and in particular only if

the energy function of the system is itself a sum function, which usually is not the case

18An energy hypersurface is an hypersurface in the system’s phase space on which the energy is constant.

31

whenever particles interact. So the question is how this result can be generalised to more

realistic cases. This problem stands at the starting point of a research programme now known

as the thermodynamic limit, championed, among others, by Lanford, Mazur, Ruelle, and van

der Linden (see van Lith (2001) for a survey). Its leading question is whether one can still

prove ‘Khinchin-like’ results in the case of energy function with interaction terms.19 Results

of this kind can be proven in the limit for n , if also the volume V of the system tends

towards infinity in such a way that the number-density Vn / remains constant.

So far we have only dealt with equilibrium, and things get worse once we deal with non-

equilibrium. The main problem is that it is a consequence of the formalism that the Gibbs

entropy is a constant! This precludes a characterisation of the approach to equilibrium in

terms of increasing Gibbs entropy, which is what one would expect if we were to treat the

Gibbs entropy as the SM counterpart of the thermodynamic entropy. The standard way around

this problem is to coarse-grain the phase space, and then define the so-called coarse grained

Gibbs entropy. Put simply, course-graining the phase space amounts to putting grid on the

phase space and declare that all points within one cell of the grid are indistinguishable. This

procedure turns a continuous phase space into a discrete collection of cells, and the state of

the system is the specified by saying in which cell it is. If we then define the Gibbs entropy on

this grid, it turns out (for purely mathematical reasons) that the entropy is no longer a constant

and can actually increase or decrease. If one then assumes that the system is mixing, it follows

from the so-called convergence theorem of ergodic theory that the coarse-grained Gibbs

entropy approaches a maximum. However, this solution is fraught with controversy, the two

main bones of contention being the justification of the coarse-graining and the assumption

that the system is mixing. Again, we refer the reader to the references given in Footnote 12 for

a detailed discussion of these controversies.

In sum, ergodicity plays a central role in many attempts to justify the posits of SM. And even

where a simplistic use of ergodicity is eventually unsuccessful, somewhat modified notions

prove fruitful in an analysis of the problem and in the search for better solutions.

19 To be more precise: what we are after is a proof for cases where there are nontrivial interaction terms, meaning

those for which there does not exist a canonical transformation that effectively eliminates such terms.

32

6. The Ergodic Hierarchy and Randomness

EH is often presented as a hierarchy of increasing degrees of randomness in deterministic

systems: the higher up in this hierarchy a system is placed the more random its behaviour.20

However, the definitions of different levels of EH do not make explicit appeal to randomness;

nor does the usual way of presenting EH involve a specification of the notion of randomness

that is supposed to underlie the hierarchy. So there is a question about what notion of

randomness underlies EH and in what sense exactly EH is a hierarchy of random behaviour.

Berkovitz, Frigg and Kronz (2006) discuss this problem and argue that EH is best understood

as a hierarchy of random behaviour if randomness is explicated in terms of unpredictability,

where unpredictability is accounted for in terms of probabilistic relevance, and different

degrees of probabilistic relevance, in turn, are spelled out in terms of different types of decay

of correlation between a system’s states at different times. Let us introduce these elements one

at a time.

Properties of systems can be associated with different parts of the phase space. In the ball

example, for instance, the property having positive momentum is associated with the right half

of the phase space; that is, it is associated with the set {x X : p 0}. Generalising this idea we

say that to every subset A of a system’s phase space there corresponds a property AP so that

the system possesses that property at time t iff the system’s state x is in A at t. The subset A

may be arbitrary and the property corresponding to A may not be intuitive, unlike, for

example, the property of having positive momentum. But nothing in the analysis to follow

hangs on a property being ‘intuitive’. We then define the event tA as the obtaining of AP at

time t.

At every time t there is a matter of fact whether AP obtains, which is determined by the

dynamics of the system. However, we may not know whether or not this is the case. We

therefore introduce epistemic probabilities expressing our uncertainty about whether AP

obtains: )( tAp reflects an agent’s degree of belief in AP ’s obtaining at time t. In the same way

we can introduce conditional probabilities: )|( 1tt BAp is our uncertainty that the system has

20 See for instance Lichtenberg & Liebermann, 1992; Ott, 1993; Tabor, 1989).

33

AP at t given that it had BP at an earlier time 1t , where B is also a subset of the system’s

phase space. By the usual rule of conditional probability we have

)(/)&()|( 111 ttttt BpBApBAp . This can of course be generalised to more then one event:

)&...&|( 1

1rt

rtt BBAp is our uncertainty that the system has AP at t given that it had

1BP at 1t ,

2BP at 2t , …, and rBP at rt , where rBB ,...,1 are subsets of the system’s phase space (and r a

natural number), and rtt ,...,1 are successive instants of time (i.e. rttt ...1 ).

Intuitively, an event in the past is relevant to our making predictions if taking the past event

into account makes a difference to our predictions, or more specifically if it lowers or raises

the probability for a future event. In other words, )()|( 1 ttt ApBAp is a measure for the

relevance of 1tB to predicting tA : 1tB is positively relevant if the 0)()|( 1 ttt ApBAp ,

negatively relevant if 0)()|( 1 ttt ApBAp , and irrelevant if 0)()|( 1 ttt ApBAp . For

technical reasons it turns out to be easier to work with )]()|()[( 11 tttt ApBApBp – which is

equivalent to )()()&( 11 tttt BpApBAp – rather than with )()|( 1 ttt ApBAp , but this makes

no conceptual difference since the multiplication with )( 1tBp does not alter relevance

relations. Therefore we adopt the following definition. The relevance of 1tB for tA is

)()()&(),( 111 tttttt BpApBApABR . (R)

The generalisation of this definition to cases with more than one set B (as above) is

straightforward.

Relevance serves to explicate unpredictability. Intuitively, the less relevant past events are for

tA , the less predictable the system is. This basic idea can then be refined in various ways.

First, the type of unpredictability we obtain depends on the type of events to which (R) is

applied. For instance, the degree of the unpredictability of tA increases if its probability is

independent not only of 1tB or other ‘isolated’ past events, but rather the entire past. Second,

the unpredictability of an event tA increases if the probabilistic dependence of that event on

past events 1tB decreases rapidly with the increase of the temporal distance between the

events. Third, the probability of tA may be independent of past events simpliciter, or it may

34

be independent of such events only ‘on average’. These ideas underlie the analysis of EH as a

hierarchy of unpredictability.

Before we can provide such an analysis, two further steps are needed. First, if the probabilities

are to be useful to understanding randomness in a dynamical system, the probability

assignment has to reflect the properties of the system. So we have to connect the above

probabilities to features of the system. The natural choice is the system’s measure .21 So we

postulate that the probability of an event tA is equal to the measure of the set A :

)()( AAp t for all t. This can be generalised to joint probabilities as follows:

)()&(1

1 BTABAp tttt

, (P)

for all instants of time 1tt and all subsets A and B of the system’s phase space. BT tt 1 is the

image of the set B under the dynamics of the system from 1t to t . This is illustrated in Figure

6. We refer to this postulate as the Probability Postulate (P). This is illustrated in Figure 8.

Again, this condition is naturally generalised to cases of joint probabilities of At with multiple

events Bti . Granted (P) and its generalization, (R) reflects the dynamical properties of

systems.

21 Provided that is normalised, which is the case in most systems studied in ergodic theory. Due to their

connection to some maybe inclined not to interpret the )( tAp as epistemic probabilities; in fact, in particular

in the literature on ergodic theory is often interpreted as a time average and so one could insist that )( tAp be

a time average as well. While this could be done, it is not conducive to our analysis. Our goal is explicate

randomness in terms degrees of unpredictability and to this end one needs to assume that )( tAp be epistemic

probabilities. However, contra radical Bayesianism, we posit that the values of these probabilities be constrained

(in the spirit of Lewis’ Principal Principle) by object facts about the system (here the measure ). But this does

not make these probabilities objective.

35

Figure 8: Condition (P)

Before introducing the next element of the analysis let us mention that there is a question

about whether the association of probabilities with the measure of the system is reasonable.

Prima facie, a measure on a phase space can have a purely geometrical interpretation and need

not necessarily have anything to do with the quantification of uncertainty. For instance, we

can use a measure to determine the length of a table, but this measure need not have anything

to do with uncertainty. Whether or not such an association is legitimate depends on the cases

at hand and the interpretation of the measure. However, for systems of interest in statistical

physics it is natural and indeed standard to assume that the probability of the system’s state to

be in a particular subset of the phase space X is proportional to the measure of A.

The last element to be introduced is the notion of the correlation between two subsets A and B

of the system’s phase space, which is defined as follows:

)()()(),( BABABAC . (C)

BT tt 1 A B

BTA tt 1

36

If the value of ),( BAC is positive (negative), there is positive (negative) correlation between

A and B; if it is zero, then A and B are uncorrelated. It then follows immediately from the

above that

),(),(1

1 ABTCABR tttt

. (RC)

(RC) constitutes the basis for the interpretation of EH as a hierarchy of objective randomness.

Granted this equation, the subjective probabilistic relevance of the event Bt1 for the event At

reflects objective dynamical properties of the system since for different transformations T

R(Bt1 ,At ) will indicate different kinds of probabilistic relevance of Bt1 for At .

To put (RC) to use, it is important to notice that the equations defining the various levels of

EH above can be written in terms of correlations. Taking into account that we are dealing with

discrete systems (and hence we have BTBT ktt 1, where k is the number of time steps it

takes to get from 1t and t), these equations read:

Ergodicity: 0),(1

lim1

0

n

kk

nABTC

n , for all XBA ,

Weak mixing: 0),(1

lim1

0

n

kk

nABTC

n, for all XBA ,

Strong Mixing: 0),(lim

ABTC nn

, for all XBA ,

K-Mixing: 0),(suplim 0),(

ABCrnBn

, for all XAAA r ,...,, 1

Bernoulli: 0),( ABTC n , for all AB, of the Bernoulli partition.

Applying (RC) to these expressions, we can explicate the nature of the unpredictability that

each of the different levels of EH involves.

37

Let us start at the top of EH. In Bernoulli systems the probabilities of the present state are

totally independent of whatever happened in the past, even if the past is only one time step

back. So knowing the past of the system does not improve our predictive abilities in the least;

the past is simply irrelevant to predicting the future. This fact is often summarised in the

slogan that Bernoulli systems are as random as a coin toss. We should emphasise, however,

that this is true only for events in the Bernoulli partition; the characterisation of a Bernoulli

system is silent about what random properties partitions other than the Bernoulli partition

have.

K-mixing is more difficult to analyse. We now have to tackle the question of how to

understand ),( rn , the minimal -algebra generated by the set },...,1;:{ rjnkAT jk that

we sidestepped above. What matters for our analysis is that the following types of sets are

members of ),( rn (ibid., 669): ...210 21 jkjkjk ATATAT , where the indices ij range

over 1, …, r. Since we are free to chose the sets rAAA ,...,, 10 as we please, we can always

chose them so that they are the past history of the system: the system was in 0j

A k time steps

back, in 1j

A k+1 time steps back, etc. Call this the (coarse-grained) remote past of the system

– ‘remote’ because we only consider states that are more than k time steps back). The K-

mixing condition then says that the system’s entire remote past history becomes irrelevant to

predicting what happens in the future as time tends towards infinity. Typically Bernoulli

systems are compared with K-systems by focussing on the events in the Bernoulli partition.

With respect to that partition K is weaker than Bernoulli. The difference is both the limit and

the remote history. In a Bernoulli system the future is independent of the entire past (not only

the remote past), and this is true without taking a limit (which in the case of K-mixing

independence only obtains in the limit). However, this only holds for the Bernoulli partition;

it may or may not hold for other partitions – the definition of a Bernoulli system says nothing

about that case.22

22 We would also like to mention that the analysis of randomness in Bernoulli and K-systems is based on

implications of the definitions of these systems, but they do not exhaust these definitions (or provide verbal

restatements of them) because there are parts of the definition that have not been used (in the case of Bernoulli

systems the condition that there be a generating partition and in the case of K-systems sets in ),( rn other than

ones of the form ...210 21 jkjkjk ATATAT . By contrast, the analyses of SM, WM, and E in the following

paragraphs exhaust the respective definitions. In the case of Bernoulli this has the consequence that the

38

The interpretation of strong mixing is now straightforward. It says that for any two sets A and

B, having been in B k time steps back becomes irrelevant to the probability of being in A some

time in the future if time tends towards infinity. In other words, past events B become

increasingly irrelevant for the probability of A as the temporal distance between A and B

becomes larger. This condition is weaker than K-mixing because it only stays that the future

is independent of isolated events in the remote past, while K-mixing implies independence of

the entire remote past history.

In weakly mixing systems the past may be relevant to predicting the future, even in the remote

past. The weak mixing condition only says that this influence has to be weak enough for it to

be the case that the absolute value of the correlations between a future event and past events

vanishes on average; but, again, this does not mean that individual correlations vanish. So in

weakly mixing systems the past can remain relevant to the future.

Ergodicity, finally, implies no decay of correlation at all. The ergodicity condition only says

that the average of the correlations (and this time without an absolute value) of all past events

with the relevant future event is zero. But this is compatible with there being strong

correlations between every instant in the past and the future, provided that positive and

negative correlations average out. So in ergodic systems the past does not become irrelevant.

For this reason ergodic system are not random at all (in the sense of random introduced

above). One could say that they mark, as it were, the zero point on the scale of randomness.23

7. The ‘no-application’ charge

How relevant are these insights to understanding the behaviour of actual systems? A

frequently heard objection (which we have already encountered in Section 5) is that EH and

characterisation given here also applies to some systems that are not ergodic. For a discussion of such cases see

Werndl (2009a, section 4.2.2). 23 They are not the only systems occupying the zero level. Periodic systems, for instance, are not random either.

We do not discuss other non-random systems here because they are not part of EH.

39

more generally ergodic theory are irrelevant since most systems (including those that we are

ultimately interested in) are not ergodic at all.24

This charge is less acute than it appears at first glance. First, it is important to emphasise that

it is not the sheer number of applications that make a physical concept important, but whether

there are some important systems that are ergodic. And there are examples of such systems.

For example, the so-called ‘hard-ball systems’ (and some more sophisticated variants of them)

are effective idealizations of the dynamics of gas molecules, and these systems seem to be

ergodic (for details, see Berkovitz, Frigg and Kronz 2006, Section 4.2, and references

therein).

Furthermore, EH can be used to characterize randomness and chaos in systems that are not

ergodic. Even if a system as a whole is not ergodic (i.e. if it fails to be ergodic with respect to

the entire phase space X) there can be (and usually there are) subsets of X on which the

system is ergodic. This is what Lichtenberg and Libermann (1992, p. 295) have in mind when

they observe that that ‘[i]n a sense, ergodicity is universal, and the central question is to

define the subspace over which it exists’. In fact, non-ergodic systems may have subsets that

are not only ergodic, but even Bernoulli! It then becomes an interesting questions what these

subsets are, of what measure they are, and what topological features they have. These are

questions studied in parts of dynamical systems theory, most notably KAM theory. Hence,

KAM theory does not demonstrate that ergodic theory is not useful in analyzing the

dynamical behavior of real physical systems (as is often claimed). Indeed, KAM systems have

regions in which the system manifest either merely ergodic or Bernoulli behaviour, and

accordingly EH is useful for charactering the dynamical properties of such systems

(Berkovitz, Frigg and Kronz 2006, Section 4). Further, as we have mentioned in Section 5,

almost all Hamiltonian systems are non-integrable, and accordingly they have large regions of

the phase space in which their motion is ergodic-like. So EH is a useful tool in studying the

dynamical properties of systems even if the system fails to be ergodic tout court.

Another frequently heard objection is that EH is irrelevant in practice because most levels of

EH (in fact, all except Bernoulli) are defined in terms of infinite time limits and hence remain

24 This bit of conventional wisdom is backed-up by a theorem by Markus and Mayer (1974), which is based on

KAM theory, and basically says that generic Hamiltonian dynamical systems are not ergodic.

40

silent about what happens in finite time. But all we ever observe are finite times and so EH is

irrelevant to physics as practiced by actual scientists.

This charge can be dispelled by a closer look at the definition of a limit, which shows that

infinite limits in fact have important implications for the dynamical behaviour of the system in

finite times. The definition of a limit is as follows (where f is an arbitrary function of time):

ctft

)(lim iff for every 0 there exists an t’ > 0 so that for all t > t’ we have f (t) c .

In words, for every number , no matter how small, there is a finite time t’ after which the

values of f differ from c by less then . That is, once we are past t’ the values of f never move

more than away from c. With this in mind strong mixing, for instance, says that for a given

threshold there exists a finite time tn (n units of time after the current time) after which

),( ABTC n is always smaller than . We are free to choose to be an empirically relevant

margin, and so we know that if a system is mixing, we should expect the correlations between

the states of the system after tn and its current to be below . The upshot is that in strong

mixing systems, being in a state B at some past time becomes increasingly irrelevant for its

probability of being in the state A now, as the temporal distance between A and B becomes

larger. Thus, the fact that system is strong mixing clearly has implications for its dynamical

behaviour in finite times.

Since different levels of EH correspond to different degrees of randomness, each explicated in

terms of a different type of asymptotic decay of correlations between states of systems at

different times, one might suspect that a similar pattern can be found in the rates of decay.

That is, one might be tempted to think that EH can equally be characterized as a hierarchy of

increasing rates of decay of correlations: K-system, for instance, which exhibits exponential

divergence of trajectories would be characterized by an exponential rate of decay of

correlations, while a SM-system would exhibit a polynomial rate of decay.

This, unfortunately, does not work. Natural as it may seem, EH cannot be interpreted as a

hierarchy of increasing rates of decay of correlations. It is a mathematical fact that there is no

particular rate of decay associated with each level of EH. For instance, one can construct K-

systems in which the decay is as slow as one wishes it to be. So the rate of decay is a feature

of certain properties of a system rather than of a level of EH.

41

8. The Ergodic Hierarchy and Chaos

The question of how to characterise chaos has been controversially discussed ever since the

inception of chaos theory; for a survey see Smith (1998, Ch. 10). An important family of

approaches defines chaos using EH. Belot and Earman (1997, 155) state that being strong

mixing is a necessary condition and being a K-system is a sufficient condition for a system to

be chaotic. The view that being a K-system is the mark of chaos and that any lower degree of

randomness is not chaotic is frequently motivated by two ideas. The first is the idea that

chaotic behaviour involves dynamical instability in the form of exponential divergence of

nearby trajectories. Thus, since a system involves an exponential divergence of nearby

trajectories only if it is a K-system, it is concluded that (merely) ergodic and mixing systems

are not chaotic whereas K- and B-systems are. It is noteworthy, however, that SM is

compatible with there being polynomial divergence of nearby trajectories and that such

divergence sometimes exceeds exponential divergence in the short run. Thus, if chaos is to be

closely associated with the rate of divergence of nearby trajectories, there seems to be no

good reason to deny that SM systems exhibit chaotic behaviour.

The second common motivation for the view that being a K-system is the mark of chaos is the

idea that the shift from zero to positive KS-entropy marks the transition from a ‘regular’ to

‘chaotic’ behaviour. This may suggest that having positive KS-entropy is both necessary and

sufficient condition for chaotic behaviour. Thus, since K-systems have positive KS-entropy

while SM systems don’t, it is concluded that K-systems are chaotic whereas SM-systems are

not. Why is KS-entropy a mark of chaos? There are three motivations, corresponding to three

different interpretations of KS-entropy. First, KS-entropy could be interpreted as entailing

dynamical instability in the sense of having nearby divergence of nearby trajectories (see

Lichtenberg & Liebermann, 1992, p. 304). Second, KS-entropy could be connected to

algorithmic complexity (Brudno 1978). Yet, while such a complexity is sometimes mentioned

as an indication of chaos, it is more difficult to connect it to physical intuitions about chaos.

Third, KS-entropy could be interpreted as a generalized version of Shannon’s information

theoretic entropy (see Frigg 2004). According to this approach, positive KS-entropy entails a

certain degree of unpredictability, which is sufficiently high to deserve the title chaotic.25

25 We would like to point out that an analysis of chaos in terms of positive KS-entropy needs further

qualifications. A system whose dynamics is, intuitively speaking, chaotic only on a part of the phase space can

still have positive KS-entropy. A case in point is a system with X = [-1, 1] where dynamics on [-1,0) is the

42

In a recent paper Werndl (2009b) argues that a careful review of all systems that one

commonly regards as chaotic shows that strong mixing is the crucial criterion. So a system is

chaotic just in case it is strong mixing. As she is careful to point out, this claim needs to be

qualified: systems are rarely mixing on the entire phase space, but neither are they chaotic on

the entire phase space. The crucial move is to restrict attention to those regions of phase space

where the system is chaotic, and it then turns out that in these same regions the systems are

also strong mixing. Hence Werndl concludes that strong mixing is the hallmark of chaos. And

surprisingly this is true also of dissipative systems (i.e. systems that are not measure

preserving). These systems have attractors, and they are chaotic on their attractors rather than

on the entire phase space. The crucial point then is that one can define an invariant

(preserved) measure on the attractor and show that the system is strongly mixing with respect

to that measure. So strong mixing can define chaos in both conservative and dissipative

systems.

The search for necessary and sufficient conditions for chaos presupposes that there is a clear-

cut divide between chaotic and non-chaotic systems. EH may challenge this view, as every

attempt to draw a line somewhere to demarcate the chaotic from non-chaotic systems is bound

to be somewhat arbitrary. Ergodic systems are pretty regular, mixing systems are less regular

and the higher positions in the hierarchy exhibit still more haphazard behaviour. But is there

one particular point where the transition from ‘non-chaos’ to chaos takes place? Based on the

argument that EH is a hierarchy of increasing degrees of randomness and degrees of

randomness correspond to different degrees of unpredictability (see Section 6), Berkovitz,

Frigg and Kronz (2006, Section 5.3) suggest that chaos may well be viewed as a matter of

degree rather than an all-or-nothing affair. Bernoulli systems are very chaotic, K-systems are

slightly less chaotic, SM-systems are still less chaotic, and ergodic systems are non-chaotic.

This suggestion connects well with the idea that chaos is closely related to unpredictability.

Conclusion

identity function and the tent map on [0, 1]. This system has positive KS-entropy, but dynamics of the entire

systems is not chaotic (only the part on [0, 1] is). This problem can be circumvented by adding extra conditions,

for instance that the system be ergodic (which the above system clearly is not).

43

EH is often regarded as relevant for explicating the nature of randomness in deterministic

dynamical systems. It is not clear, however, what notion of randomness this claim invokes.

The formal definitions of EH do not make explicit appeal to randomness and the usual ways

of presenting EH do not involve any specification of the notion of randomness that is

supposed to underlie EH. As suggested in Section 6, EH can be interpreted as a hierarchy of

randomness if degrees of randomness are explicated in terms of degrees of unpredictability,

which in turn are explicated in terms of conditional degrees of beliefs. In order for these

degrees of belief to be indicative of the system’s dynamical properties, they have to be

updated according to a system’s dynamical law. The idea is then that the different levels of

EH, except for merely ergodic systems, correspond to different kinds of unpredictability,

which correspond to different patterns of decay of correlations between their past states and

present states. Merely ergodic systems seem to display no randomness, as the correlations

between their past and present states need not decay at all.

Ergodic theory in general, and EH in particular play an important role in statistical physics. In

particular, they play an important role in the foundations of statistical mechanics (Section 5),

and EH, or some modification of it, constitutes an important measure of randomness in both

Hamiltonian and dissipative systems. It is frequently argued that EH is by and large irrelevant

for physics because real physical systems are not ergodic. But, this charge is unwarranted, and

a closer look at non-ergodic systems reveals a rather different picture. Almost all Hamiltonian

systems are non-integrable (in the sense non-integrable systems are of second Baire category

in the class of all normalised and infinitely differentiable Hamiltonians) and therefore in large

regions of their phase space the motion is random in various ways well captured by EH.

Further, as Werndl (2009) argues, EH could also be used to characterize randomness and

chaos in dissipative systems. So EH is a useful tool to study the dynamical properties of

various ergodic and non-ergodic systems.

Appendix

A. The Conceptual Roots of Ergodic Theory

The notion of an abstract dynamical system is both concise and effective. It focuses on certain

structural and dynamical features that are deemed essential to understanding the nature of the

44

seemingly random behaviour of deterministically evolving physical systems. The selected

features were carefully integrated, and the end result is a mathematical construct that has

proven to be very effective in revealing deep insights that would otherwise have gone

unnoticed. This brief note will provide some understanding of the key developments that

served to influence the choice of features involved in constructing this concept.

In his well known analysis of causation, Hume claims that the terms efficacy, agency, power,

force, energy, necessity, connection, and productive quality are nearly synonymous, and he

regards it as an absurdity to employ one in defining the rest (Hume 1978), section 1.3.14.

These are powerful claims, and when they are combined with his sceptical arguments

concerning necessary connections that he develops in that section, the result is a

metaphysically austere vision of science. However, one may regard Hume’s claims as

restricted to moral philosophy (the science of human nature) as opposed to having a broad

application that extends to natural philosophy (physics, chemistry and biology); compare

(Stroud 1977), pp. 1-16. The narrower interpretation permits regarding Hume’s claim as

peacefully co-existing with the history of mechanics, where dynamic terms such as those

above and related terms (such as those that are more fundamental) have distinctive meanings

and uses that are crucially important.

One key development in the early history of mechanics was the realization of the need to

settle on a set of fundamental quantities. For example, Descartes regarded volume and speed

as fundamental; whereas, Newton regarded mass and velocity as such. Those quantities were

then respectively used by each of them to define other important quantities, such as the notion

of quantity of motion. Descartes defined it as size (volume) times speed (Descartes 1983),

paragraph 2.36; whereas, Newton defined it as mass times velocity, (Newton 1687), p 1. (See

(Cohen 1966) for further discussion of the two views and the historical relationship between

them; also, see (Dijksterhuis 1986) for discussion of them in a broader historical context.)

Both Descartes and Newton regarded a force as that which brings about a change in the

quantity of motion; compare Descartes third law of motion (Descartes 1644), paragraph 2.40,

and Newton’s second law of motion (Newton 1687), p. 13. However, these are quite distinct

notions, and one has deeper ontological significance and substantially greater utility than the

other. In (Garber 1992), there is an excellent discussion of some of the shortcomings of

Descartes’ physics.

45

Although Newton’s notion of force is extremely effective, questions arose as to whether it is

the most fundamental dynamical notion on which mechanics is to be based. Eventually, it was

realized that the notion of energy is more fundamental than the notion of force. Both are

derived notions, meaning that they are defined in terms of fundamental quantities. The crucial

question is how to distinguish the derived quantities that are the most fundamental, or at least

more fundamental than the others. The answer to that question is far from straightforward.

Sometimes such determinations are made on the basis of principles (such as the principle of

virtual work or the principle of least action), or because they prove more useful than others in

solving problems or in providing deeper insight. In the history of mechanics, it was eventually

realized that it is best to adopt Hamilton’s formulation of mechanics rather than Newton’s; see

(Dugas 1988) for further discussion of the development of mechanics after Newton. In

Hamilton’s formulation, the fundamental equations of motion are defined in terms of the total

energy (kinetic plus potential energies) of a system, by contrast with the Newtonian

formulation, which define them in terms of the sum of the total forces acting on the system. A

number of deeply important insights result from that choice, and some of those are crucial for

understanding and appreciating the elegant conciseness of the notion of an abstract dynamical

system.

One key innovation of Hamilton’s approach is the use of phase space, a 6N dimensional

mathematical space, where N is the number of particles constituting the system of interest.

The 6N dimensions are constituted by 3 spatial coordinates per particle and one “generalized”

momentum coordinate per spatial coordinate. For a single simple system (such as a particle

representing a molecule of a gas) the phase space has 6 dimensions. Each point x in phase

space represents a possible physical state (also known as a phase) of the classical dynamical

system; it is uniquely specified by an ordered list of 6 (more generally, 6N for an N particle

system) numerical values, meaning a 6 dimensional vector. Once the state is known, other

properties of the system can be determined; each property corresponds to a mathematical

function of the state (onto the set of possible property values). The time evolution of the state

of a system (and so of its properties) is governed by a special function, the Hamiltonian,

which can be determined in many cases from the forces that act on the system (and in other

ways). The Hamiltonian specifies the transformation of the state of the system over time by

way of Hamilton’s equations of motion, which is a close analogue to Newton’s equation,

force equals mass time acceleration. It should perhaps be emphasized that the two

formulations of classical mechanics are not completely equivalent; that is to say, for many but

46

not all classical systems the corresponding mathematical representations of them are inter-

translatable. For further discussion, see section 1.7 of (Lanczos 1986) and section 2.5.3 of

(Torretti ).

The use of Hamilton’s formulation of the equations of motion leads to two immediate

consequences, the conservation of energy and the preservation of phase space volumes; for

more discussion, see sections 6.6 and 6.7 of (Lanczos 1986). These consequences are crucial

for understanding the foundations of ergodic theory. They are quite general, though not fully

general since some substantial assumptions (that need not be specified here) must be made to

derive them; however, a large class of important systems satisfy those assumptions. The

conservation of energy means that the system is restricted to a surface of constant energy in

phase space; more importantly for the foundations of ergodic theory is that most of these

surfaces are (as it turns out) compact manifolds. The time evolution of a phase space volume

that is restricted to a compact manifold has an invariant measure that is bounded, meaning

that it can be normalized to unity.

In light of the discussion above, important conceptual ties can be made to the elements that

constitute the notion of an abstract dynamical system. As noted above (in the main body of

this entry), those elements are a probability space [X,,] and a measure preserving

transformation T on [X,,]. The term X denotes an abstract mathematical space of points. It

is the counterpart to the phase space of Hamiltonian mechanics; however, it abstracts away

from the physical connections that the coordinate components have to spatial and kinematic

elements (the generalized momenta) of a classical system. The term denotes a -algebra of

subsets of X, and it is the abstract counterpart to the set of all possible phase space volumes.

The classical phase space volume is an important measure, and it is replaced by , a

probability measure on . The abstraction to a probability measure is ultimately related to the

conservation of energy and to the resulting restriction (in many cases) of the time evolution to

a compact manifold. In compact manifold cases, units may be chosen so that the total volume

of the compact manifold X is unity, in which case the volume measure on the set of sub-

volumes (the counterpart to ) is effectively a probability measure (the counterpart to ). The

phase-space-volume preserving time-evolution specified by Hamilton’s equations is replaced

by the abstract notion of a probability measure preserving transformation T on X.

47

To fully appreciate why the time evolution of volumes of phase space are of special interest in

ergodic theory rather than points of phase space, it is necessary to relate the discussion above

to developments in classical statistical mechanics. Classical statistical mechanics is typically

used to model systems that consist of a large number of sub-systems, such as a volume of gas.

A liter of a gas at standard temperature and pressure has on the order of 1020 molecules, which

means that the corresponding phase space has 6 x 1020 dimensions (leaving aside other

features of the molecules such as their geometric structure, which is often done for the sake of

simplicity). For such systems, the Hamiltonian depends on both inter-particle forces as well as

external forces. As in classical mechanics, the total energy is conserved (given certain

assumptions, as noted earlier) and the time evolution preserves phase space volumes.

One important innovation in classical statistical mechanics is the use of a new notion of state

for physical systems; referred to as a macrostate (or ensemble density). This notion goes back

to Gibbs (1902), and has since been widely used (and we should emphasise that the Gibbsian

notion of a macrostate is different from the Boltzmannian, introduced in section 5.1). The

macrostate is sometimes interpreted as indicating what is known probabilistically about the

actual physical state of the system. Macrostates are represented by density functions, which

are characterized below. The actual state of the system is referred to as a microstate, and such

states are represented as phase space points (as in classical mechanics). The predominant

reason for introducing macrostates is the large number of sub-systems that constitute the

typical system of interest, such as a volume of gas; such numbers make it impossible in

practice to make a determination of the actual state of the system.

A density function is a function that is normalized to unity over the relevant space of states for

the system (meaning a surface of constant total energy). If f(x) denotes the density function

that describes the macrostate of a system, then f(x) may be used to calculate the probability

that the system is in a given volume A of phase space by integrating the density function over

the specified volume, ∫Af(x)dx. Such probabilities are sometimes interpreted epistemically,

meaning that they represent what is known probabilistically about the microstate of the

system with regards to each volume of phase space. Subsets of phase space that can be

assigned a volume are known as the Lebesgue measurable sets,26 and their abstract

counterpart in ergodic theory is the -algebra of subsets of X. The probability measure is

26 Not all subsets of phase space points are measurable—see (Royden 1968, pp. 52-65) for an explanation

48

the abstract counterpart to the product of the density function and the Lebesgue measure in

classical statistical mechanics.

It turns out that the density function may also be used to obtain information about the average

value of each physical quantity of the system with respect to any given volume of phase

space. As already noted, each physical quantity of a classical system is represented by a

function on phase space. Such functions are similar to density functions in that they must be

Lebesgue integrable; however, they need not be normalized to unity. Suppose that f(x) is the

macrostate of the system. If g(x) is one of its physical quantities, then ∫Af(x)g(x)dx denotes the

average value of g(x) over phase space volume A.

The time evolution of an macrostate is defined in terms of the time evolution of the

microstates. Suppose that f(x) is the macrostate of the system for some chosen initial time, and

let Tt be the time evolution operator associated with the Hamiltonian for the system, which

governs its time evolution from the initial time to some other time t. During that time interval,

f(x) evolves to some other density operator ft(x) since Tt is measure preserving. It turns out

that the time evolved state ft(x) corresponds to Ttf(x), which is by definition equal to f(Ttx).

The probability that the system is in a given volume of phase space at a given time is

determined by integrating the density function at the given time over the specified volume.

A brief discussion of some key developments in the foundations of statistical mechanics will

serve to provide a deeper appreciation for the notion of an abstract dynamical system and its

role in ergodic theory. The theory emerged as a new abstract field of mathematical physics

beginning with the ergodic theorems of von Neumann and Birkhoff in the early 1930s. The

theorems have their roots in Ludwig Boltzmann’s ergodic hypothesis, which was first

formulated in the late 1860s (Boltzmann 1868, 1871). Boltzmann introduced the hypothesis in

developing classical statistical mechanics; it was used to provide a suitable basis for

identifying macroscopic quantities with statistical averages of microscopic quantities, such as

the identification of gas temperature with the mean kinetic energy of the gas molecules.

Although ergodic theory was inspired by developments in classical mechanics, classical

statistical mechanics, and even to some extent quantum mechanics (as will be shown shortly),

it became of substantial interest in its own right and developed for the most part in an

autonomous manner.

49

Boltzmann’s hypothesis says that an isolated mechanical system, which is one in which total

energy is conserved, will pass through every point that lies on the energy surface

corresponding to the total energy of the system in phase space, the space of possible states of

the system. Strictly speaking, the hypothesis is false; that realization came about much later

with the development of measure theory. Nevertheless the hypothesis is important due in part

to its conceptual connections with other key elements of classical statistical mechanics such as

its role in establishing the existence and uniqueness of an equilibrium state for a given total

energy, which is deemed essential for characterizing irreversibility, a central goal of the

theory. It is also important because it is possible to develop a rigorous formulation that is

strong enough to serve its designated role. Historians point out that Boltzmann was aware of

exceptions to the hypothesis; for more on that, see (von Plato 1992).

Over thirty years after Boltzmann’s formulation of the ergodic hypothesis, Henri Lebesgue

provided important groundwork for a rigorous formulation of the hypothesis in his

development of measure theory, which is based in his theory of integration. About thirty years

after that, von Neumann developed his Hilbert space formulation of quantum mechanics,

which he developed in a well known series of papers that were published between 1927 and

1929. That inspired Bernard Koopman to develop a Hilbert space formulation of classical

statistical mechanics (Koopman 1931). In both cases, the formula for the time evolution of the

state of a system corresponds to a unitary operator that is defined on a Hilbert space; a unitary

operator is a type of measure-preserving transformation. Von Neumann then used Koopman’s

innovation to prove what is known as the mean ergodic theorem (von Neumann 1932).

Birkhoff then used von Neumann’s theorem as the basis of inspiration for his ergodic theorem

(Birkhoff 1931). That von Neumann’s work influenced Birkhoff despite that Birkhoff’s paper

was published before von Neumann’s is explained in (Birkhoff and Koopman 1932).

Birkhoff’s paper provides a rigorous formulation and proof of Boltzmann’s conjecture that

was put forth over sixty years earlier. The key difference is that Birkhoff’s formulation is

weaker than Boltzmann’s, only requiring saying that almost all solutions visit any set of

positive measure in phase space in the infinite time limit. What is of particular interest here is

not Birkhoff’s ergodic theorem per se, but the abstractions that inspired it and that ultimately

led to the development of ergodic theory. For further discussion of the historical roots of

ergodic theory, see pp. 93-114 of (von Plato).

50

In the Koopman formulation of classical mechanics a unitary operator Tt that is defined in

terms of the Hamiltonian represents time evolution. It does so in its action on the state xX of

the system: If the initial state of a system is x, then at time t its state is Ttx. It can be shown

that the set of operators {Tt : tR} for a given Hamiltonian constitutes a mathematical group.

A set of elements with an operator is a group if the following three conditions are

satisfied.

Associativity: A(BC) = (AB)C for all A,B,C.

Identity element: IA = AI = A there is an I such that for all A.

Inverse element: AB = BA = I for each a there is a B, e is the identity.

The strategy underlying ergodic theory is to focus on simple yet relevant models to obtain

deeper insights about notions that are pertinent to the foundations of statistical mechanics

while avoiding unnecessary technical complications. Ergodic theory abstracts away from

dynamical associations including forces, potential and kinetic energies, and the like.

Continuous time evolution is often replaced with discrete counterparts to further simplify

matters. In the discrete case, a continuous group {Tt : tR} is replaced by a discrete group

{Tn : nZ} (and, as we have seen above, the evolution of x over n units of time corresponds to

nth iterate of a map T: meaning that Tnx = Tnx). Other advantages to the strategy include

facilitating conceptual connections with other branches of theorizing and providing easier

access to generality. For example, the group structure may be replaced with a semi-group,

meaning that the inverse-element condition is eliminated to explore irreversible time

evolution, another characteristic feature that one hopes to capture via classical statistical

mechanics. This entry restricts attention to invertible maps, but the ease of generalizing to a

broader range of phenomena within the framework of ergodic theory is worth noting.

A. Measure Theory

A set is an algebra of subsets of X if and only if the following conditions hold: The union of

any pair of elements of is in , the complement of each element of is in , and the empty

set is in . In other words, for every A,B, AB and Ã, where Ã denotes the set of

all elements of X that are not in A. An algebra of subsets of X is a -algebra if and only

51

contains every countable union of countable collections of its elements. In other words, if

{Ai} is countable, the countable union Ai is in .

By definition, is a probability measure on if and only if the following conditions hold:

assigns each element of a value in the unit interval, assigns X the maximum value, and

assigns the same value to the union of a finite or countable disjoint elements of that it does

to the sum of the values that it assigns to those elements. In other words, :[0,1], (X)=1,

()=0, and (Bi) =(Bi) whenever {Bi} is finite or countable and BjBk = for each

pair of distinct elements Bj and Bk of {Bi}. The probability measure is the abstract

counterpart in ergodic theory to the density function in classical statistical mechanics.

B. K-Systems

The standard definition of a K-system is the following (see Arnold and Avez 1968, p. 32, and

Cornfeld et al. 1982, p. 280): A dynamical system [ X,, , ] is a K-system if and only if

there is a subalgebra 0 such that the following three conditions hold: (1) 0 T0 , (2)

n V T n0 , (3) n

T n0 N . In this definition, T n0 is the sigma algebra

containing the sets T n B ( B 0 ), N is the sigma algebra consisting uniquely of sets of

measure one and measure zero, n V T n0 is the smallest -algebra containing all the 0

nT ,

and n T n0 denotes the largest subalgebra of which belongs to each T n0 .

The Kolomogoriv-Sinai entropy of an automorphism T is defined as follows. Let the function

z be:

z(x):x log(x) if x 0

0 if x 0

Now consider a partition of the probability space [ X, B,] and let the function h( ) be

h( ) : z[(i )]i1

r

, the so-called ‘entropy of the partition ’. Then, the KS-entropy of the

automorphism T relative to the partition is defined as

52

h(T , ): limn h( T ... T n1 ) / n ,

and the (non-relative) KS-entropy of T is defined as

h(T ): sup h(T , ),

where the supremum ranges over all finite partitions of X .

One can now prove the following theorem (Walters 1982, p. 108; Cornfeld et al. 1982, p.

283). If [ X, B,] is a probability space and T: X X is a measure-preserving map, then T is

a K-automorphism if and only if h(T , ) 0 for all finite partitions , N (where N is a

partition that consists of only sets of measure one and zero). Since the (non-relative) KS-

entropy is defined as the supremum of h(T , ) over all finite partitions it follows immediately

that a K-automorphism has positive KS-entropy; i.e., h(T ) 0 (Cornfeld et al p. 283; Walters

1982, p. 109). But notice that the converse it not true: there are automorphisms with a positive

KS-entropy that are not K-automorphisms.

C. Bernoulli systems

Let Y be a finite set of elements Y f1,..., fn (sometimes also called the ‘alphabet’ of the

system) and let fi pi be a probability measure on Y : 0 pi 1 for all 1 i n , and

pi 1i1

n

. Furthermore, let X be the direct product of infinitely many copies of Y : X Yii

,

where Yi Y for all i . The elements of X are doubly-infinite sequences x xi i

, where

xi Y for each i Z . As the -algebra C of X we choose the -algebra generated by all sets

of the form x X xi k, m i m n for all m Z , for all n N , and for all k Y (the

so-called ‘cylinder sets’). As a measure on X we take the product measure ii

, that is

xi i

... (x2 )(x1) (x0 )(x1 )(x2)... The triple is stationary if the chance element is

constant in time, that is iff for all cylinder sets

(y:yi1 wi ,m i m n) (y:yi wi , m i m n) holds. An invertible measure-

preserving transformation T: X X , the so-called shift map, is naturally associated with

53

every stationary stochastic process: Tx yi i

where yi xi1 for all i Z . It is

straightforward to see that the measure is invariant under T (i.e. that T is measure

preserving) and that T is invertible. This construction is commonly referred to as a ‘Bernoulli

Scheme’ and denoted by ‘ B(p1, ..., pn )’. From this it follows that the quadruple [ X,C,,T ] is a

dynamical system.

Bibliography

Alekseev, V. M., and Yakobson, M. V. (1981). Symbolic dynamics and hyperbolic dynamical

systems. Physics Reports 75, 287–325. Argyris, J., Faust, G. and Haase, M. (1994). An exploration of chaos. Amsterdam: Elsevier. Albert, D. (2000). Time and Chance. Cambridge/MA and London: Harvard University Press. Arnold, V. I. and Avez, A. (1968). Ergodic Problems of Classical Mechanics. New York:

Wiley. Belot, G., & Earman, J. (1997). Chaos out of order: Quantum mechanics, the correspondence

principle and chaos. Studies in the History and Philosophy of Modern Physics 28, 147–182.

Berkovitz, J., Frigg, R. and Kronz, F. (2006). The Ergodic Hierarchy, Randomness and Hamiltonian Chaos. Studies in History and Philosophy of Modern Physics 37, 661-691. Birkhoff, G. D. (1931), “Proof of a Recurrence Theorem for Strongly Transitive Systems,”

and “Proof of the Ergodic Theorem,” Proceedings of the National Academy of Sciences, 17: 650-660.

Birkhoff, G. D. and Koopman, B. O., 1932, “Recent Contributions to the Ergodic Theory,” Proceedings of the National Academy of Sciences, 18: 279-282.

Boltzmann, L., 1868, “Studien über das Gleichgewicht der lebendigen Kraft zwischen bewegten materiellen Punkten,” Wiener Berichte, 58: 517–560,

Boltzmann, L., 1871, “Über das Wärmegleichgewicht zwischen mehratomigen Gasmolekülen,” Wiener Berichte, 63: 397–418.

Boltzmann, L. (1877). Über die beziehung zwischen dem zweiten hauptsatze der mechanischen wärmetheorie und der wahrscheinlichkeitsrechnung resp. den sätzen über das wärmegleichgewicht. Wiener Berichte 76, 373-435. Reprinted in F. Hasenöhrl ed, Wissenschaftliche Abhandlungen. Leipzig: J. A. Barth 1909, Vol. 2, pp. 164-223.

Bricmont, J. (2001). Bayes, Boltzmann, and Bohm: Probabilities in physics, in Bricmont et al. (2001), pp. 4-21.

Brudno, A. A. (1978). The complexity of the trajectory of a dynamical system. Russian Mathematical Surveys 33, 197–198.

Brush, S. G. (1976). The Kind of Motion We Call Heat. Amsterdam: North Holland Publishing.

Cohen, I. B. 1966, “Newton’s Second Law and the Concept of Force in the Principia,” Texas Quarterly, 10.3: 127-157.

Cornfeld, I. P., Fomin, S. V., and Sinai, Y. G. (1982). Ergodic Theory. Berlin and New York: Springer.

Degas, René, 1988 [1955], A History of Mechanics, Dover Publications [Neuchatel, Switzerland: Editions du Griffon].

54

Descartes, René, (1644), Principles of Philosophy, edited by V. R. Miller and R. P. Miller, Dordrecht: D. Reidel Publishing Co. 1983

Dijksterhuis, E. J., 1986 [1961], The Mechanization of the World Picture, Princeton: Princeton University Press [Oxford: Oxford University Press].

Earman, J. (1986). A Primer on Determinism. Dordrecht: D. Reidel Publishing Company. Earman, J. and Redei, M. (1996). Why ergodic theory does not explain the success of equilibrium statistical mechanics. British Journal for the Philosophy of Science 47, 63-78. Frigg, R. (2004). In What Sense Is the Kolmogorov-Sinai Entropy a Measure for Chaotic

Behaviour? – Bridging the Gap Between Dynamical Systems Theory and Communication Theory. British Journal for the Philosophy of Science 55, 2004, 411-434.

Frigg, R. (2008). A Field Guide to Recent Work on the Foundations of Statistical Mechanics. In Dean Rickles (ed.), The Ashgate Companion to Contemporary Philosophy of Physics. London: Ashgate, 99-196.

Frigg, R. (2009a). Probability in Boltzmannian Statistical Mechanics. Forthcoming in Gerhard Ernst and Andreas Hüttemann (eds.), Time, Chance and Reduction. Philosophical Aspects of Statistical Mechanics. Cambridge: Cambridge University Press.

Frigg, R. (2009b). Typicality and the Approach to Equilibrium in Boltzmannian Statistical Mechanics’, forthcoming in Philosophy of Science (Supplement).

Garber, Daniel, 1992, “Descartes’ Physics,” in The Cambridge Companion to Descartes, Edited by John Cottingham, Cambridge: Cambridge University Press.

Gibbs, J. W. (1902). Elementary Principles in Statistical Mechanics. Woodbridge: Ox Bow Press, 1981.

Hume, David, 1978, A Treatise of Human Nature. Edited by L. A. Selby-Bigge with notes by P. H. Nidditch, Oxford: Oxford University Press.

Khinchin, A. I. (1949). Mathematical Foundations of Statistical Mechanics. Mineola/NY: Dover Publications 1960.

Koopman, Bernard, 1931, “Hamiltonian Systems and Hilbert Space,” Proceedings of the National Academy of Sciences, 17: 315-318.

Lanczos, Cornelius, 1986 [1970], The Variational Principles of Mechanics, Dover Publications [Toronto: University of Toronto Press].

Lavis, D. (2005). Boltzmann and Gibbs: An attempted reconciliation. Studies in History and Philosophy of Modern Physics 36, 245-73.

Lichtenberg, A. J., and Liebermann, M. A. (1992). Regular and chaotic dynamics (2nd ed). Berlin and New York: Springer.

Malament, D. and Zabell, S. (1980). Why Gibbs Phase Averages work – the Role of Ergodic Theory. Philosophy of Science 47, 339–349. Mañé, R. (1983). Ergodic theory and differentiable dynamics. Berlin and New York:

Springer. Markus, L., & Meyer, K. R. (1974). Generic Hamiltonian dynamical systems are neither

integrable nor ergodic. Memoirs of the American mathematical society, Providence, Rhode Island.

McLaughlin, B. and Bennett, K. Supervenience. The Stanford Encyclopedia of Philosophy (Fall 2008 Edition), Edward N. Zalta (ed.), URL = <http://plato.stanford.edu/archives/fall2008/entries/supervenience/>.

Newton, Isaac, (1687), Mathematical Principles of Natural Philosophy, edited by A. Motte and revised by F. Cajori, Berkeley: University of California Press 1934.

Ornstein, D. S. (1974). Ergodic theory, randomness, and dynamical systems. New Haven: Yale University Press.

Ott (1993). Chaos in dynamical systems. Cambridge: Cambridge University Press. Shields (1973). The theory of Bernoulli shifts. Chicago: Chicago University Press.

55

Simanyi, N. (2004), “Proof of the Ergodic Hypothesis for Typical Hard Ball Systems,” Ann. Henri Poincare 5: 203-233.

Sklar, L. (1993), Physics and Chance: Philosophical Issues in the Foundation of Statistical Mechanics. Cambridge: Cambridge University Press.

Smith, P. (1998). Explaining Chaos. Cambridge: Cambridge University Press. Stroud, Barry, 1977, Hume, London: Routledge and Kegan Paul. Tabor (1989). Chaos and integrability in nonlinear dynamics. An introduction. New York:

Wiley. Tolman, R. C. (1938). The Principles of Statistical Mechanics. Mineola/New York: Dover

1979. Torertti, Roberto, 1999, The Philosophy of Physics, Cambridge: Cambridge University Press. Uffink, J. (2004). Boltzmann's work in statistical physics. Standford Encyclopedia of

Philosophy. http://plato.stanford.edu, Winter 2004 edn. Uffink, J. (2007). Compendium of the foundations of classical statistical physics. In J.

Butterfield and J. Earman eds., Philosophy of Physics. Amsterdam: North Holland, 923-1047.

Van Lith, J. (2001a). Ergodic theory, interpretations of probability and the foundations of statistical mechanics. Studies in History and Philosophy of Modern Physics 32, 581-94.

Von Neumann, John, (1932) “Proof of the Quasi-Ergodic Hypothesis,” Proceedings of the National Academy of Sciences, 18: 70-82.

Von Plato, Jan, 1992, “Boltzmann’s Ergodic Hypothesis,” Archive for the History of exact Sciences 44. pp. 71-89

Von Plato, J. (1994). Creating Modern Probability. Cambridge: Cambridge University Press. Vranas, P. (1998). Epislon-ergodicity and the success of equilibrium statistical mechanics.

Philosophy of Science 68, 688-708. Werndl, C. (2009a). Justifying Definitions in Mathematics -- Going Beyond Lakatos.

Philosophia Mathematica 17, pp. 313-340. Werndl, C. (2009b). What Are the New Implications of Chaos for Unpredictability?

Forthcoming in British Journal for the Philosophy of Science.

the ergodic hierarchy

Documents