nld-draftctr.maths.lu.se/matematiklth/courses/fman15/media/... · the treatment of discrete{time...

NLD-draft

Mario Natiello

Jorg Schmeling

Center for Mathematical Sciences, Lund UniversityE-mail address : [email protected]

Center for Mathematical Sciences, Lund UniversityE-mail address : [email protected]

2000 Mathematics Subject Classification. Primary

Key words and phrases. dynamical systems

Abstract. Lecture Notes for FMA140 - 2010

To our families, for patiently putting up with us.

Contents

Foreword xiOrganisation of the Book xiDevelopment xiiAcknowledgments xii

Chapter 1. Basics 11.1. What is a Dynamical System? 11.2. General Features of Dynamical Systems Theory 41.3. Examples 51.4. Exercises 7

Chapter 2. Basic Ordinary Differential Equations 92.1. Preliminary Considerations 92.2. Linear and Nonlinear 172.3. Exercises 182.4. Fundamental Theory 182.5. Exercises 222.6. Dependence on Initial Conditions 23

Chapter 3. Advanced Ordinary Differential Equations 273.1. The local flow 273.2. Exercises 293.3. The Straightening–out Theorem 303.4. Differential inequalities 333.5. Types of flow–lines 343.6. More Topological Properties 373.7. Exercises 433.8. Poincare–Bendixson Theory 433.9. Summary 493.10. The Lorenz Equations 493.11. Exercises 50

Chapter 4. Discrete Dynamical Systems 514.1. Flows versus Maps 514.2. One-dimensional Maps 554.3. Exercises 68

Chapter 5. Stability of Dynamical Systems 715.1. Stability 71

vii

viii CONTENTS

5.2. Lyapunov Functions 725.3. Exercises 765.4. Structural Stability 775.5. Hyperbolicity – The Hartman-Grobman Theorem 815.6. Exercises 86

Chapter 6. Center Manifold Theory and Local Bifurcations 896.1. Introduction 896.2. The Bifurcation Programme for Fixed Points of Flows 916.3. Centre Manifold Theorems 936.4. Centre Manifold Theorems for Maps 1006.5. Computation of Centre Manifold Approximations 1016.6. Relation of Centre Manifold Theory with Bifurcations 1036.7. Exercises 104

Chapter 7. Local Bifurcations of Fixed Points 1077.1. Local Bifurcations of Vector Fields 1077.2. Local Bifurcations of Fixed Points of Maps 1157.3. Some General Comments on Local Bifurcations 1197.4. Exercises 120

Chapter 8. Chaotic Systems 1218.1. One-dimensional Chaotic Maps 1218.2. Higher-dimensional Maps 1308.3. Strange Attractors and Chaos 1448.4. Exercises 146

Chapter 9. Ergodic Theory 1499.1. Naive Measure Theory 1519.2. Invariant Measures 1539.3. The Birkhoff Ergodic Theorem 1589.4. Exercise 1609.5. Lyapunov Exponents 1609.6. Exercises 160

Appendix A. Appendix 163A.1. Basic definitions 163A.2. More Topological Concepts 168A.3. Fixed Point Theorems 183A.4. Linear Algebra and Matrix Algebra 185A.5. Linear operators 193A.6. Algebra 194A.7. The Liouville Theorem of Hamiltonian Dynamics 196A.8. Uniform Distribution 200A.9. Number Theory 201A.10. The general Birkhoff Ergodic Theorem 202

Appendix B. Notes and Answers to the Exercises 209

CONTENTS ix

Notes 209Answers to Selected Exercises 215

Bibliography 219

Foreword

The theory of dynamical systems has grown in the last 100 yearsfrom being a rather special branch of mathematics (within the realm ofordinary differential equations) into a rich and diverse topic, not onlyof mathematics but also of other sciences. The theory of dynamicalsystems is on one hand a field of its own in modern mathematics, butalso a universal interdisciplinary method that is extensively used.

The mathematical theory of dynamical systems investigates thosegeneral structures which are the basis of evolutionary processes, morespecifically time-evolution, i.e., how a general system is modified alongtime. This kind of processes is met in most scientific fields, from nu-merical simulations around number theory to the time-evolution of co-existing populations. The aim of the theory of dynamical systems is togive qualitative assertions about such processes.

Dynamical systems theory helps substantially to understand theapplicability or limitations of mathematical models. Because evolu-tionary processes are met in so many seemingly different fields, theirgeneral nature is very complex and rich. General results arising fromthe mathematical structure of a problem are often subject to specialrestrictions depending on the nature of the system in consideration.The conditions under which one can ensure (æstethically) interestingproperties are not always verifiable in the different praxes. Scientists inthis field have the double task of developing the mathematical theoryand understanding its relation with the specific features of the variousapplications.

In this course we will consider classical topics of dynamical systems(nonlinear ordinary differential equations), giving as well an overview ofmethods and results that are of more general nature. We will especiallyemphasize on the universality of this theory and try to indicate its wideapplicability.

Organisation of the Book

The Book is structured as follows. Some general results that arefrequently used in the theory of Dynamical Systems are listed in theAppendix. These include basic definitions such as those of topology,metric space, manifold, etc., and also some basic theorems such asJordan Canonical Form Theorem or Banach’s Fixed Point Theorem.Some of the results are just stated, others are also proved.

xi

xii NLD-draft

Chapter 1 introduces the basic ideas of dynamical systems, high-lighting the discrete–time and continuous–time descriptions. Chap-ters 2 and 3 discuss the basic and subsequently some more advancedideas about the solutions of systems of first–order ordinary differen-tial equations (i.e., continuous–time systems)). Chapter 4 introducesthe treatment of discrete–time systems. Chapter 5 advances in thequalitative theory of dynamical systems through the concept of stabil-ity, namely when a given singular solution can be expected to persistasymptotically. Chapter 6 deals with a cornerstone of the qualitativeanalysis of dynamical systems, the Center Manifold Theorem, prepar-ing the field for the most important tool in applications, namely LocalBifurcation Theory, which is discussed in Chapter 7. The mathematicaltools related to the popular concepts of “chaos”, “strange attractors”,etc., is presented in Chapter 8. Finally, Chapter 9 deals with a belovedclassical topic of Dynamical Systems Theory, namely Ergodic Theory.

Since this book aims to be self-contained, almost all results whichare not elementary have been stated and proved right here (with “ele-mentary” we mean results discussed in the basic Algebra and Analysis(& other) courses available at essentially any Science- or Engineeringfaculty). Hence, everything is here and the reader does not need torush and see other books in order to complete the ideas we presented.However, we provide a list of references with books (or articles) thathave inspired us, one way or the other, such as [4., 8., 13.], or wheresome of the Exercises are discussed.

Development

This book originated in Jorg’s lecture notes for a course in theFaculty of Engineering. Then Mario revised the text and added thematerial from his lecture notes. Finally, a bit of mutual revision fol-lowed.

Acknowledgments

The authors thank the colleagues at the Centre for MathematicalSciences, Lund University for having been able to discuss the contentsof the book and other interesting subjects.

Lund, MMXII.

CHAPTER 1

Basics

1.1. What is a Dynamical System?

The objects of dynamical systems theory are space, time and evo-lution.

1.1.1. Space. The mathematical description of a dynamical sys-tem is specified by giving the set of (potentially) observable states ofthe system. Each such state which characterises and identifies a systemis regarded as a point in a set X called phase space. The points areconsidered as different states of the system.

Thus, phase space may consist of the set of all possible positionand momentum values for a point mass in rectilinear movement, or ofthe intensity, phase and electric field polarisation of a laser device, the(vertical and horizontal) velocity amplitudes and temperature profileof a gas or the set of all possible numbers of predator and prey in aclosed ecological environment.

In most applications, phase space has some additional structure,other than consisting simply of points. For example, it is quite naturalto consider the distance between points to be relevant. Depending onthe specific problem, phase space can be a topological space, a measur-able space, a manifold or some other structure. E.g., in Hamiltoniandynamics the basic structure is a symplectic, even-dimensional mani-fold, while for the other examples above a proper space could be R2×S1,R3 and N2 (pairs of nonnegative integers) respectively. See AppendixA.1 for a more precise description of the basic concepts.

Throughout this book we will assume that our space is not ”toolarge”. In particular we assume that phase space is finite dimensionaland compact. Such assumptions are natural for most applications:when possible we focus on some limited features of the system (e.g.,the number of predators/prey), and the possible extension of thesevariables is most frequently finite (e.g., the amount of prey may godown to zero, we call that an extinction, but it can never grow withoutbounds: after some “large” value both our models and the supportingphysical system would collapse). These assumptions substantially helpto develop a rich theory. Unfortunately they do not allow to applythese methods directly to the theory of partial differential equationswhere the phase space is an infinite dimensional function space.

1

2 NLD-draft

1.1.2. Time. Time is the human means of characterising change.The goal in science is to characterise change away from any subjectiveopinion as much as possible, i.e., the scientist measures the variablesdefining phase space and change is recognised because these variableshave different values in different measuring situations. The conceptof time used in dynamical systems theory is more often than not thenatural time assumed in everyday life, i.e., the Galilean time given byordinary accurate clocks.

Time is modeled by an additive group (or semi–group) G and thelaw of evolution will be specified. The most important groups (semi–groups) we consider will be R, R+ (continuous time) or Z, N (discretetime). There is a fundamental difference between continuous and dis-crete time which we will discuss in the next Subsection. Although allnatural processes come to an end sooner or later, it is very practicalto assume that the group is non-compact, i.e., that time may extendtowards infinity, since this allows to define and study the notion ofasymptotic behaviour.

1.1.3. Evolution. The evolution law is meant to assign to a statex the new state xt after time t ∈ G has elapsed. For the case ofdiscrete time, t is computed in integer time-units. A time-unit is theminimal time-interval that is regarded as relevant for the problem. Forexample, in population problems it is customary to measure time in“generations”. While for bacteria a new generation may be producedin a few hours, for many birds, large mammals and insects, generationtime is usually one year. On the other hand, when the time-unit canbe made arbitrarily small, as in the case of rapidly changing processes(electrical, chemical, gravitational, etc., processes), it is practical todescribe the dynamics with continuous time. The dynamical propertiesare specified instantaneously, i.e., time is regarded as a real variable.

We note that under certain assumptions the evolution law doesnot change in time, i.e., it is not important at what moment of timewe apply the evolution law. We call such a system autonomous.Formally, we can write

xt = φt(x)

where φt : X → X is the evolution within t time-steps. Since there isno evolution in zero time steps we have for t = 0 that

φ0 = Id i.e., φ0(x) = x. (1.1)

Furthermore, it is intuitive to assume that the evolution after t + stime-steps is the same as going t steps and then continuing further sadditional steps. This can be formalised as

φt+s(x) = φs (φt(x)) . (1.2)

1.1. WHAT IS A DYNAMICAL SYSTEM? 3

Remark 1.1. In general, the phase space X has some additionalstructures. It can be an Euclidean space (or, more generally, a mani-fold), a probability space, a metric space or others. Our main object ofstudy will be Euclidean spaces. In this case it is natural to ask the evo-lution law to be consistent with the structure of the underlying phasespace, i.e., if X = Rn then it is natural to ask that the laws φt aresmooth. In general we expect that the evolution law respects andpreserves the underlying structure.

Let us consider the discrete- and continuous-time cases in detail.

• G = R (R+). In this case the dynamical system is called a flow(semi–flow). X is finite-dimensional and we are interestedin obtaining the smooth function x(t) giving the state of thesystem instantaneously for all times. We assume that the evo-lution law depends continuously on the time parameter t. Themathematically simplest case is X = Rn. The instantaneouschange in x is given by the evolution law, i.e., some contin-uous function f(x, t). For autonomous systems the evolutionlaw is given by a function f(x) of the state only. Hence, au-tonomous, finite-dimensional dynamical systems are describedby ordinary differential equations (ODE’s) of the general formx′ = f(x), where the specific meaning of these symbols is tobe given for each particular case. Formally, φt(x) reads as theintegral of f over a time-interval t with initial condition x.• G = Z,N. Describing the system in terms of its natural time-

unit, “time” takes integer values. The evolution law can thenbe represented by the symbolic expression xn+1 = F (xn), wherethe function F describes the evolution law when going from gen-eration n to the next generation (n + 1). We call F a map ormapping. Autonomous evolution means now that F does notdepend explicitly on n, but only on the state x. The prop-erty (1.2) implies that φn = F n where φ1 ≡ F . Namely,

φn(x) = φn−1(φ1(x)) = φ1 φ1 · · · φ1(x) = φn1 (x).

Moreover for G = Z we have

φ−1(φ1(x)) = φ1(φ−1(x)) = x

and φ−1 ≡ F−1 is the inverse map.

Remark 1.2. There is a deep relation between the continuous- anddiscrete-time descriptions, basically arising from the fact that Z canbe regarded as a subset of R. Suppose that within a continuous-timedescription we choose to summarise the dynamics by making “readings”of the variables at fixed intervals of time. In such a case the flow definedby f(x) induces a discrete-time dynamics with associated F (x), wheref and F are related by formal integration.

4 NLD-draft

Remark 1.3. The use of capitals for maps and lower-case for flowsis just a matter of notation. It was chosen for simplicity for the sake ofthe previous remark. In specific problems the notation is the choice ofthe researcher, the only condition being that the description is clearly,consistently and accurately specified.

1.2. General Features of Dynamical Systems Theory

1.2.1. Asymptotic Behaviour. The main feature of dynamicsdistinguishing it from other areas is that one is interested in the as-ymptotic behaviour for arbitrarily large times. This is relevant becausemany natural systems when left alone, relating to the rest of the uni-verse under controlled conditions, approach some sort of “steady state”where further change is undetectable. This steady state can be inferredfrom the mathematical description by taking limits as time “goes toinfinity”. The specific asymptotic questions depend on the underlyingstructure.

1.2.2. Stability. One of the most important properties of a dy-namical system is the concept of stability. This is a broad conceptwhich we will further specify along the book. Without some sort ofstability any model is questionable and numerical simulations are of-ten not possible or improper.

There are two main characterisations of stability. The first one isthe inner stability, meaning that the asymptotic behaviour does notdrastically change if we make a small “error” in the starting position.This is sometimes called the dependence on initial conditions. Theother notion of stability is called structural stability. It means thatthe behaviour of the entire system is not drastically changed if weslightly perturb (i.e., make a small change) the evolution law.

These facts have many important practical consequences. The firstconcept of stability allows us to make qualitative assertions about thesystem. Statements of the form “if the system is initially in this regionof phase space, it will be found in the vicinity of such-and-such stateafter a given (sufficiently long) time”.

The second concept of stability is relevant since in applications (nu-merical or theoretical) one always works with models, i.e., representa-tions of the original problem. These models do not exactly describethe underlying processes up to 100%. There are always features thatare simplified, disregarded or ignored, no matter how detailed the de-scription. A good model, however, is in some sense sufficiently close tothe underlying system. If this model is structurally stable then thereis hope that the investigations of this model will lead to almost correctanswers for the real system. The model is “not far from” the real sys-tem, and consequently the predictions of the model are not far from thereal behaviour. In particular, any numerical simulation is a (discrete)

1.3. EXAMPLES 5

dynamical system modeling the original system (and perhaps model-ing the original model!) and the theory of convergence of algorithms isactually a stability theory in this sense.

1.2.3. Classification. For practical reasons, the mathematicianhas an interest in classifying all dynamical systems. The goal is toidentify the qualitative properties of the time-evolution just by recog-nising some features in the mathematical description. Such programmeis unfeasible in such generality. However, partial results do exist. Onehas to restrict the equivalence (classifying) relations or restrict the classof systems under study. Often only a local classification is possible.

1.3. Examples

We give here some examples of the structures we will encounter.

1.3.1. Topological Dynamics. Topology is the simplest math-ematical structure where we can develop the concept of continuity.Topological properties, hence, deal with features that are persistentunder homeomorphisms (continuous, invertible, 1–to–1 maps of phasespace, e.g., coordinate changes). Hence, to establish such properties noderivatives are necessary.

In topological dynamics the phase space is usually a compact metricspace and the evolution law is given by a family of homeomorphisms.This setup establishes the basis for more restrictive situations whereother properties are built on top of the topological ones (smoothness,symplectic, algebraic, . . . ) Many notions are defined topologically,i.e., in the continuous category. Some of those are: periodic orbits,recurrence, topological entropy, structural stability, attractorsand invariant sets, . . . For example, it is clear that whether a solutionx(t) of a dynamical system defined for t ∈ [t0, t0 + T ] closes into itselfat the end of the path or not (this is, if x(t0) equals x(t0 + T ) or not)is a property that cannot depend on the choice of coordinates: it willpersist under homeomorphisms. It is remarkable that many systemsare stable in the topological but not in the smooth sense.

The analysis of asymptotic behaviour is highly related to topologi-cal properties. The simplest asymptotic feature of a system is to deter-mine its invariant sets, i.e., the set(s) of points in phase space that areunaltered by the dynamics. In this context, the interesting topologi-cal properties are transitivity and minimality. These notions aredifferent kinds of irreducibility of systems, i.e., we identify invariantregions of phase space that cannot be decomposed into a number ofsmaller independent invariant sets.

Asymptotics deals not only with invariant sets but also with theway the dynamics of a given initial condition may approach such set.One of the most important notions is the notion of attractor. Roughlyspeaking, an attractor ”collects” a great amount of trajectories, i.e.,

6 NLD-draft

many trajectories in phase space get closer and closer to the attractoras time passes by.

Last but not least, topological dynamics has a profound connectionto the theory of C∗–algebras. It also can be applied to questions in puretopology. It has a great record in the proof of the Poincare conjecturein higher dimensions.

1.3.2. Measurable Dynamics and Ergodic Theory. It is muchtoo difficult to study the behaviour of any single orbit in a complex sys-tem. A way out is to study the behaviour of typical trajectories. Thisleads to statistical considerations. We are interested in a.e. behaviourwith respect to an invariant measure. The expression a.e. stands for“almost everywhere” and it indicates properties that hold for all points(or all trajectories) of phase space except at most a “small” set. Also“small” has a specific definition: in this context it means a set of zeromeasure, describing collections of points that are in some sense “unim-portant” from a statistical point of view.

So the setup is that the phase space is a measure space (a usual spacewhere we can additionally define a measure, i.e., a way of computingthe “size” of sets) and the evolution is measure preserving.

This is the setup of a stationary stochastic process. In fact, theergodic theory contains the theory of stochastic processes. The differ-ence is that it does not deal with independence, but it has a notion ofequivalence of processes which makes the theory different. The ques-tions we are interested in are of stochastic nature. Do we have a lawof large numbers?, a central limit theorem? and so on. Many ofthese questions are studied and could be answered in an surprisinglygeneral setting.

1.3.3. Smooth Dynamical Systems. In the same way as topol-ogy deals with continuity, smoothness is a sub-category where in ad-dition it is possible to compute (infinitely many) derivatives. Herethe phase space is now a manifold and the evolution is smooth, i.e.,the solutions x(t) are not only continuous functions, but also derivablefunctions. The main advantage of the smooth structure is that we canlinearise the system locally, by taking derivatives of the equations, i.e.,much along the lines of Taylor expansions. In general, the study oflinear systems is much simpler than the corresponding study of generalsystems (also called nonlinear).

Another feature of smooth dynamics is that it unites the methodsfrom topological and measurable dynamics. It is quite striking thatthese more restrictive settings imply very strong general results in thesmooth category. Such phenomena are called rigidity.

There are several more special subclasses of smooth systems. Theseinclude symplectic systems (celestial or Hamiltonian mechanics), alge-braic systems, geodesic flows, and others.

1.4. EXERCISES 7

1.3.4. Symbolic Dynamical Systems and Coding Theory.There is an important tool which is useful in the investigations of a largeclass of smooth, topological or measurable dynamical systems. In manycases, the system can be encoded into a symbolic system. For example,suppose we can identify two sets in phase space (which we label with thenumbers 0 and 1), such that all trajectories intersect these sets as timegoes by. A simplified way to describe the dynamics could be to list theintersections of each trajectory, e.g., to give a sequence 1001101 · · · ofvisits in time. When this is possible without ambiguities for all of phasespace, each point can be associated to an (infinite) address out of analphabet (in the example, a binary alphabet) which is coherent with thedynamics. We may then study the dynamics between intersections as amap in the space of binary strings. We “transfer” the dynamical studyto a symbolic space. Under certain conditions, such space will carrythe complete information of the original system. In those situations,the study of the system is largely simplified.

On the other hand, the intensive investigations of the symbolicdynamics (dynamics on infinite strings) brought a deep insight into thestructure of those strings. This helped to develop the coding theorywhich deals with long (but finite) strings.

1.4. Exercises

Exercise 1.1. Show that in the case G = Z the evolution law isdefined by φn = φn1 (see Section 1.1).

Exercise 1.2. A linear, real-valued system x = x has the associatedflow φt(y) = ety, y ∈ R in continuous time. Describe the dynamicalsystem associated to the time-one map, i.e., the map connecting eachpoint in phase space to its image after one unit of time has passed,xn+1 = F (xn). Show both the new discrete–time flow and F (x) (seeSection 1.1).

CHAPTER 2

Basic Ordinary Differential Equations

We state here the foundations of the theory of dynamical systemsfor continuous time. We start with some basic structure and furtherstudy the question of existence of solutions for systems of ordinarydifferential equations of the first order. We end the chapter with someremarks about linear ordinary differential equations.

2.1. Preliminary Considerations

Newton stated the fundaments of mechanics, the theory of motion.His principle states that the force acting on a particle equals its masstimes the acceleration. Since acceleration is the second derivative ofthe position q of a particle we get

mq = F (q)

where m is the mass, q(t) is the position of the point at time t and F isthe acting force (here assumed to be autonomous, i.e., without explicitdependence on time). This is an example of an ordinary differentialequation (ODE).

More generally, ODE’s are equations of a function x(t) that involvethe time-derivatives x, x, · · · , t and possibly other parameters. Thehighest derivative involved is called the order of the equation. A specialbut important case of differential equations occurs when we can solvethe equation for the highest derivative. Thus, for an equation of ordern of the form dnx/dtn = f(x, x, · · · , t) we may introduce new variables

x1 = x, x2 = x, · · · , xn−1 = x(n−1)

and recast this equation in vector form, obtaining a system of differen-tial equations of first order:

d

dt(x, x1, · · · , xn−1) = v(x, x1, · · · , xn−1, t)

where v : Rn×R→ Rn is a vector function called the vector field. Incase of Newton’s equation, we get, with p = q

(q, p) =

(p,

1

mF (q)

).

The main question in systems of first-order ODE’s is to find func-tions t 7→ x(t) ∈ Rn such that x(t) = v(x(t), t). The autonomouscase occurs when the vector field v does not depend on time, i.e.,

9

10 NLD-draft

x

t

Figure 2.1. Solutions of a linear ODE for different ini-tial conditions. Case a > 0.

x(t) = v(x(t)). Such a curve t → x(t) is called an integral curveor trajectory of the vector field v (also called an orbit). We will seethat it is specified by one initial condition, x(0), being the solution ofa first-order equation. We can interpret the curve as showing how astate evolves in time.

2.1.1. Finding Differential Equations. Let us illustrate someways in which differential equations arise in different problems. Assumewe are given a family of curves. We want to find a differential equationwhich has this family as trajectories.

Example 2.1. We consider the family of curves in the plane

C1x+ (y − C2)2 = 0.

with y = y(x). C1 and C2 are two arbitrary constants specifying eachmember of the family. We differentiate twice and get:

C1 + 2(y − C2)y′ = 0

2(y′)2 + 2(y − C2)y′′ = 0.

Now we use the resulting equations to eliminate the constants. Thisgives C1 = −2(y − C2)y′ and hence, −2xy′(y − C2) + (y − C2)2 = 0.

The second equation yields y − C2 = − (y′)2

y′′. This leads to

y′ + 2xy′′ = 0.

2.1.2. Simple Examples. We first consider the linear equation

x(t) = ax(t)

with v(x) = ax. Here x ∈ R and a is a real constant, usually called aparameter of the equation.

2.1. PRELIMINARY CONSIDERATIONS 11

x

t

Figure 2.2. Solutions of a linear ODE for different ini-tial conditions. Case a < 0.

From the theory of differential equations we know that given aninitial condition x0 ∈ R, the function

x(t) = x0eat

is a solution. In this way we find solutions to any initial condition.These are the only solutions. For if y(t) is another solution with y(0) =x(0) = x0 then

d

dt

(y(t)e−at

)= y(t)e−at + y(t)

(−ae−at

)= ay(t)e−at − ay(t)e−at = 0.

Hence y(t)e−at is a constant (it has zero time derivative) and thereforey(t)e−at ≡ x0. Hence, y(t) = x0e

at. Thus, we see that the initialcondition uniquely defines the trajectory.

Let us study the nature of this solution in detail, for different valuesof a. Letting a > 0, the trajectories originating in the vicinity of 0 (forlarge negative times) tend to infinity for large positive times, exceptfor the solution x ≡ 0 (see Figure 2.1). For a < 0 the solutions ”comefrom infinity” (for large negative times) and tend to the zero solutionfor large positive times (see Figure 2.2). These two behaviours arequalitatively different. On the contrary, the behaviour among e.g.,all positive a is qualitatively similar. Indeed, defining a new variableτ = at, the behaviour for all positive values of a reduces to x(t) = x0e

τ .A similar property is observed among all negative values of a. Smallmodifications of a, that do not change its sign, yield essentially the samedynamics, the solution curves coincide exactly after rescaling time.

For a = 0 (see Figure 2.3) there is a third behaviour, qualitativelydifferent from the previous two. There is “no” dynamics, x(t) is con-stant. However, in this case any modification of a away from zero yields

12 NLD-draft

x

t

Figure 2.3. Solutions of a linear ODE for different ini-tial conditions. Case a = 0.

qualitatively different solutions. We say that the system is struc-turally unstable: any small change of the parameter a away fromzero can lead to one of the two previously classified types, and thethree types of solutions cannot be mapped to each other by rescalingthe coordinate or time. Note the substantial difference with the casesa < 0 or a > 0 where the type of solution is preserved under sufficientlysmall perturbations.

Next we consider the system of equations (vector equation)

x1(t) = a1x1(t)

x2(t) = a2x2(t).

These equations can be recasted as a linear equation in R2, withv(x) = Ax, x = (x1, x2) ∈ R2 and

A =

(a1 00 a2

), (2.1)

a diagonal 2× 2 matrix.Since both components are completely separated, the solutions are

the juxtaposition of the individual solutions, following the previousexample.

x1(t) = x1(0)ea1t x2(t) = x2(0)ea2t.

Again the solutions exist and are uniquely determined by their initialconditions (x1(0), x2(0)).

Graphical tools to describe dynamical system are widely availablein modern times, especially for 2-d systems, where relevant portionsof phase space can be thoroughly depicted on a piece of paper. Forexample, the vector field of a 2-d system can be depicted by considering


x2

x1

Figure 2.4. Graph representing the Vector field for eq.2.1, with a1 = −a2 = log 2.

a grid covering phase space and assigning to each point on the grid, asmall arrow representing the direction and length of the vector field.See Figure 2.4 for an example showing the linear vector field.

A useful tool in understanding dynamical systems is the phaseportrait, where some typical trajectories are illustrated. If the vectorfield is sufficiently smooth, trajectories are also smooth curves. SeeFigure 2.5. Since trajectories are tangent to the vector field, it is easyto recognise the relation between both graphs in 2-d systems.

Remark 2.1. The graphs of the trajectories, (x1(t), x2(t)), lie inR3 while the phase portrait lies in R2.

Remark 2.2. This system of equations can be regarded as a dy-namical system. The phase space is R2 and the time variable lies inR. Let x(0) = (x1(0), x2(0)) ∈ R2 be the state at time t = 0. Then wehave the unique solution x(t) ≡ x(t, x(0)). We can set

φt(x1(0), x2(0)) := (x1(t, x1(0)), x2(t, x2(0))) : R2 → R2.

This evolution law is called a flow. It is straightforward to check thatconditions (1.1) and (1.2) are satisfied.

Remark 2.3. For A =

(log 2 0

0 − log 2

)the solutions with non-

zero first coordinate, i.e., x1(0) 6= 0 are unbounded in the future (pos-itive times). Also, all solutions with non-zero second coordinate, i.e.,

14 NLD-draft

x2

x1

Figure 2.5. Phase portrait for eq. 2.1, with a1 =−a2 = log 2.

x2(0) 6= 0, are unbounded in negative time. We have exactly onebounded solution x(t) ≡ 0 for the initial condition x1(0) = x2(0) = 0.

If we consider the linear system with matrix −A the solution un-bounded in the future and in the past are exchanged, as compared withthe previous system and vice versa. Here we also have a qualitativelydifferent behaviour.

Remark 2.4. For the vector field A =

(0 −11 0

)we have that all

solutions are of the form

(x1(t, x1(0)), x2(t, x2(0)) = (x1(0) cos t−x2(0) sin t, x1(0) sin t+x2(0) cos t)

and hence are periodic (see Figure 2.6).The periodic case is similar to the case a = 0 in dimension one,

since solutions never become unbounded for large (positive or negative)times. Note however that in all examples so far, the solutions stayfinite after a finite amount of time has passed. This is not always thecase.

Example 2.2. Let v(x) = 1+x2. The integral curves have the formx(t) = tan(t + τ) for x(0) = tan τ . These solutions escape to infinityafter time π/2− τ (see Figure 2.7).

Remark 2.5. If A is not a diagonal matrix we can try to diagonaliseit by a linear transformation y = S−1x. If we succeed, this impliesx = Sy and x = Sy. Hence, Sy = ASy or y = S−1ASy. Since S−1ASis diagonal, the phase portrait, solutions, etc., in the new coordinatesystem y, are just that of a diagonal matrix (see Figure 2.8).


x2

x1

Figure 2.6. Linear ODE with periodic solutions.

x

−π2 τ π

2 t

Figure 2.7. Unbounded solutions diverging in finite time.

Remark 2.6. In our examples the solutions were uniquely deter-mined by their initial values. Through any point of phase space X = R2,there is a unique trajectory defining the evolution. This is a reasonableand necessary requirement to have a dynamical system. We will laterestablish conditions assuring this property.

Example 2.3. Let v(x) = 3x2/3. This system has the solutionx0(t) ≡ 0. However, for any τ ∈ R, x(t) = (t− τ)3 is also a solution,

16 NLD-draft

x2

x1

Figure 2.8. The new axes y are given by the dotted lines.

satisfying x(τ) = 0 (see Figure 2.9). Hence, there is more then onetrajectory passing through x = 0. We will soon see that the reason forthis is the non-differentiability of v at the point x = 0 (specifically, thefact that v(x) does not satisfy the Lipschitz condition at x = 0, seeSection 2.4).

x

t

Figure 2.9. When Lipschitz condition is not fulfilled,we may have non-unique solutions.

2.2. LINEAR AND NONLINEAR 17

2.2. Linear and Nonlinear

For the sake of dynamical systems theory, the autonomous linearcase is usually addressed in earlier, more elementary, mathematicalcourses. The general problem is of the type x ∈ Rn, A a real, n × n,square matrix and

x = Ax.

The case n = 1 is discussed in the first Calculus courses, and has thegeneral solution x(t) = eAtx0, as stated above. For arbitrary n thereexists a general procedure that, in principle, exhausts all details of theproblem.

1. Perform a linear change of coordinates rendering the matrix inits simplest form (call it B), this is called the Jordan canon-ical form of matrix A. In the simplest situation, B will be adiagonal matrix, but diagonalisation is not always possible fora general matrix A, see Appendix A.4 for the general case.

2. In the new coordinates y the solution has the general form

y(t) = eBty(0).

3. The computation of the matrix eBt falls in one of two cases:• If B is diagonal (i.e., all non–diagonal elements are zero),

then eBt is a diagonal matrix, with entries ebit, bi the cor-responding diagonal element of B. The problem reducesto n independent copies of the situation for n = 1. BeingA a real matrix, the bi’s may occur in complex-conjugatedpairs. Their treatment is formally the same as with realeigenvalues, only that complex exponentials occur, andpairs xi, xj of the original coordinate system will be cou-pled in the final solution together with trigonometric func-tions.• If B is not diagonal, i.e., A has a non–trivial Jordan canon-

ical form, then eBt will have, apart from (possibly complex)exponential factors, polynomials in t.

The nicest feature of linear systems is that the solutions are definedfor all times, and for all initial conditions and moreover, that apart fromthe more or less hard work required to compute the Jordan canonicalform, the solutions can in principle be completely described.

When the right hand side of a dynamical system is more compli-cated that just linear, these nice properties may fail partially or com-pletely. The theory of dynamical systems deals with establishing whatresults are possible to achieve and establishing the validity range ofsuch results (what we usually call assumptions). Along the way, manyof the results generally valid for linear systems will be repeatedly recov-ered (under specific validity conditions) and frequently used. Moreover,

18 NLD-draft

some associated linear systems will help us understanding the qualita-tive behaviour of nonlinear systems.

2.3. Exercises

Exercise 2.1. Explain how a higher–order ODE can be trans-formed into a first order ODE (Section 2.1).

Exercise 2.2. The harmonic oscillator.

(a) Recast the equation x + w02x = 0 (Newton’s equation for a

harmonic oscillator) as a dynamical system. Solve the systemfor arbitrary initial conditions.

(b) Show that any harmonic oscillator may be recasted as the samedynamical system through an adequate change of coordinates(possibly including a rescaling in time).

(c) Try similar steps for x+ βx+w02x = 0, the damped harmonic

oscillator.(d) Hooke’s law for elastic forces reads mx = −kx, where m is the

mass of a point particle, k the elastic constant (both positive)and x the elastic elongation. Recast Hooke’s law using the pre-vious steps. What is w0?

Exercise 2.3. Solve the linear system x = Ax, x ∈ R2, x0 = (3, 4),for the following cases (cf Remark 2.5):

A1 =

(−2 1

1 −2

); A2 =

(1 10 1

); A3 =

(0 20 1

).

2.4. Fundamental Theory

In this Section we want to investigate the connection between or-dinary differential equations and dynamical systems.

First we observe that any flow of a dynamical system in continuoustime φt = φ(·, t), t ∈ R defines a differential equation which is satisfiedby the trajectories t→ φ(y, t), y ∈ Rn. Namely, we set

v(x) =d

dtφ(x, t)

∣∣∣t=0

and get x = v(x); x(0) = y.The main question is under what conditions a differential equation

defines a dynamical system. We impose two conditions that are naturalfrom the applications point of view:

1. For any initial value there must be a solution.2. The solution must be unique (otherwise the evolution law would

be not uniquely specified). Hence, situations as those in Fig-ure 2.10 are forbidden.

2.4. FUNDAMENTAL THEORY 19

x

x

x

Figure 2.10. Unacceptable situations for a dynamical system.

Unicity of solutions is coupled to the concept of determinism.The first condition is ensured by continuity. This is the content of

Theorem 2.1. However, it is clear from Example 2.3 that a continuousvector field is not enough to ensure uniqueness of the solutions. Forthat, it will be necessary to demand an additional restriction.

Theorem 2.1 (Peano). Let v : Rn × R → R be continuous in aneighbourhood U×T of (x0, t0). Then there is a time interval [t0, t0 +c]such that a solution for the equation x(t) = v(x(t), t); x(t0) = x0 existsfor t ∈ [t0, t0 + c].

We will prove instead a more general theorem covering both exis-tence and uniqueness, in the more general setup of a non–autonomouscase (for the autonomous case we just skip the dependence on time ofthe vector field). Before doing so we reformulate the (non-autonomous)differential equation

x(t) = v(x(t), t); x(t0) = x0. (2.2)

By integrating with respect to t we get

x(t) = x0 +

∫ t

t0

v(x(s), s) ds. (2.3)

So any solution of (2.2) satisfies (2.3). On the other hand if v is con-tinuous we are allowed to differentiate (2.3) and therefore any solutionof (2.3) satisfies (2.2). Hence, these two equations are equivalent when-ever v is continuous. Unless otherwise stated, we will assume by defaultthat the vector field v is continuous.

Theorem 2.2 (Picard–Lindeof). Let U ∈ Rn, T ∈ R. Let v be acontinuous vector field satisfying the Lipschitz condition,

|v(x1, t)− v(x2, t)| ≤ L|x1 − x2|

on U × T . Then for any (x0, t0) ∈ U × T there are neighbourhoodsU1 × T1 ∈ U × T such that (2.2) has exactly one solution in U1 × T1.

Proof. We are going to use the Banach Fixed Point Theorem (seeAppendix A.3.1). Note that the distance |x1−x2| is a distance betweenfunctions.

20 NLD-draft

For an arbitrary point (x0, t0) in the Lipschitz domain, we considerthe map A = Ax0,t0 : C(T, U)→ C(T, U) defined by

A(x(t)) := x0 +

∫ t

t0

v(x(s), s) ds.

C(T, U) denotes the space of continuous functions on U × T . Themap A transforms continuous functions into continuous functions onthe same domain. By comparing with (2.3) we notice that any fixedpoint A(x(t)) = x(t) of this map is a solution of (2.2) and moreover allsolutions have this form. If we find a suitable complete metric spaceof functions where A is a contraction, then by Banach’s fixed pointtheorem there exists a unique fixed point for A, and hence a uniquesolution to the differential equation.

We choose (x0, t0) ∈ U1 × T0 ⊂ U × T such that diamU1 = a andsupU1×T0 |v| < K for some a,K (U1 × T0 must be chosen differentlyfrom U × T in case this set is not compact) . Now let t0 ∈ T1 ⊂ T0 besuch that diamT1 = b < a/K. Then any curve y(t) with y(t0) ∈ U1

and |y(t)| ≤ K satisfies

|y(t)− y(t0)| ≤ K|t− t0| < Kb < a.

Hence y(t) ∈ 2U1 for all t ∈ T1 (see Figure 2.11).

y(t0)1

K

y(t)

Figure 2.11. The trajectory y(t) and its bounds.

We define X to be the space of all continuous functions x : T1 →2U1, x(t0) = x0 ∈ U1 with the metric (distance)

d(x1(t), x2(t)) := ‖x1(t)− x2(t)‖∞ := supt∈T1|x1(t)− x2(t)|.

With this choice of distance, X is a complete metric space (see Appen-dix A.1). We need to prove that A maps X into itself and that it is acontraction.

2.4. FUNDAMENTAL THEORY 21

We observe that A(x(t))(t0) = x0 ∈ U1 and

|A(x(t))− A(x(t0))| ≤∣∣∣∣∫ t

t0

|v| ds∣∣∣∣ ≤ K|t− t0| < a.

This implies A(x(t))(τ) ∈ 2U1 for τ ∈ T1 and hence AX ⊂ X.Moreover,

|A(x1(t))− A(x2(t))| ≤∣∣∣∣∫ t

t0

|v(x1(s), s)− v(x2(s), s)| ds∣∣∣∣

≤ L

∣∣∣∣∫ t

t0

|x1(s)− x2(s)| ds∣∣∣∣

≤ L|t− t0| · ‖x1 − x2‖∞ ≤ Lb‖x1 − x2‖∞.

Since the right hand side is independent of t, the estimate holds alsofor ‖A(x1(t))−A(x2(t))‖∞. By making T1 smaller if necessary we mayassume that Lb < 1. Hence, A is a contraction on a complete metricspace and the Banach fixed point theorem concludes the proof.

Remark 2.7. The map A = Ax0,t0 depends continuously on x0, t0.By using the Banach fixed point theorem again we can show that thesolutions depend Lipschitz-continuously on the initial values. We willsee later that we actually have differentiable dependence.

The Banach fixed point theorem provides a recursive approximationto the exact solution.

Lemma 2.1. Let x0(t) ≡ x0 and for n ≥ 0 set xn+1(t) := A(xn(t)).Then

‖xn+1(t)− xn(t)‖∞ ≤KLnbn+1

(n+ 1)!.

Proof. We prove it by induction (see Appendix A.1.4 for a com-ment on the Induction Principle). The statement holds for n = 0 withthe estimates of the previous Theorem. For a general n the statementimplies

|xn+1(t)− xn(t)| ≤ KLn|t− t0|n+1

(n+ 1)!.

22 NLD-draft

Then, using the same estimates of the Theorem and computing theintegral in the last step, we have

|xn+2(t)− xn+1(t)| ≤∣∣∣∣∫ t

t0

|v(xn+1(s), s)− v(xn(s), s)| ds∣∣∣∣

≤ L

∣∣∣∣∫ t

t0

|xn+1(s)− xn(s)| ds∣∣∣∣

≤ KLn+1

(n+ 1)!

∣∣∣∣∫ t

t0

|s− t0|n+1 ds

∣∣∣∣≤ KLn+1

(n+ 2)!|t− t0|n+2 ≤ KLn+1bn+2

(n+ 2)!.

As an illustration, let us apply this Lemma to the class of linearsystems x = Bx, x(0) = x0. We know already that the (unique)solution is a time-exponential function. Following the Lemma,

x0(t) ≡ x0

x1(t) = x0 +

∫ t

0

Bx0 ds = x0 + tBx0

x2(t) = x0 +

∫ t

0

(x0 + sBx0) ds = x0 + tBx0 +t2

2B2x0

· · ·

xn(t) = x0 +

∫ t

0

B

((n−1∑j=0

sj

j!Bj

)x0

)ds

= x0 +n−1∑j=0

(∫ t

0

sj

j!ds ·Bj+1x0

)=

(n∑j=0

tj

j!Bj

)x0.

Hence,

x(t) =

(∞∑j=0

tj

j!Bj

)x0 = etBx0.

2.5. Exercises

Exercise 2.4. Sketch the graphs of the integral curves of the vectorfield v(x, t) = x−3t

t+3x(Section 2.1).

Exercise 2.5. Sketch the graphs of the integral curves of the vectorfield v(x, t) = t− ex (Section 2.1).

Exercise 2.6. Find defining ODE’s for the families of curves

a) x2 + Cy2 = 2yb) y2 + Cx = x3.

(Section 2.1.1).

2.6. DEPENDENCE ON INITIAL CONDITIONS 23

Exercise 2.7. Let v : R → R be Lipschitz-continuous. Prove thatany integral curve x(t) of x(t) = v(x(t)) is a monotonic function. Hint:Show that to the sets v > 0, v = 0 and v < 0 there corresponddifferent integral curves (Section 2.1.2).

Exercise 2.8. Let A be a n × n diagonal matrix with non–zeroentries on the diagonal. Prove that the space of all solutions of x = Axis an n–dimensional linear space (Section 2.1.2).

Exercise 2.9. Show that the solution x(x0, t) of a differential equa-tion as given by Picard–Lindelof theorem is Lipschitz continuous on x0

(Section 2.4).

Exercise 2.10. Construct the first three approximations of the so-lutions of

a) x(t) = t− x2; x(0) = 0b) x(t) = x2 + 3t− 1; x(1) = 1

(Section 2.4).

Exercise 2.11. Find the Taylor expansion of sin t with the help ofthe theorem of Picard–Lindelof applied to

x = −x x(0) = 0; x(0) = 1.

(Section 2.4).

2.6. Dependence on Initial Conditions

Now that we established the existence and uniqueness of the so-lutions for a dynamical system with Lipschitz right hand side, let usfurther explore the properties of such a solution. In many, if not most,models for applications in Sciences and Engineering, the right hand sidev(x, t) of the equations is not only Lipschitz but also differentiable, oreven C∞ (infinitely differentiable). We state therefore the following

Assumption: For the function v : U × T ⊂ Rn × R → Rn, weassume that there exists a continuous derivative (matrix!)

Dxv : U × T → Rn×n(R) (x, t)→(∂vi∂xj

).

Let us show that under this assumption the solutions x(x0, t) to

x(t) = v(x(t), t); x(t0) = x0

depend differentiably on x0, i.e., that the derivative Dx0x(x0, t) exists(see Figure 2.12). Let us gain some intuition around the problem. Ifthis derivative exists, how would it change in time? Does it fulfill somedifferential equation? We proceed,

24 NLD-draft

∂

∂t(Dx0x(x0, t)) = Dx0

(∂

∂tx(x0, t)

)= Dx0 (v(x(x0, t), t))

= Dxv(x, t) ·Dx0x(x0, t).

with initial condition

Dx0x(x0, t0) = Id or x(x0, t0) = x0.

x(x0 + h, t0) x(x0 + h, t)

∆x(x0, t, h)

x(x0, t0)

x(x0, t)

Figure 2.12. Modifications of the solution x(t) whenchanging the initial condition x0.

We can write this equation as the matrix differential equation

B(t) = Dxv(x(t), t) ·B(t) B(t0) = Id (2.4)

where B(t) is the n×n matrix Dx0x(x0, t). This is a linear equation andhence somewhat simpler to address. To make sense of this expressionwe must assure that such a matrix B(t) exists, i.e., that x(x0, t) isactually continuously differentiable with respect to x0.

Theorem 2.3. Under the previous assumption the derivative Dx0x(x0, t)exists and fulfills the variational equation (2.4)

Proof. Note first that we assume that v is derivable, but we wantto prove that also x is derivable with respect to x0. Fix U1, T1 asin Picard–Lindelof Theorem. Let x0 ∈ U1, t0 ∈ T1, x0 + h ∈ U1. Weconsider (see Figure 2.12)

∆x(x0, t, h) := x(x0 + h, t)− x(x0, t)

and define

A(t, h) :=

∫ 1

0

Dxv (x(x0, t) + s∆x(x0, t, h), t) ds.

2.6. DEPENDENCE ON INITIAL CONDITIONS 25

Hence, A(t, h) is an n×n matrix since it is the integral of the n×n ma-trix Dxv. Moreover, A(t, 0) = Dxv(x(t), t). Since the solutions x(·, t)are differentiable functions of time, we can compute the derivative,

d

dt(∆x(x0, t, h)) =

d

dt(x(x0 + h, t)− x(x0, t))

= v(x(x0 + h, t), t)− v(x(x0, t), t)

=

∫ 1

0

d

dsv (x(x0, t) + s∆x(x0, t, h), t) ds

=

∫ 1

0

Dxv (x(x0, t) + s∆x(x0, t, h), t) ds ·∆x(x0, h, t)

= A(t, h) ·∆(x0, t, h).

The first integral holds because of Leibniz’ rule of differentiation underthe integral sign, while the second integral holds since the dependenceof v on s occurs via x. Hence,

∆x(x0, t, h) = A(t, h) ·∆(x0, t, h) ∆x(x0, t0, h) = h

where A(t, h) is a matrix (linear operator). Moreover, A(t, h) is Lips-chitz continuous in h. Hence, for fixed h there exists a unique solutionto the matrix differential equation

M(t, h) = A(t, h)M(t, h) M(t0, h) = Id

Multiplying by h we have that the vector y(t, h) = M(t, h) · h satisfiesthe equation

y(t, h) = A(t, h)y(t, h); y(t0, h) = h.

Therefore∆x(x0, t, h) = M(t, h) · h

since both functions are solutions to the same equation with the sameinitial condition. Since M(t, h) is Lipschitz continuous in h, we havethat

lim|h|→0

∆x(x0, t, h)

|h|= lim|h|→0

M(t, h) · h|h|

= M(t, 0)

exists. This limit is Dx0x(x0, t) and satisfies the the differential equa-tion (2.4).

Corollary 2.1. If all the derivatives of the vector field up to orderk exist and are continuous then the solutions are k–times differentiablein the initial conditions.

Proof. We only have to consider the new equation

x = v(x, t)

z = Dxv(x, t) · zwhich satisfies the conditions of the previous theorem. The completeargument uses induction: If v is k times differentiable, then Dxv is

26 NLD-draft

k− 1 times differentiable. But then so is z. Hence z is (k− 1) + 1 = ktimes differentiable.

Assumption: Unless otherwise stated we will assume in what fol-lows that the vector field v is a Ck function, k ≥ 1.

CHAPTER 3

Advanced Ordinary Differential Equations

At this point we know that under suitable conditions a dynamicalsystem has a unique solution which is differentiable in time and ininitial conditions. In this chapter we will start our analysis of localproperties of the set of solutions. We will see how many classes ofsolutions we may encounter, further we will realise that there are someregions of phase space where “nothing happens”, i.e., the dynamicslooks like a transport in time without significant features. Further, wewill describe some topological (structural) features of phase space thatare helpful to identify the special sets where the dynamics does haverelevant, changing features. We end this chapter by studying propertiesof 2-dimensional dynamical systems.

3.1. The local flow

We start by noting that there is a way to unify the study of au-tonomous and non–autonomous situations.

Proposition 3.1. Any dynamical system can be recasted as anautonomous system on a larger phase space.

To achieve this we can introduce time as a new coordinate s, wheres(t) = t. Phase space is now described with coordinates (x, s). Thenew problem reads:

x = v(x, s)s = 1,

with initial conditions x(t0) = x0, s(t0) = t0, or

y = v(y)

where v = (v, 1) ∈ Rn × R = Rn+1 and y = (x, s) ∈ Rn+1. We willfocus in the sequel on autonomous systems, without loss of generality.Explicit time–dependence will be restored whenever practical.

Remark 3.1. For autonomous systems, if x(t) is a solution, so isx(t+ s):

x(t+ s) = v(x(t+ s)); x((t0 − s) + s) = x0.

Hence, we can fix the initial condition at time t0 = 0.

27

28 NLD-draft

Theorem 3.1. Let xi(t); i = 1, 2 be two solutions defined on opentime intervals Ti. Assume that their initial values coincide x1(0) =x2(0). Then x1 ≡ x2 on T1 ∩ T2.

Proof. Let b = sups ≥ 0 : x1(t) = x2(t); 0 ≤ t < s. The valuesof t where it makes sense to compare x1 and x2 belong to T1∩T2, whichis open. Hence, b ∈ T1 ∩ T2. If b ∈ T1 ∩ T2, then by continuity of thesolutions x1(b) = x2(b) (since both solutions exist for t = b). Then, bythe Picard–Lindelof theorem we can extend each solution in an uniqueway from the initial condition xi(b) for some time-interval [0, ε), ε > 0.This means that x1(b + t) = x2(b + t) for t ∈ [0, ε), thus contradictingthe definition of b. Hence, the solutions coincide on all of T1 ∩ T2.

Remark 3.2. We can also define the solution on T1 ∪ T2 as longas T1 ∩ T2 6= ∅ in a unique way:

x(t) =

x1(t) t ∈ T1

x2(t) t ∈ T2

The previous theorem guarantees that this will be the only solution onT1 ∪ T2.

We may try to extend the solution as much as possible, by splicingtogether the orbits defined on intersecting domains. This leads to thenotion of maximal integral curve. Let the solutions of a dynamicalsystem be defined on some intervals Tλ; λ ∈ Λ, all containing the pointx0:

xλ : Tλ → Rn xλ(0) = x0

We note that xλ itself may be constructed as before by gluing to-gether (smaller) time intervals Ti (this might be necessary to get timet = 0 ∈ Tλ).

Definition 3.1 (Maximal Integral Curve). We define the maximalintegral curve x(x0, t) on Tx0 = ∪λ∈ΛTλ via x(x0, t)|Tλ = xλ.

By the Picard–Lindelof theorem we have that the maximal time in-terval Tx0 is open (this was perhaps noticeable already in Theorem 3.1:if b were an interior point of the maximal time interval, we could findan ε > 0 such that we have a unique solution up to time t+ ε).

Definition 3.2 (Local Flow). The local flow is defined on A =(x0, t) ⊂ U × T : t ∈ Tx0 by

Φ(x0, t) = φt(x0) = x(x0, t)

where x(x0, t) = v(x(x0, t)) and x(x0, 0) = x0.

We summarise these findings in the following

3.2. EXERCISES 29

Theorem 3.2 (Properties of Local Flow). For the local flow,

1. A is open and the local flow is Ck.2. A ∩ (x0 × R) = Tx0 3 0.3. Φ (Φ(x0, t), s) = Φ(x0, s+ t) and Φ(x0, 0) = x0.

Proof. The only new point is that A is open. But by the theoremof Picard–Lindelof and the dependence on initial conditions we canextend the solutions at any time and at any place into small timeand space neighbourhoods. So the set A must be open.

Remark 3.3. We note that there exist several notations indicatingvery similar or the same objects. For example, the expressions Φ(x0, t),φt(x0), φ(x0, t) and x(x0, t) all denote a certain point of phase space,specifically that where the trajectory starting at x0 arrives after a timet. Although the first expression is associated to the concept of flow andthe last expression is associated to the concept of trajectory, the middleexpressions, with φ, may denote both. We will follow this tradition aslong as no ambiguity arises.

Remark 3.4. The phase space Rn is partitioned into disjoint orbitsx(x0, t) : t ∈ Tx0 (also denoted φt(x0)). If two orbits have a pointin common, i.e., x(x0, r) = x(x1, s) = y then they belong to the uniquesolution x(y, t) = x(x0, t + r) = x(x1, t + s). In short: Distinct orbitsnever cross.

Corollary 3.1. Let x : Tx0 = (a, b) → Rn be a maximal solutionwith b < ∞ and K ⊂ U ⊂ Rn a compact set. Then there is a timeτ ∈ (a, b) such that x(x0, t) 6∈ K for all t ∈ (τ, b). In other words, ifa maximal integral curve stays in a compact region it is defined for alltimes.

Proof. Assume the statement is false, i.e., that the maximal so-lution remains within K for all times in (a, b). In such a case we willarrive to a contradiction showing that the solution could be extendedbeyond b.

Since A is open it contains the set K× (−ε, ε) for some ε > 0 (sinceK is compact!) as a neighborhood of K × 0 (any neighborhood ofa compact set contains an ε-neighborhood – take the cover of K byballs B(x, ε) ⊂ A and choose a finite sub-cover). Let t be such thatb − t < ε. If x(x0, t) ∈ K then the solution could be extended up totime t+ ε > b, a contradiction. Hence, x(x0, t) 6∈ K.

3.2. Exercises

Exercise 3.1. Solve the following equations with separable vari-ables

a) xt dt+ (t+ 1) dx = 0.

30 NLD-draft

b) (t2 − 1)x+ 2tx2 = 0.

Use your previous knowledge from elementary Calculus courses.

Exercise 3.2. Solve the equation

x = x+ 2t− 3,

by applying a suitable linear transformation z = at+ bx.

Exercise 3.3. Solve the equation

(t+ 2x) dt− t dx = 0,

by applying the transformation x = ωt.

Exercise 3.4. Consider the parameter dependent equation

x = x2 + 2µt−1 x(1) = −1.

We are looking for an approximation of the solution of the form

x(t) = w0(t) + µw1(t) + µ2w2(t).

Why does such an approximation exist? Find the functions wi(t) byplugging them into the equation and comparing the coefficients of thepowers of µ.

Exercise 3.5. Let v be a Ck–vector field in R2. Assume that thecircle S1 = x ∈ R2 : ‖x‖ = 1 is a periodic trajectory.

a) How long does a trajectory through a point x0 with ‖x0‖ < 1exist?

b) Is there a stationary solution?

3.3. The Straightening–out Theorem

In this section we will see that the dynamics generated by x = v(x)close to a regular point is very simple. This means that, from the localpoint of view, only singular points are of interest.

Definition 3.3. A point x ∈ Rn is called regular point (for thevector field v) if v(x) 6= 0. Otherwise it is called singular, stationaryor equilibrium point.

Theorem 3.3. Let v ∈ Ck and x0 a regular point, i.e., v(x0) 6= 0.Then there is a Ck–coordinate change h : V → Rn in a neighborhoodV of x0 with

h∗v = en =

0...01

.

Remark 3.5. The notation h∗v indicates the map induced on thevector field v by the coordinate change h. Formally, if x = v andz = h(x), we have that z = (dz/dx) · x = Dh v, i.e., the vector field istransformed through the matrix Dhij = ∂hi/∂xj.

3.3. THE STRAIGHTENING–OUT THEOREM 31

Remark 3.6. This theorem can be interpreted as follows. After achange of coordinates, i.e., in the new coordinates, the vector field isvery simple. Its solutions are “horizontal” straight lines. This meansthat in a small neighborhood of x nothing interesting can happen (seeFigure 3.1), the dynamics is just monotonic evolution in time alongparallel flow lines.

xx

h

Figure 3.1. A change of coordinates h mapping theflow near a regular point to a standard flow.

Proof. The proof proceeds by showing that such a change of co-ordinates exists. Let1 v(x) = (v1(x), · · · , vn(x))T . Without loss of gen-erality, we may assume vn(x0) 6= 0 (some component has to be nonzerosince x0 is a regular point). Let L = y = (y1, · · · , yn)T ∈ Rn : yn = 0and S = (L + x0) ∩ U where U is chosen that vn|U 6= 0. This can bedone since by continuity of v there is a neighborhood of x0 such thatthe vector field does not vanish, see Figure 3.2.

S is a portion of a hyperplane, passing through x0 and perpendicularto the vector (0, · · · , 0, vn(x0))T . For sufficiently short time intervals|t| < ε, any initial condition x0 + y ∈ S remains within U . The set ofall trajectories of the dynamical system with initial condition x(0) =x0+y ∈ S and extending for the whole time-interval −ε < t < ε, defineshence a small open subset V ⊂ U containing S. The coordinates (y, t),with y ∈ L and −ε < t < ε describe a small cube B in Rn aroundthe origin (with Cartesian, orthonormal coordinates). After havingevolved for a time t, the trajectory starting at x0 + y arrives to thepoint x(x0 + y, t) ∈ V . We give to this point the new coordinates(y, t)T ∈ Rn (we identify t with the last coordinate of Rn = L × R).

1Given a basis, we usually prefer to denote vectors by n × 1 matrices, i.e.,column matrices.

32 NLD-draft

L

yn

U

x0

S = L+ x0

Figure 3.2. The procedure in the Straightening–out Theorem.

Hence, the change of coordinates h : V → B ⊂ Rn reads:(yt

)= h(x(x0 + y, t)) ∈ Rn.

In other words, we use the trajectories as new coordinate lines. Bythe Theorem of Picard and Lindelof, this map should be well definedand 1–to–1. Let us verify this by computing the determinant of itsinverse map at some point in B. We use hence Dh−1, the matrix offirst derivatives of h−1 with respect to each coordinate. We have, forx0 + y ∈ S,

D(h−1(y, t)) =

(∂

∂yx(x0 + y, t), x(x0 + y, t)

)where the last column of this matrix is the original vector field v, andthe first n−1 columns are the derivatives of the trajectories with respectto the initial condition. For t = 0 the trajectories lie at their initialcondition, hence h−1(y, 0) = x0 + y and the derivatives with respect toeach yi are either 0 or 1. Hence,

Dh−1(y, 0) =

1 · · · 0 v1(x0 + y)...

. . ....

...0 · · · 1 vn−1(x0 + y)0 · · · 0 vn(x0 + y)

with non–zero determinant at y = 0 (as expected) because vn(x0) 6= 0.The determinant is non–zero in all of B by continuity. Moreover, fort 6= 0 the last column of this matrix is still v. The vector field in

3.4. DIFFERENTIAL INEQUALITIES 33

the new coordinates can either be computed directly or transformingv with h. Since

v(x(x0 + y, t)) = Dh−1(y, t) · h∗v,the vector field in the new coordinates is just the unit Cartesian vectorh∗v = (0, · · · , 0, 1)T = en (the equation has unique solution since Dh−1

is nonsingular). Hence, the transformation maps the vector field intothe straight lines en + y parallel to en through L.

3.4. Differential inequalities

In this section we want to show how a vector field dominates dif-ferentiable functions. We can apply this result to prove the existenceof trajectories in a given time interval.

Theorem 3.4. Let v fulfill the assumptions of the theorem of Picard–Lindelof and x(t) be a solution of x(t) = v(x(t), t) for t ∈ R. Lety : R→ R be a differentiable function and t0 ∈ R with

y(t) ≤ v((y(t), t) and y(t0) ≤ x(t0).

Theny(t) ≤ x(t), t ≥ t0.

Proof. We will prove the situation in case the inequalities arestrict. The general situation is only slightly more difficult.

Letb = sups : y(t) ≤ x(t), t0 ≤ t ≤ s.

By continuity x(b) = y(b) if b <∞. Hence,

y(b)− x(b) < v(y(b), b)− v(x(b), b) = 0.

This implies y(b+ t) < x(b+ t) for small t since the derivative of y− xat b is negative. This contradicts the definition of b.

Example 3.1. Let v(x, t) = A(x, t) ·x with A(x, t) : Rn×R→ Rn×n

be a continuous matrix function with operator norm (see Appendix A.5)

‖A(x, t)‖ := sup‖x‖=1

‖A(x, t) · x‖ < a.

We want to investigate the lifetime of the trajectories starting at x(t0) =x0. We have

d

dt‖x(x0, t)‖2 =

d

dt<x(x0, t), x(x0, t)>= 2 <x(x0, t), x(x0, t)>

= 2 <x(x0, t), A(x(x0, t), t)x(x0, t)>

≤ 2a <x(x0, t), x(x0, t)>

= 2a‖x(x0, t)‖2.

Now we consider the differential equation in Rz = 2az

34 NLD-draft

with solutionz(t) = z(0)e2at.

The previous theorem applied to y(t) = ‖x(x0, t)‖ tells us that

‖x(x0, t)‖2 ≤ ‖x0‖2e2at.

This means that the solutions are finite at any time and hence liveforever by Corollary 3.1.

3.5. Types of flow–lines

In this section we will study the different ways a trajectory maylook like. There are not too many possibilities.

Theorem 3.5. Let x(t) be a maximal integral curve of the systemx = v(x) with Lipschitz continuous right hand side, and let T be themaximal time interval the trajectory lives. Then one of the followingholds (see Figure 3.3):

1. x(t) ≡ const and T = R, i.e., the trajectory is defined for alltimes,

2. x(t) is injective and regular; i.e., x(t) 6= 0 for t ∈ T , or3. x(t) is regular periodic; i.e., T = R and there is a minimal

positive τ with

x(t) = x(t+ τ) for all t ∈ R.

τ is called the period.

Proof. Let x(s) = v(x(s)) = 0 for some s ∈ T . Then x(t) ≡ x(s)is the unique solution for all t ∈ R with this initial condition x(s), bythe theorem of Picard and Lindelof. This is case 1.

Now let x(t) 6= 0 for all t ∈ T . Then either x(t) is injective (andwe are in case 2) or there are r < s ∈ T such that x(r) = x(s). Letτ = s− r. This yields x(r) = x(r + τ). We will see that the period isthe same along the whole trajectory. Indeed, the integral curves

x1(t) = x(t+ r) and x2(t) = x(t+ r + τ)

have the same initial condition xi(0) = x(r) = x(s). By uniqueness weget

x(t+ r) = x1(t) = x2(t) = x(t+ r + τ) for all t ∈ T .Hence, x(t) = x(t + τ); x(0) = x(τ) (“set” t = t + r). Then x(T ) ⊂x([0, τ ]), since the values repeat afterwards. This is a compact set (asthe image of a compact set under a continuous map), implying thatT = R by corollary 3.1.

We are left to show that there is a least positive period. Let

G = τ ∈ R+ : x(0) = x(τ).G 6= R+, otherwise we are in case 1 since then the trajectory would beidentically constant. Let us show that G has a least positive element.

3.5. TYPES OF FLOW–LINES 35

Since x(t) is not constant, for some t0 > 0 we have x(t0) 6= x(0). Bycontinuity there exists ε > 0 such that x(t0 + t) 6= x(0) for |t| < ε

2.

Hence, if g ∈ G, then g ≥ ε (i.e., G has a positive infimum). In otherwords, the period cannot be less than ε, since we have produced anε–interval where x(t) is different from x(0). Using the same argument,two different periods g1, g2 ∈ G fulfill |g2− g1| ≥ ε. Since τ is a period,we have ε ≤ τ . The intersection G ∩ [ε, τ ] is nonempty and finite.Hence, there exists a least element τ0 for G. Moreover, the elements ofG are of the form τ0 · N.

case 1

case 2

case 3

.

Figure 3.3. Types of flow lines

Remark 3.7. Since the set of periods (including positive and neg-ative periods) is a proper closed additive subgroup of the reals, anotherproof of the Theorem can be based on the following algebraic result: Anyproper subgroup of the reals is a lattice (see Theorem A.15), i.e., it isequal to a positive constant τ0 (the least period) times Z.

Example 3.2. We consider a special flow on

S1 × S1 = (z1, z2), zi ∈ C : |zi| = 1, i = 1, 2 = T2

given byφt(z1, z2) =

(e2πitz1, e

2πiatz2

).

We consider the integral curve through the point (1, 1); i.e., the curve

φt(1, 1) =(e2πit, e2πiat

).

We have two different cases

36 NLD-draft

a) a ∈ Q; a = p/q. This implies for time t = q

φq(1, 1) =(e2πiq, e2πip

)= (1, 1)

and we are in case 3 of Theorem 3.5 (see Figure 3.4).

Figure 3.4. A periodic orbit on the surface of a torus.

b) a 6∈ Q. Then the equation

φt(1, 1) = (1, 1)

has the only solution t = 0, since e2πit = 1 implies t ∈ Zand e2πiat = 1 implies ta = m ∈ Z But then we would havea = m

t∈ Q, which is not the case. So we are in case 2 of

Theorem 3.5 (see Figure 3.5). We see also that in this Example,case 1 of the Theorem is never attained.

Figure 3.5. Part of a non–periodic trajectory.

In the case a 6∈ Q we can say even more. Let the firstcoordinate be arbitrary but fixed, i.e., e2πib. Then the trajectoryintersects the circle (e2πib, z2) with |z2| = 1, i.e., φt(1, 1) =(e2πib, z2(t)

)if and only if t = b + k; k ∈ Z. In this case

3.6. MORE TOPOLOGICAL PROPERTIES 37

φb+k(1, 1) =(e2πib, e2πi(ab+ka)

). Hence, the trajectory of (1, 1),

regarded at fixed integer times starting at t = b, wanders aroundthe circle (e2πib, z2) in some way.

Let now the second coordinate z2 = e2πic be given. Will thetrajectory come any close to z2 ? By Kronecker’s theorem A.18the sequence ka mod 1 is dense in [0, 1) and hence the points(e2πib, e2πi(ab+ka)

)k∈N approximate the point

(e2πib, e2πic

)and any

arbitrary neighbourhood of z2 will contain be points of the se-quence. Hence, the orbit φt(1, 1) is dense, provided a is irra-tional (recall that z2 was given but arbitrary). The same holdsfor an arbitrary trajectory φt(z1, z2).

There is another method to investigate the flow of the previousExample. We fix the first circle at e.g., b = 0. We are left with a circleon the second coordinate, the set (1, z2) : |z2| = 1 of phase space. Weare looking for the first intersection of the trajectory φt(1, z) : t > 0with the circle (1, z2) : |z2| = 1. This intersection occurs for t = 1and subsequently for all integer times. This defines a map from thecircle to itself denoted by f : S1 → S1 in the following way:

f(z) =<(0, 1), φ1(1, z)>= e2πiaz,

Where <,> is a scalar product, in this case the projection onto thesecond component of the image φ1(1, z) of z. Let now z = e2πiα. Hence,

f(e2πiα) = e2πiae2πiα = e2πi(α+a).

This map is called the Poincare map of the flow (with respect to thecontrol section given by the circle). It contains all important informa-tion about this flow. It has the following properties:

1. a ∈ Q: Every orbit is periodic and the phase space decomposesinto an uncountable number of periodic (invariant) trajectories.

2. a 6∈ Q: Every orbit is dense.

In a different notation, the Poincare map corresponds to the map

f(α) = α + a mod 1,

defined on the unit interval [0, 1].

3.6. More Topological Properties

We can now introduce more fundamental concepts from DynamicalSystems theory.

Definition 3.4. A non–empty open set D in phase space is calledinvariant if for all t ∈ R, φt(D) = D ≡ φ0(D). If this holds for positivetimes, the set is called positively invariant.

The most natural invariants are trajectories, of all three kinds de-scribed by the previous Theorem.

38 NLD-draft

Definition 3.5. A system is called (topologically) transitiveif there exists a dense orbit. In this case the phase space cannot bedecomposed into closed invariant sets.

Proposition 3.2. A system f : X → X on a compact metric spaceX is transitive if and only if for any pair of non–empty open sets U, Vthere is a number N = N(U, V ) such that

φN(U) ∩ V 6= ∅.

Proof. First we assume f is transitive. Let U, V be given. Letxn = φn(x0) be a dense orbit. Then there is n1 < n2 ∈ R such thatφn1(x0) ∈ U and φn2(x0) ∈ V and hence φn2(x0) ∈ φn2−n1(U) ∩ V(the intersection is non–empty). This proves the “only if” part of thestatement.

Now we assume that part and we will prove that f is transitive (the“if” part). Let xl be a countable dense subset of X. Consider thecountable family of open balls (U 1

m(xl))m,l of radius 1/m around each

element of xl. We can number all these sets into the sequence (Un).Let V1 = U1. By the assumption, there is a n2 such that φn2(U2)∩V1 =V2 6= ∅. Using V2 and U3 we can define in the same way a set V3. Wecan continue this process using the induction principle and define thefamily V of sets Vn, n ∈ N via

Vk+1 = φnk+1(Uk+1) ∩ Vk 6= ∅.

Then Vk+1 ⊂ Vk and there is a point x ∈⋂n Vn. The orbit of this point

meets all the open sets Vn and hence is dense.

Definition 3.6. A system is called minimal if all orbits are dense.In this case the phase space does not contain any proper closed invariantset.

The next theorem shows that every system contains a minimal sub-system. Its proof uses elements of set theory glimpsed at in the Ap-pendix.

Theorem 3.6. Let f : X → X be a continuous function governing adiscrete dynamical system on a compact metric space. Then it containsa minimal invariant subset.

Proof. Let X 6= ∅ be compact and metric. Then it is also atopological space with a countable basis of open sets. Also, X itselfis compact and invariant. As a consequence of Definition 3.6, if X isnot minimal, then it contains a proper, closed, non–empty invariantset K0. In the same way, if K0 is not minimal, then it must containa proper closed invariant subset K1. If this set is not minimal we cancontinue this process and find K2. By transfinite induction we get achain

K0 ⊃ K1 ⊃ K2 ⊃ · · · ⊃ Kα ⊃ · · ·


of compact sets. The intuition of the proof goes by showing that sucha chain cannot be continued “forever”. It has to come to a stop on aminimal compact set.

For any ordinal α we find an open set Un from the basis of thetopology with Kα ⊃ Un ⊃ Kα+1. Since we have a countable basis ofthe topology the above chain must stabilise at a countable ordinal α.But then

⋂αj=0Kj is non–empty, compact, invariant and minimal since

it does not contain any proper closed subset (it may consist of just onepoint). Actually, we do not need the countability of the basis to derivethe stabilisation.

Remark 3.8. Minimality implies transitivity. The above exampleof a rotation dynamical system on the Torus is minimal (transitive)if and only if a is irrational.

Definition 3.7. A system is called mixing if for any two open setsV, U there is a number N = N(U, V ) ∈ R such that φn(U)∩V 6= ∅ forall n > N .

Remark 3.9. It is not hard to prove that mixing implies transitivity(use Proposition 3.2).

Example 3.3. The irrational rotation is not mixing. To analysethis statement it is sufficient to consider the induced map f on the unitinterval obtained from the Poincare map. Let d = ‖a‖S1 = mina, 1−aand U = V = (0, d/4). Then the property

fn(U) ∩ V = (na, na+ d/4) ∩ (0, d/4) 6= ∅implies

fn+1(U) ∩ V = (na+ a, na+ a+ d/4) ∩ (0, d/4) = ∅.Remark 3.10. We have seen that in this Example we could reduce

our considerations from a flow to a map on a lower-dimensional space.This can be done in quite generality as we will see later. This is onereason why time–discrete systems are important. It is also possible toconstruct from any time–discrete model a continuous time model. Mostof the results of this section for time–continuous systems can be restatedfor time–discrete systems without major differences.

We can now ask ourselves whether it is typical that all trajecto-ries are regular. The next Theorem indicates that this depends ontopological facts.

Theorem 3.7 (Theorem of the Hedgehog). Let M = S2, the surfaceof a sphere, and v a continuous vector field on the sphere. Then v hasat least one singular point, i.e., there exists x0 ∈ S2 with v(x0) = 0.

We defer the proof to Appendix A.2.3.2. Note that a consequenceof this Theorem is that any differential equation on the sphere has atleast one stationary solution (constant orbit, case 1 of Theorem 3.5).

40 NLD-draft

Remark 3.11. The above theorem shows that global properties im-ply the existence of stationary points. We will see later that also localproperties imply the existence of stable stationary trajectories.

3.6.1. Limit Sets. In the theory of Dynamical Systems one isinterested in asymptotic behaviour. Let us specify what we mean bythis.

Definition 3.8. Let x ∈ X a point and φt : X → X be a flow.The ω–limit set of the trajectory through x is defined as the set of allaccumulation points in the future:

ω(x) = ωφ(x) := y ∈ X : ∃tn → +∞ with φtn(x)→ y .The α–limit set of the trajectory through x is defined as the set of allaccumulation points in the past:

α(x) = αφ(x) := y ∈ X : ∃tn → −∞ with φtn(x)→ y .

We have,

Lemma 3.1. If y ∈ φt(x) then ω(y) = ω(x). Also ωφt(x) = αψt(x)where ψt = φ−t.

Proof. If y ∈ φt(x) then there is a time s such that y = φs(x).Since

limn→+∞

φtn−s(y) = limn→+∞

φtn−s (φs(x)) = limn→+∞

φtn(x)

the first statement is proved. The second statement amounts to ainterchange of names between positive and negative times, flows andlimits.

Let us identify with own names some special limit sets:

O :=⋃x∈X

ω(x) A :=⋃x∈X

α(x) L := O ∪ A

The next statement is general for all continuous flows.

Theorem 3.8. Let X be compact and φt : X → X be a continuousflow. Then for all x ∈ X we have

1. ω(x) 6= ∅,

2. ω(x) = ω(x), i.e., the ω–limit set is closed,3. the ω–limit set is invariant, i.e., φt(ω(x)) = ω(x),4. the ω–limit set is connected.

Remark 3.12. As examples below will show, compactness is essen-tial for the first and the last statement.

Proof. By compactness any sequence φtn(x) ∈ X (tn → +∞) hasan accumulation point. Such a point is by definition in the ω–limit setof x.


For the second statement we will show that the complement of ω(x)is an open set, i.e., that for each point y in this complement, a wholeopen neighbourhood Uε(y) also belongs to the complement.

If a point y ∈ X \ ω(x), then no sequence (φtn(x))tn→+∞ can con-verge to y. Hence there is a number ε > 0 such that

Uε(y) ∩ φt(x)t≥0 = ∅.

But then ω(x) ∩ Uε(y) = ∅.Let us address the third statement. Let y ∈ ω(x), φtn(x) → y

(tn → +∞) and z ∈ φt(y). In other words, we take a point y in theω–limit set of x, a sequence in the trajectory of x tending to y and apoint z in the trajectory of y. Note that the sequence φtn+t(x) also hasa limit belonging to ω(x). Then

φtn+t(x) = φt (φtn(x))→ φt(y) = z.

Hence, z ∈ ω(x) This means that φt(ω(x)) ⊂ ω(x). Now we willshow the reversed inclusion. If w ∈ ω(x) there is a sequence of timessn → +∞ such that φsn(x)→ w. Hence, for some time s, we have

φsn−s(x) = φ−s (φsn(x))→ φ−s(w) = u ∈ ω(x),

since the left hand side of the previous line also tends to the limit set.This means that w = φs(u) ∈ φs(ω(x)), so we proved that ω(x) ⊂φt(ω(x)), and therefore both sets are equal, i.e., the limit set is aninvariant set.

Finally, let us assume that the ω–limit set of a point x is not con-nected. This means it can be “divided into two parts”: there are opensets V1, V2 in X such that

ω(x) ∩ Vi 6= ∅, ω(x) ⊂ V1 ∪ V2, V1 ∩ V2 = ∅.

This is, the closure of the open sets is disjoint, the limit set is completelycontained in the union of both open sets and has nonempty intersectionwith both. If there is a time T such that φt(x) ∈ V1 for all t > T thenV2 ∩ ω(x) = ∅ since all subsequences of points in V1 that have limit,must have their limit within the closure V1. Hence, if the assumptionholds, the trajectory has to “oscillate” between V1 and V2 for arbitrarilylarge times, it cannot “settle down” to one of the sets. In addition,since V1 ∩ V2 = ∅, both sets are “separated”, not even their closureshave points in common, so there exists an ε > 0 such that the distancebetween the sets (the infimum of distances between some point in oneset and some point in the other set) d(V1, V2) = ε > 0. However,since the trajectory is connected, there will be portions of it outsideboth sets and these portions will occur repeatedly for arbitrarily largetimes. Hence, there must be a sequence of moments tn → +∞ such thatd(φtn(x), Vi) >

ε3. Since phase space is compact there is a convergent

42 NLD-draft

subsequence tnk such that

u = limk→∞

φtnk (x) 6∈ V1 ∪ V2.

Because of its definition, u belongs to ω(x) and the limit set must havepoints outside both open sets. This contradicts our assumption, hencethe limit set is connected.

Let us consider now some examples of non–compact phase spacesand realise that when the compactness hypothesis does not hold, thenthe statements of the previous Theorem need not hold either.

Example 3.4. The parallel vector field v(x, y) = (1, 0) on the planehas straight parallel lines as trajectories. They only “accumulate atinfinity”. Therefore the ω–limit set of any point is empty.

Example 3.5. The ω–limit set of the flow in Figure 3.6 is notconnected.

ω(x) = L1 ∪ L2 α(x) = y

x

y

L1

L2

.

.

Figure 3.6. A non–compact phase space may have anon–connected ω–limit set.

Definition 3.9. A closed simply connected set D such that φt(D) ⊂D for all positive times is called a trapping region.

Remark 3.13. Simply connected means that (a) Any two points ofthe set can be joined by an arc lying completely within the set and (b)Any closed loop in the set can be shrinked to a point.

3.8. POINCARE–BENDIXSON THEORY 43

Remark 3.14. A sufficient condition for D being a trapping regionis that the vector field is directed everywhere inward on the boundaryof D.

If a dynamical system has a compact trapping region D, then thereexists a neighbourhood U of D such that for all x ∈ U , ω(x) ∈ D.

Definition 3.10. A closed (positively) invariant set A such thatthere exists a neighbourhood V of A (V is open and A ⊂ V ) suchthat for every x ∈ V there exists a positive time T such that for anyneighbourhood N of A, φt(x) ∈ N for all t > T is called an attractingset.

Definition 3.11. An attractor is a transitive attracting set.

Remark 3.15. For a compact phase space, an attractor coincideswith the ω–limit set. For non–compact phase spaces, the ω–limit setmay be non–connected and there may exist many attractors. Each at-tractor however will coincide with some connected component of theω–limit set.

3.7. Exercises

Exercise 3.6. Show that every bounded trapping region D suchthat the vector field points inwards on the boundary of D contains anattracting set.

Exercise 3.7. Consider the dynamical system defined on R2,y = −yx = x(1− x)(1 + x).

• List as many invariant sets you can.• Show that there exist exactly three trajectories of type 1, i.e.,x(t) ≡ const.• Find a trapping region. There are many choices.• Show that the set x ∈ [−1, 1], y = 0 is positively invariant,

and moreover an attracting set.• Find the ω–limit set of the system and check if it is connected

(note that phase space is not compact).

3.8. Poincare–Bendixson Theory

3.8.1. The Theorem of Poincare–Bendixson. This Section isdevoted to one of the most beautiful theorems in Dynamical Systems.It is actually a topological theorem like the Theorem of the Hedgehog.It is valid on the sphere and other 2–d spaces but we will recogniselater that it is not valid on the torus by recalling Example 3.2. Toarrive to the Theorem we will need to prepare the terrain by buildingsome foundational structures.

44 NLD-draft

We consider an arbitrary Lipschitz vector field v on the sphere S2

with the assumption that the number of its singular points is finite:

#x ∈ S2 : v(x) = 0

<∞.

This is not a big restriction since “typical” vector fields have only afinite number of singular points. It is automatically fulfilled when thevector field is real analytic.

Lemma 3.2. Let Σ be a curve transversal to the vector field v, i.e.,the tangent vector (of the curve) TxΣ at a point x ∈ Σ is not parallelto v(x), for all x ∈ Σ. Consider now all forward intersections of thetrajectory of x with Σ, with their time ordering. In other words, letφt(x)t≥0 ∩ Σ = xi, i ∈ N where for xk = φtk(x) we have tk < tk+1.Then the sequence xi is monotonic on Σ, i.e.,

xk ∈ [xk−1, xk+1].

Proof. We note on passing that transversality implies that wehave a discrete set of times when the trajectory of x intersects Σ.

Σ

xi

xi−1v(x)

φt(x)

D

Figure 3.7. The curve Σ on the sphere, a trajectoryφt(x) and some vectors from the vector field v(x).

Assume the sequence of forward intersections meets at least twodistinct points on Σ. Since the vector field is transverse to the curveΣ, the vectors v(x), x ∈ Σ, they all “point” in the same direction


relative to Σ, otherwise by continuity and the mean value theorem,some vector v(x) would be tangent to Σ. Hence, the trajectory φt(x)crosses Σ always in the same direction (say “from left to right”). Letus consider two consecutive crossings, this is depicted in Figure 3.7.

By the Picard–Lindelof Theorem, the trajectory cannot cross itselfand therefore it cannot exit the region D which is bounded by theportion of the trajectory between crossings, i.e., (φt(x))ti−1≤t≤ti , andby the segment [xi−1, xi] ⊂ Σ (the vector field on Σ points inwards).Therefore xi+1 must be “below” xi along Σ and hence, xi ∈ [xi−1, xi+1].

Remark 3.16. We assumed in the picture that the vector fieldpoints inward. But actually this is not a restriction. If it points out-wards we can consider the region S2\D which then has the same pictureas Figure 3.7.

Remark 3.17. Note that if the sequence of forward intersectionsxi consists of only one point, visited infinitely many times, the tra-jectory φt(x) is periodic. If the point is visited only once, φt(x) neverreturns to Σ for t > 0.

Corollary 3.2. The number of points from the ω–limit set lyingon Σ is at most one:

# x : ω(x) ∈ Σ ≤ 1.

Proof. To prove the Corollary we will show that if this numberof points is not zero, then it is exactly one. Let y ∈ ω(x) ∩ Σ. Sincethe vector field is transversal to Σ we have v(y) 6= 0. By Theorem 3.3,the Straightening–out Theorem, there is a small neighborhood U ofy such that the vector field is horizontal (up to a coordinate trans-formation). Hence, any trajectory entering U must cross Σ. Now ify = limn→+∞ φtn(x) there are points xk ∈ φt(x)t≥0 ∩ Σ close to y.These points build a sequence as that of Lemma 3.2. Therefore, anypoint y ∈ ω(x) ∩ Σ is an accumulation point of the sequence xi. Bythe previous Lemma, this sequence is monotonic and hence it has atmost one limit point.

This limit point cannot be a singular point since the vector field istransverse to Σ.

Lemma 3.3. If ω(x) does not contain a singular point, then ω(x)is a periodic trajectory. Moreover, for any y close to x we have

ω(x) = ω(y).

Proof. Let z ∈ ω(x) and w ∈ ω(z) ⊂ ω(x). Since w is not asingular point, we can find a curve Σ through w and transversal to thevector field v. Let limn→+∞ φtn(z) = w where tn is chosen in such a

46 NLD-draft

Σ

x

y

xi

yi

xi+1

φt(x)

φt(y)

ω(x)

Figure 3.8. The points x, y, their trajectories, thecurve Σ and the limit set ω(x).

way that zn = φtn(z) ∈ Σ. But all the points zn ∈ ω(x) by invariance.So Corollary 3.2 implies that

zn = zm = w for all n,m ∈ N.

Hence the trajectory φt(z) is periodic by Theorem 3.5. Considernow another point z′ ∈ ω(x), then φt(z′) is also a periodic trajectorywithin ω(x). But the trajectory φt(x) cannot approach two differentperiodic orbits as t→ +∞. Hence, ω(x) is a single periodic trajectory.This ends the proof of the first statement.

For the second statement, see Figure 3.8. Let yi be the intersectionsφt(y) ∩ Σ. Since φt(y) does not intersect φt(x) we have yi ∈[xi, xi+1]. This implies

limi→∞

yi = limi→∞

xi

and hence the ω–limit sets of x and y coincide.

Let us now deal with the case in which the ω–limit set containssingular points.

Lemma 3.4. Let z1 6= z2 ∈ ω(x) and v(z1) = v(z2) = 0. Then thereis at most one trajectory φt(y) ⊂ ω(x) with

ω(y) = z1 α(y) = z2.


Σ1

Σ2

φt(y1)

φt(y2)

z

z2

A1

A2

D

.

..

Figure 3.9. Hypothetical picture of an assumption con-trary to Lemma 3.4.

Proof. Let us assume the contrary, i.e., that we have two dis-joint trajectories, based on the points y1 and y2, within ω(x). Referto Figure 3.9 to visualise the different elements of the proof. We haveφt(y1)∩φt(y2) = ∅, φt(y1)∪φt(y2) ⊂ ω(x) and α(yi) = z2 andω(yi) = z1. Moreover, the vector field does not vanish on any point ofφt(y1) or φt(y2). Therefore we always can find transversal curvesΣ1 and Σ2 to the trajectories φt(y1) and φt(y2). The two trajecto-ries φt(yi); i = 1, 2 and the two points z1, z2 bound a region D. Wemay assume that φt(x) ⊂ D (otherwise we consider S2\D). The tra-jectory φt(x) must accumulate on the boundary of D. Moreover, thetrajectory φt(x) must also intersect Σ1 and Σ2 transversally (sinceit comes arbitrarily close to the trajectories φt(yi); i = 1, 2). Hence,the region D is divided into two parts D = A1∪A2 as in Figure 3.9. Insuch a case, the trajectory φt(x) enters A1 and cannot leave it any-more. This contradicts the assumption that both trajectories φt(y1)and φt(y2) are in the ω–limit set of x (both parts A1 and A2 cannotbe in the ω–limit set of x).

Now we arrived finally at the theorem of Poincare and Bendixson.

Theorem 3.9 (Poincare–Bendixson). Let v be a vector field on S2

with #x : v(x) = 0 < ∞. Then for x ∈ S2 we have one of thefollowing possibilities:

1. ω(x) is a singular point2. ω(x) is a periodic trajectory

48 NLD-draft

3. ω(x) is the union of a finite number of singular points x1, · · · , xnand connecting regular trajectories φt(yij) with

α(yij) = xi ω(yij) = xj.

Moreover, for i 6= j the curve φt(yij) is unique.

Proof. If ω(x) does not contain a singular point we are in case 2by Lemma 3.3.

If ω(x) does not contain any regular point it must be a single singu-lar point (it is connected and the number of singular points is finite).

If there is a regular trajectory φt(y) ⊂ ω(x) which is not periodicthen ω(y) consists of a single point z. This is because in this case, forany transversal curve Σ we have by Lemma 3.2 that #φt(y)∩Σ ≤1. Moreover, z is a singular point, because if it were regular we couldchoose Σ such that z ∈ Σ. But this means that φt(y) intersects Σinfinitely often in the unique limit point. Therefore ω(y) is a periodictrajectory, which is a contradiction. The same holds for α(x). This isprecisely case 3 and the proof is finished.

Example 3.6. The three cases of the Poincare–Bendixson Theoremare illustrated in Figures 3.10 and 3.11. Note the difference between thedirection arrows in Figures 3.9 and 3.10.

x

x

.

.

.

Figure 3.10. Case 3 of Poincare–Bendixson Theorem,on the right, infinitely many trajectories having the sameα and ω limit set.

Remark 3.18. The proof of the Theorem builds on the constructionof an open connected set, using a portion of a trajectory and the curveΣ, see Figure 3.7. Such construction is essentially a Jordan region onthe plane, hence, the Theorem holds as well within positively invariantJordan regions of R2. On the other hand, in more complicated spacessuch as the surface of a Torus, we noticed in Example 3.2 that theω–limit for the case a /∈ Q is the whole torus.

3.10. THE LORENZ EQUATIONS 49

x

y

.

Figure 3.11. Cases 1 and 2 of Poincare–Bendixson Theorem

Remark 3.19. A trajectory connecting two different singular pointsis called a heteroclinic orbit. If both singular points coincide, thetrajectory is called a homoclinic orbit.

3.9. Summary

We have seen that locally only singular points are interesting, be-cause of the Straightening–out Theorem 3.3. We will later in the bookwork on the qualitative features of the trajectories in the vicinity ofsingular points. We have also developed structures to deal with as-ymptotic behaviour (limit sets, trapping regions, etc.). Finally withthe Poincare–Bendixson Theorem 3.9 we realise that two-dimensionalflows on relatively “simple” spaces have only simple asymptotic be-haviour.

3.10. The Lorenz Equations

A full-featured dynamical system that nevertheless has a rathersimple appearance, with only two (quadratic) nonlinear terms is givenby the Lorenz equations[10.]:

x = σ(y − x)

y = rx− y − xzz = xy − bz.

This system originated when modeling the Benard experiment, whichstudies the dynamics of a gas fluid in a small chamber (aspect ratio b =8/3) with constant temperatures T0 > T1 at the lower and upper platesrespectively, under the influence of gravity. x, y, z represent amplitudesin a simplified, truncated, model solution as well as the deviation froma linear vertical temperature profile. The fluid parameters are givenby σ, the Prandtl number (usually fixed at the value 10 for numericalillustrations) and r, the Rayleigh number. Usually, numerical and otherexperiments are done varying r in (0, 30). At this moment in the bookwe should be able to identify singular points and realise that the number

50 NLD-draft

and location of these points change with r. Advanced students maysucceed in finding a trapping region.

This model was conceived along the lines devised by Benard fora laboratory experiment with typical sizes of the order of millimeters.Some of his observations occur in the model as well. It has been claimedthat the model could be a metaphor for atmospheric behaviour, but itis too crude to describe the complicated atmospheric dynamics. Themodel was studied numerically in 1963 by Lorenz, when computerswere several orders of magnitude slower than today. The anecdote tellsthat two runs with initial conditions differing in the fourth significantfigure, soon gave rise to completely different dynamics, thus awakingthe attention of scientists towards sensitivity to initial conditions, whichwe will discuss on a later Chapter.

3.11. Exercises

Exercise 3.8. Compute all singular points and (at least portionsof) ω(x) for the system:

x = ax− y − x(x2 + y2)

y = x+ ay − y(x2 + y2),

for the cases a > 0 and a < 0.

Exercise 3.9. Compute all singular points and their dependencewith r for the Lorenz equations.

Exercise 3.10. Find a trapping region for the Lorenz equationsfor the case 0 < r < 1. Hint: Try with a region shaped as an ellip-soid. Choose the constants so that the vector field points inwards onthe surface of the ellipsoid.

Exercise 3.11. Find a trapping region for the Lorenz equationsthat holds for any (nonnegative) r. Hint: Try again with a regionshaped as an ellipsoid. Now it has to be a much larger ellipsoid than inthe previous exercise and its “centrum” will be shifted along the z-axis.

CHAPTER 4

Discrete Dynamical Systems

We investigate in this Chapter the important issue of discrete timedynamical systems. Apart from the natural connection between flowsand maps, discrete time dynamical systems are relevant in many ap-plications.

4.1. Flows versus Maps

There are several ways to obtain a map f from a flow φt. We listtwo important cases here:

4.1.1. The Time–one Map. Given a flow φt(·) : Rd → Rd, wedefine f : Rd → Rd by

f = φ1.

Thus, we obtain a new system, given by fn; n ∈ Z. This corresponds toa (time–)discretisation of the original system. Note that since this mapis simply the flow of a continuous–time dynamical system recasted atdiscrete time intervals, all properties of continuity and derivability ofthe flow are automatically inherited by the time–one map. Althoughtime–one maps are very important and indeed useful, this approachhas some limitations.

1. f is always homotopic to the identity, i.e., there is a contin-uous map F : Rd × [0, 1]→ Rd such that for each t ∈ [0, 1] themap F (·, t) : Rd → Rd is a homeomorphism satisfying F (·, 0) =id and F (·, 1) = f (two objects are homotopic if they can beconnected by a continuous family of objects of the same kind inthe way described here)1. The map F can be chosen to be φt(·).This means that the class of time-1-maps is rather restrictive.F.e. let f : T2 → T2 be the “reflection” (x, y) → (y, x). Thismap exchanges the two circles x = 0 and y = 0. But these twocircles cannot be deformed one into another by a continuousdeformation. Hence this map cannot be the time–one map of aflow.

2. Let f(x) 6= x but fn(x) = x for some n > 0. Then for all t ∈ Rwe have

fn (φt(x)) = φn+t(x) = φt (φn(x)) = φt (fn(x)) = φt(x).

1Whenever the flow φt is smoother than just C0, f will inherit its smoothnessand F can be actually “upgraded” to a diffeomorphism.

51

52 NLD-draft

This means all the points φt(x) are periodic. In particular wecannot have isolated periodic orbits!

There is also a need for studying discrete dynamical systems of afundamentally different nature than just time–one maps.

4.1.2. Orientation Preserving Maps.

Definition 4.1. A surface (2–d) in R3 is called orientable, ifwhen moving a pair of orthogonal vectors at each point along any con-tinuous closed loop on the surface, the final vectors at point x can bemapped onto the original ones (i.e., they are not the mirror image ofthe original pair). The typical non-orientable surface is the Mobiusband.

Orientation is not confined to 2–d spaces, the choice of basis vectorsin Rn defines an orientation in that space. Maps defined on orientablespaces may preserve the orientation i.e., the induced map on the tan-gent space has determinant one. In 2-d again, the image of two tangentorthogonal vectors is not the mirror image of the starting pair. In sucha case they are called orientation preserving. Both the previousexample and the next to come are standard examples of orientationpreserving maps.

4.1.3. The Poincare Map. On a (small) transversal section Σto a periodic trajectory γ we can define the following map:

f = fΣ : U ⊂ Σ→ Σ,

where fΣ(x0) is the first intersection point (for t > 0) of the trajectoryx(x0, t), x0 ∈ Σ with Σ (see Figure 4.1). This map is also calledthe First-return Map. Σ is also called the Poincare section or controlsection.

In order to have a Poincare map the flow of the original system hasto fulfill some basic requirements valid for each point x ∈ Σ:

1. Σ is orientable and transverse to the flow, i.e., there exists anormal vector n(x) and n(x) · v(x) > 0.

2. The flow has to return to Σ, i.e., there exists a positive t0(x)such that φt0(x)(x) ∈ Σ (this guarantees that returns will occurfor all real times).

3. v(x) 6= 0 on Σ (otherwise no return is possible).

By the differentiable dependence on initial conditions this map iswell-defined and as smooth as the original flow. Moreover, the Poincaremap is invertible (suitably adjusting the domain), since the Picard–Lindelof Theorem assures that each point in phase-space belongs toonly one trajectory. We note that the dimension of Σ is smaller thanthat of the original phase space: dim Σ = d− 1.

4.1. FLOWS VERSUS MAPS 53

x0

f(x0)

γ

Figure 4.1. The Poincare First-return Map.

φ12

fΣ1

Σ2

φ21

Σ1

Figure 4.2. Two equivalent control sections connectedby diffeomorphisms.

The new dynamical system generated by the Poincare map f in-herits many properties of the flow. It is widely used to understand theflow. Intuitively, the map factors out the “uninteresting” flow-directionwhere the dynamics of individual points consists only on transport fromΣ back to Σ again (compare with the Straightening–out theorem).

Let us see how this map depends on the transversal section Σ.Assume we have two properly chosen transversal sections Σ1 and Σ2.

54 NLD-draft

Then we can define two maps φ12 : Σ1 → Σ2 and φ21 : Σ2 → Σ1, bysliding along a trajectory x(x0, t) for increasing times, where x0 ∈ Σ1

or x0 ∈ Σ2, respectively until the other transversal section is met. Themaps are invertible with a proper choice of domains. Moreover, bythe smooth dependence on initial conditions both maps are diffeomor-phisms. Now we can compare the Poincare maps fΣ1 = φ21 φ12 andfΣ2 = φ12 φ21. We have

fΣ2 = φ12 fΣ1 φ−112 .

Therefore the Poincare map changes only by a smooth diffeomorphism,essentially a change of coordinates, and hence they all are (smoothly)equivalent (see Figure 4.2).

4.1.4. Suspensions. We can associate to a given map a flow whichinherits many properties of the map. This flow is called the suspen-sion of f : Rd → Rd. We consider the larger space (x, t) ∈ Rd × [0, 1]with the vector field v(x, t) = (0, 1). then we identify the “top” andthe “bottom” via

(x, 1) = (f(x), 0).

This gives a “torus-like” manifold with a smooth flow (the identificationmap is smooth) (see Figure 4.3).

f

f(x)

f(x) f 2(x)

f 2(x)x

x

v

Figure 4.3. A suspension of f generates a flow.

Recalling Example 3.2, the rotation f on the circle R/Z = Rmod 1, mapping x → x + a mod 1, has as suspension the originallinear flow on the torus.

4.2. ONE-DIMENSIONAL MAPS 55

4.1.5. Basic Topological Concepts for Maps. Many conceptsdefined in the previous chapters can be almost automatically trans-ported from flows to maps, simply by replacing continuous time bydiscrete time when relevant. Among them, we can count the conceptsof trajectory or orbit, invariant set, transitivity, minimality, α– and ω–limit sets, as well as other concepts discussed in the previous Chapter.

Some of these concepts, however, although they can be definedin almost the same way, have different properties in flows and maps.For example, fixed points both in maps and flows may be defined asinvariant sets with only one element, i.e., fn(x) = x for all n ∈ Z orφt(x) = x for all t ∈ R. However, a more functional definition for a mapis just f(x) = x, whereas in flows they are defined as x : v(x) = 0.

To recall more differences, note that trajectories with initial con-dition x0 for flows are invariant curves on phase space, while trajec-tories in maps are discrete sets of points. An invariant curve in amap may contain infinitely many trajectories within it. Also, periodicorbits in flows are closed curves whereas periodic orbits in maps arediscrete and finite sets of points (if N is the minimal positive integersuch that fN(x) = x, the periodic orbit consists of the set of pointsx, f(x), · · · , fN−1(x)).

Important subsets of phase space for a map f are Fix(f) ⊂ Per(f),the set of fixed points and the set of periodic points (of any positiveperiod N ≥ 1) respectively.

4.2. One-dimensional Maps

We will start with discrete dynamical systems in one dimension.One of the main advantages over higher dimensional systems is thatthe domain, the real numbers, is an ordered set. More generally, phasespace will be either R, [0, 1] or S1 (the unit circle, which can be re-garded as the interval [0, 1] where we identify the endpoints). First wedeal with invertible systems and show that their behaviour is not toocomplicated.

4.2.1. Invertible Systems on [0, 1]. An injective smooth systemon the interval is given by a strictly monotonic function f : [0, 1] →[0, 1]. Since any graph of a continuous function defined on [0, 1] mustintersect the diagonal any such map has at least one fixed point (recallthat on maps a fixed point is such that x = f(x)).

Theorem 4.1. Let f be an injective system. Then for any forwardorbit fn(x)n∈N the ω-limit set is a fixed point.

Proof. The set of fixed points Fix(f) for f (i.e., the set of pointswhere the graph of f intersects the diagonal (see Figure 4.4)) is non–empty (see a proof in Appendix A.3.3) and closed. Without loss ofgenerality we may assume that f is strictly monotonically increasing.

56 NLD-draft

y

x0 1

1

f(0)

f(x)

f(1)

fixed points

Figure 4.4. An injective map of the interval.

In case Fix(f) = [0, 1], i.e., f = Id, we are done. If this is not thecase, consider a subinterval free from fixed points, i.e., let x ∈ (a, b) ⊂[0, 1]\Fix(f). Let the interval (a, b) be maximal. Then by maximalityboth a and b must be fixed points, i.e., f(a) = a and f(b) = b. Wehave by monotonicity and continuity that

f((a, b)) = (f(a), f(b)) = (a, b) ⊂ [a, b].

Moreover we have for x ∈ (a, b) that on the whole subinterval (a, b) ei-ther f(x) > x or f(x) < x. Otherwise we would have points z, y ∈ (a, b)such that f(z) > z and f(y) < y (we may assume z < y without lossof generality). In such a case, for some point w ∈ (z, y) there wouldbe an intersection of the graph of f with the diagonal (by continu-ity) and hence another fixed point in (a, b). Let the first case hold,i.e.,f(x) > x. Then b > fn(x) > fn−1(x) since fn(x) = f(fn−1(x)) ∈(a, b) = (f(a), f(b)) for all n. By the convergence of bounded mono-tonic sequences on a closed interval, there is a point

x∞ = limn→∞

fn(x) ∈ [a, b].

But we have

f(x∞) = f(

limn→∞

fn(x))

= limn→∞

f (fn(x)) = limn→∞

fn+1(x) = x∞.

Therefore x∞ is a fixed point and must be equal to b, and the ω–limitset of the whole interval (a, b] is the fixed point b. The case f(x) < x


gives x∞ = a. In any case, all points that are not fixed points lie insome maximal interval and hence have a fixed point as ω–limit set.

Remark 4.1. The set of fixed points might be complicated. Thereare differentiable functions whose fixed point set can be any compactsubset of [0, 1]. On the other hand if the function f is real analytic thefixed points must be isolated and hence their number is finite.

4.2.2. Notions of Recurrence. We will see in this section thatthere is a well-developed theory of maps of the circle. This investigationstarted with Poincare. He related arbitrary diffeomorphisms of thecircle to the “linear” maps: the rotations. Let us start with somedefinitions.

Definition 4.2. A point x is called positive (negative) recur-rent if x ∈ ω(x) (x ∈ α(x)).

Remark 4.2. All periodic points are recurrent (positive and neg-ative). But there can be more. Any trajectory of the irrational torusflow is recurrent (ω(x) = α(x) = T2).

Proposition 4.1. Any point in a minimal invariant subset is re-current.

Proof. Let f |Z : Z → Z be minimal. This means that the orbit ofany z ∈ Z is dense in Z, i.e., α(z) ∪ ω(z) = Z and z is recurrent.

Definition 4.3. A point x is called non-wandering if for all openneighbourhoods U of x there is an n ∈ N such that

fn(U) ∩ U 6= ∅.The set of non-wandering points is denoted by Ω.

Theorem 4.2. Ω is closed, invariant and contains the ω– and α–limit sets of all trajectories.

Proof. Let xn ∈ Ω and xn → x. Let U be an open neighbourhoodof x. Then xn ∈ U for all large n. Let xn ∈ U . since xn ∈ Ω there isan N ∈ N such that fN(U) ∩ U 6= ∅. Hence x ∈ Ω. This proves thatΩ is closed since it contains the limits of its convergent sequences.

Let f(x) ∈ U . Then x ∈ f−1(U) and f−1(U) is open by continuity.If x ∈ Ω there is an N such that fN (f−1(U)) ∩ f−1(U) 6= ∅. Hence

∅ 6= f(fN(f−1(U)

)∩ f−1(U)

)= fN(U) ∩ U

and f(x) ∈ Ω.Let x = limk→∞ f

nk(y) ∈ U (this means that x is in the ω–limit ofsome trajectory). Then for all large k we have fnk(y) ∈ U . Hence

fnk+1(y) ⊂ U and fnk+1(y) = fnk+1−nk(fnk(y)) ⊂ fnk+1−nk(U)

and for N = nk+1 − nk we have

U ∩ fN(U) 6= ∅

58 NLD-draft

and x ∈ Ω.

Corollary 4.1. Any recurrent point is contained in Ω since itsα– and ω–limit are in Ω.

Corollary 4.2. If X is compact and f continuous then Ω 6= ∅.

Proof. This follows since the ω–limit sets are not empty by The-orem 3.8.

The next statement is a simple consequence of Theorem 3.6 andProposition 4.1.

Corollary 4.3. The set R of all recurrent points (positive andnegative) is non-empty.

Proof. Let M denote the closure of the union of all minimal sub-sets of X. Then by Proposition 4.1 and Corollary 4.1 we have

Per(f) ⊂M ⊂ R ⊂ Ω.

Recall that Fix(f) ⊂ Per(f) and it is non–empty (see Appen-dix A.3.3).

4.2.3. Circle Diffeomorphisms. Consider R as a phase spaceand also as an additive group. It has as such the proper subgroupZ. The quotient between both groups is a new space S1 = R/Z. Weunderstand this quotient as the identification of all elements of R thatdiffer in one element of Z. We end up thus with the interval [0, 1] withthe endpoints identified, since all other real numbers can be built byadding an integer to that interval. This space is equivalent to the unitcircle S1 e.g., via the map t→ e2πit. The map π : R→ R/Z = S1 givenby π(x) = x mod 1 is called a factorisation. We start by presentinga way of relating maps on different spaces.

4.2.3.1. The Rotation Number.

Definition 4.4. Lift. Let f : S1 → S1 be a continuous map of thecircle. We call F : R → R a lift of f whenever the following relationholds: f(x mod 1) = F (x) mod 1 (this can be restated as f(π(x)) =π(F (x)). Without loss of generality we assume that f is orientationpreserving (otherwise we consider f 2). F (x + 1) − F (x) = deg f iscalled the degree of f .

Figure 4.5 depicts a function f and its lift F . Before continuing toinvestigate circle diffeomorphisms we will prove some facts about lifts.

Lemma 4.1. Let f : S1 → S1 be a continuous map with lift F . Thendeg f ∈ Z and it does not depend on x nor on the choice of the lift.


0 1 0 1x xa

1 f(a−) = 1F (x)

f(0) = F (0) mod 1

F (1) = F (0) + 1

F (0)

Figure 4.5. The graph of the lift F (left) and the graphof f = F mod 1.

Proof. We have for π : R→ S1 defined as above:

π (F (x+ 1)) = f (π(x+ 1)) = f (π(x)) = π (F (x)) .

Since π(a) = π(b) ⇐⇒ a − b ∈ Z we have F (x + 1) − F (x) ∈ Z.Because F (x+ 1)−F (x) is a continuous function it must be constant,i.e., independent of x. Let G be another lift. Then

π (G(x)) = f(π(x)) = π (F (x)) .

Since G(x)−F (x) is continuous and integer valued it must be constant.In particular

G(x+ 1)− F (x+ 1) = G(x)− F (x).

Hence, there are as many different lifts of f as there are integers.In particular, we can always pick a lift that coincides with f at somepoint and by continuity at some open set in (0, 1).

Lemma 4.2. The degree depends continuously on the map f . Henceit is locally constant.

Proof. Let g : S1 → S1 be close to f in the max distance:

maxx∈S1|g(x)− f(x)| < 1

4.

We consider two lifts F,G such that they coincide with f and g at somepoint in [0, 1) and define

φ(x) = G(x)− F (x).

Then ‖φ‖∞ < 14, since if this does not hold, then by continuity there ex-

ists x0 such that |G(x0)−F (x0)| = 14. But then |g(π(x0))−f(π(x0))| ≥

14

since the latter difference differs from the former in an integer.

60 NLD-draft

For x ∈ [0, 1) we have, then,

G(x+ 1)− φ(x+ 1) = F (x+ 1) = F (x) + deg f = G(x) + deg f − φ(x)

hence,

|G(x+ 1)−G(x)− deg f | = | deg g − deg f | = |φ(x+ 1)− φ(x)|≤ |G(x+ 1)− F (x+ 1)|+ |G(x)− F (x)|

<1

4+

1

4=

1

2.

This proves the assertion.

Lemma 4.3. If f is an orientation preserving diffeomorphism, then

F (x+ 1)− F (x) = deg f = 1.

Proof. f is monotonically increasing and therefore also F (theycan be chosen to coincide on some open set in [0, 1)), so deg f > 0.F maps the interval [0, 1) continuously to an interval of width deg f ,since F (1) − F (0) = deg f . Assume deg f ≥ 2. Pick two points y1,y2 on F ([0, 1)) such that y2 = y1 + 1. These points correspond to twodifferent points on [0, 1), namely yi = F (xi), i = 1, 2. But

0 = π(F (x2)− π(F (x1)) = f(π(x2))− f(π(x1)) = f(x2)− f(x1),

and hence f is not bijective.

Theorem 4.3. Let f : S1 → S1 be an orientation preserving diffeo-morphism of the circle. Then there exists (independent of the lift)

τ(f) := lim|n|→∞

F n(x)− xn

mod 1

for all x ∈ S1 and τ(f) is independent of x. This number is called therotation number. If f has a periodic orbit, the rotation number isrational.

Proof. There is no loss of generality in taking n positive, nor inchoosing the lift with F (0) ∈ [0, 1]. If f has a periodic orbit andthe limit τ(f) exists independent of x, let us compute it using theperiodic point y of period m ≥ 1. Since the limit is assumed to exist,any infinite subsequence will have the same limit. We consider thenn = km, k → ∞. Note first that since y is periodic, Fm(y) = y + p,for some integer 0 ≤ p ≤ m. Hence,

F km(y) = Fm(Fm(· · · (Fm︸︷︷︸k times

(y)) · · · )) = y + kp

Computing the limit, we have

limkm→∞

F km(y)− ykm

= limk→∞

F km(y)− ykm

= limk→∞

(y + kp)− ykm

=p

m.

This proves that the last statement follows from the first.


Let |x − y| < 1. Then |F (x) − F (y)| < 1 (since f is injective andhence, deg f = 1). Hence, |F n(x)− F n(y)| < 1 for all n ∈ N and∣∣∣∣F n(x)− x

n− F n(y)− y

n

∣∣∣∣ ≤ 1

n(|F n(x)− F n(y)|+ |y − x|) ≤ 2

n.

Hence if the rotation number exists for x, it takes the same value forany y such that |x − y| < 1. Moreover, let y = y0 + k, 0 ≤ y0 < 1,k ∈ N. Then

F (y) =F (y0 + k)

=F (y0) + (F (y0 + 1)− F (y0)) + · · ·+ (F (y0 + k)− F (y0 + k − 1) = F (y0) + k.

Repeating this argument iterating on F , we obtain that F n(y) =F n(y0) + k and further that F n(y) − y = F n(y0 + k) − (y0 + k) =F n(y0) + k − (y0 + k) = F n(y0)− y0. Hence,∣∣∣∣F n(x)− x

n− F n(y)− y

n

∣∣∣∣ ≤ 1

n(|F n(x0)− F n(y0)|+ |y0 − x0|) ≤

2

n.

for all x, y ∈ R, proving that the rotation number is independent of x.It remains to prove that the limit exists. Let us attempt to calculatethis limit for x = 0.

Since F is monotonically increasing and deg f = 1, there exists aninteger 0 ≤ r < n such that r < F n(0) < r + 1. This shows that thesequence F k(0)/k is bounded. Repeating the argument m times, wehave mr < F nm(0) < m(r + 1). Dividing these inequalities by n andnm, we obtain

|Fnm(0)

nm− F n(0)

n| < 2

n.

Exchanging the roles of m and n we obtain a similar relation. Thus,

|Fm(0)

m− F n(0)

n| < 2

n+

2

m.

Hence, F k(0)/k is a Cauchy sequence and has therefore a limit.

Theorem 4.4. Let h : S1 → S1 be an orientation preserving home-omorphism. Then

τ(h−1 f h

)= τ(f).

Proof. Let F,H be lifts of f, h, respectively. Then H−1FH is alift of h−1fh. Since f is an orientation preserving homeomorphism themap F : R → R is strictly monotonic. We may assume (by choosingappropriate lifts) that H(0) ∈ [0, 1). As in the previous proof thisimplies that

|H(x)− x| < 2, x ∈ R.Setting x = H−1(y) we get also∣∣y −H−1(y)

∣∣ < 2, y ∈ R.

62 NLD-draft

If |x−y| < 2 then for z = (x+y)/2 we have |x−z| < 1 and |y−z| < 1.Therefore, if |x− y| < 2 we have (from the previous Theorem)

|F n(x)− F n(y)| = |F n(x)− F n(z) + F n(z)− F n(y)|≤ |F n(x)− F n(z)|+ |F n(y)− F n(z)| ≤ 2.

Hence,∣∣(H−1 F H(x))n − F n(x)

∣∣ =∣∣H−1 F n H(x)− F n(x)

∣∣≤∣∣H−1 (F n H(x))− F n H(x)

∣∣+ |F n H(x)− F n(x)|< 2 + 2 = 4.

Therefore ∣∣(H−1 F H(x))n − F n(x)

∣∣n

<4

n.

Theorem 4.5. Let f : S1 → S1 be an orientation preserving diffeo-morphism of the circle. Then

τ(f) ∈ Q ⇐⇒ f has a periodic orbit.

Proof. One direction was proved in Theorem 4.3. Let then τ(f) =pq∈ Q. Then

τ(fm) = limn→∞

Fmn(x)− xn

mod 1

= m limn→∞

Fmn(x)− xnm

mod 1 = mτ(f) mod 1.

Hence,

τ(f q) = 0.

We will show now that assuming that f q does not have fixed pointsleads to a contradiction. Let F q be the lift of f q with F q(0) ∈ [0, 1).If F q(x) − x ∈ Z then x is a fixed point. Hence, if there are no fixedpoints, 0 < F q(x)−x < 1 (the relation holds for x = 0 and F q(x)−x isa continuous function that cannot take integer values). By continuityand periodicity, there exists a δ > 0 such that

0 < δ ≤ F q(x)− x ≤ 1− δ.Hence,

nδ ≤n−1∑i=0

[F q(F qi(x)

)− F qi(x)

]= F qn(x)− x ≤ n(1− δ)

and the rotation number is δ ≤ τ(f q) ≤ 1−δ. The contradiction arisedassuming f q did not have fixed points.

If τ 6∈ Q then the orbits of f are displaced in the same way as theorbit of the rotation Rτ(f)(x) = x+ τ(f) mod 1.


Theorem 4.6. Let the rotation number of an orientation preservingcircle diffeomorphism f be irrational.

Then for n1, n2,m1,m2 ∈ Z and x ∈ R we have

n1τ +m1 < n2τ +m2 ⇐⇒ F n1(x) +m1 < F n2(x) +m2.

Proof. The function

p(x) = F n1(x) +m1 − F n2(x)−m2

is continuous and never changes its sign since otherwise p(x) = 0 im-plies that fn1(x) = fn2(x) (or fn1−n2(x) = x since f is invertible) and xwould be periodic, which is impossible since τ(f) /∈ Q. Let us assumethat p(x) < 0. The other case is similar.

We write y = F n2(x). Then

F n1−n2(y)− y < m2 −m1.

Let n ∈ N. Define inductively y0 = 0 and yi+1 = F (n1−n2)(yi), i =0, · · · , n− 1. Then

F n(n1−n2)(0) = yn =n−1∑i=0

(yi+1−yi) =n−1∑i=0

(F (n1−n2)(yi)−yi) < n(m2−m1).

Hence,

τ = limn→∞

F n(n1−n2)(0)

n(n1 − n2)<m2 −m1

n1 − n2

.

Similarly

F n1(x) +m1 > F n2(x) +m2 =⇒ n1τ +m1 > n2τ +m2.

We are now ready to prove the following theorem

Theorem 4.7 (Poincare). Let f be an orientation preserving dif-feomorphism of the circle with irrational rotation number. Then

1. f is transitive =⇒ f is topologically conjugate to Rτ(f),2. f is not transitive =⇒ there is a surjective monotonic continu-

ous map h : S1 → S1 with

h f = Rτ(f) h.This is called a semi-conjugacy.

Proof. We start by proving the semi-conjugacy part and subse-quently verify that if f is transitive then h is not only surjective butalso injective. We fix a lift F : R→ R of f , x ∈ R and write τ = τ(f).

We define a map H on the set

B := F n(x) +m : n,m ∈ Zby

H (F n(x) +m) = nτ +m.

64 NLD-draft

The set B consists of all possible lifts of the orbit of x by f . On theset B we have already the property we are interested in, namely

H F = Rτ Hsince

H F (F n(x) +m) = H(F n+1(x) +m

)= (n+ 1)τ +m

= Rτ (nτ +m) = Rτ H (F n(x) +m) .

The first equality holds since deg f = 1 and hence F (x+1)−F (x) = 1,from which it subsequently follows that F (x + m) = F (x) + m. Firstwe will extend H to B and subsequently to all R.

Lemma 4.4. H has a continuous extension to B.

Proof of the Lemma. Note that by Theorem 4.6 H is mono-tonic on B. For any z ∈ B there are sequences xn, yn ∈ B having z astheir limit and such that xn z and yn z. Since H is monotonicboth the left–hand–side and right–hand–side limits of H at z exist,independently of the choice of sequence, i.e., limn→∞H(xn) = H(z−)and limn→∞H(yn) = H(z+) exist. If these limits were different the setR \ H(B) contains the interval (H(z−), H(z+)). In the Appendix wewill prove that for τ irrational the set

H(B) = nτ +m : n,m ∈ Zis dense in R (Theorem A.18). Therefore R\H(B) does not contain anyinterval, H(z−) = H(z+) = H(z) and the limit limx→zH(x) = H(z)exists.

We are going to show now that we can extend H to the entire realline. Note on passing that H is continuous and monotonic on B andsurjective onto R (since H(B) is dense).

Consider an interval (a, b) ⊂ R\B. By surjectivity we have H(a) =H(b) so we set for all x ∈ (a, b)

H(x) = H(a) = H(b).

Then H : R→ R is monotonic, surjective and continuous. Moreover

H F = Rτ Hby monotonicity and continuity. For z = F n(x) +m ∈ B we get

H(z + 1) = H (F n(x) +m+ 1) = nτ +m+ 1 = H(z) + 1.

This holds by continuity and monotonicity for all z. Hence H can beprojected to a continuous function h = H mod 1: S1 → S1. Thiscompletes the proof of the semi-conjugacy part.

If f is transitive, i.e., there is a dense orbit (fn(x))n, the set R \Bdoes not contain any interval. Moreover, B = R and H is a bijection.


Intuitively, one may expect that an orientation preserving diffeo-morphism with irrational rotation number will behave as an irrationalrotation. The above Theorem indicates that it is almost like this, sucha map is semi-conjugate to a rotation. To assure full conjugacy, weneed only a little more structure. This part of the theory was done 50years after Poincare by Denjoy in 1932.

Theorem 4.8 (Denjoy). Let f : S1 → S1 be a diffeomorphism withirrational rotation number τ(f) and with derivative f ′(x) of boundedvariation. Then f is transitive and hence, h is a conjugacy.

Proof. For the proof we will need the following lemma. It statesthat the asymptotic behaviour of all points on the circle is the same.

Lemma 4.5. For x, y ∈ S1 we have

ωf (x) = ωf (y).

Proof of Lemma 4.5. We want to show that ω(y) ⊂ ω(x). Bythe invariance of the ω–limit set (Theorem 3.8) y ∈ ω(x) implies ω(y) ⊂ω(x). So let us assume that y ∈ S1 \ ω(x). Also by Theorem 3.8 theω–limit set of x is compact. Hence its complement is open. But opensets on the circle are special: The set S1 \ ω(x) is the union of disjointintervals

S1 \ ω(x) =⋃n∈Z

In In = (an, bn) an, bn ∈ ω(x).

The intervals (an, bn) are the connected components of S1\ω(x). There-fore their endpoints must belong to ω(x). Since ω(x) is invariant itscomplement must also be invariant and hence, we have

f

(⋃n∈Z

In

)=⋃n∈Z

In.

Since f is a diffeomorphism the image of an interval (an, bn) is alsoan interval J that (by invariance) cannot contain a point of ω(x) andwhose endpoints are in ω(x). Therefore it must be one of the In:

f((an, bn)) = (am(n), bm(n))

and inductively

fk((an, bn)) = (am(n,k), bm(n,k)).

The forward images of (an, bn) cannot return to themselves, otherwisethe map

fk : (an, bn)→ (an, bn) = (a(m(n,k), b(m(n,k))

would have a fixed point and then by Theorem 4.5 τ ∈ Q. Hence,n 6= m(n, k) for all n and k 6= 0. Since the intervals (an, bn) are alldisjoint the orbit of y has at most one point in each interval In andhence ω(y) ∩ S1 \ ω(x) = ∅. Hence, ω(y) ⊂ ω(x). Exchanging the roleof x and y we get ω(x) ⊂ ω(y).

66 NLD-draft

Coming back to the Theorem, we note that the intervals In arecalled wandering intervals. They constitute the complement of theω–limit set of any point on the circle. In particular Ω = ω(x), i.e.,the set of non-wandering points coincides with this universal ω–limitset (ω(x) ⊂ Ω and In ⊂ (S1 \ Ω)). We proceed with the proof byshowing that assuming that f is not transitive and that f ′ is of boundedvariation leads to a contradiction.

If f is not transitive there exists a nonempty interval I0 = (a0, b0) ⊂S1 \ ω(x). By Lemma 4.5, the orbit

(fk(I0)

)k

of this interval consistsof disjoint intervals and the map

fk : I0 → fk(I0) = Ik

is a diffeomorphism between these two intervals.Now we set

φ(x) = log |f ′(x)|.Then φ has bounded variation because f ′ has bounded variation and|f ′| ≥ c > 0 on the compact set S1 (log z has bounded derivative oneach interval [a, b] with 0 < a < b <∞). Let

V = var(φ) = supxi∈S1

n−1∑k=1

|φ(xk+1)− φ(xk)|.

If x, y ∈ I0 then (fk(x), fk(y)) ∩ (f l(x), f l(y)) = ∅ for k 6= l and

V ≥n−1∑k=0

∣∣φ(fk(x))− φ(fk(y))∣∣ ≥ ∣∣∣∣∣

n−1∑k=0

φ(fk(x))−n−1∑k=0

φ(fk(y))

∣∣∣∣∣=

∣∣∣∣∣logn−1∏k=0

|f ′(fk(x))| − logn−1∏k=0

|f ′(fk(y))|

∣∣∣∣∣=

∣∣∣∣log

((fn)′ (x)

(fn)′ (y)

)∣∣∣∣ .Hence,

e−V ≤∣∣(fn)′ (x)

∣∣∣∣(fn)′ (y)∣∣ ≤ eV .

On the other hand, since fn is a diffeomorphism from I0 to In we havethat

infx∈I0

∣∣(fn)′ (x)∣∣ |I0| ≤ |In| ≤ sup

x∈I0

∣∣(fn)′ (x)∣∣ |I0|

by the Mean Value Theorem. This yields that

|In|+ |I−n| ≥(

infx∈I0

∣∣(fn)′ (x)∣∣+ inf

y∈I0

∣∣∣(f−n)′ (y)∣∣∣) |I0|

≥ |I0| infx,y∈I0

√∣∣(fn)′ (x) (f−n)′ (y)∣∣

≥ |I0|√e−V = e−V/2 |I0| .


We used that a+ b ≥√ab for a, b > 0. Hence,∑

n∈Z

|In| ≥∑n∈Z

e−V/2 |I0| =∞

which is a contradiction since the total length of the disjoint intervalsIn cannot exceed the length of S1.

Remark 4.3. If f is twice differentiable then it is of bounded vari-ation.

4.2.3.2. The Denjoy Example. We are going to construct an exam-ple of an C1+δ–diffeomorphism that is not transitive. This is due toDenjoy and later to Herman. We fix a sequence

ln = |Jn| = cδ(|n|+ 1)−1/δ

where cδ =(∑

n∈Z(|n|+ 1)−1/δ)−1

.Given an irrational α, the intervals Jn are arranged in the same

order on the circle as the orbit nα and we “blow up” a point of the orbitnα to an interval Jn and “squeeze” the complement of the intervals tokeep total length equal to one. This way the complement Ω of theintervals is a compact perfect disconnected subset of the circle andhence a Cantor set (the orbit nα is dense, this implies that for any twopoints of the circle there is an n such that nα ∈ (x, y); hence, there is aninterval In between the two points and Ω is totally disconnected). Fordetails about the Cantor set we refer to Appendix A.2.1 and Chapter 6.

A standard Denjoy map f is defined by mapping the interval Jndiffeomorphically to Jn+1, with derivative 1 at the endpoints (see fig-ure 4.6).

We can choose the map f : Jn → Jn+1 to have Holder continu-ous derivative of exponent δ. For n < 0 and (x, y) = Jn we have(f(x), f(y)) = Jn+1 and

|f(x)− f(y)| = |Jn+1| ≤ supz∈Jn|f ′(z)| · |Jn| .

This gives the following bound on the derivative (we need to make thederivative only slightly larger than the ratio of the length of the twointervals). Hence, we can find a diffeomorphism f : Jn → Jn+1 whosethe derivative at any point z ∈ Jn is at most

1 ≤ |f ′(z)| ≤(

1 +1

|n|

)|Jn+1||Jn|

=

(1 +

1

|n|

)cδ(|n+ 1|+ 1)−1/δ

cδ(|n|+ 1)−1/δ

≤(

1 +1

|n|

)(1− 1

n

)−1/δ

.

68 NLD-draft

J0

J1

J−1

J2

J3

0

α−α

2α

3α

Figure 4.6. A Denjoy map.

Hence we have for x, y ∈ Jn that

|f ′(x)− f ′(y)| ≤

∣∣∣∣∣1−(

1 +1

|n|

)(1− 1

n

)−1/δ∣∣∣∣∣

≤∣∣∣∣1− (1 +

1

n

)(1− 2

δ · n

)∣∣∣∣≤ D

1

|n|+ 1= D |Jn|δ ≤ D · |x− y|δ.

Similarly we can estimate the derivative for n > 0. This defines a mapin the class C1+δ. Moreover the intervals Jn are wandering. Hence, fis not transitive.

4.3. Exercises

Exercise 4.1. Compute the solutions of the systemxn+1 = axn + b, n ≥ 0x0 = X

where a and b are constants.

Exercise 4.2. Compute the solution of the systemxn+1 = β(xn)α, n ≥ 0x0 = X

4.3. EXERCISES 69

where β and X are positive numbers (Hint: Recast the problem as in theprevious exercise by a change of variables performed taking logarithms).

Exercise 4.3. Consider the function

f(x) =ax

1 + bx

where a > 1 and the discrete dynamical systemxn+1 = f(xn), n ≥ 0x0 = X

Through a change of variables of type:x = φ(z) = z/(1 + γz) (choose some adequate γ), the system becomeslinear in z. Solve the system.

Exercise 4.4. The linear map, xn+1 = rxn was advanced centuriesago by Malthus to model population grow.

1. Find the general solution for x0 > 0.2. For which values of r the solution presents exponential growth?3. For which values of r the solution presents exponential decrease?4. Is exponential growth biologically sensible? Under which limits?

A sometimes serious, sometimes less relevant flaw of this model is thatin population problems x is a (nonnegative) integer, not just any arbi-trary real number.

Exercise 4.5. The logistic map ([11.]), xn+1 = f(xn), where f(x) =rx(1− x), r ∈ (0, 4], x ∈ [0, 1], has received an extraordinary attentionin the past.

1. Explain how this map “corrects” the problem of exponentialgrowth in Malthus model.

2. The model assumes a maximal population K, the carrying ca-pacity (hidden in this formulation). Show that the non-integer“flaw” of Malthus model reflects here as x taking rational valuesin [0, 1] with denominator K.

3. Find the location of fixed point(s) of the map as a function ofr.

4. For what values of r can we use Banach Fixed Point Theoremto compute the fixed points?

5. Repeat the two last points for the first iterated map f f , i.e.,xn+1 = f(f(xn)).

Exercise 4.6. Consider the systemx = −y + x(1− x2 − y2)y = x+ y(1− x2 − y2)

This system may be easily analysed using polar coordinates. Performthe coordinate change and verify that the set Σ = r > 0, φ = 0 isa good Poincare Section. Compute Poincare’s “First-return map” by

70 NLD-draft

picking an initial condition on Σ and calculating its time-evolution upto t = 2π. Compute the fixed points of the map.

Exercise 4.7. Show that the system y′′ + (y2 + (y′)2 − 2) + y = 0has a periodic, non-constant solution (Hint: Polar coordinates).

Exercise 4.8. Show that the map Aα : T2 → T 2 given by

Aα(x, y) = (x+ α, x+ y) mod 1

is topologically transitive if α 6∈ Q.Is this map mixing if α 6∈ Q (cf. Example 3.3, Chapter 3) ?What happens for α ∈ Q?

Exercise 4.9. Let Φt be the billiard flow in the unit square, i.e.,the trajectories flow linearly inside the square and are reflected by theelastic law (“incoming angle equals outgoing angle”). Let us fix aninitial angle and consider the Poincare map with respect to one of thesides of the square. Show that this map is equivalent to a rotation.Hint: You can calculate the map directly in elementary trigonometry.Another way is to reflect the square to get 4 copies which constitute asquare of side–length 2. By reflection the trajectory is a straight linethrough the boundary.

Exercise 4.10. Let X be a compact, perfect (i.e., no isolated points)metric space. Show that when f : X → X is a transitive homeomor-phism (recall: homeomorphism: a continuous map that is invertible;transitive: there is a point x ∈ X such that the set fn(x) : n ∈ Z isdense in X), then there is a point y ∈ X such that already its forwardorbit is dense, i.e.,

fn(y) : n ∈ N = X.

Exercise 4.11. Let f : [0, 1)→ [0, 1) be given by

f(x) = 10x mod 1.

How does the set α(x) look like (α(x) is defined here as the set ofaccumulation points of all pre-images)? What is the set α(0)?

Exercise 4.12. Let f : [0, 1)→ [0, 1) defined by

f(x) = 2x mod 1.

Is there a point x ∈ [0, 1) such that ω(x) is countable infinite?Hint: Use the binary expansion of a real number. Try to identify therepeating patterns in this expansion

Exercise 4.13. Consider the system xn+1 = 4x3n− 3xn. Show that

periodic points are dense in [−1, 1] Hint: Let x = cos θ and considerthe map θ → 3θ on the unit circle.

CHAPTER 5

Stability of Dynamical Systems

5.1. Stability

We are now interested in understanding how a dynamical systemchanges when it is “slightly” modified. This is a very important ques-tion since in experimental situations nothing is known with absoluteexactness. We are always “approximating” in one way or the other.Either in the choice of our initial conditions, that can never be exactlymatched any closer than the accuracy threshold that we can achieve,or in the choice of the model. For example, parameters that are as-sumed to be fixed, such as temperature, will not be exactly the samein two runs of the same experiment, some variation, even if it is of onepart in a million, will be unavoidable. Therefore, we need to know howthese facts influence the behaviour of the system. This complex set ofquestions is called stability.

We may think of two kinds of stability.

1. How do nearby initial conditions for a given system behaveasymptotically?

2. What happens if one modifies (perturbs) the dynamical system?

The first question is known as dependence on initial conditions, namely,whether arbitrarily close initial conditions will separate as t → ∞ ornot. The second question is part of the topic of structural stability,considering what is the fate of a given initial condition under the influ-ence of “arbitrarily close” dynamical systems. This last problem maybe subsequently divided into a local and a global part, depending onwhether the modifications occur for a small region near a singular pointor for large portions of phase space. We will be more specific below.

Focusing on the first question, let p be a singular point, i.e., v(p) = 0and x = v(x). Then x(t) ≡ p is an integral curve of the system. If x0

is close to p, what happens to the trajectory x(x0, t) for t→∞? Doesit get closer to p, away from p or is it indifferent?

Example 5.1. Let U : Rn → R be a potential (energy) (see Fig-ure 5.1). The gradient field

x(t) = −∇U(x(t))

points into the direction of maximal decrease. Hence, the trajectoriesflow “downhill”.

71

72 NLD-draft

U(x1)

U(x2)

U(x3)

.

.

Figure 5.1. Dynamics under the influence of a potential.

Definition 5.1 (Lyapunov Stability). A singular point x0 is called(Lyapunov) stable if for any neighborhood V (x0) there exists an openset U(x0) ⊂ V (x0) such that for all y ∈ U(x) and for all t > 0, thetrajectory x(y, t) ∈ V (x0). If in addition limt→∞ x(y, t) = x0, then thesingular point is called asymptotically stable. Singular points thatare not stable are called unstable.

Question: When is a singular point asymptotically stable?In the case of a gradient flow as in Figure 5.1, the points x1 and x3

are stable because any nearby trajectory flows to these points. Thereare at the local minima of U(x) and there is no possibility to flowdownhill further on. However, the point x2 is not asymptotically stable.It is called a saddle point and there are many ways to continue to flowdownhill for nearby trajectories. The projected vector field looks like inFigure 5.2 (left) for x1 and x3, while for x2 it is depicted on Figure 5.2(right).

Let us now generalise this idea. We want to find a function whichmimics the role of a potential in such a way that we can decide whethera point is stable or not. Such a function is called a Lyapunov func-tion.

5.2. Lyapunov Functions

5.2.1. Inner Lyapunov Stability.

Definition 5.2. Let x0 be a singular point of the dynamical systemx = v and D(x0) a neighbourhood of x0. A C1 function H : D(x0)→ Ris called a Lyapunov function if H(x) ≥ 0, H(x) = 0 ⇐⇒ x = x0

5.2. LYAPUNOV FUNCTIONS 73

Figure 5.2. Local flow. Left: Near a stable fixed point.Right: Near a saddle point.

and

∇H · v ≤ 0 on D.

Remark 5.1. H simulates a potential for v. The inequality in thedefinition means

d

dtH(x(x0, t)) =

n∑i=1

∂H

∂xi

dxidt

= ∇H · v ≤ 0

i.e., the function H is non-increasing and in case of strict inequalityin D \ x0, it is strictly monotonically decreasing along trajectories.

Theorem 5.1 (First Lyapunov Stability Theorem). (a) If for asingular point x0 there exists a Lyapunov function, then this point is(Lyapunov) stable. (b) If in addition ∇H · v < 0 on D \ x0, then x0

is asymptotically stable.

Proof. (a) For any neighbourhood V (x0), consider the compactball B of radius r > 0 centered at x0, with r sufficiently small suchthat B ⊂ V (x0) ∩ D. Let 2ε > 0 be the minimum value of H(x) onthe surface ∂B of B (this surface is a compact set and hence H(x)attains a minimum), and U(x0) = x ∈ B : H(x) < ε. V (x0) andU(x0) are the neighbourhoods in the definition of stable fixed point.For y0 ∈ U(x0), x(y0, t) ∈ U(x0) for all t > 0 since H(x(y0, t)) is non-increasing along trajectories. Therefore, the trajectory exists foreverand x0 is stable.

(b) It remains to show that limt→∞ x(y0, t) = x0. Since H is in thiscase monotonically decreasing along trajectories, limt→∞H(x(y0, t))exists. We will show that this limit is zero. Assume that this limitis some value a > 0 instead. Then H(x(y0, t)) > a > 0 for all t > 0.Let δ > 0 be so small that H(x) < a if |x − x0| < δ. This implies|x(y0, t) − x0| ≥ δ for all t > 0. Let W = U(x0) ∩ x : |x − x0| ≥ δ.Then we have, for some negative b,

∇H · v < b < 0

74 NLD-draft

in the region W . This implies that

d

dtH(x(y0, t)) < b < 0

and after a finite amount of time H(x(y0, t)) = 0. This is a contradic-tion since x0 is the only point where the Lyapunov function vanishesand this point does not belong to W . Hence,

limt→∞

H(x(y0, t)) = 0

what implies (by continuity) that limt→∞ x(y0, t) = x0.

5.2.2. Gradient Systems. Gradient systems are dynamical sys-tems where the vector field is the gradient of some suitable functionU . This gradient is called the potential (recall Example 5.1). Let x0

be a minimum of the potential. Then ∇U(x0) = 0 and ∇U > 0 in anexcluded neighborhood D \ x0 of x0. Let us consider U itself as aLyapunov function. Hence,

∇U · v = −|∇U |2 < 0

on V \ x0. Hence U is a Lyapunov function and x0 is asymptoticallystable.

Example 5.2. Let x = Ax and A = diag(λ1, · · · , λn) is a diagonal(real–valued) matrix. We set

H(x) = x · x =∑

x2i

then ∇H = 2(x1, · · · , xn) and

∇H · v = 2x · Ax = 2∑

λix2i

Hence the solution x ≡ 0 is asymptotically stable if all λi < 0 (in thisexample we may say “if and only if”).

Theorem 5.2 (Second Lyapunov Stability Theorem). If p is a sin-gular point and the Jacobi matrix A = Dv(p) has only negative eigen-values then p is asymptotically stable.

Proof. After shifting the origin of coordinates to the singularpoint p, we write v(x) = A(x) ·x with A : Rn → GLn(R) and A(0) = A.We prove the theorem by producing an adequate Lyapunov function.Since A(0) is negative definite, we choose the same Lyapunov functionas in the previous Example. Then

∇H · v = 2x · A(x)x = 2x · A(0)x+ 2x · o(x)x < 0

in a neighborhood of the origin because of the continuity of A(x).

Remark 5.2. The result is valid also for matrices with complexeigenvalues, provided that their real parts are negative.

5.2. LYAPUNOV FUNCTIONS 75

Remark 5.3. The previous Theorem is in the flavour of structuralstability. The non–linear system v(x) is regarded as a perturbation ofthe linear system A(0)x. It will be important to establish under whatconditions this is possible.

Example 5.3.x = ax− y + k(x2 + y2)

y = x− ay + k(x2 + y2)

with a2 < 1, k < 0. The linearisation at the fixed point (0, 0) is

Dv(0) =

(a −11 −a

)and has eigenvalues λ1,2 = ±i

√1− a2. In this case, the linear system

alone is not enough to decide stability since the eigenvalues do not havenegative real parts. However, the nonlinear part of this system is suchthat there exists a Lyapunov function:

H(x, y) = x2 − 2axy + y2 = (x− ay)2 + (1− a2)y2

Then H > 0 ⇐⇒ (x, y) 6= (0, 0) and ∇H = 2(x− ay, y− ax). Hence,

1

2∇H · v =(x− ay)(ax− y) + (x− ay)kx(x2 + y2)

+ (y − ax)(x− ay) + (y − ax)ky(x2 + y2)

=k(x2 + y2)(x2 − 2axy + y2)

=k(x2 + y2)H(x, y) < 0

for (x, y) 6= (0, 0), and we have asymptotic stability.

5.2.3. Lyapunov Functions and Trapping Regions. Recallthe definition of Trapping Region in Definition 3.6.1 and Remark 3.14.Consider a dynamical system having an asymptotically stable fixedpoint x0 and a Lyapunov function that is strictly decreasing in somedomain D(x0). Take some other point y ∈ D(x0). Then a = H(y) is apositive number.

Proposition 5.1. The equation H(x) = a denotes the boundary ofa trapping region around x0.

Proof. Since H is positive definite, H(x) = a denotes an ellipsoid,containing the point x0 in its (strict) interior. Since x0 is asymptoticallystable H is strictly decreasing and hence ∇H · v < 0 on the surface ofthe ellipsoid (and everywhere outside x0).

Remark 5.4. Trapping regions are however broader than what theabove proposition suggests, since they can be defined without requiringthat the enclosed region should have as only singularity a stable fixedpoint. To have a trapping region is sufficient to find a closed surfacehomotopic to a sphere where the vector field points inwards, regardlessof which kind of dynamics one has on the inside.

76 NLD-draft

5.2.4. Asymptotic Stability for Maps. Definition 5.1 of sta-bility and asymptotic stability can be directly translated to maps, theonly difference being the use of “discrete time” instead along with a“substitute” for taking derivatives.

The concept of Lyapunov function and the first Lyapunov Theoremcan be restated for maps as follows. Consider a C1 map xn+1 = F (xn)with a fixed point x0. By a shift of the origin of coordinates, let uslocate this origin at the fixed point, so that x0 = 0, or in other wordsF (0) = 0.

Definition 5.3. Let 0 be a singular point and D a neighbourhoodof 0. A C1 function H : D → R is called a Lyapunov function ifH(x) ≥ 0, H(x) = 0 ⇐⇒ x = 0 and

∆H(x) = H(F (x))−H(x) ≤ 0 on D.

Remark 5.5. Again, the function H is non-increasing along tra-jectories and in case of strict inequality in D\x0, it is strictly mono-tonically decreasing along trajectories.

Theorem 5.3. (a) If for the singular point x = 0 there exists a Lya-punov function, then this point is stable. (b) If in addition ∆H(x) < 0on D \ 0, then 0 is asymptotically stable.

Proof. The proof is completely analogous to that of Theorem 5.1,only that the quantity to be monitored along trajectories is now ∆H(x).

The analogue of Theorem 5.2 is valid as well. By considering time–one maps, it is clear that its valid formulation has to be such thatstability for flows automatically translates into stability of time–onemaps and vice-versa. Consider the linear case to build up some intu-ition. A linear map generates a flow φt(x0) = eAtx0. The time–onemap is: xn+1 = eAxn. The origin is a fixed point of the linear vectorfield Ax and of the time–one map. Theorem 5.2 is satisfied if and onlyif the eigenvalues of eA have absolute value smaller than unity. Hence,we state:

Theorem 5.4 (Lyapunov Stability Theorem for Maps). If p is asingular point of the C1 map xn+1 = F (xn) and the Jacobi (linearisa-tion) matrix A = DF (p) has only eigenvalues of absolute value smallerthan unity, then p is asymptotically stable.

5.3. Exercises

Exercise 5.1.Use the Lyapunov function method to show that the origin is a stablefixed point for the system

x = −2xy − x3

y = x2 − y3

5.4. STRUCTURAL STABILITY 77

Exercise 5.2. Compute a Lyapunov function for the systems:(a) x+ ω2(x+ x3) = 0.(b) x+ αx+ ω2(x+ x3) = 0.Show a trapping region for any of these systems.

Exercise 5.3. Consider the following dynamical system.

x = ax− 2y + x2

y = x+ y + xy.

For which value(s) of the parameter a is the solution (x(t), y(t)) ≡ (0, 0)asymptotically stable ?

5.4. Structural Stability

In this Section we want to investigate the second issue of stabil-ity, namely what happens when we modify the dynamical system (thevector field). In particular, we might ask whether it makes sense to lin-earise a system (this may be seen as a drastic modification, i.e., replac-ing a given v(x) by its first–order Taylor expansion, v(x0) +a(x−x0)).A linearisation can be rephrased as the “best fitting” linear map. Sincelinear systems are completely understood, it would be very interestingto profit of them to understand non–linear systems. We would like tocarry over the linear information to the non–linear system. In this sec-tion we investigate when this is possible. Let us start with an example.

Example 5.4. Recall the presentation in Chapter 2 of linear sys-tems on R. Let two linear systems x = ax be given, with constantsa = λ and a = γ. We have seen before that the solutions of eachsystem, xi(t) = xi(0)eait, are similar iff both ai have the same sign.We will call such systems similar or conjugated, inspired in the con-siderations of Chapter 4 where we said that the orientation preservingdiffeomorphisms f (with irrational rotation number τ) and Rτ (therotation by an angle τ) were conjugated if there exists a bijective con-tinuous function h with continuous inverse, such that f h = hRτ . Aconjugacy may be defined between more general objects: maps, vectorfields, etc., may be conjugated. Let us see how conjugacy works forthese linear systems.

We will show that the systems x = λx and y = γy are conjugatedby the change of coordinates

h(x) =

sg(x)|x|γ/λ, x 6= 0

0, x = 0.

For x 6= 0, consider the dynamics of y = h(x). Outside the origin h isdifferentiable and hence

y =dh

dxx =

γ

λ|x|

γλ−1λx = γh(x) = γy.

78 NLD-draft

The dynamics coincides also at the origin, which is a fixed point bothfor y and x. The coordinate transformation h is not differentiable at theorigin for γ/λ < 1 but it is continuous, injective and with continuousinverse. How do trajectories map by h?

h(x0eλt) = sg(x0)|x0|

γλ eγt = y0e

γt.

Hence h maps trajectories on trajectories.

This example motivates the following definition.

Definition 5.4. Two vector fields v, w are topologically equiv-alent (we write v ∼ w(h)) if there is a continuous invertible map (ahomeomorphism) h : Rn → Rn (called a conjugacy) which maps tra-jectories into trajectories:

h(xv(x0, t)) = xw(h(x0), t).

Theorem 5.5. Let h be a conjugacy between v and w then

1. If x0 is a singular point so is h(x0),2. If xv(x0, t) is periodic so is xw(h(x0), t).

Proof. Since xv(x0, t) ≡ x0 for all t ∈ R we have xw(h(x0), t) ≡h(x0) for all t ∈ R. This proves the first statement. For the second weonly have to notice that if τ is a period of xv(x0, t) then

xw(h(x0), t) = h(xv(x0, t)) = h(xv(x0, t+ τ)) = xw(h(x0), t+ τ)

and therefore the trajectory xw(h(x0), t) is periodic with the same pe-riod.

Remark 5.6. Different kinds of equivalences between systems maybe defined, which are suitable for different circumstances. For examplewe could speak of linear equivalence (h is a linear map), etc.

Example 5.5. In this Example we will show topological equivalencebetween systems whose trajectories appear to be quite different at a firstglance. Let v(x, y) = (x, y) and w(x, y) = (x+y,−x+y). These vectorfields have phase portraits as in Figure 5.3.

The corresponding trajectories are

xv((x0, y0), t) = et(x0, y0)

and

xw((x0, y0), t) = et(x0 cos t+ y0 sin t,−x0 sin t+ y0 cos t).

We note that each regular trajectory has a unique point of intersectionwith the unit circle S1. Hence, if (x0, y0) 6= (0, 0) there is a unique timet(x0, y0) such that xv((x0, y0), t(x0, y0)) ∈ S1. Now we define a maph : R2 → R2 by h(0, 0) = (0, 0) and

h(x0, y0) = xw (xv((x0, y0), t(x0, y0)),−t(x0, y0)) .

5.4. STRUCTURAL STABILITY 79

S1

S1

v(x, y) = (x, y) w(x, y) = (y + x, y − x)

Figure 5.3. Two exponentially growing vector fields.

This means we flow to the unit circle along the trajectories of v and thenflow “backwards” on the corresponding trajectory of w (see Figure 5.4).This “flight time” (and the whole map h) can be computed explicitly foreach initial condition: 2t = − log (x2

0 + y20).

S1

(x0, y0)

xv(x0, y0), t)

xw(xv(x0, y0), t),−t)

trajectory of v

trajectory of w

Figure 5.4. A point (x0, y0) and its image by h.

This map is differentiable everywhere except at (0, 0) by the dif-ferentiable dependence on initial conditions. It is continuous at (0, 0)(t(x0, y0) → +∞ as (x0, y0) → (0, 0) and xw((x, y),−t) → (0, 0) ast → +∞) and invertible with a continuous inverse. hence it is acoordinate transformation. We want to show now that it maps tra-jectories on trajectories. For any given time t let t0 = t(x0, y0) and

80 NLD-draft

t1 = t(xv((x0, y0), t)). Then t = t0 − t1 and with p = (x0, y0)

h(xv(p, t)) = xw(xv(xv(p, t), t1),−t1) = xw(xv(p, t+ t1),−t1)

= xw(xw(xv(p, t+ t1),−t− t1), t) = xw(xw(xv(p, t0),−t0), t)

= xw(h(p), t).

The first equality is the definition of h, the second equality holds sincemoving a time t and subsequently a time t1 is the same as moving inone sweep for the total time t + t1 = t0. The third equality followssince moving a time t1 and subsequently a time −t1 corresponds to notmoving at all. The fourth equality uses that t + t1 = t0 and the lastequation follows again from the definition of h. This shows that thetwo vector fields are conjugated.

Definition 5.5. The vector field v is called structurally stableif any vector field w within some (sufficiently small) open ball aroundv is topologically equivalent to v.

Remark 5.7. We need to give a definition of neighbourhood for thespace of vector fields or simply of nearby vector field as well. To avoidcomplications, we consider vector fields defined on Rn and simplifysome details that may be chosen differently. The Cr open ball of radiusε > 0 around a vector field v consists of all vector fields w, such thatfor any compact set K of phase space, supx∈K |Dkv(x)−Dkw(x)| < ε,for k = 0, · · · , r (the vector fields as well as their first r derivatives lieclose in the sup–norm). For structural stability we may accept at thismoment r = 0, but soon we will use r = 1 (see below).

Remark 5.8. Topological equivalence is an equivalence relation,i.e., we have

• v ∼ v(Id) (reflexive)• v ∼ w(h) implies w ∼ v(h−1) (symmetry).• v1 ∼ v2(h1) and v2 ∼ v3(h2) implies v1 ∼ v3(h2 h1) (transitiv-

ity).

Therefore, all vector fields in a neighborhood of a structurally stablefield are structurally stable.

Example 5.6. The linear flow on the torus is not structurally stablebecause periodic (rational a) and non–periodic (irrational a) flows canbe arbitrarily close to each other.

Definition 5.6. v, w are locally topologically equivalent atx, y if the conjugacy h is defined only in a neighborhood of x and h(x) =y.

Remark 5.9. The concept of nearby vector field can be understoodlocally as well. Two vector fields are locally Cr–close at x0 if thereexists a compact neighbourhood of x0 where they are close in the senseof Remark 5.7.

5.5. HYPERBOLICITY – THE HARTMAN-GROBMAN THEOREM 81

Remark 5.10. If x, y are regular points for v, w respectively thenthe vector fields are locally topologically equivalent at x, y. This fol-lows from the Straightening–out Theorem since both vector fields areconjugate to the “horizontal” vector field.

This Remark indicates that the only interesting points for localtopological equivalence are the singular ones. Away from singularpoints, the local dynamics in small regions of phase space is essen-tially the horizontal vector field dynamics. The next Theorem dealswith the case of singular points. Note that here we have to proceedbeyond continuity and consider C1 vector fields.

5.5. Hyperbolicity – The Hartman-Grobman Theorem

In this Section we will put together some ideas from inner stabilityand from structural stability.

Definition 5.7. A hyperbolic singular point (or hyperbolicfixed point) for a C1 vector field is a singular point x at which thelinearisation Dv(x) has no purely imaginary eigenvalues.

Definition 5.8. A hyperbolic singular point (or hyperbolicfixed point) for a C1 map F is a singular point x at which the lin-earisation DF (x) has no eigenvalues of modulus one (on the complexunit circle).

It is interesting to notice that we have two different (but related)notions of hyperbolicity depending on whether we deal with flows ormaps. The same occurred with the notions of Lyapunov stability andasymptotic stability. This is a recurrent feature in the theory of dy-namical systems. Many “flow”–theorems have a map companion.

For hyperbolic fixed points of flows, the question of asymptoticstability has a simple translation. Using Theorem 5.2, we give thename stable (dropping the adverb asymptotically) to the hyperbolicfixed points where all eigenvalues of the linearisation have negativereal parts. Other names frequent in the literature are:

node For fixed points whose linearisation eigenvalues have either allpositive real parts (unstable node) or all negative real parts(stable node).

focus For nodes with complex eigenvalues.saddle For hyperbolic fixed points with both stable and unstable lin-

earisation eigenvalues.center For (non hyperbolic) fixed points where all eigenvalues lie on

the imaginary axis.

These ideas can be translated word by word to maps (now usingTheorem 5.4):

stable Hyperbolic fixed points with linearisation eigenvalues of abso-lute value smaller than one.

82 NLD-draft

node For fixed points whose linearisation eigenvalues have absolutevalue larger than one (unstable node) or smaller than one (stablenode).

focus For nodes with complex eigenvalues.saddle For hyperbolic fixed points with both stable and unstable lin-

earisation eigenvalues.center For (non hyperbolic) fixed points where all eigenvalues have

absolute value unity.

Theorem 5.6 (Hartman–Grobman). Let x0 be a hyperbolic singu-lar point of a vector field x = v(x) (respectively map xn+1 = v(xn)),i.e., the linearisation A = Dv(x0) satisfies Definition 5.7 (respectively5.8) above. Then v(x) and the linear field (map) A · (x−x0) are locallytopologically equivalent in a neighbourhood of the singular point.

Proof. Without loss of generality we may assume that x0 = 0,otherwise we apply a translation by the vector x0. We will only sketchthe basic ideas of the proof. Let us proceed in two steps. First, wenotice that out of two flows we can produce corresponding time–onemaps. Then we prove the Theorem for maps. Finally we realise thatit holds also for flows, since the map–proof shows that the time–onemaps of a hyperbolic flow and its linearisation are conjugated.

Let xv(·, 1) : Rn → Rn be the time–one map associated to the vec-tor field v. By the considerations about the dependence on initialconditions discussed in Chapter 2, we know that the derivative of thetime–one map fulfills the variational equation 2.4 (recall the commentin the introduction to time–one maps in Section 4.1.1). This is a linearequation and has the solution z(t) = eDv(0)tz(0). Hence, we can write

xv(x, 1) = Ax+ φ(x)

where φ is small in norm (and also zero at the origin, together with itsderivative) and A = eDv(0). By assumption, A has no eigenvalues onthe unit circle (since Dv(0) has no eigenvalues with zero real part). Letφ, ψ be two bounded, small (with sup norm less than ε > 0), Lipschitzcontinuous maps of Rn (vanishing at the origin together with their firstderivative) and A be hyperbolic (no eigenvalues on the unit circle).Then we will show that the two maps A + φ and A + ψ are “nicely”conjugated, i.e., there is a unique map H = Id +u, u bounded, smalland Lipschitz continuous, such that

(Id +u) (A+ φ) = (A+ ψ) (Id +u)

or

Au(x)− u(Ax+ φ(x)) = φ(x)− ψ(x+ u(x)).

Note that the following part of the proof holds irrespective of A arisingfrom a time–one map. First we remark that since A has no eigenvalues


of modulus 1, we can decompose

Rn = E− + E+ A(E−) = E− A(E+) = E+

where the invariant subspaces E− and E+ correspond to the eigenvaluesof modulus larger than 1 (E+ ) or smaller than 1 (E−), respectively.Then

‖A−‖ ≤ a < 1 ‖(A+)−1 ‖ ≤ a < 1 (5.1)

where A−,+ = A|E−,+ . We can now decompose all equations into their−part and +part (projections onto E− and E+).

Let us prove first that the operator

L(u) = Au− u(A+ φ) : C0b (Rn)→ C0

b (Rn)

is a linear invertible operator with norm of the inverse at most 11−a .

To do this, we will show that for any operator g there exists a uniquev such that Lv = g, and then we will estimate the norm of L−1. Westart by decomposing the equation via E− and E+:

A+v+ − v+ (A+ φ) = g+

A−v− − v− (A+ φ) = g−

Defining the auxiliary operators G−(v−) = A−v− (A + φ)−1 andG+(v+) = (A+)−1v+ (A + φ) we can rewrite the equations aboveafter some manipulation as:

(1−G−)v− = −g− (A+ φ)−1

(1−G+)v+ = (A+)−1g+

The good thing is that ||G±|| ≤ a, which guarantees (Theorem A.14)that the operators (1 − G±)−1 exist, and hence there is a unique so-lution v = (v+, v−) to the above equation. Moreover, ||L−1g||/||g|| =||v||/||g|| ≤ 1

1−a .Knowing what sort of operator L is, we can address the original

issue, namely to show that the operator equation L(u) = φ−ψ(I+u)has a unique solution u. We consider instead the equivalent equation

u = L−1[φ− ψ (I + u)] ≡ F (u).

We will show that F is a contraction and hence the above equation hasa unique solution. Indeed

||F (u)− F (v)|| = ||L−1[ψ (I + u)− ψ (I + v)]|| ≤ ε

1− a||u− v||

since ψ (and φ) have Lipschitz constant smaller than ε. Restricting thedomain of ψ (and φ) we can make ε as small as necessary in order toobtain ε

1−a < 1.Hence, there is a unique solution for L(u) = φ − ψ (I + u). The

map Id +u is invertible because we can also solve

(Id +v)(A+ ψ) = (A+ φ)(Id +v)

in a similar way, which gives the inverse to Id +u.

84 NLD-draft

Now we consider

A+ φ = xv(·, 1) A+ ψ = xw(·, 1).

Then Id +u = H conjugates these two time–one maps. We derive thefinal map by “averaging” it from time 0 till time 1:

h =

∫ 1

0

xw(−t, ·) (Id +u) xv(t, ·) dt.

To finish the proof we check the following lines saying that we reallyhave a conjugacy of the flow with the linear flow w = L = Dv(0).

xL(h(xv(x, s)),−s) = xL

(∫ 1

0

xL(H(xv(xv(x, s), t)),−t) dt,−s)

= xL

(∫ 1

0

xL(H(xv(x, s+ t)),−t) dt,−s)

=

∫ 1

0

xL (H(xv(x, t+ s)),−(t+ s)) dt

=

∫ s

s−1

xL (H(xv(x, u+ 1)),−u− 1) du

=

∫ s

0

xL(·,−u) [xL(·,−1) H xV (·, 1)] xv(x, u) du

+

∫ 0

s−1

xL(·,−u− 1) H xv(x, u+ 1) du

=

∫ s

0

xL(·,−u) H xv(x, u) du+

∫ 0

s−1

xL(·,−u− 1) H xv(x, u+ 1) du

=

∫ s

0

xL(·,−z) H xv(x, z) dz +

∫ 1

s

xL(·,−z) H xv(x, z) dz = h

Next we want to investigate the local structural stability of vectorfields. Since we just proved that we can reduce to the linear case if wehave a hyperbolic singularity we will start with linear fields.

Theorem 5.7. Two linear hyperbolic vector fields are conjugatedif and only if the number of eigenvalues with negative real parts (andhence also those with positive real parts) coincide.

Proof. We sketch the proof. The one–dimensional case was provenin Example 5.4, e.g.,

x = λx y = γy λ, γ < 0

are conjugated via h(x) = sg(x)xγ/λ. By the invariance of subspacesassociated to different eigenvalues (a basic result of Linear Algebra),the result holds for a general matrix with all eigenvalues distinct. It


remains to show that a matrix with an eigenvalue λ > 0 of multiplic-ity k > 1 generates a flow that is conjugated to that of the identitymatrix of dimension k. Such proof was given by Ladis in 1973, us-ing similar ideas to those inspiring Example 5.5, namely considering a(k− 1)–dimensional sphere centered in the origin. All trajectories out-side the origin in both vector fields intersect the corresponding sphereonly once. Hence, a conjugacy between both systems exist and it canbe constructed computing the intersection time, as it was done in Ex-ample 5.5.

This Theorem is the hyperbolic version of a slightly more generalTheorem by Ladis:

Theorem 5.8 (Ladis). Two linear vector fields are conjugated ifand only if the number of eigenvalues with negative real parts (andhence also those with positive real parts) coincide and the blocks withpurely imaginary eigenvalues coincide (up to a positive proportionalityconstant).

Since the imaginary–eigenvalues block defines an invariant subspaceof phase space, if those blocks coincide and the orthogonal complementsatisfies the previous Theorem, then Ladis Theorem follows immedi-ately.

Remark 5.11. We recall that in general, the conjugacy h is notdifferentiable at x = 0.

Corollary 5.1. If p is a hyperbolic singular point. Then v islocally structurally stable at p.

Proof. If two vector fields are close and one has a hyperbolic sin-gular point so has the other (and it is close to the first one). Hence,both vector fields are conjugated to their respective linearisations byHartman-Grobman Theorem. Moreover, both linearisations have thesame index (the index does not change under small perturbations).Hence, they are conjugated by Ladis Theorem.

Note that here we have used the notion of structural stability ina local way, namely we use “locally conjugate” instead of just “conju-gate”.

Remark 5.12. The Theorems of Hartman–Grobman and Ladis helpto give a complete local understanding of the dynamical behaviour nearhyperbolic fixed points, even for a parametric family of vector fieldsor maps. Indeed, for hyperbolic fixed points, the linearisation matrixsuffices to establish if the point is stable, unstable or saddle. Hartman–Grobman theorem suffices to establish the local behaviour of phase–spacetrajectories, since trajectories for linear systems can be completely com-puted. Finally, “nearby” dynamical systems obtained by continuously

86 NLD-draft

varying some system parameter(s), will, by Ladis Theorem, have con-jugate dynamics, as long as all linearisation eigenvalues remain hyper-bolic (assuming of course that the dependency with parameters is suchthat nearby systems have a corresponding fixed point). Hence, knowingthe qualitative dynamics for one member of the family via Hartman–Grobman theorem is enough to understand the qualitative dynamics forall nearby hyperbolic systems.

Moreover, the Straightening-out Theorem gives a qualitative under-standing of the dynamics far away from singular points. Hence, a sig-nificant class of dynamical systems may be qualitatively understood justby adequately using these three theorems.

5.6. Exercises

Exercise 5.4. Let x = A · x and y = B · y be C1–conjugated (i.e.,the conjugating map is differentiable and the inverse also). Show thatthen the matrices A,B have proportional eigenvalues.

Exercise 5.5. Let f : U(p) → R be a C1–function, x = v(x) andv(p) = 0. Assume that f(p) = 0, d

dtf(x(x0, t)) > 0 for x0 ∈ U(p) \ p.

Moreover there is a sequence xn → p such that f(xn) > 0. Thenx(t) ≡ p is unstable.

Exercise 5.6. Let v = −∇f , f ∈ C2(Rn,R). Then p ∈ Rn isa hyperbolic singular point if and only if df(p) = 0 and d2f(p) is anon–degenerate bilinear form.

Exercise 5.7. Use the Poisson integration formula for the circleto show that if an electrostatic potential on a circle is constant then thepotential is constant everywhere at inside the circle.

Exercise 5.8. (a) Compute the fixed points for the system xn+1 =f(xn), where f(x) = µx(1 − x). Study their stability as a function ofµ ∈ [0, 4] (consider yourself satisfied with a few typical (µ-values).(b) Repeat for the period-2 points, i.e., the additional fixed points off f .(c) Try to show that there exists a period three orbit, by studying thefixed points of f f f near µ = 1 + 2

√2. For which values of a the

fixed point is stable?

Exercise 5.9. (a) For the system of the previous Exercise, computefor which values of µ the fixed point outside the origin has negativeJacobian.(b) What is the value of the Jacobian when the system xn+1 = f(f(xn))develops two new fixed points? Compute the stability of these points.

Exercise 5.10. Analyse the stability for the fixed point of the sys-tems x = x2 och x = x3. Note that linear stability analysis is notenough.

5.6. EXERCISES 87

Exercise 5.11. Compute fixed points and stability for(a) x+ sinx = 0.(b) x+ εx− x+ x3 = 0.(c) x = x2 − x, y = x+ y.

CHAPTER 6

Center Manifold Theory and Local Bifurcations

6.1. Introduction

We have investigated several kinds of asymptotic behaviour. Manyof the systems considered so far were structurally stable, i.e., smallperturbations on the vector field or map did not change the systemdrastically.

However, this is not always the case. It is known that many sys-tems present qualitatively different behaviour when certain changesare operated on them. Consider for example the Lorenz equations. For0 < r < 1 there exists only one singular point, while for r > 1 thereare three such points: the “old” stable fixed point becomes unstable,while a couple of stable, symmetry related fixed points shows up. Adrastic transition has occurred for r = 1. One of our goals is to be ableto address such changes.

The simplest way to study transitions in the qualitative dynamicsis to consider parametric families of systems which go from one regimeto another while changing the parameter. At some special parametervalues the qualitative (asymptotic) dynamics changes drastically. Wecall such value a bifurcation point and the general study of suchchanges is called bifurcation theory.

There is an interest from applications in studying bifurcation prob-lems, since the parameters of a differential equation or map usuallyrepresent a given choice of experimental conditions, e.g., constant tem-perature T0 or constant amplitude β of an external electromagneticfield in coupled laser devices, the reproduction rate of a population,etc. To some extent, these constant conditions may be altered fromexperiment to experiment until eventually the asymptotic behaviour ofthe system is deeply modified.

Apart from Lorenz equations already mentioned, we will mentiontwo more examples in this Section, focusing on the effective changes.Consider first the logistic map

x −→ ax(1− x)

on the unit interval. This map has been widely studied along the years,starting perhaps with Volterra’s population models of the 1920’s, up toour days. It is not completely understood but many results are knownin detail.

89

90 NLD-draft

We have seen that for small values of the parameter a, the singularpoint x = 0 is attracting (Exercise 5.8). By checking the locus of thefixed point and the derivative of the map, we can establish its stabilityfor most parameter values using Theorem 5.6, the Hartman–GrobmanTheorem. Increasing a, a second fixed point appears and at the sameparameter value the stability of x = 0 changes (we could say that sta-bility is “transferred” to the second fixed point when it appears, likein a relay-race). Subsequently a periodic orbit of period–2 appears (aninvariant set of the map consisting of exactly two points, that map ontoeach other) branching out of the second fixed point (which still exists).Again, the fixed point becomes unstable, while the period–2 inheritsits stability. Subsequently, a period–4 branches out from the period–2and we run through a sequence of period–doublings, where the stabilityis transferred consequently from one new invariant set to the next thatbranches out: first the period–2, then period–4, period–8, · · · becomeattracting periodic orbits. Analytic calculations become very involvedpretty soon. The values of a where these transitions take place accu-mulate monotonically towards a point near a = 3.57. At that value,we arrive at a regime where the attractor has infinitely many points.Increasing this parameter further one sees again periodic attracting or-bits accumulating on chaotic sets. At a = 1 + 2

√2 the attractor is

a period 3 orbit (Exercise 5.8 again), which subsequently undergoes a“period–doubling” cascade through periods 3 · 2n. When a = 4 thesystem develops complete chaos meaning that it contains (unstable)periodic orbits of all periods and it is sensitive with respect to initialconditions (in this case, the system is conjugated to the doubling map).In this way, the logistic map exhibits all kind of bifurcations, it develops“from order to chaos”.

The second example will be treated in detail in Chapter 8. Considera hyperbolic saddle fixed point p of a map whose stable and unstablemanifolds coincide (see Figure 6.1, left). Intuitively, we may think that

p

Ws(p) = Wu(p)

p Ws(p)

Wu(p)

Figure 6.1. A separatrix splitting.

the stable and unstable manifolds of the fixed point each have a “life ofits own”, i.e., that small changes in the equations (e.g., small changes insome parameters present in the equations) alter these manifolds differ-ently or independently. Therefore, the fact that they coincide is highly

6.2. THE BIFURCATION PROGRAMME FOR FIXED POINTS OF FLOWS 91

unlikely for a random choice (of parameters) and it puts specific con-straints on the problem. However, it does occur in interesting problemsor in some approximation to them.

If we perturb the system slightly, it can be possible to arrange aseparatrix splitting that creates a transversal homoclinic point (seeFigure 6.1, right). The two curves no longer coincide, and eventuallythey may cross transversally for some choice of perturbation. As wewill further discuss in Chapter 8, this situation forces the existence ofhighly complicated dynamics, since one crossing of the manifolds forcesthe existence of infinitely many other crossings.

The Lorenz transition and the period–doubling cascade of the logis-tic map are examples of local bifurcations, namely those where thedynamical changes are confined to a local region of phase space closeto the associated fixed point. Separatrix splittings are an example ofglobal bifurcations, where the changes on phase space are extended(global).

In general, bifurcations are too complicated to be understood infull. However, much insight is available without the need of a highlyelaborated mathematical structure, starting by the study of the changeof stability of fixed points while varying some system parameters.

In this Chapter, we will focus on techniques to simplify the studyof local bifurcations (Centre Manifold Theory, inspired in [3.]), leavingfor the next Chapter the specific computation of the simplest localbifurcations. We will finally address global bifurcations and chaos inChapter 8.

6.2. The Bifurcation Programme for Fixed Points of Flows

The setup for our study is a family of dynamical systems x = f(x, µ)defined on Rn (or on a compact ball of Rn when necessary) and de-pending on only one real parameter µ. We will assume that f is of classCk in both x and µ, for k large enough. This means that whenever weneed to take one or many derivatives of f , such computation will makesense.

Let us recall some results from Chapter 5, especially Remark 5.12.Far away from fixed points, the Straightening–out Theorem assuresthat the dynamics will be simple: Up to a coordinate transformation,flow lines are parallel “straight” lines. It is in the local vicinity of fixedpoints where most of the structure becomes visible. The Theorems ofLadis (Theorem 5.8) and Hartman–Grobman (Theorem 5.6) help usto address the problem. Suppose we have a hyperbolic fixed point forsome value of the parameter: f(x0, µ0) = 0. Hyperbolicity implies thatno eigenvalue of the Jacobian at the fixed point has zero real part, inparticular, no eigenvalue is zero and detDxf(x0, µ0) 6= 0. Hence, by theImplicit Function Theorem (Dini’s Theorem) of Calculus, there exists aunique curve of fixed points x(µ), with x(µ0) = x0, at least for a small

92 NLD-draft

parameter interval around µ0. Since we assumed Ck differentiability,the curve will be smooth and also the Jacobian Dxf(x(µ), µ) will bea smooth matrix function of µ and hence still hyperbolic for a suffi-ciently small interval around µ0 (as long as no real part of an eigenvaluechanges sign). Hartman–Grobman Theorem assures that the local dy-namics along the curve x(µ) is equivalent to the linearised dynamics.Ladis Theorem assures that all linearised dynamics are equivalent, solocally, the dynamics is one and the same (except for a coordinatechange) along the curve, as long as all invoked theorems hold.

The dynamical changes can only arise if the hyperbolicity conditionbreaks down. We still would like to describe the problem in such a waythat the dynamical changes are separated from other modificationsthat can be compensated for with e.g., a diffeomorphism or a change ofcoordinates. It is reasonable to ask whether there is a coordinate choicemaking some sort of splitting of phase space into an unaffected part anda bifurcating part. We will first address the dynamical situation for onedynamical system with a non-hyperbolic fixed point and subsequentlyextend the method to study parametric families of dynamical systems.The situation is described by the following Theorem, which we statewithout proof:

Theorem 6.1 (Sositaisvili). Suppose that a differential equation inRn with C2 right hand side having a singular point at the origin can bewritten in the following way

x = Ax+ g1(x, y, z)

y = B+y + g2(x, y, z)

z = B−z + g3(x, y, z),

where A, B+ and B− are the (Jordan) blocks of the linearisation ma-trix corresponding to eigenvalues having zero, positive and negative realparts respectively and the functions g1, g2 and g3 are zero at the originalong with all their first derivatives. Then, in a neighbourhood of thesingular point 0, the equation under consideration is locally topologi-cally equivalent to the following system:

x = A x+ f(x)v = B+(x) vw = B−(x) w

(the equivalence is in fact Ck) where f(0) = f ′(0) = 0 and B±(0) arethe original blocks B±.

Remark 6.1. The intuition behind this Theorem is as follows. Thevw-part may be seen as a hyperbolic system parametrised by x and hencelocally conjugated to its linear part. The crucial part of the Theorem isthe change of coordinates, separating vw from x, where the substantialdynamical changes take place.

6.3. CENTRE MANIFOLD THEOREMS 93

Instead of proving Sositaisvili’s Theorem, we will get a closer lookat this separation and its consequences, proving other, more detailed,theorems.

6.3. Centre Manifold Theorems

Let us begin the analysis by considering motivating examples.

6.3.1. Dynamics on an Invariant Manifold. Consider the fol-lowing system defined in R2:

x = ax3

y = −yThe coordinate axes I1 = y = 0 and I2 = x = 0 are invariant setsfor the dynamics, since any initial condition on e.g., I1 is of the type(x0, y0) = (x, 0), which plugged into the equations yields a solution inwhich y is identically zero. Hence, orbits with initial condition on I1

do not leave I1. The same holds for I2. Both sets are examples ofinvariant manifolds. They are of course invariant sets and they arealso manifolds (in this example each set is equivalent to R1). The setI2 can be recognised as belonging to the stable invariant set (stablemanifold) of the fixed point at the origin (the nature of I1 depends onthe value of a). More generally, we have the following

Definition 6.1. A smooth function y = h(x), y ∈ Rp, x ∈ Rq is aninvariant manifold of a dynamical system defined in Rp+q if the dy-namics of an initial condition (x0, h(x0)) does not abandon the graph ofthe function along the dynamics, i.e., if φt(x0, h(x0)) = (x(t), h(x(t))).

Proposition 6.1. The system

x = yy = 0z = −z + x2 + y2

has an invariant manifold z = h(x, y) such that h(0, 0) = 0 and∇h(0, 0) = 0.

Proof. The first two equations decouple from the third and havethe solutions x(t) = x0 + y0t, y(t) = y0. We look for a function z(t) =h(x0 + y0t, y0) satisfying the equation

dh

dt= −h+ x2 + y2.

We seek for a solution that does not lie along the stable manifold ofthe fixed point (the z axis) but one that is tangent to the xy-plane atthe origin (the non-hyperbolic local directions near the origin). Sincethe stable manifold near the origin behaves as z(t) = e−t, we imposeto h the additional condition

limt→−∞

h(x0 + y0t, y0)et = 0.

94 NLD-draft

After multiplying the differential equation by et and performing partialintegration we obtain h(x, y) = (x − y)2 + 2y2. The reader can verifythat it is invariant for the dynamics and tangent to the xy-plane at theorigin.

Remark 6.2. The manifold h(x, y) is an example of a centremanifold, an invariant manifold tangent to the non-hyperbolic lin-earised subspace at the fixed point.

Consider the coordinate transformation (x, y, z)→ (x, y, w), wherew = z − h(x, y). The dynamics in the new coordinates reads

x = yy = 0w = −w.

to be compared with the statement in Theorem 6.1 (Sositaisvili). Indeed,the separation provided by Sositaisvili’s Theorem requires the identifica-tion of a manifold, depending in this example on the xy-coordinates as-sociated to a non-hyperbolic linearisation, where all the local dynamicalcomplexity takes place. The other directions of phase space, transversalto this manifold, behave locally in an essentially linear fashion. Theformalisation of this idea is given by the following three theorems.

6.3.2. Centre Manifold Theorems. We will state and prove thetheorems for the case of a fixed point at the origin with empty unstablelinearisation subspace. The general case is a more or less straightfor-ward generalisation.

Theorem 6.2. Let f, g be C2 functions vanishing at the originalong with their derivatives, the n× n matrix A have eigenvalues withzero real part and the m×m matrix B have eigenvalues with negativereal parts. The system

x = Ax+ f(x, y)

y = By + g(x, y) (6.1)

has a local C2 invariant (centre) manifold y = h(x), |x| < δ, withh(0) = 0, Dh(0) = 0. The flow on the centre manifold is governed bythe equation

x = Ax+ f(x, h(x)). (6.2)

Proof. The last statement follows from the first, since if we havean invariant manifold y = h(x), then on the manifold the first equa-tion 6.1 becomes eq.(6.2) while the second equation 6.1 implicitly de-fines the manifold as

dh

dxx =

dh

dx(Ax+ f(x, h(x))) = Bh(x) + g(x, h(x)) (6.3)

The proof proceeds by recasting this second equation as a fixed pointproblem.


First we replace our problem by a locally equivalent problem van-ishing outside a compact region in x. Let φ(x) be a C∞ function withφ(x) = 1 for |x| ≤ 1 and φ(x) = 0 for |x| ≥ 2. Further, for ε > 0 con-sider F (x, y) = f(xφ(x

ε), y) and G(x, y) = g(xφ(x

ε), y). We will prove

the theorem for the system

x = Ax+ F (x, y)

y = By +G(x, y),

for small enough ε. This system coincides with eq.(6.1) in a smallneighbourhood of the origin.

For q > 0 consider the space X of functions h : Rn → Rm withLipschitz constant p > 0 such that h(0) = 0 and |h(x)| ≤ q. X is acomplete metric space, using the sup norm. We will analyse our systemwhen y is some function from X.

Let x(t, x0, h) be the unique solution of

x = Ax+ F (x, h(x)), x(0, x0, h) = x0, (6.4)

for h ∈ X. This solution exists for all t ∈ R because of the bounds onF and h. We can consider now an operator T : X → X defined by

T (h(x0)) =

∫ 0

−∞e−BsG(x(s, x0, h), h(x(s, x0, h))ds

If h is a fixed point of T then y = h(x) is an invariant manifold of thesystem and we are almost finished. We will proceed now by showingthat T is a contraction on X for small p, q and ε. We will estimatebounds for all the ingredients entering the definition of T and finallyverify that it is possible to have the Lipschitz constant of T smallerthan unity.

Since F and G are C2 functions vanishing at the origin along withtheir derivatives, we have that for some continuous function k(ε) withk(0) = 0,

|F (x, y)|+ |G(x, y)| ≤ εk(ε)

|F (x, y)− F (x′, y′)| ≤ k(ε) (|x− x′|+ |y − y′|)|G(x, y)−G(x′, y′)| ≤ k(ε) (|x− x′|+ |y − y′|)

for any x, x′ ∈ Rn and |y|, |y′| < ε. Also, there exist positive constantsβ, C such that for s ≤ 0 and any y, |e−Bsy| ≤ Ceβs|y| and finally forr > 0 and some constant M(r) we have that |eAsx| ≤ M(r)er|s||x|.For q < ε any function h we consider will fit these bounds and we canproceed with our estimates of the ingredients in the integral definingT . We start with

|T (h(x0))| ≤ C

βεk(ε).

96 NLD-draft

From eq.(6.4), the solution x(t, x0, h) reads

x(t, x0, h) = eAt(x0 +

∫ t

0

e−AsF (x(s, x0, h), h) ds

)hence, for t ≤ 0 (this is what we need, considering the definition of T ),

|x(t, x0, h)− x(t, x1, h)| ≤M(r)e−rt|x0 − x1|

+ (1 + p)M(r)k(ε)

∫ 0

t

er(s−t)|x(s, x0, h)− x(s, x1, h)| ds

(we use the bounds on F , h and eAtx described above). Hence,

ert|x(t, x0, h)− x(t, x1, h)| ≤M(r)|x0 − x1|

+ (1 + p)M(r)k(ε)

∫ 0

t

ers|x(s, x0, h)− x(s, x1, h)| ds

Using Gronwall’s inequality (see Appendix A.1.3) with the functionsψ(t) = 1 and φ(t) = ert|x(t, x0, h)− x(t, x1, h)|, we have

ert|x(t, x0, h)− x(t, x1, h)| ≤M(r)|x0 − x1|e(1+p)M(r)k(ε)|t|

and finally, letting γ = r + (1 + p)M(r)k(ε),

|x(t, x0, h)− x(t, x1, h)| ≤M(r)|x0 − x1|e−γt.The next step is to use the above preparation to estimate the size

of |T (h(x0))− T (h(x1))|. Using the definition of T and the bounds forG and |e−Bsy| we obtain

|T (h(x0))− T (h(x1))| ≤ Ck(ε)M(r)(1 + p)

β − γ|x1 − x0|

provided that ε and r are small enough so that β > γ. Note that thereis no restriction here, since for a given small r, ε can be chosen inde-pendently so that k(ε) is sufficiently small (even after multiplicationby (1 + p)M(r)) and hence γ can be made as small as needed. Thisestimate is required to verify that T (h) is also a Lipschitz function withLipschitz constant smaller than ε, which means that T is an operatormapping elements of X to other elements of X.

Now we need another estimate for x in order to address the finalcomputation. Again, for t ≤ 0,

|x(t, x0, h1)− x(t, x0, h0)|

≤M(r)

∫ 0

t

er(s−t)|F (x(s, x0, h1), h1)− F (x(s, x0, h0), h0)| ds

≤M(r)k(ε)

∫ 0

t

er(s−t) (|x(s, x0, h1)− x(s, x0, h0)|+ ||h1 − h0||) ds

Letting now φ(t) = ert|x(t, x0, h1)− x(t, x0, h0)|, we write

φ(t) ≤ M(r)k(ε)

r||h1 − h0||+M(r)k(ε)(1 + p)

∫ 0

t

φ(s) ds


(the factor (1 + p) may not seem necessary, but it renders the estimatesimilar to the previous one) and using Gronwall’s inequality again, weobtain (same γ as in the previous estimate)

|x(t, x0, h1)− x(t, x0, h0)| ≤ M(r)k(ε)

r||h1 − h0||e−γt.

Now we are ready to compute the Lipschitz constant of T :

|T (h1(x0))− T (h0(x0))| =

=

∫ 0

−∞e−Bs(G(x(s, x0, h1), h1)−G(x(s, x0, h0), h0)) ds

≤ Ck(ε)

∫ 0

−∞eβs(M(r)k(ε)

re−γs||h1 − h0||+ ||h1 − h0||

)ds

≤ Ck(ε)

(1

β+M(r)k(ε)

r(β − γ)

)||h1 − h0||.

This constant can be made smaller than unity by taking r and ε suffi-ciently small. Hence, the unique fixed point of T is a Lipschitz centremanifold. Moreover, the operator T also is a contraction restricted toLipschitz differentiable functions, hence the fixed point is C1. To provethat h is also C2, note that both F and G are C2 and hence the trajecto-ries are C2 regarded as functions of their initial condition. Therefore, Tis a contraction operator also when restricted to C2 functions h(x).

The method of proof of the previous Theorem, namely that of re-casting the problem as a fixed point equation is used in the follow-ing two Theorems as well. Although the underlying idea is relativelysimple, the estimates to compute the Lipschitz constant can get veryinvolved.

Theorem 6.3. (a) If the zero solution of eq.(6.2) is stable (asymp-totically stable, unstable) then the zero solution of eq.(6.1) is stable(asymptotically stable, unstable).

(b) Suppose the zero solution of eq.(6.1) is stable. Then for x0, y0

sufficiently small there exists a solution u(t) of eq.(6.2) such that x(t) =u(t) + O(e−γt) and y(t) = h(u(t)) + O(e−γt) for t → ∞, where γ is apositive constant depending only on B.

This Theorem completes the picture given by Sositaisvili’s Theo-rem, since the dynamics outside the Centre manifold converges to themanifold exponentially in time.

We skip the proof of the above Theorem and focus directly on thenext one, where a method of approximating the Centre Manifold ispresented.

Theorem 6.4 (Approximation to the Centre Manifold). Supposethat φ(0) = 0, Dφ(0) = 0 and that (cf. eq.(6.1))

Dφ(x) (Ax+ f(x, φ(x)))−Bφ(x)− g(x, φ(x)) = O(|x|q)

98 NLD-draft

as |x| → 0 with q > 1. Then as |x| → 0 the centre manifold satisfies

|h(x)− φ(x)| = O(|x|q).

Proof. Since we are looking for an approximation near the origin,let θ be a C1 function with compact support (i.e., vanishing outside aball) that coincides with φ for |x| sufficiently small. We build with thisfunction the approximant

Nθ(x) = Dθ(x) (Ax+ F (x, θ(x)))−Bθ(x)−G(x, θ(x)),

where F and G are as in Theorem 6.2. Note that Nθ(x) = O(|x|q) nearthe origin, while for the center manifold we have Nh = 0. Considernow the mapping S : Y → X, defined as S(z) = T (z + θ) − θ. HereT : X → X is the operator of Theorem 6.2 and the function z(x) istaken from a closed subset Y of X defined as

Y = z ∈ X : |z| ≤ K|x|q, x ∈ RnIf we manage to specify the constant K in such a way that S(Y ) ⊂ Y ,then S is a contraction mapping and has a unique fixed point. In such acase, z+ θ = h, is the unique fixed point of T and consequently, withinthe support of θ we have, |h(x)−φ(x)| = |h(x)−θ(x)| ≤ |z(x)| ≤ K|x|qand the Theorem is proved.

Let us begin by rewriting S using the definition of T . For z ∈ Y letx(t, x0) be the solution of x = Ax+F (x, z(x)+θ(x)) with the conditionx(0, x0) = x0. Then,

T (z + θ)(x0) =

∫ 0

−∞e−BsG(x(s, x0), z(x(s, x0)) + θ(x(s, x0)))ds

First, we recast θ(x0) as,

θ(x0) =

∫ 0

−∞

d

ds

(e−Bsθ(x(s, x0))

)ds

=

∫ 0

−∞e−Bs

(d

dsθ(x(s, x0))−Bθ(x(s, x0))

)ds

The expression in brackets can be computed as (we do not write thedependency on s, x0 to lighten the notation):

Bθ(x)− d

dsθ(x) = Bθ(x)− θ′(x) (Ax+ F (x, z(x) + θ(x)))

= −Nθ(x)−G(x, θ(x)) + θ′(x) (F (x, θ(x))− F (x, z(x) + θ(x)))

Now we collect the above expressions into

(Sz)(x0) = T (z + θ)(x0)− θ(x0) =

∫ 0

−∞e−BsQ(x(s, x0), z(x(s, x0)))ds

where

Q(x, z) = G(x, θ + z)−G(x, θ)−Nθ(x) + θ′(x)(F (x, θ)− F (x, z + θ)).


Next step is to estimate the size of different quantities as in Theo-rem 6.2. We use indeed the same Lipschitz constants for F and G. Wehave, for some constant R > 0,

|Q(x, z)| ≤ |Q(x, 0)|+ |Q(x, z)−Q(x, 0)| = |Nθ(x)|+ k(ε)|z(x)|≤ (R +Kk(ε))|x|q.

Following Theorem 6.2 again, the computations starting in eq.(6.4) andGronwalls inequality, we have, for t ≤ 0 and γ = r + 2M(r)k(ε),

|x(t, x0)| ≤M(r)|x0|e−γt.With the two previous estimates and the bound |e−Bsy| ≤ eβs|y|

for s ≤ 0 (also from Theorem 6.2), we can estimate

|(Sz)(x0)| ≤ C

∫ 0

−∞eβs(R +Kk(ε))M(r)q|x0|qe−qγs.

Hence,

|(Sz)(x0)| ≤(C(R +Kk(ε))

M(r)q

β − qγ

)|x0|q = C1|x0|q.

Taking ε and r sufficiently small we can assure β− qγ > 0 and further,since k(ε) goes to zero with ε independently of the other bounds, wecan have C1 ≤ K, and the Theorem is proved.

Remark 6.3. The last three theorems have a fundamental impor-tance to understand the qualitative dynamics near a non-hyperbolicfixed point with stable local hyperbolic subspace. The dynamics of initialconditions sufficiently close to the fixed point “mimic” the dynamics onthe Centre Manifold (“differences” among both dynamics decrease ex-ponentially). On this manifold, the dynamics is described by a reducedequation (in a lower dimensional space), and it can be approximated bya polynomial vector field. Hence, these three theorems accomplish theprogramme of Sositaisvili’s Theorem, providing additionally an approx-imation scheme for the computation of h(x).

Remark 6.4. The Centre Manifold need not be unique. Indeed,Theorem 6.2 starts by replacing the original problem with a truncation.Within the truncation, the fixed point theorem is used and solutions areunique, but the arbitrariness of the truncation allows us to constructdifferent Centre Manifolds for the same situation. However, if h1(x),h2(x) are two Centre Manifolds, because of Theorem 6.4 we have that|h1(x) − h2(x)| = O(|x|q) as x → 0 for any q > 1. If f and g are Ck,k > 2, the Centre Manifold is also Ck, however if f and g are analytic,the Centre Manifold need not be analytic.

Example 6.1. Let us illustrate the ideas behind the centre manifoldfor the case with no unstable directions in the linearisation. In Figure6.2 a true orbit of the system is depicted in full line. The vertical

100 NLD-draft

True orbit

Stable manifold

Projected orbit

Centre manifold

Figure 6.2. Schematic view of a true orbit and its“companion”, the projected orbit on the centre manifold.

lines represent the product of this orbit with the (1-d) stable manifold.The blue dotted curve represents the projection of the true orbit alongthe stable manifold onto the centre manifold. The true orbit and itsprojection approach each other exponentially fast.

6.4. Centre Manifold Theorems for Maps

There exist three parallel theorems for maps with non-hyperbolicfixed points. The theorems are essentially a direct translation of theirdifferential equation versions. We state the theorems without proof.

Consider the system

xn+1 = Axn + f(xn, yn)yn+1 = Byn + g(xn, yn),

(6.5)

where A and B are square matrices, the eigenvalues of A have modulusone and those of B modulus smaller than one, and f and g are C2

functions vanishing at the origin together with their first derivative.Further, for functions φ(x) such that φ(0) = 0, φ′(0) = 0 we define

(Mφ)(x) = φ(Ax+ f(x, φ(x)))−Bφ(x)− g(x, φ(x)). (6.6)

For a Centre Manifold h as in the next Theorem we have Mh = 0.

Theorem 6.5. There exists a C2 function y = h(x) such thath(0) = 0, h′(0) = 0 and for |x| ≤ ε h satisfies

xn+1 = Axn + f(xn, h(xn))h(xn+1) = Bh(xn) + g(xn, h(xn)).

(6.7)

Theorem 6.6. If φ ∈ C1 and (Mφ)(x) = O(|x|q) as x → 0 forsome q > 1 then |h(x)− φ(x)| = O(|x|q) as x→ 0.

6.5. COMPUTATION OF CENTRE MANIFOLD APPROXIMATIONS 101

Theorem 6.7. (a) If the zero solution of un+1 = Aun+f(un, h(un))is stable (asymptotically stable, unstable), then the zero solution ofeq.(6.5) is stable (asymptotically stable, unstable).(b) If the zero solution of eq.(6.5) is stable, then for a solution (xn, yn)of eq.(6.5) with initial condition (x0, y0) sufficiently small, there existsa solution un of un+1 = Aun + f(un, h(un)) such that |xn − un| ≤ Kβn

and |yn − h(xn)| ≤ Kβn for all n, K > 0 and 0 < β < 1.

6.5. Computation of Centre Manifold Approximations

We will produce a chain of systematic approximations to the CentreManifold. To fix ideas, let us assume that f and g can be derivatedarbitrarily many times (a situation frequently assumed in applications)and expanded in power series. Theorems 6.4 and 6.6 suggest a proce-dure.

6.5.1. Centre Manifold Approximation for Flows. For eachinteger k ≥ 1, let us call φk(x) the approximation to h(x) given byTheorem 6.4 taking q = k + 1. Since f and g are at least quadratic,it is clear that φ1(x) = 0 is an approximation to the Centre Manifoldwith errors O(|x|2). It suffices to plug it in eq.(6.3). Consider furtherthe chain of approximations φk+1(x) = φk(x) + ∆k+1(x), with ∆k+1(x)a homogeneous polynomial of degree k + 1 in x. Our goal is to choose∆k+1(x) in such a way that Theorem 6.4 is fulfilled to order q = k+2, sothat the successive approximations φm(x) differ from the correct CentreManifold h(x) in O(|x|m+1). We perform the construction inductively,starting with φ1(x) = 0. Assuming that a good choice of ∆k and φk hasbeen done up to order k, we seek for a φk+1(x) that satisfies eq.(6.3):

d

dx(φk(x) + ∆k+1(x)) (Ax+ f(x, φk(x) + ∆k+1(x)) =

B (φk(x) + ∆k+1(x)) + g(x, φk(x) + ∆k+1(x)) +O(|x|k+2),

which can be recasted as

d

dx(∆k+1(x))Ax−B∆k+1(x) =

Bφk(x) + g(x, φk(x))− dφk(x)

dx(Ax+ f(x, φk(x))) +O(|x|k+2),

Some terms involving ∆k+1(x) (inside f and g) appear to be missingin this last equation when compared with the previous one. Actually,because of the form of f and g, those terms are of order O(|x|k+2) orhigher and need not be written explicitly. Both equations are indeedequivalent. Moreover, the RHS of the last equation has no contribu-tions of degree k or lower, these terms have been taken care of by theinductive procedure, starting at k = 1. Hence the explicitly writtenpart of the RHS is a known homogeneous polynomial of degree k + 1,while the LHS is also a homogeneous polynomial of the same degree

102 NLD-draft

and unknown coefficients. By equating both polynomials, we solve forthe unknown coefficients. It remains to prove that the equation alwayshas unique solution. This is equivalent to stating that the equationddx

(∆k+1(x))Ax− B∆k+1(x) = 0 has the unique solution ∆k+1(x) = 0.When A and B are diagonalisable, the proof is rather straightforward,while it is more involved with technicalities for the general case.

Example 6.2. We compute approximations to the Centre Manifoldfor the system

x = y

y = −y + yx+ x2

which has a non-hyperbolic fixed point at the origin. First we move toa coordinate system where the linearisation at the origin is diagonal,namely: u = x+ y and v = y 1. We rewrite:

u = u(u− v)

v = −v + u(u− v),

looking for approximations v = φk(u). φ1(u) = 0 fulfills Theorem 6.4with errors O(|x|2). Letting φk+1(u) = φk(u) + ak+1u

k+1, we computeeq.(6.3) for this case, namely:(

dφkdu

+ ak+1(k + 1)uk)

(u(u− φk(u)− ak+1uk+1)) =

−(φk(u) + ak+1u

k+1)

+ u(u− φk(u)− ak+1uk+1).

For k = 1 we have 2a2u2(u2−a2u

3) = −a2u2 +u2−a2u

3. The equationis satisfied with errors O(|x|3) by letting a2 = 1 and φ2(u) = u2. Fork = 2 we have (2u+3a3u

2)(u2−u3−a3u4) = −(u2+a3u

3)+u2−u3−a3u4,

giving a3 = −3 and φ3(u) = u2 − 3u3. A subsequent step gives a4 = 11and φ4(u) = u2−3u3 +14u4. The dynamic splits approximately into anexponential decay to the manifold v = φ4(u) +O(|x|5) and a dynamicson the manifold given by u = u2 − u3 + 3u4 − 14u5 +O(|x|6).

6.5.2. Centre Manifold Approximation for Maps. The equa-tion to consider now is eq.(6.6). Again, we inductively construct suc-cessive approximations to h for k ≥ 1 of the form φk+1(x) = φk(x) +∆k+1(x), with φ1(x) = 0 and ∆k+1(x) a homogeneous polynomial ofdegree k + 1 in x satisfying eq.(6.6) up to order k + 1, after suitablerewriting:

φk+1(xn+1) = φk+1(Axn + f(xn, φk+1(xn)))

= (φk + ∆k+1)(Axn + f(xn, φk+1(xn)))

= B(φk(xn) + ∆k+1(xn)) + g(xn, φk+1(xn)) +O(|x|k+2).

1Eigenvectors are defined up to a constant. Hence there are different (althoughequivalent) choices for the subsequent calculations. For example, u = x+y, v = −ywork equally well.

6.6. RELATION OF CENTRE MANIFOLD THEORY WITH BIFURCATIONS103

Let us work now with the last equality. Using the same argument asbefore, f(xn, φk+1(xn)) = f(xn, φk(xn))+O(|x|k+2), and similarly for g.Also, ∆k+1(Axn + f(xn, φk+1(xn))) = ∆k+1(Axn) +O(|x|k+2). Hence,

∆k+1(Axn)−B∆k+1(xn)

=Bφk(xn) + g(xn, φk(xn))− φk(Axn + f(xn, φk(xn)))

+O(|x|k+2).

Induction and Theorem 6.6 assure that the RHS has no contributionsof order |x|k or lower. Both ∆k+1(x) and the LHS of the equation areunknown homogeneous polynomials of degree k+ 1 while the RHS is aknown homogeneous polynomial of the same degree, fully computableafter having solved step k. Equating both polynomials we obtain thecoefficients of ∆k+1(x). Again, it remains to prove that the equationalways has unique solution. This is equivalent to stating that the equa-tion ∆k+1(Axn)−B∆k+1(xn) = 0 has the unique solution ∆k+1(x) = 0.When A and B are diagonalisable, the proof is, again, rather straight-forward, while it is more involved with technicalities for the generalcase.

Example 6.3. The system

xn+1

yn+1

==

xn + xnynyn/2− xn2

==

xn + f(xn, yn)yn/2 + g(xn, yn)

has a non-hyperbolic fixed point at the origin. The linearisation isalready in diagonal form with eigenvalues 1 and 1/2. Letting k = 1, wehave φ1(x) = 0, while eq.(6.6) reads a2x

2/2 = −x2 +O(|x|3), obtaininga2 = −2 and φ2(x) = −2x2. We notice that a3 = 0 since there areno third-order terms in the defining equation, while a4 = −16 and theCentre Manifold reads h(x) = −2x2 − 16x4 + O(|x|5). The dynamicson the manifold is approximated by xn+1 = xn−2x3

n−16x5n +O(|xn|6).

6.6. Relation of Centre Manifold Theory with Bifurcations

The moment has arrived to connect the Centre Manifold Theorywith the bifurcation discussion at the beginning of the Chapter. Thegoal was to be able to study what happens on a parameter dependentfamily of dynamical systems when some linearisation eigenvalues be-come non-hyperbolic. Since this phenomenon affects a few eigenvaluesat a time, it is desirable to study it on a reduced space, covering onlythe non-hyperbolic part of the system without being forced to carrythe hyperbolic part behind.

Sositaisvili’s Theorem as well as the reduction to the Centre Man-ifold hold for just one dynamical system, with fixed vector field. How-ever, the bifurcation problem deals with a parametric family of vectorfields, where the fixed point and its linearisation eigenvalues depend on

104 NLD-draft

the parameter. At some special parameter value, the system ceases tobe hyperbolic and the goal is to understand the whole transition.

Centre Manifold theory is particularly adapted to study such aproblem. The strategy is to extend the dynamical equations with anadditional equation for the parameter, namely µ = 0 or µn+1 = µnfor maps (of dimension p if µ consists of p real-valued parameters). Inthis way, we are treating the parameters as extra coordinates and theadditional equation indicates that for each dynamical system the pa-rameters are actually constant. The enhanced system treats then thewhole family at once, at the cost of increasing the dimensionality.

At the special value µ0, both the original system and the extendedsystem have a non-hyperbolic fixed point, while the Centre Manifold forthe extended system (of dimension d+ p, where x ∈ Rd) now includesthe parameter dependency near the fixed point on the x equation forsmall deviations µ − µ0. The hyperbolic part, collected in the vw (ory) coordinates, has been decoupled from the non-hyperbolic part. Inthe extended description, the non-hyperbolic part has a µ-dependencyintertwined with the x-dependency. On this lower-dimensional system,with d + p equations (where p of them are trivial), we will address inthe next Chapter the study of the bifurcation transition. The extendeddescription does not change the fact that the original system was as-sumed to become non-hyperbolic only for µ = µ0, only that since nowµ behaves essentially as a coordinate, we can study the transition hy-perbolic → non-hyperbolic while varying µ. We address the details ofthis approach in the next Chapter.

6.7. Exercises

Exercise 6.1. Compute a reduction to the Centre Manifold up toorder 5, i.e., error of size O(|x|6), for the system

x = xy − x3 + xy2

y = −y + x2 + x2y,

near the origin. Compute a dynamical equation on the manifold. Is theorigin stable?

Exercise 6.2. Compute a reduction to the Centre Manifold for thesystem

x = 10(y − x)

y = x− y − xz

z = xy − 8

3z

6.7. EXERCISES 105

Exercise 6.3. Compute a reduction to the Centre Manifold for thesystem

xn+1 = −xn − yn2

yn+1 = −1

2yn + xn(yn + xn)

Exercise 6.4. In the system

x = y

y = µx− y − x2

consider µ as a third variable with dynamics µ = 0. Compute thereduction to the Centre Manifold (that now is 2-dimensional) and theapproximated dynamics on the manifold.

Exercise 6.5. In the system

x = µx− yy = x+ µy + y2 + x2

consider µ as a third variable with dynamics µ = 0. Compute thereduction to the Centre Manifold (that now is 2-dimensional) and theapproximated dynamics on the manifold.

CHAPTER 7

Local Bifurcations of Fixed Points

7.1. Local Bifurcations of Vector Fields

We will now proceed with the bifurcation analysis starting fromthe simplest possible situation of a one-parameter family of dynamicalsystems. Some of the ideas presented here have been developed from[13., 14.].

Having only one real parameter, the simplest (and typical) situ-ation is to have a fixed point where all eigenvalues of its linearisa-tion are different. Assume the fixed point is hyperbolic to begin with.Then the eigenvalues of the linearization Dxf(x, λ) are non–zero andby the Implicit Function Theorem there exists a curve x(λ) of hy-perbolic fixed points. Moreover, the eigenvalues vary independentlyof each other, and hence we may assume that only one real part ofan eigenvalue is approaching zero at some value λ∗. Other situations(double eigenvalues, two “unrelated” eigenvalues crossing zero simulta-neously, etc.) require additional constraints and break down in front ofarbitrary small perturbations. We will start the analysis by assumingalso that at a non–hyperbolic condition the dependency on λ is trans-verse, technically that detDλf(x0, λ0) 6= 0. This situation is called acodimension–one bifurcation, i.e., there is a one–parameter familyof fixed points crossing a non—hyperbolic condition transversally.

Remark 7.1. Codimension–one bifurcations are the typical phe-nomenon found in applications, since it is difficult to minutely controlseveral parameters at the same time. However there are plenty of ex-perimental studies where codimension–two (and even three) problemsare encountered.

Since only one eigenvalue and consequently only one eigenvector orinvariant manifold is failing hyperbolicity, after simplifying the prob-lem as much as possible through the Theorems of Hartman–Grobman,Ladis and Sositaisvili, the simplest situation we encounter is wherethe linearisation block A is real and one–dimensional. The next sim-plest is where the eigenvalues of A are complex conjugated and crosstransversally the imaginary axis. Let us study these two cases in detail.Having only one parameter to resort to, other situations than these twoare exceptional and belong in higher–codimension studies.

107

108 NLD-draft

7.1.1. The Saddle–node Family of Bifurcations. The situa-tion we encounter is that after a suitable coordinate transformation, onn−1 coordinates the dynamics is essentially linear, hyperbolic and com-paratively simple, while there is a one–dimensional decoupled problemx = f(x, λ) (or, rather, two–dimensional adding the auxiliary equa-

tion λ = 0) where changes are unavoidable (topological equivalencealong the λ-axis cannot occur). We summarise the non—hyperbolicand transversality conditions (these will be our basic assumptions):

• Fixed point: f(x0, λ0) = 0• Non–hyperbolicity: ∂f

∂x(x0, λ0) = 0

• Transversality: ∂f∂λ

(x0, λ0) 6= 0

The second condition signals the break down of the one–parametercurve of fixed points used in the fully hyperbolic case, since a basiccondition of the Implicit Function Theorem is not fulfilled. However,the third condition allows the use of this Theorem in the “other” di-rection.

Lemma 7.1. Under the basic assumptions, the one–dimensionalsystem x = f(x, λ) has a unique curve of fixed points λ(x) for asufficiently small interval around x0, with λ(x0) = λ0. Moreover,dλdx

(x0) = 0.

Proof. The existence of the curve is assured by the Implicit Func-tion Theorem. Along this curve we have that f(x, λ(x)) = 0. Takingthe x–derivative on both sides, we have ∂f

∂x+ (∂f

∂λ)dλdx

= 0. Hence, using

the basic assumptions again, we have dλdx

(x0) = 0.

Remark 7.2. Recall that the Implicit Function Theorem states alsothat the function λ(x) is derivable, with derivative

dλ

dx(x0) = −

∂f∂x

(x0, λ0)∂f∂λ

(x0, λ0).

The numerator on the right hand side is zero because of the basic as-sumptions.

Computing the second derivative of the fixed point equation at thebifurcation condition we arrive at:

Theorem 7.1 (Saddle-node Bifurcation). Under the basic assump-

tions and provided ∂2f∂x2

(x0, λ0) 6= 0, there exists a quadratic curve offixed points λ(x) for a sufficiently small interval around x0. The fixedpoints are hyperbolic for x 6= x0 and come in pairs with opposite stabil-ity for each fixed λ.

Proof. By Lemma 7.1 we know that the function λ(x) exists andthat its first derivative is zero at the bifurcation point. The second

7.1. LOCAL BIFURCATIONS OF VECTOR FIELDS 109

derivative of the fixed point equation reads

∂2f

∂x2+ 2

∂2f

∂λ ∂x

dλ

dx+∂2f

∂λ2(dλ

dx)2 +

∂f

∂λ

d2λ

dx2= 0.

Hence, at the bifurcation condition we have

∂2f

∂x2(x0, λ0) = −∂f

∂λ(x0, λ0)

d2λ

dx2(x0) 6= 0.

Therefore, the function λ(x) reads:

λ(x) = λ0 + c(x− x0)2 +O[3],

where

2c = −∂2f∂x2

(x0, λ0)∂f∂λ

(x0, λ0)6= 0.

We use the notation O[n] to indicate that a given expression is writtenaccurately up to order n−1 in (x−x0), (λ−λ0), i.e., in all variables ofthe extended problem, while terms of power n and higher are concealedin the O[· · · ]. The goal of such a partition is that for sufficiently smalldeviations, the O[· · · ] terms do not influence the qualitative behaviourof the system. In order to compute the stability of the fixed point(s)along the curve, from the x–derivative of the fixed point equation, weobtain:

∂f

∂x= −∂f

∂λ

dλ

dx.

The first factor on the right hand side is non–zero at the bifurcationpoint because of the basic assumptions and by continuity it remainsnon–zero on a sufficiently small x–interval. The second factor readsdλdx

= 2c(x − x0) + O[2] and changes sign at x = x0. Hence, ∂f∂x

hasdifferent signs at each side of x0.

Another approach to arrive at the previous Theorem is the follow-ing. A perhaps simpler way to reproduce the saddle–node analysis isthe following.

Lemma 7.2. Let f(x, λ) be expansible in Taylor series around (λ0, x0)as

f(x, λ) = a(λ− λ0) + g(x− x0) + b(λ− λ0)(x− x0) + c(x− x0)2

+d(λ− λ0)2 +O[3].

Under the basic assumptions, and if ∂2f∂x2

(x0, λ0) 6= 0, then a suitablechange of coordinates reduces the Saddle-node problem to the form

f(y, µ) = µ+ cy2 +O[3].

where now the critical point lies at (µ0, y0) = (0, 0).

110 NLD-draft

λ0

λ

x

x0

Figure 7.1. A bifurcation diagram for the saddle–nodebifurcation. Full line corresponds to a stable fixed point.Vertical lines show typical trajectories for different valuesof λ.

Proof. Note that a 6= 0, c 6= 0 and g = 0 because of the assump-tions of the Lemma. Let y = (x− x0) + b

2c(λ− λ0) and rewrite:

f(y, λ) = a(λ− λ0) + (d− b2

4c)(λ− λ0)2 + cy2 +O[3]

The expression µ(λ) = a(λ − λ0) + (d − b2

4c)(λ − λ0)2 is monotonic

and crosses zero for (λ− λ0) sufficiently small. By shifting the origin,stretching and rescaling the λ coordinate, we may recast it as µ. Hence,the function f(y, µ) = µ + cy2 + O[3] satisfies the assumptions of theTheorem at the critical point (µ0, y0) = (0, 0).

We will use this technique in the coming subsections.7.1.1.1. The Bifurcation Diagram. A Bifurcation diagram con-

sists of a graph of the fixed point curve(s) along with the stability of thefixed points for each value of λ, depicted in (x, λ) space, at the vicinityof (x0, λ0). It is customary to place the phase space coordinate in thevertical axis and the parameter in the horizontal axis. For the case ofa saddle–node bifurcation there are four possible diagrams, depending

on the signs of ∂f∂λ

(x0, λ0) and ∂2f∂x2

(x0, λ0). See Figure 7.1 for the case∂2f∂x2

< 0 < ∂f∂λ

where the stable fixed point appears for x > x0. Othercombinations of signs produce mirror images of this picture along oneor both axes.

The previous scenario may be altered when additional constraintsare imposed on the system. We will address the two simplest casessince both yield novel results. Imposing further constraints does notalter the qualitative picture around the bifurcation.


7.1.1.2. The Transcritical Bifurcation.

Theorem 7.2. Let f(x, λ) be expansible in Taylor series around(λ0, x0). Assume that the following saddle-node conditions hold, namely

f(x0, λ0) = 0, ∂f∂x

(x0, λ0) = 0 and ∂2f∂x2

(x0, λ0) 6= 0, along with the ad-

ditional assumptions ∂f∂λ

(x0, λ0) = 0, ∂2f∂λ ∂x

(x0, λ0) 6= 0. Then f can berewritten as

f(y, µ) = bµy + cy2 +O[3],

after a suitable coordinate transformation, where now the critical pointoccurs at (µ0, y0) = (0, 0). The bifurcation diagram displays two hyper-bolic fixed points of opposite stability, collapsing into one at the non-hyperbolic condition µ = 0.

Proof. We leave to the reader to work out the coordinate transfor-mation and parameter renaming-rescaling. We compute the bifurcationdiagram noting that the system has a fixed point at y0 = 0, for any µ-value, and another fixed point at y1 = −bµ/c. It is also straightforwardto compute the linearisation eigenvalues ±bµ.

λ0 λ

x

x0

Figure 7.2. A bifurcation diagram for the Transcriticalbifurcation. Full line corresponds to a stable fixed point.Vertical lines show typical trajectories for different valuesof λ.

The bifurcation diagram is shown in Figure 7.2 for the case b > 0,showing explicitly that the origin of coordinates lies at (x0, λ0) (corre-sponding diagrams occur for different choices of signs for b and c). Thisbifurcation is called the Transcritical Bifurcation. It has two fixedpoints that “collapse” at the bifurcation point, exchanging stabilityproperties.

7.1.1.3. The Pitchfork Bifurcation.

Theorem 7.3. Under the assumptions of the Transcritical Theo-

rem, with the modified constraints ∂2f∂x2

(x0, λ0) = 0 and ∂3f∂x3

(x0, λ0) 6= 0,a suitable coordinate transformation recasts the problem as

f(y, µ) = bµy + dy3 +O[4].

112 NLD-draft

λ0λ

x

x0

Figure 7.3. A bifurcation diagram for the Pitchforkbifurcation. Full lines correspond to stable fixed points.Vertical lines show typical trajectories for different valuesof λ.

The bifurcation diagram displays one fixed point at one side of thebifurcation, changing its stability at the bifurcation point. A pair ofhyperbolic fixed point of opposite stability (compared with that of the“original” point) branch out of the non-hyperbolic point.

Proof. We leave again the computation of the new coordinates tothe reader. The process gets slightly more cumbersome, since now wehave to deal with cubic terms as well. Hint: Try with a transformationof the kind x = y + mµ + pµ2 + qy2 and λ = µ + nµ2 and show forsmall (y, µ) the arbitrary parameters m, n, p and q can be adjusted toachieve the goal.

Analysing the resulting equation we note that there is a fixed pointat y0 = 0, occurring for all values of the parameter µ, while a pair offixed points appear at y1,2 = ±

√−bµ/d, provided −bµ/d > 0. We

evaluate the y–derivative g′ = bµ + 3dy2 at the fixed points to com-pute the stability, yielding bµ for y0 and −2bµ for the pair y1,2. Thebifurcation diagram for b > 0 > d is shown in Figure 7.3.

The name Pitchfork arises from the shape of the fixed point curves.Again, there are four corresponding diagrams with changed orientationand/or stability depending on the signs of b and d.

Remark 7.3. Since both Transcritical and Pitchfork require one,respectively two additional constraints on top of the saddle–node con-ditions of Theorem 7.1, they are less likely to appear in experimentalsetups, except for the case of highly symmetric systems. A symmetricsystem presenting e.g., a Pitchfork, upon an arbitrarily small generalperturbation breaking the symmetry may fall into a situation with a


saddle–node bifurcation and an unaffected hyperbolic fixed point. SeeFigure 7.4.

λ

x

λ

x

Figure 7.4. A non–symmetric perturbation may mod-ify the Pitchfork into a saddle–node.

7.1.2. The Hopf Bifurcation. The next level of complicationarises when we still have one parameter but now a single pair of com-plex conjugated linearisation eigenvalues become non–hyperbolic bycrossing the imaginary axis. Sositaisvili’s reduction becomes now twodimensional. The basic assumptions read (shifting the fixed point andthe critical parameter value to the origin as it is now customary):

1. Fixed point: f1(0, 0, λ) = 0, f2(0, 0, λ) = 0.2. Complex eigenvalues:

D

(f1

f2

)(0, 0, λ) =

(α(λ) −β(λ)β(λ) α(λ)

)= A(λ)

where α(λ) = dλ, d 6= 0 and β(λ) = ω + cλ, ω 6= 0.

A formal MacLaurin expansion of a general problem compatible withthe basic assumptions, up to second order reads

x = α(λ)x− β(λ)y + a1x2 + b1xy + c1y

2 +O(3)

y = β(λ)x+ α(λ)y + a2x2 + b2xy + c2y

2 +O(3)

Let us now work out a polynomial change of coordinates, simplifyingthe problem, a similar idea to what was sketched previously with thesaddle–node family of bifurcations. We note on passing that the proce-dure repeated in this Chapter of simplifying the vector field as much aspossible through coordinate transformations is part of a general methodknown as Normal Form Theory.

We take a constructive approach by proposing a change of coordi-nates x = u + h1(u, v), y = v + h2(u, v) where the h’s are sufficientlynice quadratic functions to be determined under the condition that thesecond order terms in the MacLaurin expansion above become as sim-ple as possible (zero if possible). It is easier to work out formally thetransformation in matrix form:(

xy

)=

(1 + ∂h1

∂u∂h1∂v

∂h1∂u

1 + ∂h2∂v

)(uv

)= (1 +Dh)

(uv

)

114 NLD-draft

Hence,(uv

)= (1 +Dh)−1A(λ)

(u+ h1

v + h2

)+

(a1u

2 + b1uv + c1v2 +O(3)

a2u2 + b2uv + c2v

2 +O(3)

)We replace (1 +Dh)−1 by 1 − Dh since this introduces errors onlyof O(3) or higher. Rewriting this expression in order to make themodification explicit, we obtain(uv

)=A(λ)

(uv

)+

A(λ)

(h1

h2

)−Dh A(λ)

(uv

)+

(a1u

2 + b1uv + c1v2 +O(3)

a2u2 + b2uv + c2v

2 +O(3)

)If we demand the second order terms in this equation to be identicallyzero, we obtain a linear system of six equations in the six coefficientsentering in h1 and h2. The right hand side of the system is given bythe six corresponding problem-dependent coefficients ai, bi, ci, i = 1, 2.

The good news is that this system has unique solution for a suffi-ciently small λ–interval around zero. This is because the determinantof the coefficient matrix reads (α2 + β2)2(α2 + 9β2) and is nonzero for|λ| sufficiently small since β(λ = 0) = ω 6= 0.

The final general form, up to third order reads,(uv

)= A(λ)

(uv

)+ (u2 + v2)

(a −bb a

)(uv

)+ o(3).

In the same spirit, we may extend the change of coordinates toy = v+h2(u, v)+h3(u, v) where h3 is an homogeneous cubic polynomial,to be determined under the condition that the third order terms inthe MacLaurin expansion above become as simple as possible. Thecalculation is much lengthier. The general idea is exposed in [4.]. Thebottom line is that third-order terms cannot be completely eliminated,since the resulting equation system for the coefficients of h3 does nothave unique solution. Hence, some cubic terms will always be presentin the final general form. Under the general condition a 6= 0 andsufficiently close to the bifurcation point (0, 0, 0) in (x, y, λ)–space, wehave

Theorem 7.4 (Andronov–Hopf Bifurcation Theorem). Let f(x, y, λ)be a family of vector fields with singular solution (x, y)(t) ≡ (0, 0) forall λ. Let α(λ) + iβ(λ) be a pair of complex conjugated eigenvalues ofthe matrix Df(0, 0, λ) that crosses the imaginary axis at λ = 0 withnon–zero speed, i.e., α(0) = 0, β(0) 6= 0 and α′(0) 6= 0. Further assumethat no other eigenvalue is an integer multiple of iβ. Then a family (in-dexed by λ) of periodic trajectories bifurcates from (x, y, λ) = (0, 0, 0).The periods of these orbits approach 2π

βas λ→ 0.

Proof. The proof can be constructed by working out the bifurca-tion diagram of the general form, sufficiently close to the bifurcation

7.2. LOCAL BIFURCATIONS OF FIXED POINTS OF MAPS 115

point, including the third order terms. It is more illustrative to recastthe problem in polar coordinates, x = r cosφ, y = r sinφ, disregardinghigher–order terms:

r = dλr + ar3

φ = ω + br2,

where d, a and ω are nonzero. Note that the term in b may be dis-regarded as well for r small enough, since φ will be nonzero for anychoice of b if r is small enough.

The structure of a Pitchfork bifurcation can be read on the r–equation. This means that apart from a fixed point at r = 0, a so-lution with constant r =

√−dλ/a bifurcates whenever the argument

of the root is positive (recall that r is nonnegative, so there are noother solutions to r = 0). This solution is a periodic orbit, since ris constant and positive while φ varies monotonically. The sign of ωdetermines the sense of circulation of the orbit, while the relative signsof d and a specifies which of four possible bifurcation diagrams occur.See Figure 7.5 for the case d > 0 > a, w > 0.

λ0 λ

x

y

x

Figure 7.5. A bifurcation diagram for the Hopf bifurca-tion. Full line corresponds to a stable fixed point. Whenthe fixed point changes stability, a stable periodic orbitspawns out.

Remark 7.4. When the bifurcation generates a stable limit cycle,it is called supercritical, otherwise subcritical.

7.2. Local Bifurcations of Fixed Points of Maps

Intuitively, since all flows generate time–one maps, one may expectthat the effects of local bifurcations on flows will transport to mapsone way or the other. Bifurcations of fixed points of maps must exhibit

116 NLD-draft

at least as much complication as that shown by a family of time–onemaps when the underlying flow–family undergoes a bifurcation.

Since there is a map version of the Centre Manifold Theorems, wewill start the analysis following the spirit of the previous sections, i.e.,by considering a general map under non–hyperbolic conditions, simpli-fying its expression as much as possible via changes of coordinates andfinally studying the resulting family of maps. Again, the simplest bifur-cations occur when the hyperbolicity condition of Hartman–Grobmantheorem breaks down as one eigenvalue of the linearisation of the mapat the fixed point crosses 1 or −1, or when a pair of complex conjugatedeigenvalues crosses the unit circle. Many features of the treatment re-semble those of flows. We give here a summarised description, leavingthe details to the reader, when they just mimic the corresponding com-putations for flows.

7.2.1. The Saddle–node Family of Bifurcations of Maps.The setup here is a map xn+1 = F (xn, λ) with the following basicassumptions (using coordinates such that the fixed point and thecritical parameter value occur at the origin):

1. Fixed point at the origin: F (0, 0) = 02. Non–hyperbolicity: ∂F

∂x(0, 0) = 1

3. Transversal crossing: ∂F∂λ

(0, 0) 6= 0

Theorem 7.5. (Saddle-node for maps) Under the basic assump-

tions, and given ∂2F∂x2

(0, 0) 6= 0, a general map F (x, λ) that admits aTaylor expansion in x and λ can be taken to the form

xn+1 = λ+ xn + cxn2 +O([3])

by changes of coordinates. The associated dynamics is that of a pairof hyperbolic fixed points of opposite stability branching out of the non-hyperbolic situation.

Proof. The central part of the proof is the coordinate change,which can be performed paralleling the procedures of the previous sec-tion.

As in the case of flows, the non–hyperbolicity condition leads inthis case also to the breakdown of the Implicit Function Theorem con-ditions to find a unique curve x(λ) of fixed points, while the transversalcrossing condition permits to use the Theorem the other way around,to establish the existence of a curve λ(x).

Theorem 7.6. (Transcritical and Pitchfork for maps) Underthe assumptions of the previous Theorem, modified as:(a) ∂F

∂λ(0, 0) = 0 while ∂2F

∂λ ∂x(0, 0) 6= 0 or

(b) ∂F∂λ

(0, 0) = 0, ∂2F∂x2

(0, 0) = 0 while ∂2F∂λ ∂x

(0, 0) 6= 0 and ∂3F∂x3

(0, 0) 6= 0

7.2. LOCAL BIFURCATIONS OF FIXED POINTS OF MAPS 117

then a general map F (x, λ) that admits a Taylor expansion in x and λcan be taken to the form (respectively)

(a) : xn+1 = (1 + λ)xn + cxn2 +O([3])

(b) : xn+1 = (1 + λ)xn + dxn3 +O([4])

by changes of coordinates.

The bifurcation diagrams mimic completely those already displayedfor flows.

7.2.2. The Flip or Period–Doubling Bifurcation. Probablythe most famous difference between flows and maps is given by thePeriod–doubling, or Flip, bifurcation. Indeed, in maps there are twoways of obtaining a non–hyperbolic dynamical system where only onereal eigenvalue is involved. Apart from the saddle–node type of non–hyperbolicity (in its three flavours) discussed above (paralleling in mostrespects the behaviour in flows), we have to consider the case of aneigenvalue λ = −1.

Assume then that F (x, λ) has a non–hyperbolic fixed point at (0, 0)with eigenvalue −1. By the Implicit Function Theorem, F will have acurve x(λ) of fixed points and no other fixed points sufficiently close tothis curve. Note that in this case the argument goes in a completelydifferent way as compared with the saddle–node situation, since thereis no “breakdown” of the conditions of the Implicit Function Theoremassociated to the non–hyperbolicity condition.

Let us consider, however G(x, λ) = F (F (x, λ), λ), the second iter-ated map. G has also a non–hyperbolic fixed point at (0, 0) but nowwith eigenvalue +1. Suppose now that the conditions for a Pitchforkbifurcation are fulfilled by G. The fixed point of F is also a fixed pointof G (the whole curve indeed), but the additional (unique) pair of fixedpoints (x1, x2) of G branching away from the bifurcation are not fixedpoints of F (F has a unique curve of fixed points x(λ)). Hence, wemust have F (x1) = x2 and F (x2) = x1 (otherwise G would have morethan three fixed points), i.e., a period–2 orbit. More formally:

Theorem 7.7 (Period–Doubling Bifurcation of Maps). Let F (x, λ)be a C3 map in both variables and let G(x, λ) = F (F (x, λ), λ). Underthe assumptions

1. F (0, 0) = 0

2. ∂F∂x

(0, 0) = −1

3. ∂G∂λ

(0, 0) = 0

4. ∂2G∂x2

(0, 0) = 0

5. ∂2G∂x ∂λ

(0, 0) 6= 0

6. ∂3G∂x3

(0, 0) 6= 0

118 NLD-draft

there exists a curve of period–2 points of F for a sufficiently smallinterval around (x, λ) = (0, 0). At both sides of the bifurcation inparameter space, the fixed point is hyperbolic and switches stability.The period–2 points have opposite stability compared to the fixed point.

Proof. Most of the proof is already done. Only the final issue onstability is left. A general C3 map satisfying the first two conditionsreads

F (x, λ) = λ(α+βλ+γλ2)+x(−1+aλ+ bλ2)+x2(c+gλ)+dx3 +O[4].

Moreover, for |λ| sufficiently small we may further simplify the expres-sion to F (x, λ) = λ + x(−1 + aλ) + cx2 + dx3 + O[4] by subsequentrescalings. Computing the iterated map G it is straightforward to ver-ify that conditions 3 and 4 are fulfilled. Conditions 5 and 6 definingthe Pitchfork scenario for G can be recasted in terms of the derivativesof F at (0, 0):

5’. 2 ∂2F∂x ∂λ

+ ∂F∂λ

∂2F∂x26= 0

6’. 12(∂

2F∂x2

)2 + 13∂3F∂x36= 0

By direct computation we can now verify that condition 5’ guaran-tees that the common fixed point of F and G changes stability at thebifurcation (regarded as a fixed point of either F or G).

7.2.3. The Hopf Bifurcation on Maps. Next in the series ofnon–hyperbolic situations is when a pair of complex conjugated eigen-values crosses the unit circle in the complex plane. The correspondingbifurcation for maps has been popularised under the name “Hopf”, de-spite the fact that Hopf Theorem deals with flows. The map theoremwas proved by Naimark in 1959. We deal here with a 2–dimensionalproblem with a fixed point at the origin. The simplest formulation ofthis bifurcation theorem reads,

Theorem 7.8. Let F (x, y, λ) be a map on R2 depending on oneparameter with a smooth family of fixed points (x(λ), y(λ)) with a pairof complex conjugated eigenvalues σ(λ). Without loss of generality letx(0) = 0 = y(0). Assume

H1. |σ(0)| = 1 and σ(0)j 6= 1, for j = 1, 2, 3, 4

H2. ∂|σ|∂λ

(0) = d 6= 0

Then there is a smooth coordinate change mapping F to (in polar co-ordinates):

F (r, θ, λ) = (r(1 + dλ+ ar2), θ + c+ gλ+ br2) + higher order terms.

If in addition

H3. a 6= 0

then an invariant circle for the map (rn+1, θn+1) = F (rn, θnλ) branchesout from the bifurcation point.

7.3. SOME GENERAL COMMENTS ON LOCAL BIFURCATIONS 119

Proof. We will only sketch parts of the proof here. AssumptionH1 with j = 1, 2 assures that the eigenvalue σ(0) is not real, i.e., c 6= 0.Working out the coordinate transformation, one realises that H1 withj = 3, 4 assures that all lower-order terms in r, θ, λ can be eliminatedwhen going from F to F , i.e., that F as above can be achieved. H2.assures also the change of stability of the fixed point at the bifurcation.

Consider next F without higher-order terms. H3 assures thatr =

√−dλ/a is an invariant circle for the dynamics. Since c 6= 0 the

dynamics on the invariant circle is that of a shift map.The final and technical part of the proof is to realize that the trun-

cated dynamics here sketched is equivalent to the dynamics of the fullmap F for sufficiently small λ and r.

Remark 7.5. Depending on the relative signs of d and a the changeof stability of the fixed point and the occurrence of the invariant cir-cle may be either supercritical or subcritical as in the flow case (fourpossible cases).

Remark 7.6. The dynamics on the invariant circle is governedto lowest order by θ 7→ θ + c + (g − bd/a)λ. Maps of this sort arenot structurally stable. Both periodic and non–periodic behaviour willalternate when varying λ. The precise description of this map requiresfull knowledge of F .

Remark 7.7. If H1 with j = 3, 4 is not fulfilled, F will have addi-tional quadratic and cubic terms in r, consequently altering the natureof the dynamics.

If H1 with j = 1, 2 is not fulfilled, we are actually dealing witha pair of eigenvalues +1 or −1, a problem which actually belongs incodimension–2 analysis.

7.3. Some General Comments on Local Bifurcations

7.3.1. Higher Dimensions. The presentation in this Chapter re-lies in the possibility of performing the reduction to the Centre Mani-fold described by Sositaisvili Theorem. This reduction is in general pos-sible, but there exist some exceptional cases, called resonances wherelower-order terms cannot be eliminated (as mentioned in the Hopf The-orem for maps) or a higher degree of smoothness cannot be achieved.Also, recall that the dynamics on the centre manifold does influencethe details of the local behaviour on the hyperbolic subspace.

7.3.2. Higher-codimensional Bifurcations. Codimension–onebifurcations are the starting cases of a “chain” of bifurcation problemssubject to the influence of several independent and coexisting param-eters. Regarding the situation from the point of view of applications,

120 NLD-draft

real–life problems are for practical reasons described by taking into ac-count a small set of independent parameters at a time, so in any case,only low–codimension bifurcation problems are relevant.

Consider e.g., a laser device with injected signal. Such devices areuseful in a variety of experimental situations, particularly in communi-cation. The injected electromagnetic signal will have its own amplitudeand phase, independent of the laser device, thus providing two param-eters that can be controlled by the experimenter to some extent. Inaddition, the laser cavity itself may vary (by having a moving mir-ror on one end or some other sophisticated construction) and hence amismatch between the natural frequency of the cavity and that of thelasing medium can be achieved. We have therefore three parametersthat can be varied somehow freely, and the overall dynamics will occurin a subset of a three-dimensional family of dynamical systems. Thereexists experimental and theoretical work describing codimension–2 bi-furcations as well as the effect of the “surrounding” codimension–3environment.

7.4. Exercises

Exercise 7.1. Verify that the system

r = λr + 2r3 − r5

θ = 1

has a subcritical Hopf bifurcation.

Exercise 7.2. Study the occurrence of fixed point bifurcations atthe origin for different values of a for the system

xn+1 = axn + (6− a)yn + xn3

yn+1 = −xn − 2yn + yn3.

Exercise 7.3. Study the occurrence of fixed point bifurcations atthe origin for different values of a for the system

x = (1 + a2)x+ (2− 6a)y + x3

y = −x− 2y − y4.

CHAPTER 8

Chaotic Systems

8.1. One-dimensional Chaotic Maps

We have seen in Chapter 4 that one-dimensional invertible mapshave a rather simple dynamics, at least one that is understandablereferring only to periodic and quasiperiodic trajectories. They have arather predictable limiting behaviour. We may say that they do notproduce chaotic regimes. We defer a proper definition until later, butwe advance that one feature of chaotic dynamics is that the ω–limit setis more complicated than just a finite set of periodic trajectories andconnecting trajectories.

We saw also with Poincare–Bendixson Theorem 3.9 that 2–dimen-sional flows are also rather predictable and understandable with every-day tools.

To encounter “chaos” it will be necessary to search among problemswith higher complexity. Fortunately there are plenty of such problemsin nature and chaos does leave traces in experiments and natural phe-nomena. What we precisely mean by “chaos” will be specified aftersome acquaintance with its properties.

Let us start our search with one-dimensional non–invertible maps.Although invertibility is an important dynamical feature, this is not asserious as it may look. Many invertible dynamical systems display an“underlying” non–invertible map appearing after some sort of limitingprocedure (e.g., “infinite dissipation”, etc.).

We will start with some model classes.

8.1.1. Expanding Maps of the Circle.

Definition 8.1. A differentiable map f : S1 → S1 is called ex-panding if |f ′(x)| > 1 for all x ∈ S1.

The simplest examples are linear maps, of the form

Em : S1 → S1 Em(x) = mx mod 1,

for m > 1 a positive integer.

Theorem 8.1. For a linear expanding map Em of the circle,

1. # Pern(Em) = mn − 1, wherePern(Em) = Fix(En

m) = x ∈ S1 : Enm(x) = x

2. The closure of the set of periodic points is the whole circle, i.e.,Per(Em) = S1.

121

122 NLD-draft

3. The rational numbers are pre-periodic, i.e., there are numbersk, n ∈ N such that Ek+n

m (x) = Ekm(x). n is the period and k is

the pre-period.4. Em is transitive, i.e., there exists a dense orbit.

Remark 8.1. Notice here a new phenomenon which might be con-sidered as an indicator of chaos, namely the coexistence of periodic(i.e., non-dense) trajectories and dense trajectories.

Proof. As customary, we identify the unit circle S1 with the realunit interval mod 1 through the map t → e2πt. A periodic point ofperiod n satisfies En

m(x) = x. This implies

mnx = x mod 1 or (mn − 1)x = 0 mod 1.

In other words, (mn − 1)x is an integer. The equation has exactlymn − 1 different roots on [0, 1) (all of them rational):

x =k

mn − 1k = 0, · · · ,mn − 2.

This proves the first statement.The second statement says that the set of periodic points is dense

in the circle. Recalling the definition, this means that for any point yon the unit interval and any ε > 0, there exists a periodic point p suchthat |y − p| < ε. Note that the periodic points of period n are equallyspaced on the circle with distance d = 1

mn−1. This implies that when

n grows they become more and more dense. For any given ε, taking nlarge enough we obtain d < ε.

Regarding the third statement, for x = p/q ∈ Q there are only q

different images Ejm

(pq

)= mjp

qmod 1, when varying the value of n.

Hence, repetitions must occur, i.e., for some integers n, k En+km

(pq

)=

Enm

(pq

), what proves the statement. To depict the situation, recall that

the map is expanding and thus not 1-to-1. A periodic point of period nmight have more than one preimage. The additional preimages are notperiodic of period n but will eventually become periodic after a fewiterations of the map. In other words, we look for rational numbersx = p/q such that Ek+n

m (x) = Ekm(x), (this is: Ek

m(x) is n-periodic)while Ek

m(x) 6= x (x is different from Ekm(x)) and En

m(x) 6= x (x itself isnot n-periodic). The statement becomes that for x = p/q ∈ Q, thereexists k, n ∈ N such that (mn − 1)mkp/q ∈ N while (mk − 1)p/q /∈ Nand (mn − 1)p/q /∈ N. Example: Let p = n = 1, m = 3 and q = 9.The points x1 = 1/9 and x2 = 1/3 are not of period n = 1, sinceE3(1/9) = 1/3 and E3(1/3) = 0. x1 is not of period n = 2 either,since E2

3(1/9) = 0. However, E2+j3 (1/9) = E1+j

3 (1/3) = 0 for all j ≥ 1.Hence x1 has preperiod k = 2 and x2 has preperiod k = 1. After kiterates of E3 those points land on the period-1 point x = 0. There

8.1. ONE-DIMENSIONAL CHAOTIC MAPS 123

exists another point of preperiod k = 1, namely y2 = 2/3. The pointsp/9, for p = 1, 4, 7 are preimages of x2 with preperiod k = 2, while thepoints for p = 2, 5, 8 are preimages of y2 also with preperiod k = 2.

Finally, let I be an arbitrary interval on the circle. Then there arek, n ∈ N such that [

k

mn,k + 1

mn

]⊂ I.

But

Enm(I) ⊃ En

m

([k

mn,k + 1

mn

])= S1.

Let J be any other interval in the circle. Then

Enm(I) ∩ J = S1 ∩ J = J 6= ∅.

This means that Em is transitive.

The next theorem shows that the limit behaviour of some trajecto-ries might be even more complicated.

Theorem 8.2. For the expanding linear map E3, there exists apoint X ∈ S1 with ω(X) = C, the Standard Cantor set.

For the definition and properties of the Cantor set, refer to Appen-dix A.2.1. We discuss here two properties which will be of use in theproof of this Theorem. The proof uses a coding method which willbe of great importance later on.

1. The Standard Cantor set C corresponds to all numbers in [0, 1]whose base–3 expansion does not contain the digit 1, i.e.,

x ∈ C ⇐⇒ x =∞∑i=1

ai3i

ai ∈ 0, 2.

2. The action of E3 on a point x ∈ C is as follows:

E3(x) =

(3∞∑i=1

ai3i

)mod 1

=

(a1 +

∞∑i=1

ai+1

3i

)mod 1 =

∞∑i=1

ai+1

3i∈ C

Hence, E3 (C) = C.

Proof. Let us write a list of all finite ”words” with letters 0, 2.This can be done in length–lexicographical ordering:

0, 2, 00, 02, 20, 22, 000, 002, 020, 022, 200, 202, 220, 222, · · ·i.e., first we order the words in groups by length, so that words oflength k + 1 are found after those of length k, for all k ≥ 1. Amongeach length k the order is lexicographic, namely the natural orderingof the strings taken as integer numbers. The reader should check in

124 NLD-draft

Appendix A.2.1 that the words of length k label all intervals requiredin step k of the construction of the Cantor set. Recall also that theseintervals have width 1/3k. The proof finishes by exhibiting the pointX with the stated property. Let X ∈ S1 to be the point whose base–3expansion has the form

X = .0200022022000002020022200202220222 · · ·built by placing the words of the previous list one after the other. Byconstruction, X ∈ C. The map En

3 acting on X ”erases” the first nsymbols in the string of X, hence the whole trajectory Ek

3 (X), k ≥ 0lies in the Standard Cantor set. Moreover, we see that for any k andsufficiently large n, the trajectory of X, visits all intervals used in theconstruction of the Cantor set at step k. This suffices to verify thatthe orbit of X is dense in the Cantor set. Indeed, for any ε > 0 we canfind k such that 1/3k < ε and hence the orbit of X passes closer thanε from any point in C. Hence, ω(X) = C.

Remark 8.2. A general feature of any Em map is that its actionis very simple when x ∈ [0, 1) is written in base m. Indeed,

x =∞∑i=1

aimi⇒ Em(x) =

∞∑i=1

ai+1

mi

where the ai’s take values in 0, . . . ,m − 1. Em eliminates the firstsymbol in the base–m expansion of x. Hence, the limit behaviour ofa point x does not depend on the first initial symbols in base–m. Alltypes of limit behaviour can be found in any small interval (a, b) ⊂ S1.An interval of the form ( k

mn, k+1mn

) is defined by the first n digits in basem: Writing k in base m as k = k1m

n−1 + · · ·+ kn−1m+ kn (where theintegers ki ∈ [0,m− 1]) then(

k

mn,k + 1

mn

)=x ∈ S1 : a1 = k1, · · · , an = kn

.

Remark 8.3. In general the ω–limit set of a point x can be verycomplicated. A deep theorem of Krieger states that the maps Em ”con-tain” all stationary stochastic processes as the limit behaviour.This means that it is impossible to describe all trajectories. Thereforewe should call these maps chaotic.

Now we want to analyse general expanding maps on the circle.Again, for an expanding map f : S1 → S1 we can consider its liftF : R→ R.

Theorem 8.3. Any expanding map g of degree m is topologicallyconjugated to Em.

Proof. Since g is expanding, any lift G has nonzero and contin-uous derivative, hence, this derivative never changes sign and G is a


strictly monotonic function. Let F (x) = mx on R. Then F is a lift ofEm. We also fix a lift G of g with the property that G has its (unique)zero a in [−1/2, 1/2). We consider now G restricted to the interval[a, a+ 1). Then

G(a) = 0 G(a+ 1) = m.

We can partition the image by G of the interval [a, a + 1] in intervalsof length unity, [j, j + 1], 0 ≤ j < m and using the preimages by G,[a, a+ 1] can be partitioned as well in intervals I1

0 , · · · , I1m−1 with

I1j := G−1 ([j, j + 1]) j = 0, · · · ,m− 1.

Inductively, we define similar intervals for Gn:

Inj := (Gn)−1 ([j, j + 1]) j = 0, · · · ,mn − 1.

The same procedure for F yields equally spaced intervals ∆nj . This

is illustrated in Figure 8.1. Letting a′n,j = j/mn we have that ∆nj =

a a+1Inj 0 1∆nj

0 0

mn

jj+1

mn

jj+1

Gn F n

Figure 8.1. The preimage Inj of a unit-length intervalby Gn and the corresponding interval ∆n

j for F n.

[a′n,j, a′n,j+1]. Let an,j be the endpoints of Inj .

Since G is expanding we have |G′(x)| ≥ µ > 1 for all x ∈ R. Thisimplies that

∣∣Inj ∣∣ ≤ µ−n. We define the conjugating map H on theendpoints of the intervals Inj and ∆n

j :

Inj = [an,j, an,j+1] ∆nj = [a′n,j, a

′n,j+1]

H(an,j) = a′n,j.

126 NLD-draft

Because of the properties of lifts, the intervals I and ∆ can be replicatedover the whole real line. We have that

mj

mn= ma′n,j = F (H(an,j)) = a′n−1,j = H(an−1,j) = H(G(an,j)).

Note that the intermediate images a′n−1,j may lie outside [0, 1]. Sincethe length of the intervals tends to 0 and H is monotonic we can con-tinuously extend H to a map H : [a, a + 1) → [0, 1) (a map on thewhole real line, in fact). This map is monotonic and bijective, hence ahomeomorphism. Moreover,

H(a+ 1)−H(a) = F (1)− F (0) = deg f = deg g ∈ Z.

H projects to a map h : S1 → S1. Also by continuity,

f h = h g.

8.1.2. Coding and Symbolic Dynamics. In this Subsection wewill investigate a method to illustrate the dynamics in a combinatorialway. We have been using this method without explicit formalisation.

Is there some simple way to visualise the action of f = Em ? Ex-press x ∈ [0, 1) in base m = deg g:

x = .s1(x)s2(x)s3(x) · · · =∞∑k=1

sk(x)

mk

The symbols sk are integers in 0, 1, · · · ,m− 1. We have:

F (x) = mx =∞∑k=1

sk(x)

mk−1= s1 +

∞∑k=1

sk+1(x)

mk

and consequently

f(x) = mx mod 1 =∞∑k=1

sk+1(x)

mk= .s2(x)s3(x)s4(x) · · ·

With this notation, the action of f consists of dropping the first symbolof the fractional expression of x in base m. For this reason f is calleda shift map: f shifts x one step along the symbol list. We will nowrephrase the result of Theorem 8.3 in the shift-symbol language.

To any x ∈ [a, a+ 1), we associate the infinite sequence of symbols

x −→ x = s1(x)s2(x)s3(x) · · ·

where si(x) is the i–th symbol of the expression of h(x) in base–m =deg g, i.e., using base–m expressions x is another way of regardingh(x), only that the initial dot is missing. In this way, we are implicitlydefining a space Σm = 0, 1, · · · ,m−1∞ whose elements are all infinite


symbol sequences where each symbol is an integer in 0, 1, · · · ,m−1.Then we have a 1–to–1 relation between S1, seen as [a, a+ 1) and Σm:

x : x ∈ S1 = Σm = 0, 1, · · · ,m− 1∞,There is a natural way to define distances on Σm (by recasting backa string as a point in S1 expressed in base–m and using the naturaldistance on the real numbers) by which all original continuity propertiesare also preserved on the symbol space. We may hence consider themap h : S1 → Σm that in all practical respects acts as h. The newcoding is useful to express the result of Theorem 8.3 in terms of h:

g(x) ≡ h(g(x)) = f(h(x)) ≡ σ(x)

where σ : Σm → Σm is the left shift defined by

σ(x1x2x3 · · · ) = x2x3x4 · · ·This provides a simple combinatorial description of the original system,i.e., g is conjugated to a shift map g ≡ σ. The dynamical propertiesof g are thus transferred to a symbol space and to shifts on strings ofsymbols.

Using this coding, we can easily identify the ω–limit set of a pointx: A set

[i1 · · · in] := x ∈ Σm : x1 = i1, · · · , xn = inis called a cylinder of length n (Sometimes cylinders are called wordsor blocks). Now a cylinder has non-empty intersection with the ω–limit set if and only if there is a sequence nk such that

σnk(x) ∈ [i1 · · · in]

(a cylinder is a compact set and therefore it must contain limit point).But

σnk(x) = xnk+1xnk+2 · · ·xnk+in · · ·and we ”see” the cylinder [i1 · · · in] infinitely often in the infinite se-quence x1x2 · · · .

The reader may recognise a cylinder in the labeling of an intervalat step n of the construction of the Standard Cantor set.

8.1.3. More General Maps – The Logistic Map. If we dropthe assumption of expansiveness we immediately turn into much moredifficult situations. There is no final theory on general differentiablemaps of the interval. Some classes of maps have attracted specialattention. The most famous class is the logistic family,

ga(x) = ax(1− x).

The complete analysis of these maps requires involved theories (likeTeichmuller Theory or perhaps Renormalisation Theory) whichare beyond the scope of this book, but a large number of interestingand practically useful properties can be established using elementarymathematics.

128 NLD-draft

Let us illustrate the relevance of the logistic family in applicationsby considering a metaphor for population problems. Given a popu-lation of size x, a basic simplifying assumption is that the offspringproduced by the population as a whole at a given time will be propor-tional to x. The underlying assumption is that all individuals behaveroughly the same, so roughly all tend to mate and generate offspring,and a fraction of them will actually succeed, if they are healthy enoughand lucky enough. We ignore the individual deterministic conditionsand consider an overall, population-wise rate of success. We say thenthat after one generation, the population changes as xn+1 = axn. Wehave a = 1+b−d, where b is the birth-rate and d the death-rate. Givenan initial condition, this equation has the solution xn = x0a

n. If a > 1xn grows exponentially and for a < 1 it decreases exponentially to-wards extinction. Since exponential growth/decrease for an indefiniteperiod of time is materially impossible, a possible improvement of themetaphor would be to impose a stop for the growth when the popula-tion gets too large. Let us assume that there is a maximum acceptablevalue M for the population and express x as a fraction of the maximalpopulation. Hence x ∈ [0, 1] and xn+1 = ga(xn) is a more reasonablepopulation metaphor since it does not allow for population growth be-yond unity, but it still may present an “almost” exponential growth forx1. A fundamental flaw of this metaphor is that populations consistof individuals, i.e., nonnegative integers, hence x should be a rationalnumber with denominator M . However, many interesting properties ofga(x), and even conclusions about the fate of populations, are obtainedassuming x to be an arbitrary real number in the unit interval. Wewill study some properties of the logistic maps for a ∈ [0, 4] in a seriesof exercises.

8.1.4. Exercises.

Exercise 8.1. Location of fixed points for the logistic family.

(a) Compute the fixed points of the system xn+1 = ga(xn) for a ∈[0, 4].

(b) For which values of a does the system have genuine period–2trajectories, i.e., fixed points of xn+2 = ga(ga(xn)) which arenot a repetition of the previous fixed points?

(c) Which is the order of the polynomials involved in a similar ques-tion for period–4?

Exercise 8.2. Asymptotic stability of fixed points.

(a) Compute the eigenvalue of the linearisation at the fixed point(s)of the logistic family for a ∈ [0, 4]. Color–code graphically theregions where the fixed points are stable, non–hyperbolic, or un-stable.


(b) Repeat the previous computation for the genuine period–2 tra-jectories, i.e., for the fixed points of xn+2 = ga(ga(xn)).

(c) Try to answer a similar question for period–4, · · · , 2n. Use acomputer algebra programme to help with the calculations.

(d) Color–code and plot the position of the non-zero fixed point asa function of a on a 2–d plot x vs a. Show the period–2 andhigher trajectories as well on the same plot.

Exercise 8.3. Another interesting set of plots.

(a) For the non-zero fixed point x0(a) of the logistic family, compute(numerically if necessary) the “trajectory” of the linearisationeigenvalue on the complex plane parametrised by a, i.e., use a as“time” and compute the trajectory of the curve a→ Dga(x0(a)),for the a values where the eigenvalue remains inside the unitcircle.

(b) Repeat the previous computation for the genuine period–2 tra-jectories, i.e., for the fixed points of xn+2 = ga(ga(xn)).

(c) Display the above computations on the same plot (using theunion of the corresponding a–ranges).

8.1.5. The Tent Map. For a = 4, the logistic map

g4(x) = 4x(1− x)

is topologically conjugated to the tent map

T (x) =

2x if x ≤ 1/2

2(1− x) if 1/2 ≤ x .

(see Figure 8.2).

0 0.5 1 0 0.5 1

1 1

g4(x) T (x)

Figure 8.2. The logistic map g4 and the tent map T .

The conjugating map h is given by

h(x) =1− cos(πx)

2.

130 NLD-draft

To see this we calculate for x ≤ 1/2 (the other case is similar)

g4(h(x)) = 4h(x)(1− h(x))

= 4

(1− cos(πx)

2

)(1 + cos(πx)

2

)= 1− cos2(πx) = sin2(πx).

On the other hand,

h(T (x)) =1− cos (πT (x))

2)

=1− cos (2πx)

2) = sin2(πx).

All dynamical properties of T (x) have a correspondence in g4(x), exceptfor those properties that depend on derivability at the maximum.

8.1.6. A First Approach to Chaotic Behaviour. A new prop-erty that we have seen in this section and was absent in 2–d flows (ac-cording to the Poincare–Bendixson Theorem) and also absent in circlediffeomorphisms is the coexistence of dense orbits and periodic pointsin the ω–limit set.

Another property that may have revealed itself while performing theexercises on the logistic family is that, when a is allowed to vary, thestable nonzero periodic trajectory of period 2n undergoes the followingfate, for any n: For low values of a it does not exist. At some pointit appears as a fixed point of gna with linearisation eigenvalue unity.Then it is stable for some a–interval and finally it becomes unstableat the point where a stable trajectory of period 2n+1 appears. For agiven value of a, the map displays a collection of unstable trajectoriesof periods 1, 2, 4, · · · , 2n−1 and the stable trajectory of period 2n. Theω–limit set becomes increasingly complicated as a grows and moreover,nearby points in [0, 1] may have ω–limits lying far apart.

These properties are the “fingerprints” of chaos, in a loose sensethey are indicators announcing that the expected dynamics will be veryunpredictable and complicated. To understand when a system will behighly unpredictable is very important in applications, so that we donot create false expectations, nor we arrive to inadequate conclusions.

8.2. Higher-dimensional Maps

In dimension 1 we saw so far that interesting (“chaotic”) dynamicsis produced by local expansion and strong recurrence (the map cannotexpand without “returning” since the phase space is bounded). Thiscould only be achieved by non-invertible maps. In higher dimensionsone can match the local expansion (at least in some direction) withinvertibility. We will now see that already in 2 dimensions one canhave “chaotic” invertible dynamics.

8.2. HIGHER-DIMENSIONAL MAPS 131

One of the main tools to analyse such systems will be the methodof coding inspired as we have seen in the expansion of a number in basem for expanding maps of the circle of degree m. So the question ariseswhat are the properties we want to have for a “good” coding.

1. We want to divide (partition) the phase space X into finitelymany pieces (atoms) (Xi)

n−1i=0 .

2. The pieces Xi intersect only in small regions, essentially theyhave at most portions of their borders in common (this is neededfor the uniqueness of coding on large relevant parts).

3. The intersection⋂k∈Z f

k(Xik) consists of at most one point forany bi-infinite sequence (ik) (this is needed to assure that notwo points have the same coding).

We will see that in some situations we can construct such a coding.In this way we obtain a continuous map h : Σn = 0, · · · , n− 1Z → Xwith the property

h σ = f h,where σ is an appropriate shift map.

8.2.1. Smale’s Horseshoe. We consider the following map of theplane specified in the unit square Q (see Figures 8.3 and 8.4)

Q

f(Q)

A B

C D

A B CD

Figure 8.3. The horseshoe map.

The action of f is better visualised in steps. First compress thehorizontal direction by a factor of µ and expand the vertical directionby a factor of λ. Then bend the intermediate image of Q in the formof a horseshoe, laying it back on Q.

The inverse map bends back, compresses and expands in reverseorder.

In order to succeed with this construction preserving continuity ofthe map (nothing on the square is teared apart), it is needed that λ > 2and µ < 1

2. These numbers are called expansion rate and contraction

rate respectively. These rates need not be uniform (non uniformity

132 NLD-draft

Q

f(Q)

Q

f−1(Q)

A B

C D

C D

A B CD

Figure 8.4. The inverse map.

leads to curvy lines instead of straight lines and to transverse crossingsat angles other than π/2) and their actual value is not important aslong as they assure that Figures 8.3 and 8.4 can be drawn. However,we will see below that µ and λ are very useful to establish interestingproperties of the invariant set of this map.

It is clear from the pictures that there are points of Q that aremapped back on Q by both f and f−1. We may ask the question ifthere are any points that never leave Q upon repeated iteration of fand f−1. The answer is positive. This will be the invariant set

Λ =⋂n∈Z

fn(Q)

called the horseshoe. Let us investigate its structure. The intersection

Q ∩ f(Q) ∩ · · · ∩ fn(Q)

is the union of 2n disjoint vertical rectangles (see Figure 8.5, the twogray-shaded rectangles correspond to n = 1 while the four black stripscorrespond to n = 2).

The intersection

Q ∩ f−1(Q) ∩ · · · ∩ f−n(Q)

is the union of 2n horizontal rectangles, corresponding to a rotatedpicture similar to Figure 8.5. Therefore

f−n(Q) ∩ · · · f−1(Q) ∩Q ∩ f(Q) ∩ · · · ∩ fn(Q)

is the union of 4n small squares (see Figure 8.6).The reader may be reminded of the construction of a Cantor set.

The vertical strips in the forward iteration n have width µn and arecontained within the strips of the previous iteration. Hence,

Λ+ ≡⋂n≥0

fn(Q) = C × [0, 1].


Q

R0 R1

Figure 8.5. Two iterates of f . The gray shaded areais Q ∩ f(Q) while the black shaded area is Q ∩ f(Q) ∩f 2(Q).

Figure 8.6. The first two iterations of f and f−1 put together.

In the same way, the horizontal strips of the inverse iterates havewidth λ−n and

Λ− ≡⋂n≥0

f−n(Q) = [0, 1]× C.

134 NLD-draft

Finally, the horseshoe can be seen as the intersection of the forwardand backward iterates, Λ = Λ+ ∩ Λ−, in other words

Λ =⋂n∈Z

fn(Q) = C × C,

where C is a Cantor set.8.2.1.1. Coding Space on Two Symbols. We can introduce the fol-

lowing coding with the help of two vertical rectangles R0 and R1 (seeFigure 8.5 again), which we take to be left and right halves of Q, sep-arated by a common central vertical line. We note on passing that theleft vertical strip of f(Q)∩Q lies inside R0 while the right one lies insideR1. By way of the horseshoe construction, these strips are horizontallyseparated by some positive distance 0 < η < 1 − 2µ. We could there-fore consider an equivalent coding where the left strip of f(Q)∩Q “is”R0 and the right one “is” R1, since no point of Q outside these stripsbelongs to Λ. Either choice leads to the same dynamical properties.The two horizontal rectangles H0 (lowermost) and H1 associated to theinverse map will be useful as well. We define

x←→ x = · · ·x−n · · ·x−1.x0x1 · · ·xn · · · ∈ Σ2 ⇐⇒ fn(x) ∈ Rxn

for all n ∈ Z. We consider the bi-infinite past and future itinerary ofeach point x ∈ Λ, and we construct x by placing the label (0 or 1)of the rectangular strip where fn(x) lies in position n. Every centralportion x−M · · ·x0 · · ·xM (M positive integer) of x identifies a smallsquare inside Q. The bi-infinite sequence x identifies a point in Λ. The“dot” in the sequence identifies which element corresponds to n = 0.This coding is continuous (having an appropriate metric on Σ2) andbijective, hence a homeomorphism:

h : Λ→ Σ2.

Here we propose the following metric on Σ2

d(x, y) =1

2N, N = max n : xi = yi |i| ≤ n .

N is the last position up to where both strings coincide, starting tocompare the strings at n = 0 and moving away in both directions.The reader should verify that d(x, y) = 0 ⇔ x = y. An even morenatural metric, when comparing with the base–2 coding introduced forthe one–dimensional case, is:

db2(x, y) =∑n∈Z

|xn − yn|2|n|

.

Note that 2d ≥ db2, i.e., 2d can be computed from db2 by putting allsubsequent numerators equal to one after the first nonzero numeratoris reached, while moving away from n = 0. In any case, the metricturns Σ2 into a compact metric space (actually a Cantor set). Two


points in Σn are close if and only if their images under h are in somesmall square, i.e., close in Λ.

How does f look like if we represent Λ with Σ2? Defining the shiftmap σ on sequences of Σ2,

σ(· · ·x−n · · ·x−1.x0x1 · · ·xn · · · ) = · · ·x−n · · ·x−1x0.x1 · · ·xn · · ·we have that

h σ = f h;

i.e., the action of f is just shifting the dot on a sequence one step tothe right.

Corollary 8.1. Horseshoe properties:

1.⋃n∈N Pern(f) = Λ,

2. #Pern(f) = 2n,3. f |Λ is mixing.

Proof. The proof follows from the properties of the symbolic spaceΣ2. In Σ2 a point is periodic if and only if its coding is periodic, i.e.

x = (x0 · · ·xn)∞.

Therefore, any cylinder

Cn−n(x) := y ∈ Σ2 : yk = xk, −n ≤ k ≤ n

contains a periodic point (of period 2n + 1) and the first statementis proved. The second statement follows immediately since there areexactly 2n words on two symbols having length n. These words serveas the building blocks for a period–n point.

To derive the mixing property, we consider two arbitrary cylinders[xn1 · · ·xm1 ] and [yn2 · · · ym2 ]. Let us verify that the image of one of thecylinders intersects the other cylinder for some n. Indeed, if n > m2−n1

then

∅ 6= [yn2 · · · ym2 ] ∩ σn([xn1 · · ·xm1 ]) =

= z ∈ Σ2 : zn2 = yn2 · · · zm2 = ym2 ; zn+n1 = xn1 · · · zn+m1 = xm1.

Remark 8.4. Some of the periodic points in the second item, how-ever, are not “genuinely” period–n, for example, the string 1111 countedat n = 4 is just one of the (two) period–1 orbits. In any case, there aremany genuinely period–n points for all n (for prime n, there are 2n−1

true period–n orbits).

Now we want to derive some useful analytic properties.

Definition 8.2. Let f : X → X be a homeomorphism. For x ∈ Xwe define its stable set (manifold) as

W s(x) := y ∈ X : d(fn(y), fn(x))→ 0 as n→ +∞ .

136 NLD-draft

The unstable set (manifold) W u(x) of x is defined as the stable setfor f−1 at x.

Remark 8.5. These manifolds may be defined in a correspondingway for vector fields, by replacing discrete time n with continuous timet. We have already encountered invariant manifolds in Chapter 6.

Theorem 8.4. The stable sets for the points in the horseshoe Λ areactually smooth manifolds.

This result follows from a more general theorem which is proved inthe same spirit as the Hartman–Grobman Theorem and is beyond thescope of this book. The Theorem is actually more general. It holds forany diffeomorphism and any point having hyperbolic behaviour, i.e.,having exponential expansion and exponential contraction along theorbit in some directions.

Remark 8.6. The stable and unstable manifolds are invariant:

f(W s(x)) = f(z) ∈ X : d(fn(x), fn(z))→ 0 as n→ +∞= y = f(z) ∈ X : d(fn(fx), fn(y))→ 0 as n→ +∞= W s(f(x)).

Moreover any conjugacy transports stable manifolds into stable mani-folds, since

d1(fn(x), fn(y)) = d1(h−1 gn h(x), h−1 gn h(y)→ 0

if and only if

d2(gn(x), gn(y))→ 0

where h : X1 → X2 is a continuous conjugacy.

Theorem 8.5. For all x ∈ Λ we have

W s(x) ∩ Λ = W u(x) ∩ Λ = Λ,

i.e., the stable and unstable manifolds are dense.

Proof. Since h is a homeomorphism we prove the theorem on Σ2,since any homeomorphism transports stable (unstable) sets into stable(unstable) sets. By the definition of stable set W s(x), we have

y ∈ W s(x) ⇐⇒ (∃ n0 ∈ N) : (∀n > n0) xn = yn.

Otherwise, there would be a sequence nk +∞ with xnk 6= ynk andd(σnk(x), σnk(y)) = 1 and the orbits would not converge. Now consideran arbitrary cylinder set of the form

Cmn (y) := z ∈ Σ2 : yi = zi −m ≤ i ≤ m n,m ∈ N.


We need to find in each such cylinder a point of the stable set. In fact,such point will be

w = w0w1 · · ·

yi if −m ≤ i ≤ n

xi if n+ 1 ≤ i

0 otherwise.

Remark 8.7. Interesting facts about the horseshoe.

1. The horseshoe plays a similar role as a hyperbolic fixed point.Nearby any point of the horseshoe, there are trajectories that are“repelled” (away from that point) and those that are “attracted”(approaching the point).

2. The Lebesgue measure1 of Λ is zero because for each n ∈ Nwe can cover the set Λ by 4n squares of area a2n with a =max(µ, λ−1) < 1

2, where µ (λ) is the contraction (expansion)

rate. Hence, a complete covering of Λ has area An = (4a2)n

and we have limn→∞An = 03. With the help of the coding we can conclude (similarly to the

case of expanding maps on the circle) that the horseshoe isstructurally stable: Any “nearby” map can be coded in thesame way and hence it will have the same (conjugate) dynamics.

Definition 8.3. A homeomorphism f : X → X is called expan-sive if there is a δ > 0 such that

d(fn(x), fn(y)) < δ for all n ∈ Z ⇐⇒ x = y.

Theorem 8.6. The horseshoe is expansive.

Proof. If x = y, then x = y and since d is a proper distance, it isimmediate that d(fn(x), fn(y)) = 0 for any n.

Let now x 6= y. To complete the proof we need to show that theirimages will eventually be moved to a distance at least δ > 0. Clearly,x 6= y ⇔ x 6= y and hence there will be a least |n| such that xn 6= yn,this means that fn(x) and fn(y) lie on vertical strips belonging todifferent coding rectangles and therefore at a distance δ ≥ η > 0, whereη is the separation between the first vertical strips defined previously.Hence, x and y lie further away that some positive distance.

8.2.2. How do Horseshoes Appear? Let x be a hyperbolicfixed point of saddle type of a diffeomorphism f . A point p ∈ W s(x)∩W u(x) is called a homoclinic point. In the case of flows such pointshave rather simple properties, since stable and unstable manifolds can-not cross transversely. Either they are disjoint, or one of them is a

1If the reader is unfamiliar with Lebesgue measure, just read “standard” mea-sure of plane areas.

138 NLD-draft

subset of the other. In the latter case, we speak of a homoclinic or-bit. In maps, these manifolds may cross transversely, since there isnothing corresponding to e.g. the Picard–Lindelof Theorem to imposeadditional restrictions.

Let us make some acquaintance with these manifolds in the case of ahorseshoe, as a preparation to answer the title question. Two differentpoints x 6= y of the horseshoe have manifolds that intersect. In fact,

x

yz

W s(x)

W u(x)

W s(y)

W u(y)

Figure 8.7. The point z = W u(x) ∩W s(y).

z = · · ·x−2x−1.y0y1y2 · · · approaches y for positive n and approachesx for negative n, see Figure 8.7. In the same way, we can producepoints belonging to either W s(x) or W u(x) (but not to both) as well aspoints in W u(x)∩W s(y). Indeed, z(u) = · · · · · ·x−2x−1.z0z1z2 · · · , with

zi = |1−xi| for i ≥ 0 lies on the unstable manifold of x while similarlyz(s) = · · · · · · z−2z−1.x0x1x2 · · · with zi = |1 − xi| for i < 0 lies on the

stable manifold of x. Any point z(su) such that for |n| ≥ N0 > 0

the coding symbols satisfy z(su)n = xn while for |k| < N0 we havez(su)k = |1 − xk| will lie in the intersection of both manifolds, and inall cases z(su) 6= x.

Hence, in the general case we may expect that there will be dif-feomorphisms f with homoclinic points p. We have α(p) = ω(p) = x.Note that this cannot happen in a linear system! If the intersectionof W s(x) and W u(x) is transversal we call the point a transversalhomoclinic point. We note that the entire orbit of p consists oftransversal homoclinic points (see Figure 8.8) since ω(fn(p)) = ω(p)for any n (and the same holds for the α–limit). Note that for n > 0,fn(p) is just a few steps ahead of p along W s(x), while for negative


x

p

f(p)

f 2(p)

f−1(p)

f−2(p)

Figure 8.8. A sample of the infinitely many transversalhomoclinic points existing along with p.

n it is a few steps behind. The angle between the stable and unsta-ble manifold may vary when iterating with f but they can never betangent.

x

p

R

f(R)

Figure 8.9. The action of f on a small rectangle Raround the fixed point x.

Now we consider a small rectangle R around x and its image by ahigh positive iterate of f (see Figure 8.9). Since x is hyperbolic, theimage of a sufficiently small rectangle R will be approximately linear,and hence expanded along the unstable direction and compressed alongthe stable direction.

Since a large portion of W u(x) lies within f(R), higher iteratesfk(R) will stretch the image along W u(x) until it eventually reaches pand for even higher iterates the image will return to R along with fn(p).The situation for large n will resemble that of Figure 8.10. The shaded

140 NLD-draft

Figure 8.10. A higher iterate fn(R) returns to Rbended in the form of a horseshoe (shaded area).

area has all qualitative features of the horseshoe construction. Thisidea has been formalised by Smale in the 60’s. The above discussionsketches the proof of the following

Theorem 8.7. If p is a transversal homoclinic point of a planar dif-feomorphism f with hyperbolic fixed point x, then there exists a Cantorset Λ and a positive integer m such that x ∈ Λ, fm(Λ) = Λ and fm

restricted to Λ is conjugated to a shift map on bi–infinite sequences on2 symbols.

Example 8.1. Let us perform a rigorous approximate calculationshowing that, starting at some positive integer M , the Horseshoe hasinfinitely many periodic orbits, with all periods k ≥M .

Consider Figure 8.9, place the origin of coordinates at the fixedpoint, and the x- and y-axes along the stable respective unstable direc-tions. Assume now that R in the figure is contained in the region wherethe Hartman-Grobman linearization holds and pick a point (0, Y ) ∈ R,preimage of the homoclinic crossing p. Hence, after n > 1 iterates ofthe map f , the point (0, Y ) will map on some point (X, 0) ∈ R. Bycontinuity, the map fn on a small neighbourhood U ⊂ R of (0, Y ) willtake the approximate form

x′ = X + ax+ b(y − Y )

y′ = cx+ d(y − Y ),

where we take x and y−Y small enough so that the nonlinear terms canbe safely neglected. Here, d 6= 0 represents the condition of transversalintersection at (X, 0) and invertibility is given by ad − bc 6= 0. Thisrepresentation holds at least for the images (x′, y′) lying on R. Suchpoints can be further mapped using the linearisation of the original map.


After additional m ≥ 0 iterates, the images will lie approximately on

x′′ = λm−x′

y′′ = λm+y′,

where λ−, λ+ are the stable and unstable eigenvalues of the originalmap f . Some images will return to the region U , the overall structurelooking more or less as in Figure 8.10. Combining both maps, theseimages can be approximately computed as:

x′′ = λm− (X + ax+ b(y − Y ))

y′′ = λm+ (cx+ d(y − Y )).

Periodic points of period k = n + m satisfy hence, the approximateequation

x = λm− (X + ax+ b(y − Y ))

y = λm+ (cx+ d(y − Y )),

where (x, y) ∈ U ⊂ R. Letting M = n + m0 we recover the initialstatment if we manage to prove that for all m ≥ m0 (i.e., for sufficientlylarge m) the periodic point equation above has a solution. To do this,setting u = y − Y let us rewrite the equation as(

−λm−Xλ−m+ Y

)=

(λm−a− 1 λm−b

c d− λ−m+

)(xu

).

For every sufficiently large m this linear equation has a unique solution(x, u)m since the determinant of the matrix is nonzero (it is approxi-mately equal to −d since |λm− | and |λ−m+ | become arbitrarily small forsufficiently large m).

Remark 8.8. Summary of horseshoe properties. An invariant setof horseshoe type has

1. Periodic orbits of period p for all p ∈ N.2. The number of distinct genuine periodic orbits of period n grows

(roughly) exponentially for n large.3. Uncountably many dense, non–periodic orbits.

We have seen that a horseshoe has an extremely complicated structureAlso, horseshoes appear as soon as we have a transversal homoclinicpoint. In some sense, we may say that chaotic systems always containsubsystems of horseshoe type. This means that in order to understandgeneral chaotic dynamics it is useful to understand the horseshoe first.We will pursue this understanding with the help of probability theory inthe next chapter.

8.2.3. Exercises.

Exercise 8.4. (a) Show that if p, q are different periodic pointsin Λ, then Ws(p) ∩ Wu(q) 6= ∅ (Hint: Produce a symbolic sequencebelonging to this intersection).

142 NLD-draft

(b) Use the fact that there exist a dense orbit in Λ to prove that Λcannot have stable periodic points. In fact, all periodic points are ofsaddle type, i.e., with both stable and unstable manifolds.

Exercise 8.5. Homoclinic tangencies: Assume that the originis an hyperbolic fixed point of a µ-dependent family of planar maps suchthat Wu(0) has a tangency point with Ws(0) for µ = 0. An approxi-mation of this map with the same setup as in Example 8.1, mapping asmall neighbourhood of (0, Y ) to a small neighbourhood of (X, 0) reads

x′ = X + ax+ b(y − Y )

y′ = µ+ cx+ e(y − Y )2.

Note now that since there is a tangency we have no transversality (i.e.,d = 0 in Example 8.1) and the lowest nonlinear contribution has tobe included in the description. Compute a return map from a suitableneighbourhood of (0, Y ) to itself, find periodic orbits and their stabilityeigenvalues as a function of µ. Compute the values µk where saddle-node bifurcations occur. Show further that limk→∞ µk = 0.

8.2.4. The Solenoid. Another “Cantor set” object that is easyto describe with simple mathematics is the Solenoid. Consider a mapf from the solid torus V into itself (see Figure 8.11)

W s

Figure 8.11. The Solenoid map.

f(t, x, y) =

(2t mod 1,

1

10x+

1

2cos(2πt),

1

10y +

1

2sin(2πt)

),

where x, y are coordinates on the unit disc, x2 + y2 ≤ 1 and t ∈ S1.The set

Γ :=⋂n∈Z

fn(V ) =⋂n∈N

fn(V )


is called the solenoid, or Smale–Williams attractor. It is an attractorsince it attracts all nearby points. Note first that f(V ) ⊂ V , this isclear from the definition. Moreover, for fixed t = t0, f(t0, x, y) is acontraction (see W s depicted in Figure 8.11). Hence, all points in thetorus V approach Γ. This set has locally the structure of a productof an interval and a Cantor set. It has a quite peculiar topologicalstructure:

1. The intersection of Γ with the unit disc (for each fixed t = t0)is a Cantor set.

2. Γ consists of uncountable many “threads” since the Cantorsets for each t continue locally into each other. However, each”thread” can intersect the unit disc at t = t0 only countablymany times.

3. Γ is itself connected, since it is the intersection of connectedsets.

Sets of the kind of the solenoid are called indecomposable continua.They play an important role in topology.

8.2.4.1. Coding. Consider the ”rectangles”

R0 = [0,1

2]× D2 R1 = [

1

2, 1]× D2,

where D2 is the unit disc. Then we get a coding map from the binarystrings to the points of the solenoid:

π : Σ2 → Γ.

We list some properties of the coding map:

1. The coding is not one–to–one in the first coordinate. It is 2–to–1 at the dyadic points and otherwise one–to–one (comparewith the expansion in base 2).

2. The cylinder sets have the form

[i

2n,i+ 1

2n]×Dm

j

where Dmj is a connected component of fm(V ) ∩ D i

2n(see Fig-

ure 8.12).3. The solenoid is for the same reasons as the expanding maps on

the circle or the horseshoe, structurally stable.4. The solenoid contains transversal homoclinic points. The sim-

plest can be found as the intersection of the stable and unstablemanifold of the fixed point. The fixed point is the image of 0∞

under the coding:(0,

1

2

∞∑n=0

10−n, 0

)=

(0,

5

9, 0

).

144 NLD-draft

D i2n

[i

2n ,i+12n

]

Figure 8.12. coding cylinders on the solenoid.

8.3. Strange Attractors and Chaos

Let us now make some considerations about what is meant by ex-pressions such as “chaos” or “chaotic dynamics”. Given a dynamicalsystem, the goal in applications is to understand its asymptotic be-haviour for t → ∞ (or such large time that the system behaviourcannot be distinguished in practice from that in the limit). In theend we are looking for ω–limit sets, attractors and their correspond-ing basin of attraction, i.e., which regions of phase space approachwhat portion of the ω–limit set. For simple systems it may suffice tocompute fixed points and their stability. More complicated systemsare less understood, while different techniques of analysis give differentpartial understanding. Intuitively, we would like to say that a system ischaotic when its attractor is more complicated than a finite set of fixedpoints, periodic orbits and connecting orbits, i.e., something “beyond”what is given by the Poincare–Bendixson Theorem.

For vector fields in continuous time, one of the examples that be-came classical is that of the Lorenz equations. It is a nice exercise toshow that the equations have a trapping region consisting of a fairlylarge ball encompassing the origin. Any initial condition on the sur-face of that ball will remain inside it for positive times. The questionis what is the attractor lying inside the region? For certain parametervalues, the attractor is just a fixed point. For other parameter values,the attractor gets modified, until we come to a situation where we stillhave a trapping region, but there exists neither a stable fixed point nor

8.3. STRANGE ATTRACTORS AND CHAOS 145

a stable periodic orbit inside the region. The attractor, whatever it is,must be more complicated than that.

8.3.1. Chaotic Behaviour. Another related question arising inapplications is that of the routes to chaos. For families of dynamicalsystems depending on parameters, we seek to understand in which waythe system “evolves” from simple dynamics to complicated dynamicswhen varying these parameters (it is a peculiar “evolution” since we areevolving in parameter space, time is always infinite when consideringasymptotic behaviour). The classical example in maps has come to bethe period doubling cascade of the logistic family: When a variesmonotonically from 0 towards 4, the attractor of the logistic map gachanges at fixed (discrete) values of a, going from a fixed point at zero,to a nonzero fixed point, to period–doubled attractors, and more.

Routes to chaos are somehow the opposite of structural stability.While in some parameter intervals we may have structural stability,along the route, the nature of the ω–limit set changes drastically atcertain fixed parameter values. These changes may eventually “finish”in the appearance of a horseshoe. Let us be more precise.

Definition 8.4. A strange attractor is an attractor that con-tains a transversal homoclinic orbit. A system is chaotic if it has astrange attractor.

As a consequence, a strange attractor contains within itself infin-itely many periodic orbits and dense, non-periodic orbits. None ofthese periodic orbits is asymptotically stable, otherwise the set wouldbe decomposable (not transitive) and hence not an attractor.

The intuitive image is that the orbits within the attractor are ofsaddle-type. When an initial condition passes close enough to the stablemanifold of some orbit, the dynamics “mimics” this orbit until we getso close to it that the system “escapes” along the unstable manifold.Since we are never exactly “on” a stable manifold, this process goes onforever, shifting from one periodic orbit to another. In fact, lying on adense orbit, the system eventually passes ε–close to any periodic orbit,for any given ε > 0.

Alternatively, Devaney’s definition of chaos gives a clear picture ofchaotic behaviour:

Definition 8.5. The mapping f : V → V is said to be chaotic onthe set V if:1. f has sensitive dependence on initial conditions,2. f is topologically transitive in V ,3. periodic points are dense in V .

8.3.2. Fingerprints of Chaos. For real systems, we can at bestguess the presence of chaos. Even if the equations we use to describe oursystem are chaotic (they display a strange attractor), to what extent

146 NLD-draft

or precision the equations describe the system may always be a matterof debate.

When we can refer to a well-defined set of equations the quest forchaos goes through finding a trapping region, verifying that there isno stable fixed point or periodic orbit inside the region and finding atransverse homoclinic orbit within the attracting set. In some caseswe can do part of the job, namely we find the trapping region and wefind the transverse homoclinic orbit, but we cannot prove e.g., that thisorbit belongs to the attractor. When nothing more precise can be done,we rely on “indicators” of crucial properties that we have encounteredin other, better known, chaotic systems. However, we must bear inmind that necessary conditions should not be mixed up with sufficientconditions.

1. Sensitivity to initial conditions, meaning that x 6= y ⇒|fn(x) − fn(y)| > 1 for n large enough (or a correspondingstatement for flows). If this can be established for pairs x, ylying arbitrarily close to each other and for a large region (pos-itive measure) of initial conditions in phase space, if not chaoswe have at least one fundamental consequence of it, namely theimpossibility of making long-term dynamical predictions (be-cause our knowledge of initial conditions is never perfect).

2. Sensitivity to parameter variation, meaning that we have afamily of dynamical systems where the ω–limit set changes dras-tically when changing parameters. These changes bear the col-lective name of bifurcations. Not all bifurcation cascades willlead to the formation of e.g., a horseshoe, but again the impos-sibility of making long-term dynamical predictions is present,since similarly to initial conditions, the constant value of a givenparameter cannot be specified beyond the precision of our ex-periment and/or numerical computation.

8.4. Exercises

Exercise 8.6. Let A : T2 → T2 be given by

(x, y) −→ (2x+ y, x+ y) mod 1.

What is the number of periodic points of period n? Are the periodicpoints dense?

Hint: Argue that the number of periodic points of period n is thenumber of preimages of the point (0, 0) under the (non-invertible) map

(An − Id) : T2 → T2.

8.4. EXERCISES 147

Then argue that the number of preimages is for all points the same.From this conclude that the number of preimages equals to the area of[(

2 11 1

)n− Id

]([0, 1]× [0, 1]) .

For the density part, verify that all points with rational coordinates areperiodic.

This map is area preserving, it expands in one direction and con-tracts in the other. It is called the Arnold Cat Map and it is a classicalexample in Dynamical Systems Theory. Search it in the web!

Exercise 8.7. Show that the homoclinic points, i.e., points x ∈ Λwith α(x) = ω(x) = p for some periodic point p, are dense in thehorseshoe. Are the homoclinic points for a fixed point dense?

CHAPTER 9

Ergodic Theory

We have seen that some systems have a very chaotic, i.e., sensitive,dependence of the trajectories with respect to their initial data. In suchcases, it is unreasonable to attempt predict the long-time-behavior ofall trajectories. One way out could be to consider only typical or ran-domly chosen trajectories and investigate their statistical properties.This idea goes back to Boltzmann. He considered a hard-ball-gas andconjectured that if one chooses the initial distribution arbitrarily thenthe system will ”mix-up” during its time evolution and will approachthe uniform disordered distribution (see Figure 9.1)

gas particles removing the wall

t=0

t=T

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 9.1. An expanding gas evolves towards equilibrium.

This could be expressed in the following way. Let φ be an observable(e.g., temperature, pressure, etc.) and x0 the initial configuration (e.g.,the configuration of the gas particles). The measured observable is thenφ(x0). The average over all configurations (or macro-observable) isgiven by

∫Xφ(x) dx where X is the space of all configurations, i.e., the

phase space, and dx is the volume measure. The Boltzmann Ergodic

149

150 NLD-draft

Hypothesis (BEH) claims that the time evolution approximates thestate average.

limN→∞

1

N

N−1∑n=0

φ(fnx0) =

∫X

φ dx.

Here f : X → X is the evolution law. Sometimes this formula is ex-pressed as “time average equals phase space average”. The BoltzmannErgodic hypothesis is until today unproved and remains as one of thebig challenges in dynamical systems.

Before going into some details we will discuss the hypothesis onexamples.

Example 9.1. Let E2 : [0, 1)→ [0, 1) be the doubling map x −→ 2xmod 1. We consider the observable

φ(x) =

−1 if 0 ≤ x < 1

2

+1 otherwise.

Then ∫[0,1)

φ(x) dx =1

2· (−1) +

1

2· (+1) = 0.

If x0 = 0 then

limN→∞

1

N

N−1∑n=0

φ(En2 x0) = lim

N→∞

1

N

N−1∑n=0

φ(x0) = −1 6= 0.

Hence the BEH does not hold for the initial value x0 = 0. However, wewill see that x0 = 0 is in some sense an exception.

Let x = x1x2x3 · · · ∈ [0, 1) be a number with the ”right” frequencyof digits in base 2

limN→∞

1

N# 1 ≤ n ≤ N : xn = 0 = lim

N→∞

1

N# 1 ≤ n ≤ N : xn = 1

=1

2.

By the law of large numbers the ”probability of choosing such anumber at random” is 1 (i.e., 100%). Now for such an x0 we have

limN→∞

1

N

N−1∑n=0

φ(En2 x0) =

= limN→∞

1

N(# 1 ≤ n ≤ N : xn = 0 · (−1)+

+# 1 ≤ n ≤ N : xn = 1 · (+1))

=1

2· (−1) +

1

2· (+1) = 0 =

∫[0,1)

φ(x) dx.

So for a ”typical” point the BEH is valid.

9.1. NAIVE MEASURE THEORY 151

Our main aim is to show that this situation holds generally. Forthis we need some preparation.

9.1. Naive Measure Theory

A measure µ should be a function on subsets of X which assigns toit its “size” (like length, area, volume), i.e.,

µ : B ⊂ 2X → R+ ∪ 0. (9.1)

Here 2X denotes the set of all subsets of X. We ask this function topreserve some fundamental properties,. First, the measure should notchange if we “cut a set into disjoint pieces”:

µ

(⋃i∈N

Ai

)=∑i∈N

µ(Ai), Ai ∈ 2X , Ai ∩ Aj = ∅; i 6= j. (9.2)

Second, we want the total measure of the space to be equal to 1:

µ(X) = 1. (9.3)

Such a measure is called a probability measure, and reflects the factthat the probability of the whole sample space is equal to 1. Finally, Bdenotes the measurable subsets of X. Unfortunately, it is not possibleto extend such a function to all subsets of X (Theorem of Ulam).One has to restrict the measure to the set of “constructible subsets”.We first choose some elementary sets E ∈ E like intervals, rectangles,cubes, cylinder sets or similar. Then we apply a countable number ofthe operations union and complement:

E1, E2 ∈ E =⇒ E1 ∪ E2 ∈ Band

E1 ∈ E =⇒ X \ E1 ∈ B.In other words, we admit countable unions and countable intersectionsin B. In case our elementary sets are connected to the topology of X(like open sets), the set B is called the Borel-σ-algebra of X. It is thebasic idea of measure theory to construct measures on such σ-algebras.

Remark 9.1. In a compact metric space we always have a countablebase of the topology consisting of balls. We can choose them as ourelementary sets. We will from now on assume that the cardinalityof the elementary sets is countable.

Remark 9.2. The Lebesgue measure on [0, 1] has the fundamentalproperty that the measure of any finite interval I = (a, b) with 0 ≤a < b ≤ 1 is m(I) = b − a. It is an example of a probability measure,where E is the collection of all open intervals. Lebesgue measure hasthe additional property that all subsets of a set of zero measure aremeasurable (and have also measure zero).

152 NLD-draft

Example 9.2. The δ-measure on [0, 1] is another example:

δx(A) =

1 if x ∈ A0 otherwise.

9.1.1. Integration and Linear Functionals. From physics wecan get the motivation that integration is the process of evaluatingthe average of an observable (a function). Already the Mean ValueTheorem of integral calculus conveys this intuition. We will “borrow”this motivation to develop our ideas. An average of a function shouldbe

1. 1 for the observable φ ≡ 1,2. positive for positive observables3. linear in the space of all observables and4. continuously dependent on the observables.

If we assume the observables are represented by elements from thespace of continuous functions then we arrive to the idea of integrationas a positive linear continuous functional L of norm 1. As in the naivemeasure theory we can extend the space of potentially integrablefunctions I. For this we allow the following operations

C0 ⊂ Iand

φn ∈ I =⇒ φ(x) = limn→∞

φn(x) ∈ I. (9.4)

We note that the functional does not need to be finite! Therefore, thisis not the usual notion of integrable functions.

There is a 1–to–1 correspondence between measures and integra-tions (this is the message of Riesz representation Theorem). Fora Borel set A we have

χA(x) =

1 if x ∈ A0 otherwise.

∈ I

(this is called the characteristic function of the set A) and we candefine

µ(A) = Lµ (χA) :=

∫X

χa(x) dµ(x).

On the other hand, one can prove that the operations (9.4) appliedto the linear combinations of the characteristic functions χA recoverthe whole set I. Therefore it suffices to define the linear functionaljust on the set of characteristic functions.

Lµ (χA) := µ(A).

This gives the notion of an integral with respect to an arbitrary mea-sure: ∫

X

φ dµ := Lµ(φ).

9.2. INVARIANT MEASURES 153

9.2. Invariant Measures

In the situation of a dynamical system we are interested in asymp-totic statistics. So a measure which is concentrated on a set that iswandering (i.e., its iterates never return: fn(A) ∩ A = ∅) does notresemble the asymptotics. Therefore we restrict our attention to in-variant measures (invariant under the action of f):

µ(f−1(A)) = µ(A) ∀A ∈ B. (9.5)

This is connected to the following integral equation∫f−1(A)

dµ(x) =

∫X

χf−1(A)(x) dµ(x) =

∫X

χA(f(x)) dµ(x)

=

∫X

χA(y) dµ(f−1(y)) =

∫A

dµ(x).

9.2.1. The Empirical Measure. Assume we have fixed an ini-tial value x0. Then for a fixed elementary set E we can consider theempirical frequencies

FreqN(x0, E) :=1

N# 0 ≤ n < N : fn(x0) ∈ E ,

i.e., the visiting frequencies to the set E of x0 and its iterates. In oursituation we can assume that we have a countable set of elementarysets (e.g., rectangles with rational endpoints, cylinder sets, etc.) whichgenerate B. Then we can find by “diagonalisation” a subsequence Nk

such that

∃ limk→∞

1

Nk

FreqNk(x0, E) =: µemp(E)

for all elementary sets E1. This function can be extended to a measure– called an empirical measure – on all B. It has the property that

µemp(f−1(E)) = lim

k→∞

1

Nk

FreqNk(x0, f−1(E))

= limk→∞

1

Nk

FreqNk(f(x0), E)

= limk→∞

1

Nk

FreqNk−1(x0, E)± 1

Nk

= µemp(E)

for all elementary sets E. Hence an empirical measure is invariant.

1FreqN is a number between zero and one so the sequence has a convergentsubsequence, in fact any set E in the countable family of elementary sets has aconvergent subsequence. The idea is to pick a subsequence that is convergent forall sets in the family.

154 NLD-draft

Remark 9.3. The main difficulty is that an empirical measure isnot unique. It may depend on x0 and on the sequence Nk. For thedoubling map E2 we have for all N that

FreqN(0, E) =

1 if 0 ∈ E0 otherwise

= δ0(E).

For any ”typical”point x0 we have for the interval defined by the cylin-der set I = y ∈ [0, 1) : y1 = 0 (an interval of length 2−1, all points inthe unit interval having zero as first element in its binary expansion)

limN→∞

1

NFreqN(x0, I) = lim

N→∞

1

N#0 ≤ n < N : xn = 0 =

1

2.

Consider now a point x1 with x(2n)! = 0, · · ·x(2n+1)!−1 = 0 andx(2n−1)! = 1, · · ·x(2n)!−1 = 1 for all n ∈ N. Then

limk→∞

1

(2k)!− 1Freq(2k)!−1(x1, I) ≥ 1−

(limk→∞

1

(2k)!− 1(2k − 2)!

)= 1

and

limk→∞

1

(2k + 1)!− 1Freq(2k+1)!−1(x1, I) ≤

(limk→∞

1

(2k + 1)!− 1(2k)!

)= 0.

Hence, for this point the empirical measure is not unique (it dependson the subsequence Nk).

9.2.2. Invariant Measures for One–dimensional Maps. Con-sider a map on the interval with the following property: There are ndisjoint intervals (Ii)

ni=1;

⋃Ii = [0, 1] and f is linear on each of these

intervals. Moreover, the image of each of the intervals is the union ofsome of those intervals

f(Ii) = Ij1 ∪ · · · ∪ Ijk(i) .Such maps are called linear Markov maps. We can associate to thismap a transition matrix A = (aij)1≤i,j≤n with

aij =

1 if f(Ii) ⊃ Ij0 otherwise.

The transition matrix for the map E2 is

AE2 =

(1 11 1

).

The map g drawn in Figure 9.2 is also a linear Markov map. Its tran-sition matrix is

Ag =

0 0 1 10 1 1 10 1 1 01 1 1 0

.


I1 I2 I3 I4 x

y

0 0.25 0.50 0.75 1

Figure 9.2. The linear Markov map g.

The transition matrix describes existing orbits. If the entry aij = 1then at each time–step n there is a point with fn(x) ∈ Ii and fn+1(x) ∈Ij. More generally, the matrix Ak = (a

(k)ij ) gives all possible transitions

within k time–steps. Not all transitions have to be equally likely. Wecan assign probabilities pij to each possible transition. Clearly

n∑j=1

pij = 1 (9.6)

since the total probability of transitioning is 1. Also pij = 0 if aij = 0.The new matrix P = pij is called a stochastic matrix. If there is

a positive integer M such that PM has strictly positive entries this ma-trix is called irreducible. Then equation (9.6) and the Perron-FrobeniusTheorem A.13 (see Appendix A.4.5) assure that 1 is a simple eigen-value and the corresponding left and right eigenvectors are positive.The easy part of the result goes by noting that since all rows of Padd up to one, the column vector with all entries equal to unity is aright eigenvector of P with eigenvalue one. Let p = (p1, · · · , pn) be thenormalised left eigenvector, i.e.,

∑pi = 1 and pP = p. We can now

construct an invariant measure on [0, 1). Let

Ii1···ik = Ii1 ∩ f−1Ii2 ∩ · · · ∩ f−(k−1)(Iik)

156 NLD-draft

then we set

µP (Ii1···ik) = pi1 · pi1i2 · · · pik−1ik .

This defines a function on the elementary intervals and can be extendedto B. For this we only have to show that the measure such defined hasthe invariance property on the elementary intervals.

Note first that the intervals can be recasted in the cylinder no-tation of the previous chapter: Ii1···ik ≡ [i1 · · · ik] where the iteratesare to be performed with f−1. Verify first that since [i1 · · · in−1] =⋃nk=1[i1 · · · in−1k] it holds that

µP ([i1 · · · in−1]) =n∑k=1

µP ([i1 · · · in−1k]).

Indeed,

µP ([i1 · · · in−1]) = pi1pi1i2 · · · pin−2in−1

= pi1pi1i2 · · · pin−2in−1

(n∑k=1

pin−1k

)

=n∑k=1

pi1pi1i2 · · · pin−1k

=n∑k=1

µP ([i1 · · · in−1k]).

Now we will check that since p is an eigenvector this measure is invari-ant. Note that

(p1, · · · , pn) · P =

(n∑k=1

pkpk1, · · · ,n∑k=1

pkpkn

).

Hence,

µp(f−1([i1 · · · im])

)= µp

(n⋃k=1

[ki1 · · · im]

)

=n∑k=1

µp ([ki1 · · · im])

=n∑k=1

pkpki1pi1i2 · · · pim−1im

=

(n∑k=1

pkpki1

)pi1i2 · · · pim−1im

= pi1pi1i2 · · · pim−1im

= µp ([i1 · · · im]) .


Finally we can extend this result to all cylinders and therefore to allmeasurable sets in B. Such a measure is called a Markov measure.

In case we can choose

pij =|Ij|

|Ii| |f ′|Ii |

then this probability is the amount of the measure that is spread fromIi to Ij. One can show that the resulting measure has a density ρ(x)with respect to the Lebesgue measure∫

[0,1)

φ(x) dµ(x) =

∫[0,1)

φ(x)ρ(x) dx

and the density is connected to the right eigenvector p+ normalisedso that that <p, p+>= 1. For this we conclude (note that Ii1···im =[i1 · · · im])

µp ([i1 · · · im])

|Ii1···im|=pi1pi1i2 · · · pim−1im

|Ii1···im|

=1

|Ii1···im|pi1

|Ii2||Ii1 ||f ′|Ii1 |

|Ii3||Ii2||f ′|Ii2 |

· · · |Iim||Iim−1||f ′|Iim−1

|

=1

|Ii1···im|pi1

|Iim||Ii1 ||f(m)|Ii1 |

=1

|Ii1···im|pi1|Iim||Ii1|

|Ii1···im| = pi1|Iim||Ii1|

where the left-hand side is bounded away from zero and finite indepen-dently of i1 · · · im and hence a density.

Example 9.3. For the map E2 we can choose

P =

(12

12

12

12

).

The resulting density is p+ = (1, 1) and the measure is Lebesgue mea-sure itself.

Remark 9.4. In the case that

pij = plj = pj for all i, j, l

the probability to be at ”state” j does not depend on where you comefrom. Such a measure is called a Bernoulli measure. The generalformula is

µP (Ii1···ik) = pi1 · pi2 · · · pik .

158 NLD-draft

since

(p1, · · · , pn)

p1 · · · pn· · ·

p1 · · · pn

= (p1(p1 + · · ·+ pn), · · · , pn(p1 + · · ·+ pn))

= (p1, · · · , pn).

The Lebesgue measure for E2 is actually a Bernoulli measure.

9.3. The Birkhoff Ergodic Theorem

We will present here (without proof) the most important theoremon orbit statistics. It can be seen as a mathematical justification ofthe Boltzmann Ergodic Hypothesis. The general form of this theoremand its proof can be found in the Appendix (Theorem A.19). Beforeproceeding we need some preparation.

We are only interested in the basic stones of an invariant measure.If e.g., we would have an invariant set with positive measure then alsothe complement is invariant and we can investigate the set and itscomplement separately. therefore we define

Definition 9.1. An invariant measure µ is called ergodic if it isindecomposable, i.e., if for any invariant Borel set A (this is, a set suchthat A = f(A) = f−1(A))

µ(A)µ(X \ A) = 0.

Remark 9.5. We can give also another equivalent formulation: µis ergodic if and only if any invariant measurable function φ is constantµ-a.e.2, i.e., there is a number a such that φ(x) = a for a set of pointsof measure 1.

We are now ready to state

Theorem 9.1 (Birkhoff’s Ergodic Theorem). Let µ be an ergodicmeasure for f : X → X, X a compact metric space. Then there is aset A with µ(A) = 1 such that for any continuous function φ and forany x ∈ A we have

limN→∞

1

N

N−1∑n=0

φ(fn(x)) =

∫X

φ(x) dµ.

2A property is said to hold µ-a.e. (almost everywhere with respect to themeasure µ) when the set of points where the property holds has measure 1 or,equivalently, when the set of points where it does not hold has measure zero.

9.3. THE BIRKHOFF ERGODIC THEOREM 159

Hence, when the hypotheses of the Theorem are fulfilled, that is,when we have an ergodic measure µ, the BEH holds for almost everypoint of X (with the possible exception of a set X \A of zero measure).

Corollary 9.1. For x ∈ A the empirical measure exists and isindependent of x and the subsequence.

Proof. Let E be an elementary set and χE its characteristic func-tion. Such a function is not continuous in general, but it can be approx-imated by continuous functions differing from χE on a set of measuresmaller than ε, for any ε > 0. So we can produce a sound argumentshowing that the Theorem holds also for χE. Actually, the full ErgodicTheorem stated in the Appendix is proved for a still wider class offunctions. Hence, applying the Theorem we have

limN→∞

1

N

N−1∑n=0

χE(fn(x)) =

∫X

χE(x) dµ = µ(E)

for all x ∈ E.

Remark 9.6. The Birkhoff Ergodic Theorem gives a method toevaluate integrals. This method is the essence that is behind the Monte–Carlo–Method. Suppose we have a map f with associated ergodic mea-sure µ and we want to evaluate the integral

∫Xφ(x) dµ for some suit-

able function φ. We may choose a typical initial condition x0 (i.e.,randomly) and generate a sequence of “random numbers” using themap, namely fn(x0). Then we evaluate the function at these numbersand average these values. The Birkhoff Ergodic Theorem claims thatthis value tends to the integral as n → ∞. Of course, we must assureourselves first that x0 ∈ A.

The following theorem is actually simpler then the Ergodic Theoremand was proved before.

Corollary 9.2 (Poincare Recurrence Theorem). Let A ∈ B and µbe an ergodic measure. Then µ-a.e. point x ∈ A (i.e., the set of pointsfor which the result does not hold is a subset of A of zero measure)returns infinitely often to A.

Proof. If µ(A) = 0 then the assertion is obvious. Let us assumethen that µ(A) > 0. By the Birkhoff Ergodic Theorem we know thatfor a.e. point of A the empirical measure of A is equal to the measureµ(A). Hence a.e. point must return infinitely often to obtain the right“hitting frequency” (if fn(x) would return only finitely many times toA then the limit on the left hand side of the Ergodic Theorem wouldbe zero, while the right hand side is positive).

160 NLD-draft

9.4. Exercise

Exercise 9.1. Show using the properties of measures and this def-inition that invariant sets with respect to an invariant ergodic measurehave either full measure (measure 1) or measure zero.

9.5. Lyapunov Exponents

In this Section we consider a map of the interval f : [0, 1) → [0, 1)with an ergodic measure µ. We choose the function φ(x) = log |f ′(x)|(if f ∈ C1 then φ is continuous). The Birkhoff Ergodic Theorem gives

limN→∞

1

Nlog∣∣∣(fN(x)

)′∣∣∣ = limN→∞

1

Nlog

∣∣∣∣∣N−1∏n=0

f ′(fn(x))

∣∣∣∣∣= lim

N→∞

1

N

N−1∑n=0

φ(fn(x))

=

∫[0,1)

log |f ′(x)| dµ = λµ

for µ-a.e. x. The number λµ is called the Lyapunov exponent of fwith respect to µ. It gives the average expansion along the orbit.

If we consider higher-dimensional maps the derivative Dxf becomesa matrix and the norm is not multiplicative. We have to estimatelimn→∞

1n‖Dxf

nv‖. We note that in general

‖Dxfnv‖ 6=

n∏i=1

‖Df i−1(x)f‖‖v‖.

Therefore we cannot apply the Birkhoff Ergodic Theorem in sucha situation. However, the Oseledec Theorem can handle this morecomplicated situation, ensuring the existence of n numbers λi whichgive the average contraction/expansion rates in different directions.

9.6. Exercises

Exercise 9.2. Let Rα(x) = x + α mod 1 be a rotation with irra-tional α. What are the invariant measures? What changes if α ∈ Q?

Hint: For irrational α show that the hitting frequency

limN→∞

1

N#0 ≤ n < N : Rn

α(x) ∈ (a, b)

does not depend on x (but only on the length of the interval). Arguethat there can be only one invariant functional.

Exercise 9.3. Two (Borel) measures µ, ν are said to be singulariff there is a (Borel) set A such that µ(A) = 1 and ν(A) = 0. Use theBirkhoff Ergodic Theorem to show that two different ergodic measures

9.6. EXERCISES 161

are singular.Hint: Use the fact that the two linear functionals associated to themeasures are different.

Exercise 9.4. Consider

suppµ := x ∈ X : µ (Bε(x)) > 0 for all ε > 0 ,i.e., the smallest closed set of full measure. Show that it is an invariantset for a continuous function f : X → X on the compact metric spaceX, provided µ is invariant.Hint: Show that the symmetric difference

(suppµ \ f(suppµ)) ∪ (f(suppµ) \ suppµ)

has zero measure for any invariant measure. Conclude from continuitythat the support must be invariant.

Exercise 9.5. Let µ be an invariant measure for f : X → X,where f is continuous on the compact metric space X. Show thatµ(X \

⋃x∈X ω(x)

)= 0.

APPENDIX A

Appendix

A.1. Basic definitions

A.1.1. Spaces.

Definition A.1. A topological space is a set having an addi-tional structure such that the concepts of convergence, connectednessand continuity may be defined.

The simplest topological spaces are such that these properties areestablished without using the concept of distance. Indeed, all threeconcepts above can be established using the concept of open set andclosed set only. Topological spaces are characterised by describing howtheir open sets look like.

In science applications the concept of distance is often natural.Hence, metric spaces are used, where the topological properties aredefined and refined using the additional concept of distance betweenpoints. The usual Euclidean space (ordinary space) is an example of ametric space.

The historical order, however, is the opposite: Metric spaces camefirst and gradually their topological underlying structure was unveiled.

Definition A.2. A set or a topological space is called compactwhen it is closed (i.e., its complement is an open set) and when addi-tionally it can be covered by a finite number of open sets (more techni-cally: all open covers contain a finite sub-cover). For Euclidean space,a set is compact if and only if it is closed and bounded.

Definition A.3. A set X is called a metric space if there is afunction called metric

d : X ×X → R+ ∪ 0with properties

• reflexivity:

d(x, y) = 0 ⇐⇒ x = y

• symmetry:d(x, y) = d(y, x)

• triangle inequality

d(x, z) ≤ d(x, y) + d(y, z).

163

164 NLD-draft

Example A.1. We consider some metric spaces:

• Let X be any non–empty set. The metric

d(x, y) =

0 x = y

1 x 6= y

is called the telephone metric.• Let X = Rn. There are many metrics on this space. E.g.,

d((x1, · · · , xn), (y1, · · · yn)) =

√√√√ n∑k=1

(xk − yk)2

or

d((x1, · · · , xn), (y1, · · · yn)) = max1≤k≤n

|xk − yk|

• Let X = C[a, b] the space of continuous functions on the interval[a, b]. A metric can be defined as

d(f, g) = maxx∈[a,b]

|f(x)− g(x)|.

Since continuous functions on an interval are integrable we canalso consider a different metric

d(f, g) =

∫[a,b]

|f(x)− g(x)| dx.

We note that d(f, g) = 0 ⇐⇒ f ≡ g since a non-negative con-tinuous function has zero integral if and only if it is identicallyzero.

Definition A.4. A sequence (an)n is called a Cauchy sequenceif for any ε > 0 there is a number N = N(ε) such that for all n,m > Nwe have d(an, am) < ε. Equivalently, a sequence is Cauchy if

limn→∞

supm>n

d(an, am) = 0.

Remark A.1. Any convergent sequence of a metric space is a Cauchysequence. But if we consider the rational functions there are manyCauchy sequences that do not converge in Q, namely those having aslimit (in a broader sense) functions with irrational points in their im-age. This will mean that the space of rational functions has ”holes”.

Definition A.5. A metric space is called complete if any Cauchysequence converges.

Proposition A.1. The space C[a, b] of continuous functions onthe bounded interval [a, b] with the max-distance is complete.

A.1. BASIC DEFINITIONS 165

Proof. Recall first that continuous functions in closed and boundedintervals are bounded. Consider now a Cauchy sequence fn(x) of func-tions of the space. This sequence induces a Cauchy sequence of realnumbers, for each point x0 ∈ [a, b]. Indeed, if d(fn, fm) < ε then forany x0 ∈ [a, b] d(fn(x0), fm(x0)) < ε. Since a closed and bounded in-terval of the real numbers is a complete metric space, fn(x) convergespointwise to some function f(x). It remains to be shown that suchfunction is continuous. There are many ways to do this.

We show first that fn → f not only pointwise, but also in themax-distance defined above. Indeed, for n,m > N0 sufficiently large,

d(fn, f) = maxx∈[a,b]|fn(x)−f(x)| = maxx∈[a,b] limm→∞

|fn(x)−fm(x)| ≤ ε.

The last step holds since fn is a Cauchy sequence of functions, hence|fn(x)− fm(x)| ≤ ε for all x ∈ [a, b].

Now it is straightforward to show that f(x) is continuous. Consider

|f(x)− f(y)| ≤ |f(x)− fn(x)|+ |fn(x)− fn(y)|+ |fn(y)− f(y)|.For n large enough the first and third terms in the sum above canbe made to be smaller than ε/3, because of the previous argument.Also, since fn is continuous, there exists δ > 0 such that |x − y| ≤δ ⇒ |fn(x) − fn(y)| ≤ ε/3. Putting everything together we find that|x − y| ≤ δ ⇒ |f(x) − f(y)| ≤ ε. This can be done for any positive ε,and hence f is continuous.

Remark A.2. Cauchy sequences are not preserved by a continuous(homeomorphism) or even a differentiable (diffeomorphism) change ofcoordinates. The map

tan:(−π

2,π

2

)→ R x→ tanx

is a diffeomorphism of the open interval(−π

2, π

2

)onto the real line.

The sequence an = π2− 1

nis a (non–convergent) Cauchy sequence in

the interval while bn = tan(π2− 1

n

)is not a Cauchy sequence in R.

Although R is a complete metric space, its diffeomorphic image(−π

2, π

2

)is not.

The next theorem says that a compact metric space is not too large.

Theorem A.1. Any compact metric space has a countable dense setxl. Moreover the countable family (U 1

m(xn))n,m forms a basis of the

topology, i.e., for all x ∈ X and all open sets V 3 x there is n,m ∈ Nsuch that

x ∈ U 1m

(xn) ⊂ V.

Recall that a set is countable when its elements can be put into 1–to–1 correspondence with the natural numbers. Also, a set is dense (inthe metric sense) if for any ε > 0 and for any x ∈ X there is an elementxk of the set such that d(x, xk) < ε. Denseness can be characterised

166 NLD-draft

without distance, e.g., a set D is dense if for all x ∈ X every open setcontaining x (this is called a neighbourhood of x) contains at least oneelement of D.

Definition A.6. A n-dimensional manifold is an abstract mathe-matical space, where each point has a neighbourhood equivalent to anopen set of Rn. Equivalent means here that there exists a 1–to–1 andinvertible map between the neighbourhood and the open set in Rn. Sucha map is called a chart and the set of all charts is called an atlas.Whenever two charts have a common domain, the corresponding mapsshould be compatible (conjugate).

The overall structure of the manifold may be complicated, but inthe vicinity of each point its structure is just like ordinary space. Amanifold may be topological, where the relation with Rn is given bycontinuous maps, differentiable (differentiable maps), smooth (infin-itely differentiable maps) or analytic (analytic maps).

A.1.2. Banach Spaces and Norms. Whenever we can regardthe elements of a metric space X as vectors (i.e., X is a vector space,which means that we can define the sum of elements in X as a newelement in X, compatible with the ordinary sum), there is a natu-ral association connecting the ideas of “distance between points” and“length of vectors”. Indeed, the length ‖x‖ is the distance between xand the zero vector.

These ideas are formalised in the concept of norm and normedspace. Whenever a norm is defined on a vector space, we can definea corresponding distance as d(x1, x2) = ‖x1 − x2‖. Subsequently wecan define open sets and a topology using this distance. However, theinverse procedure is not always possible, there exist topologies (i.e.,specifications of open sets) that are not normable. We refer to standardtextbooks in topology and linear spaces for the proper definitions.

Also n × n matrices can be regarded as vector spaces, under theusual sum of matrices and the zero matrix in the role of “zero vector”of the space. We may want to estimate the size of matrices using somesort of norm.

From Linear Algebra we know that matrices describe linear mapsbetween vector spaces. It is quite natural to define matrix norms asthe largest modification in size they can act on a non zero vector.

Definition A.7 (Matrix Norm). We define the norm of a squarematrix A as:

‖A‖ := sup‖x‖=1

‖A · x‖.

Note that the norm of a matrix is defined via norms of vectors.

Definition A.8 (Banach Space). A Banach Space is a completenormed vector space.

A.1. BASIC DEFINITIONS 167

Hence, it is a vector space where we can sum its elements. Inaddition, we can define a norm and out of it a distance between pointsand finally every Cauchy sequence in the space is convergent. Theusual Euclidean Rn with the usual norm and distance is of course aBanach space.

A.1.3. Gronwall’s Inequality.

Lemma A.1 (Gronwall’s Inequality). If, for t0 ≤ t ≤ t1, φ(t) ≥ 0and ψ(t) ≥ 0 are continuous functions such that the inequality

φ(t) ≤ K + L

∫ t

t0

ψ(s)φ(s)ds

holds on t0 ≤ t ≤ t1, with K and L positive constants, then

φ(t) ≤ K exp

(L

∫ t

t0

ψ(s)ds

)on t0 ≤ t ≤ t1.

Proof. The hypothesis is equivalent to

φ(t)

K + L∫ tt0ψ(s)φ(s)ds

≤ 1

Multiplying by Lψ(t) and integrating,∫ t

t0

Lψ(s)φ(s)ds

K + L∫ st0ψ(τ)φ(τ)dτ

≤ L

∫ t

t0

ψ(s)ds.

Thus

ln

(K + L

∫ t

t0

ψ(s)φ(s)ds

)− lnK ≤ L

∫ t

t0

ψ(s)ds

and finally

φ(t) ≤ K + L

∫ t

t0

ψ(s)φ(s)ds ≤ K exp

(L

∫ t

t0

ψ(s)ds

).

A.1.4. The Induction Principle. The induction principle is amethod of proof used to establish that a given statement is true for allnatural numbers. It proves, so to speak, an infinite family of statementsat once. Let us illustrate it with the following example: Let us provethat for all integers n ≥ 0, (n+ 1)2 = n2 + 2n+ 1.

The method starts by proving (checking) that a first statement,say that for n = 0, holds. In our example, for n = 0 the equality says12 = 1, that is clearly valid.

The next and final step is to prove that if a statement in the infinitesequence of statements is true, then so is the next one. In our example,assuming that (k+ 1)2 = k2 + 2k+ 1 holds, we want to prove that eventhe following statement is true: ((k+ 1) + 1)2 = (k+ 1)2 + 2(k+ 1) + 1.

168 NLD-draft

We realise that ((k+1)+1)2 = (k+2)2 = k2 +4k+4. But k2 +4k+4 =(k2 + 2k + 1) + (2k + 2) + 1 = (k + 1)2 + 2(k + 1) + 1. This completesthe proof.

Starting at n = 0 is not important, the crucial thing is that thereexists some starting point n0 and that the statement we want to provehas to hold for all n ≥ n0.

The induction principle is not supposed to be “proved” itself withinthe standard number framework that we learn at school. It is part ofthe foundational axioms defining the natural numbers, as formalisedby Peano in the late XIX-th century. This is because this concept is asnatural to human intuition as the positive integers are. However, theinduction principle is a fruitful object of study for mathematical logic,and it can be proved in some axiomatic systems other than Peano’s.

A.1.5. Maps.

Definition A.9. A continuous 1–to–1 map of a topological spacewhose inverse is also continuous is called a homeomorphism. If themap and its inverse are differentiable it is called a diffeomorphism.

Remark A.3. Such maps may be considered as (continuous re-spectively differentiable) changes of coordinates. If we apply a home-omorphism or a diffeomorphism to a small ball around the origin inRn we map the (Cartesian) straight coordinate lines to (not necessarilystraight) new curves. Since this done in a 1–to–1 way, we can give toeach point in the image its coordinates defined by the preimage.

A.2. More Topological Concepts

A.2.1. Cantor Sets. Let us consider the following construction.In the interval [0, 1] delete the open subinterval

(13, 2

3

). There remain

two disjoint closed intervals I0 and I2. These intervals correspond tothe real numbers in [0, 1] that have a 0 or a 2 in the first position whenexpressed in base–3:

I0 = x ∈ [0, 1] : x = .0x2x3 · · · in base–3

and

I2 = x ∈ [0, 1] : x = .2x2x3 · · · in base–3.We can now repeat the procedure on each of the new intervals, eliminat-ing their central open subinterval of width 1/32, namely (1/9, 2/9) and(7/9, 8/9). The four remaining closed intervals contain now real num-bers having 0’s or 2’s in the first two positions. Continuing inductivelyby repeating the “rescaled” procedure at each remaining subinterval,i.e., deleting the central third of each subinterval, we have at step nexactly 2n intervals, each one of length

(13

)n. They correspond to real

numbers having 0’s or 2’s in the first n positions and can be labeled by

A.2. MORE TOPOLOGICAL CONCEPTS 169

sequences i1i2 · · · in denoting the first n positions in the base-3 expres-sion, where ik ∈ 0, 2 and

Ii1i2···in ⊂ Ii1i2···in−1 .

In other words, we have

Ii1i2···in = x ∈ [0, 1] : x = .i1i2 · · · inxn+1 · · · in base–3.The standard Cantor set is the set

C =⋂n∈N

⋃i1i2···in

Ii1i2···in .

Thus, it is the set of all real numbers in [0, 1] which do not have a 1 intheir base–3 expansion. A general Cantor set is the homeomorphicimage of the standard Cantor set. For future use, let us call xn thestring with the first n digits of the base–3 expression of a point x ∈ C.

Definition A.10. A subset A of a space X is called totally discon-nected if for any two points x 6= y ∈ A we can find two open sets of X,U 3 x and V 3 y such that A = (V ∩ A) ∪ (U ∩ A) and U ∩ V = ∅.

Definition A.11. A set A is called perfect if it is equal to the setof its limit points. In other words, every point of A is the limit of somesequence in A.

Theorem A.2. Any Cantor set is compact, perfect and is totallydisconnected. Moreover these three properties characterise a Cantorset.

Proof. We prove the Theorem for the standard Cantor set.The Cantor set is compact since it is a closed and bounded subset

of the real line. You may note that it is closed since its complement in[0, 1] is open (it is a countable union of disjoint open intervals).

All endpoints of the closed intervals used to build C belong to C(they consist of a finite string xn followed by an infinite string of 0’sor of 2’s). To see that C is perfect we will show that all points in Care accumulation points of elements of C, namely that in any neigh-bourhood of x ∈ C there exists some other point y ∈ C. Indeed, forany 1 > ε > 0 consider the open interval of radius ε around x ∈ C(any neighbourhood of x contains one such interval). Let n > − log3 ε.Then the points xn00 · · · and xn22 · · · belong to the Cantor set, theyare distinct and they lie closer to x than ε.

Since C is perfect it contains infinitely many points. We choosetwo: x 6= y. Since the points are distinct, they have different base–3expansions, i.e., their expansions coincide up to some N but the digitN + 1 is (without loss of generality) 0 for x and 2 for y. Consider nowthe closed interval IxN ⊂ [0, 1], where xN is the common initial stringof 0’s and 2’s in x and y. Both points belong to this interval. In thenext step of the Cantor construction the open third of this interval is

170 NLD-draft

eliminated, namely all points of the form 0.xN1p (in base–3), where pis any string built from 0, 1, 2. In particular, x∗ = 0, xN1100 · · · /∈ Cand x < x∗ < y. The sets U = [0, x∗) and V = (x∗, 1] satisfy therequirements of the definition (note that regarded as subsets of theunit interval, they are open).

Let us consider some standard and less standard properties of theCantor set.

Exercise A.1. Compute the Lebesgue measure of the eliminatedintervals in the construction of the Standard Cantor set. Is the Cantorset measurable? What is its measure?

Exercise A.2. Try a Cantor-like construction by eliminating fromthe unit interval at each step a central open interval whose length is afraction p < 1/3 of the closed interval in question. In other words, inthe first step we eliminate from the unit interval of length L0 = 1, thecentral portion of length pL0. Two intervals of length L1 = (1 − p)/2are left. In the second step we eliminate the central intervals of lengthpL1, and so on. Is there a limit set? How does it look like? Computethe Lebesgue measure of the eliminated intervals and that of the limitset.

Exercise A.3. Let p = 1/k, where k > 3 is an integer. Try aCantor-like construction by eliminating from the unit interval at eachstep n ≥ 0 a central open interval whose length is pn+1. Is there alimit set? How does it look like? Compute the Lebesgue measure of theeliminated intervals and that of the limit set.

A.2.2. On the Concept of Infinity. This subsection is intendedas an informative addendum to previous discussions. In these days,however, books for a general audience can be successfully comple-mented by “googling” through the internet or inside Wikipedia (atyour own risk, of course), using –for the present purpose– keywordssuch as Axiom of Choice, Cantor, Well-ordering theorem, Zorn, Zer-melo, Baire, Tarski, Hilbert, Constructivism, etc.

Definition A.12. We call two sets equipotent if there exists abijective function mapping one set onto the other (element-wise). Wesay that a set is infinite if it is equipotent with a proper subset of itself.

Exercise A.4. Prove the following statements.

1. Z and R are infinite sets.2. Q is equipotent to Z.3. There is no surjective function from Q (or Z) to R, i.e., R and

Q are not equipotent.


Exercise A.5. Show that the standard Cantor set is equipotent tothe real interval [0, 1].

This definition of infinite set contributed to start one of the mostamazing developments in mathematics around the turn of the XIX–XX-th centuries. In fact, the idea of infinity was present already intraditional analysis: the concept of limit, accumulation point, contin-uous function, or just irrational number, all involve some way or theother the “construction” of e.g., infinite sequences (existing at leastwithin the universe of our imagination), as well as mathematical as-sertions about these sequences. Let us reason about the implicationsof the newer views on infinity. Before Cantor, the usual definition ofinfinity rested on counting:

Definition A.13. A set is finiteN (N for naive) if it has n ele-ments, for some nonnegative integer n, whereas a set is infiniteN if itis not finiteN .

The only set with zero elements is the empty set. A corollary of thisdefinition is that a finite set with n> 0 elements can be put in 1–to–1 correspondence (i.e., bijectively) with the finite set 1, 2, · · · , n ofpositive integers (this is just counting, the naive way). Now we restatethe second half of Definition A.12 as:

Definition A.14. A set A is infiniteC (C for Cantor) if there existsa proper subset B of A and a bijective function f : A 7→ B, whereas aset is finiteC if it is not infiniteC.

Again, the empty set is finiteC since it has no proper subsets. How-ever, it turns out that Cantor’s concept of infinity is different from thenaive one:

Lemma A.2. If a set D is infiniteC then it is infiniteN .

Proof. The empty set and any set without proper nonempty sub-sets are ruled out since they are finiteC . Assume now that D is bothinfiniteC and finiteN . Since it is finiteN , it has n elements for somepositive n. Hence, a proper nonempty subset of D has m < n elementsfor some positive m. Since D is infiniteC then by composition of bijec-tive functions there should exist a bijective function between the finitesets of integers 1, 2, · · · , n and 1, 2, · · · ,m. Hence, we arrive at acontradiction and infiniteC ⇒ infiniteN .

Remark A.4. The question whether infiniteN ⇒ infiniteC or not(and hence whether equivalence between both concepts of infinity holds)seems to be undecidable without an additional assumption. We will seein the next lines that such an assumption has unexpected consequences.

172 NLD-draft

A.2.2.1. Some Naive Comments on the Axiom of Choice. The nextdevelopment spawned by Cantor’s approach is exemplified by the thirdstatement in Exercise A.4, namely that there exist infinite sets whichare not pairwise equipotent. In particular, the power set of a set A(the set consisting of all subsets of A) is not equipotent with A. Thisstatement is easy to prove for finiteN sets without any additional as-sumption, since the power set of a finite set with n elements has exactly2n elements. Cantor’s idea allows to produce a proof that holds also forinfiniteC sets. Here enters Cantor’s famous diagonal argument (this isthe path to solve Exercise A.4.3), proving that the set of real numbersin [0, 1] is not denumerable (or not countable, i.e., not equipotent to therationals). Comparing with finite sets, we say that the power set of Ais “larger” than A and that the set or real numbers is “larger” thanthe set of rationals. Having now infinite sets that are not equipotentwith each other, the question arises whether these “infinities of differentsize” can be somehow organised or classified. To answer this questionalso requires the above mentioned additional assumption. Cantor madethis assumption more or less automatically to start with. The idea wasrecognised by Peano and Bolzano among others and finally formulatedclearly for the first time by Zermelo in 1904. In one of its forms it iscalled the Axiom of Choice:

Axiom A.1. For any collection of nonempty sets, there exists a(choice) function on the collection assigning to each set one of its ele-ments.

The “any” is important here since for certain collections, be themfinite or infinite, it is more or less easy to produce choice functionswithout additional assumptions. It is enough to have an unambiguousrule that always selects an identifiable element.

In fact, since the concept of choice seems to be natural for humanbeings and apparently also for other living entities, and since it is alsoprovable in the mathematical sense for finite sets, it seems natural toassume that it holds for arbitrary collections. However, this requires aleap of faith. If we accept the axiom, this means that we believe thatsuch a choice will always be possible, even in the cases where we cannotproduce an explicit choice function or where it will never be producedbecause there is no available rule. To illustrate these ideas, ignore theAxiom for a moment and consider the following choices:

1. On the infinite collection of 2-element sets of the form n, n+1,for all nonnegative integers n, define the choice function as thatpicking the odd number of each 2-element set.

2. Consider the equivalence relation defined on [0, 1] as: a ∼ b ifa − b ∈ Q. All elements on the unit interval belong to justone equivalence class and the union of all equivalence classescoincides with the unit interval. We want a choice function on


the collection of equivalence classes that picks one element outof each equivalence class.

The first choice can be done without difficulties. We can produce theresult of applying this function, namely the set of odd positive integers(in every other set in the collection they appear as the first element, inthe others as the second). We can even “list” this result in a completeand comprehensive way. The second choice is trickier. We may believeit can be done, but we have no rule to produce an example. We cannotprove that it is an impossible task either. However, accepting theAxiom of Choice we can prove that it can be done. Still, we will neverbe able to produce a complete explicit example of it. We know that

e.g., 0,√

22

and π4

belong to different classes, so one choice function couldstart with these numbers, but we cannot produce the “full list” or ageneral rule to pinpoint any function value at will. We have difficultiesalready in explicitly organising the different equivalence classes, exceptfor a few notorious ones.

There are a number of familiar and less familiar results whose proofinvokes the Axiom of Choice (for some of the examples a weaker alter-native is enough). Among them we have:

1. Given an infinite set E of (distinct) real numbers in [0, 1] thefollowing two statements are equivalent: (a) x is an accumula-tion point of E; (b) There exists a sequence xn∞n=1 of distinctelements of E such that limn→∞ xn = x.

2. Every infinite set has a denumerable subset (also: infiniteN ⇒infiniteC).

3. Every vector space has a basis (Hamel Basis Theorem).4. The generalized Cartesian product of a nonempty family of

nonempty sets is nonempty (the Multiplicative Axiom).5. A non-empty complete metric space is not the countable union

of nowhere-dense sets (i.e., sets whose closure has dense com-plement). This is one form of Baire’s Category Theorem.

6. There exists sets on the unit interval that are not measurable inthe Lebesgue sense. The usual example is the Vitali set, definedusing the second choice function exemplified above.

7. The Induction principle can be extended to well-ordered sets(Transfinite Induction), in particular to Cantor’s hierarchies ofinfinites.

8. All sets can be well-ordered (also called Zermelo’s Theorem).9. There exist ways to partition the unit ball in R3 into a finite

number of pieces and reassemble the pieces via rigid motionsin such a way that a new sphere of any size, or even two unitspheres, are produced (Banach-Tarski Paradox, generalising anidea of Hausdorff).

174 NLD-draft

This list is ordered in a non-rigorous way from “very natural state-ments” to “highly unnatural statements”. The acceptance of the Ax-iom of Choice allows to prove that all these statements are correct.

The first one is particularly interesting. With elementary methodsit can be proven that (b)⇒(a). However, to prove the converse (try it!)it is necessary at some point to pick an arbitrary element of an infiniteand highly unspecified subset of E. The need of accepting the possi-bility of performing an arbitrary and unspecified choice is “central” tothe proof. We cannot proceed without it. The interesting fact aboutthis example is that both statements are encountered in elementarymathematics1.

Less “drastic” versions of the Axiom of Choice such as the Axiomof Countable Choice (allowing Choice within countable collections) orthe Axiom of Dependent Choice (which guarantees the existence ofa countable choice set an where an+1 depends on an) are possiblegeneralisations of finite choice that avoid some of the most puzzlingconsequences of Axiom A.1, such as e.g., points 6. or 9. in the listabove. For a nice discussion, see ch 13 in [16.].

The bottom line to take home is that the moment we start deal-ing with infinity (which in mathematics occurs quite frequently) wehave to analyse carefully in which way the (familiar) results from fi-nite mathematics can be extended to infinite sets. The question ofaccepting/rejecting the Axiom of Choice is usually not a central prob-lem for the needs of applied mathematics, since many interesting anduseful results can be produced without taking decisions governed bythis Axiom. However, mathematicians dealing with foundational is-sues, set theory, etc., do have a lot of trouble with it. Lebesgue calledit a false statement while simultaneously Frechet claimed it was a well-known fact. Eventually, different schools of mathematics appeared,accepting or rejecting the Axiom (or some related statement) and itsconsequences.

A.2.3. The Index of a Vector–field. Let v : R2 → R2 be acontinuous vector field with a finite number of singular points. Let xbe a singular point and C be a curve going exactly once (clockwise)around the the singularity x. We also assume that there is no othersingularity than x inside and on the curve C. Let us consider thevectors v(y); y ∈ C. We are interested in the number of completerevolutions this vector makes while we go once around the contour C.We count this number with its orientation: + if the vector rotatesclockwise and − if it rotates counterclockwise. The reader may notice

1However, some apparently harmless restrictions on the set E are enough toprove equivalence within this restricted environment, e.g., if we know that E hasonly one accumulation point.


that this number does not depend on the starting point y0 ∈ C. Thisnumber is called the index of the vector field at the singularity x.

If we “perturb” the curve a little bit (i.e., we make a small modifi-cation of the path of the curve without self intersections), the vectorsv(y) along the path will change continuously since we changed y con-tinuously. The rotation changes therefore continuously. Hence, theindex depends continuously on C as long as we do not cross anothersingularity. Since the index is an integer, it must remain constant aslong as we do not meet another singularity.

Example A.2. Let v1(x, y) = (x, y), v2(x, y) = (y,−x) and v3(x, y) =(y, x) be three vector fields (see Figure A.1).

v1(x, y) = (x, y) v2(x, y) = (y,−x) v3(x, y) = (y, x)

Figure A.1. Three planar vector fields with a singular-ity at the origin.

These vector fields have the following indices at the origin:

ind(0,0)(v1) = 1 ind(0,0)(v2) = 1 ind(0,0)(v3) = −1.

The vector fields in Figure A.2 have indices

ind(0,0)(v4) = 2 ind(0,0)(v5) = 3 ind(0,0)(v6) = −2.

v4 v5 v6

Figure A.2. More planar vector fields with a singular-ity at the origin.

176 NLD-draft

Similarly, we can define the index indv of a non–self–intersectingclosed curve which does not pass through a singular point as the num-ber of rotations (with orientation) that the vector v(y) does while goingaround the curve. This index is also called the winding number of thevector field along the curve.

Theorem A.3. For a non–self–intersecting closed curve C we have

indv(C) =∑

xi singular inside C

indxiv.

Proof. Let us consider the curve C1 which we obtain by continu-ous deformation from C as in Figure A.3.

C

C1

x1

x2

x3

Figure A.3. Two curves with the same index.

C1 has the same index as C since it does not pass through anysingular point along the deformation. Then the straight segments ofC1 ”cancel each other” since we move in opposite directions and theycan be placed arbitrarily close to each other. By continuity of the vectorfield, whatever modification of v along a segment will be compensatedby its companion segment. The curved arcs around each fixed pointadd up exactly to the sum of the indices.

Calling the components of the vector field v = (f1, f2), the index ofa curve C can be computed as a line integral along the curve:

indv(C) =1

2π

∫C

f1df2 − f2df1

f 21 + f 2

2

,

since it is the total angular variation of the vector field along the curve.This relates to Green’s Formula and contour integrals on the complexplane.

Corollary A.1. If the index of a closed non–self–intersectingcurve is non–zero, the curve encloses at least one singular point.

The “negative” reformulation of the previous Corollary reads:

Corollary A.2. If a closed non–self–intersecting curve does notencompass any singular point, its index is zero.


This last result is intuitive for the case of sufficiently small curves.Then by continuity the vector field is almost constant along the curveand its total variation is zero.

Theorem A.4. The index of a periodic orbit of a planar vectorfield is +1.

Proof. The vector field is tangent to the orbit. When circulatingalong the orbit in the direction of the flow, it makes one completerevolution.

Corollary A.3. A periodic orbit of a planar vector field encom-passes at least one singular point.

A.2.3.1. Flows on the Sphere. We abandon for a moment the Eu-clidean plane and consider another 2-d surface, namely the sphere.The next Theorem might be known to the reader in terms of the Eulercharacteristics of the sphere S2 ∈ R3 (we will proceed along these linesbelow). We will interpret it in terms of vector fields on the sphere (i.e.,at each point x of the sphere there is a vector v(x) ∈ R3 tangent toS2).

Theorem A.5. Any continuous vector field on the sphere withfinitely many singular points has sum of the indices equal to 2.

Proof. We state without proof that the index of a curve or sin-gular point is invariant in front of diffeomorphisms (this permits thecomputation of indices on other 2-d manifolds). We consider a non-singular point x on the sphere. This point has a small neighborhoodU(x) where all the vectors are almost the same as v(x) (see Figure A.4,left). So we can choose the boundary of U as our contour C = ∂U(x).The vector field has its singularities outside U(x) and its boundary.

U(x)

x

transformationof the vector field

Figure A.4. A vector field on the sphere.

178 NLD-draft

Consider now the stereographic projection mapping the comple-ment of U(x) to a plane, placed in such a way that x is the “northpole”:

St(θ, φ) =2 sin θ

1− cos θ(cosφ, sinφ).

In particular, the south pole (π, 0) maps to the origin, (0, 0). The map”flattens out” the complement of U(x) to the interior region definedby the large contour St(C). All (eventual) singularities of the originalvector field on the sphere lie now inside this region. The image ofthe (essentially constant) original vector field along C is depicted onFigure A.4, right. The index of this curve 2, and it corresponds to thesum of indices of the singular points of the original field.

The reader may wonder how the curve C encompassing no singularpoints on the sphere could be mapped to a new curve with index 2.The stereographic projection does not map the whole sphere diffeomor-phically on the plane. One point, or a small neighbourhood, is missing(note that the map St above is not defined for θ = 0). Moreover, theinterior region defined by C on the sphere became the exterior on theplane. The sum of indices on the sphere is then 0 + 2, the first termcoming from the region inside C and the other from the rest of thesphere (computed after mapping it diffeomorphically to a planar re-gion by St). Note that the theorem poses no specific condition on thevector field: The value 2 is a geometric property of the sphere, not ofthe dynamics.

In this way one may define the Index at Infinity for a planarvector field by mapping the plane to a sphere, letting infinity map tothe north pole. We have then ind∞ = 2 −

∑k indk where k runs over

all (finitely many) fixed points of the vector field.A.2.3.2. The Hedgehog Theorem. We provide here the proof of the

Hedgehog Theorem stated in Chapter 3. The Theorem can now beseen as a consequence of Theorem A.5.

Theorem 3.7 (Hedgehog). Any continuous vector field on thesphere has a singular point.

Proof. By Theorem A.5 the sum of the indices of the vector fieldis 2. So it must have a singular point, by Corollary A.1.

The name of the Theorem arises from the following metaphor. Con-sider a spheric hedgehog. The quills can be recasted as vectors in 3-space, having one radial component, pointing outwards from the hedge-hog and two tangent components. A smooth outer surface for a hedge-hog quills corresponds then to a continuous planar vector field on thesphere. The corresponding statement is that a smooth hedgehog hasat least one quill along the outer normal vector of its surface.


A.2.4. The Poincare-Hopf Theorem. Theorem A.5 and 3.7 areof a very general character, they hold for any continuous vector field.Indeed, these dynamical results are linked to the underlying topologicalproperties of the sphere, and hold, so to speak, before dynamics is evenspoken of (or rather, independently of the particular nature of thedynamics in study). Poincare realised that these Theorems could beformulated for more general surfaces and proved the Theorem that isnow called Poincare-Hopf Theorem connecting topology and analysis.See [15.] for a clear and simple approach.

Theorem A.6. (Poincare-Hopf) Let Sg be a closed surface withEuler characteristic χ(g). For any vector field on Sg with finitely manycritical points, the sum IS of the indices of the critical points is equalto the Euler characteristic of Sg, namely IS = χ(Sg) = 2− 2g.

We will prepare us with more topological background before provingthe theorem.

A.2.4.1. The Euler Characteristic of a Surface. For a triangle inthe plane, consider the integer computed by taking the number of faces(one in this case) minus edges (three) plus vertices (three). We haveχ(4) = f − e+ v = 1 and this holds for any planar triangle regardlessof shape. Consider now a polygon in the plane, i.e., a closed, simplyconnected region bounded by a finite number of straight lines.

Definition A.15. A triangulation of a polygon is a finite par-tition of the polygon in non-overlapping triangles, glued to each otheralong the edges in such a way that each edge is shared by two adjacenttriangles except the edges along the border of the polygon which belongto one triangle only (see Figure A.5).

Figure A.5. A triangulation of a square.

Remark A.5. (a) All the original vertices of the polygon are ver-tices of the triangulation, but there may exist other vertices inside thepolygon or along the original sides.

180 NLD-draft

(b) For a closed compact surface (such as e.g., a sphere) each edge ina triangulation is shared by exactly two triangles (the surface has noboundary and hence there are no boundary triangles).

Lemma A.3. For any triangulation of a polygon, the sum f − e+ vof faces, edges and vertices equals one.

Proof. We proceed by induction. Given a triangulation of a poly-gon, let us pick one triangle arbitrarily and build the covering by addingadjacent triangles to it one at a time by gluing them to free edges, insuch a way that at all intermediate steps we have a simply connectedfigure. The first triangle has f − e + v = 1. We will show that if at astep k in the construction we have fk − ek + vk = 1 then the additionof another triangle does not change this fact.

There are exactly two ways to add a triangle to a collection oftriangles, see Figure A.6. In the first case, the addition generates one

Case 1 Case 2

Figure A.6. Adding a triangle to a polygonal collectionof triangles.

new face, one new edge and no new vertices. Hence if fk − ek + vk = 1before the addition, now we have fk + 1 − (ek + 1) + vk = 1. For thesecond case, we add one new face, two new edges and one new vertex.Thus, we obtain fk + 1− (ek + 2) + vk + 1 = 1.

Lemma A.4. A (sub)triangulation of a triangle generated by (a)adding a new vertex along one side or (b) adding a new vertex at theinterior of the triangle, does not alter the sum f − e+ v.

Proof. In case (a) the edge where the vertex lies turns into twoedges. To complete the triangulation, along with the new vertex wehave to add an edge linking it to the opposite vertex of the triangle.The original face turns then into two as well. f − e+v = 2−5 + 4 = 1.In case (b) we add three edges from the new vertex to the original onesand get three faces instead of one. Hence, f−e+v = 3−6+4 = 1.

Definition A.16. The Euler characteristic of a surface is thequantity χ(S) = f − e + v computed over any triangulation of thesurface.


Remark A.6. (a) We have shown that the Euler characteristic ofa planar polygon is equal to one, regardless of the choice of triangula-tion. (b) The Euler characteristic is invariant under deformations ofa triangulation that do not alter the numbers f , e and v.

Lemma A.5. For a sphere S, χ(S) = 2.

Proof. Take an arbitrary triangulation for the sphere S. Pick onetriangle and generate a hole on the sphere by eliminating the face of thistriangle. Deform now the sphere to a polygon P on the plane having theedges of this triangle as boundaries, keeping the rest of the triangulationunchanged (except for the deformation). Since the triangulation of Shas one more face than that of P we have χ(S) = 1 + χ(P ) = 2.

Lemma A.6. For a torus T , χ(T ) = 0.

Proof. Take a triangulation of the torus. Let us “cut” this surfacein order to obtain a rectangle R. We need a longitudinal cut along the“equator” of the torus and a transversal cut as well. See Figure A.7.

Torus

Surface of genus 2

Figure A.7. Examples of “surface surgery” in order tocompute χ(Sg). Magenta lines: equatorial cuts. Bluelines: transversal cuts.

Generate a sub-triangulation such that the two cut lines consist ofedges and vertices of the triangulation. The triangulation ofR coincideswith that of T except that the vertices and edges along the cutting linesare duplicated since the two cutting lines are now the four edges of therectangle. Let et ≥ 1 be the edges along the top side of the rectangleand el ≥ 1 those along the left side. Let vt = et + 1, vl = el be thecorresponding vertices (the vertex at the upper left corner has to becounted only once). Hence, fT − eT + vT = fR − (eR − et − el) + (vR −(et + 1 + el)) = fR − eR + vR − 1 = 0.

Definition A.17. For a closed surface we define the genus g of thesurface as the number of holes. In particular, the genus of the sphereis zero and that of the torus is one.

Exercise A.6. For a closed surface Sg of genus g ≥ 0, χ(Sg) =2(1− g). Hint: This Exercise follows the ideas of the torus proof. By

182 NLD-draft

adequate “surgery”, the surface can be stretched out into a polygon of4g sides (each torus-like part of the surface contributes with 4 sides)without altering the triangulation other than duplicating vertices andedges along the cutting lines (as in the case of the torus). To obtainχ(Sg) from this polygon we have to identify edges pairwise as in theprevious Lemma. Counting the cutting lines and the excess verticesand edges, the result is obtained. See Figure A.7.

Definition A.18. Barycentric Sub-triangulation. For a giventriangulation, consider the sub-triangulation produced by drawing themedians of each side, i.e., the lines going from the center of one edgeto the opposite vertex. A standard result of planar geometry is that themedians cross at one point inside the triangle. The midpoint of eachside and the crossing point become new vertices in the sub-triangulation.Hence, this triangulation modifies (f, e, v) of each triangle in the fol-lowing way, see Figure A.8: (1, 3, 3)→ (6, 12, 7).

Figure A.8. The barycentric sub-triangulation and aflow on top of it. White dots: added vertices.

Proposition A.2. For any closed surface Sg of genus g there existsa vector field F with index IF such that IF = χ(Sg) = 2− 2g.

Proof. Take an arbitrary triangulation of Sg and produce a barycen-tric sub-triangulation on each triangle. Generate a vector field F on itby letting the vertex inside each original face become an stable fixedpoint, while each original vertex becomes an unstable fixed point andthe mid-side added vertices become saddle points. Let all edges becomeheteroclinic orbits connecting the fixed points such that an outgoingheteroclinic connects (a) each original vertex to the adjacent vertices(mid-side and in-face for each original triangle sharing the vertex) and(b) each mid-side vertex connects to the in-face vertex of each of thetwo triangles sharing the original edge along a (half) median, see Fig-ure A.8.

A.3. FIXED POINT THEOREMS 183

The original triangulation has Euler characteristic χ(Sg) = f − e+v = 2−2g. Let us now compute the index of the vector field F definedabove. There are v unstable nodes, f stable nodes and e saddle fixedpoints. Hence, the index becomes IF = f + v − e = χ(Sg).

A.2.4.2. Proof of Poincare-Hopf Theorem. It remains to be proventhat an arbitrary vector field V satisfying the conditions of the Theoremhas the same index as the standard vector field F . The argument usedis a variation of the one presented for Theorem 3.7. Since we have afinite number of singular points there exists a point x0 in the surfacewhere both V and F are non singular. By smooth deformations of theoriginal vector field without altering the index we can even make bothvector fields to coincide at x0. By additional smooth deformationswe can “move” the singular points to the interior of a small regionbounded by a simple closed curve C, see Figure A.9. Let us compute

• x0

· · ·

C

Figure A.9. All fixed points lie in the small regionbounded by C.

the difference of winding numbers, indVv − indFv along a small circlearound x0. Since the winding number is an integer, this difference iszero. Expand now the small circle. As long as this expansion does notencounter any singularity, the index integral will be unchanged. Hence,we can move the circle all the way until it coincides with C. Along Cwe have indVv (C)− indFv (C) = 0 = IV − IF .

A.3. Fixed Point Theorems

A.3.1. The Banach Fixed Point Theorem. The Banach FixedPoint Theorem is one of the basic theorems which helps to prove exis-tence and uniqueness results in a broad generality.

It is of special interest for us since it is also a statement about adynamical system. Moreover, the theorem is useful in a large numberof branches of mathematics and science.

Theorem A.7 (Banach’s Fixed Point Theorem). Let X be a com-plete metric space and f : X → X a contraction, i.e., a map such that

184 NLD-draft

there exists a number 0 ≤ λ < 1 such that

d(f(x), f(y)) ≤ λd(x, y)

for all x, y ∈ X. Then the function f has a unique fixed point x = f(x)on X.

Proof. First we remark that because of its definition, a contrac-tion is always a continuous Lipschitz function.

Let us fix a point x0 ∈ X. We will show first that its trajectoryx0, f(x0), · · · fn(x0), · · · forms a Cauchy sequence. We have, for n ∈N

d(fn(x0), fn+1(x0)) ≤ λd(fn−1(x0), fn(x0)) ≤ λnd(x0, f(x0)).

Hence, for m > n

d(fn(x0), fm(x0)) ≤ d(fn(x0), fn+1(x0)) + d(fn+1(x0), fn+2(x0)) + · · ·+ d(fm−1(x0), fm(x0))

=m−n−1∑k=0

d(fn+k(x0), fn+k+1(x0))

≤m−n−1∑k=0

λn+kd(x0, f(x0))

= λn1

1− λd(x0, f(x0))→ 0, as n→∞.

The last equality holds since d(x0, f(x0)) is finite and λ < 1, hence thesum is essentially a partial sum of the geometric series. This meansthat the trajectory of any point is a Cauchy sequence.

Since X is a complete metric space, this sequence is convergent.Let x ∈ X be its limit. Then

f(x) = f( limn→∞

fn(x0)) = limn→∞

f(fn(x0)) = limn→∞

fn+1(x0) = x

thus proving that x is a fixed point.It remains to prove that the fixed point is unique. Let us assume

that y ∈ X is also a fixed point. We will show that y coincides with x.Indeed, since both points are assumed to be fixed points, we have

d(x, y) = d(f(x), f(y)) ≤ λd(x, y).

Since λ < 1 this implies that d(x, y) = 0 or x = y.

To gain some experience with Banach’s Theorem, let us try it insome exercises.

A.4. LINEAR ALGEBRA AND MATRIX ALGEBRA 185

A.3.2. Exercises.

Exercise A.7. Compute the fixed point of the sequence

(a) xn+1 =√

1 + xn, x0 = 0.(b) xn+1 = 1

2(xn + 3

xn), x0 = 1.

(c) What happens if we change the initial condition?

Exercise A.8. Show that the sequence an+1 = 14

+ an− an2, n ≥ 0converges for all initial conditions a0 ∈ [0, 1]. Compute its limit.

A.3.3. The 1–dimensional Fixed Point Theorem.

Theorem A.8. Let f : [0, 1] → [0, 1] be continuous. Then f has afixed point.

Proof. If 0 or 1 are fixed points, we are done. Otherwise, considerthe function g(x) = f(x) − x. We have g(0) > 0 > g(1). Since g isalso a continuous function, by the Mean Value Theorem and Rolle’sTheorem, there exists a point x0 such that g(x0) = f(x0)−x0 = 0.

A.3.4. Brouwer’s Fixed Point Theorem.

Theorem A.9. Let f : D2 → D2 be a continuous map of the closeddisc into itself. Then f has a fixed point.

Proof. Assume there is no fixed point. Let v0(x) = xf(x), i.e., thevector connecting x with its image f(x). On the boundary of the discwe consider the vector fields v1(x) = xx− where x− is the diametricallyopposite point to x. For 0 ≤ t ≤ 1 we can consider the vector fieldvt(x) = xxt where xt = t · f(x) + (1− t)x− (see Figure A.10)

None of the vectors vt is zero (x 6= f(x) for all x). hence vt is acontinuous deformation of v0 into v1. This implies that

indv0S1 = indv1S1 = 1.

By Corollary A.1 inside S1 the vector field v0 must have a singularpoint, i.e., x = f(x) contradicting our assumption.

A.4. Linear Algebra and Matrix Algebra

Understanding square matrices is a fundamental ingredient of Dy-namical Systems Theory. In many cases, simply looking at the lin-earization of a system near a singular point is enough to understand itsqualitative behaviour. We will deal with square real matrices through-out, although many results hold for complex matrices as well.

A.4.1. Basic Results on Square Matrices and Linear Equa-tions. The following results are part of the Main Theorem in basicLinear Algebra courses. We state it here without proof.

Theorem A.10. Let A be a square real n×n matrix. The followingstatements are equivalent:

186 NLD-draft

f(x)

x

x−

v0 vt v1

Figure A.10. Brouwer’s fixed point theorem on theunit disc.

• A is invertible• detA 6= 0• The equation Ax = b has unique solution for any vector b ∈ Rn

• The equation Ax = 0 has only the zero solution x = 0• The columns of A are linearly independent• The rows of A are linearly independent

A.4.2. Eigenvalues and Eigenvectors. For a square n× n realmatrix, the linear equation (A − λI)x = 0 has (possibly complex-valued) nonzero solutions x ∈ Rn (or Cn) if and only if det(A−λI) = 0(see the previous Theorem). The latter is a real polynomial equationand by the Fundamental Theorem of Algebra, it has exactly n complexroots, when counted with their multiplicity. This polynomial is calledthe characteristic polynomial of the matrix A.

Definition A.19. Each distinct root of the polynomial equationdet(A−λI) = 0 is called an eigenvalue. The multiplicity of an eigen-value regarded as a root of the characteristic polynomial is called alge-braic multiplicity. Any member of the family of associated nonzerovectors xλ for each eigenvalue is called an eigenvector.

A.4.3. Cayley-Hamilton Theorem and Other Results onMatrices. Other basic results from Linear Algebra include the fol-lowing:

• A is invertible if and only if all its eigenvalues are nonzero.• Each eigenvalue has at least one eigenvector. For a given eigen-

value λi, let gi ≥ 1, the geometric multiplicity, denote the


number of associated linearly independent eigenvectors (notethat gi = dim ker(A− λiI)).• Different eigenvalues have linearly independent eigenvectors.• Dimension Theorem: For an n × n matrix A, dim ker(A) +dim ran(A) = n, where ran(A) = y ∈ Rn : ∃x : y = Ax andker(A) = x ∈ Rn : Ax = 0.• adj(A) · A = A · adj(A) = det(A) · I.

Theorem A.11 (Cayley-Hamilton Theorem). Let the characteris-tic polynomial of the square matrix A be pA(λ) = det(A − λI). Then,pA(A) = 0 (the null matrix).

Algebraic Proof. Consider (A − λI) · adj(A − λI) = pA(λ) I.The entries of the matrix adj(A− λI) are polynomials of degree n− 1in λ. Hence, we may rewrite this matrix as adj(A− λI) =

∑n−1i=0 λ

iBi,where all matrices Bi commute with A. Hence,

(A−λI)·adj(A−λI) = −λnBn−1+n−1∑i=1

λi(ABi−Bi−1)+AB0 = pA(λ) I.

This polynomial equality holds if and only if all (matrix) coefficients ofthe polynomials at each side coincide, i.e., −Bn−1 = I, ABi−Bi−1 = ciIand AB0 = c0I, where pA(λ) = c0 + c1λ+ · · ·+ λn. Equality still holdsafter left-multiplying each equality by Ai and summing up. Hence,AnBn−1 +

∑n−1i=1 A

i(Bi−1 − ABi)− AB0 = 0 = pA(A).

Analytic Proof. It is immediate to verify that pA(A) = 0 ifA is a diagonal matrix and also if A is diagonalisable. For a generalmatrix, since diagonalisable matrices are dense in the set of all matrices,consider a sequence Ak of diagonal matrices having A as limit. Since thestatement of the Theorem is closed, pA(A) = limk→∞ pAk(Ak) = 0.

A.4.4. The Jordan Canonical Form Theorem. Let us con-sider now further results related to eigenvalues and eigenvectors as apreparation for the Jordan Normal Form Theorem, which we discussbelow.

Lemma A.7. There exist positive integers ki such that the set ofeigenvalues of the matrix A decompose the vector space V = Rn (or Cn)as a direct sum of invariant subspaces:

V = ⊕i ker((A− λiI)ki).

Proof. For each eigenvalue λi, consider the invariant subspacesker(A − λiI) (spanned by a set of linearly independent eigenvectorsassociated to λi) and ran(A − λiI), of V. If there exists a nonzerovector belonging to both subspaces, then some vector x ∈ V that is notin ker(A− λiI) is mapped by A− λiI onto ker(A− λiI) (onto a linearcombination of the eigenvectors associated to λi). Hence, we have thatker(A − λiI) ⊂ ker((A − λiI)2) (strict inclusion). This argument can

188 NLD-draft

be finitely repeated: there exists a maximal value ki ≥ 1 such thatker((A− λiI)ki) ∩ ran((A− λiI)ki) = 0, while for all integers j > 0,ker((A−λiI)ki) = ker((A−λiI)ki+j). Equality holds also for the rangeof these matrices, which shows that ran((A− λiI)ki) does not containeigenvectors associated to λi.

Hence, the invariant subspaces ker((A−λiI)ki) and ran((A−λiI)ki)decompose V as a direct sum, by the Dimension Theorem. Moreover,they are invariant under the action of A. We can hence further decom-pose ran((A−λiI)ki) using another eigenvalue. It remains to be shownthat when the set of eigenvalues is exhausted, the “residual” range con-tains only the zero vector. Assume that ran(

∏i(A−λiI)ki) has positive

dimension. This subspace is A-invariant and hence it should containan eigenvector to A which is not associated to any of its eigenvalues,but this would contradict the exhaustion.

Remark A.7. (i) The chain of subspace inclusions ker(A− λiI) ⊂· · · ⊂ ker((A − λiI)ki) is strict, i.e., each subspace has strictly largerdimension than the previous one. (ii) Restricted to the invariant sub-space ker((A − λiI)ki), the matrix (A − λiI)ki is the null matrix. (iii)We will show below that mi = dim ker((A−λiI)ki) equals the algebraicmultiplicity of λi.

Definition A.20. A Jordan block Jd of dimension d is a squared× d matrix of the following form:

Jd =

λ 1 0

0 λ. . .

.... . . . . . 1

0 · · · 0 λ

In other words, apart from λ along the diagonal and from 1 along thefirst upper sub-diagonal, all other matrix elements are zero. Jordanblocks of dimension 1 are trivially 1 × 1 (diagonal) matrices having λas the only matrix element.

Theorem A.12 (Jordan Canonical Form). Every square real matrixA can be reduced to a canonical form by a similarity transformation S:

J = S−1AS,

where J consists of Jordan blocks lying along the diagonal, each blockwith one of the eigenvalues of A along its diagonal. Jordan blocksassociated to a given eigenvalue have dimensions that add up to thealgebraic multiplicity of the eigenvalue.

Remark A.8. When the algebraic and geometric multiplicity of aneigenvalue λi coincide, there are gi linearly independent eigenvectors(note also that from mi = gi it follows that ki = 1). The matrix Scan be constructed by placing each eigenvector as a column of S. The


corresponding Jordan blocks are of dimension 1, and hence diagonalmatrices. If this occurs for all eigenvalues, we say that A is diagonal-isable.

Before addressing the full problem let us consider the case of 2× 2matrices to gain insight.

Lemma A.8. Every 2 × 2 real matrix A can be reduced to Jordancanonical form by a similarity transformation S:

J = S−1AS,

where J has the eigenvalue(s) of A in the diagonal, zero in the loweroff–diagonal element and 0 or 1 in the upper one, the latter case if andonly if A has only one eigenvalue and one eigenvector.

Proof. The characteristic polynomial is of degree 2 and hence ithas either one or two distinct roots. In the latter case, we have twolinearly independent eigenvectors and the matrix S can be built byplacing these eigenvectors as columns. J is diagonal, as mentionedin the previous remark. If we have a double eigenvalue, then if itsgeometric multiplicity is two, we still have two linearly independenteigenvectors (constituting a base for the null-space of A − λI) as be-fore and J is diagonal. The remaining situation is when we have oneeigenvalue and only one linearly independent eigenvector x1.

Let x2 6= cx1 be a vector linearly independent of x1. There existconstants c1, c2 such that (A− λI)x2 = c1x1 + c2x2, since (x1, x2) is abasis. Let b = (A− λI)x2, which is a nonzero vector. Hence,

(A− λI)b = (A− λI)(c1x1 + c2x2) = c2(A− λI)x2 = c2b.

Therefore, b is an eigenvector of A, since Ab = (λ+c2)b. But A has onlyone eigenvalue and only one linearly independent eigenvector, whichmeans that c2 = 0 and (by suitably scaling x2 if necessary) b = x1.We have then that Ax2 = x1 + λx2. Let S = (x1, x2) be the invertiblematrix having x1 and x2 as columns. Then, by direct computation weobtain,

AS = A(x1, x2) = (λx1, x1 + λx2) = (x1, x2)

(λ 10 λ

)= SJ.

Remark A.9. Note that (A − λI)2 = 0 since (A − λI)2x = 0 forany vector x. A matrix B such that for some positive integer n > 1satisfies Bn = 0 is called nilpotent. The reader may want to verifythat in the definition of Jordan block, (Jd − λId)d = 0.

Proof of Jordan Theorem. Because of Lemma A.7, we mayrestrict the study to each invariant subspace ker((A−λiI)ki). We con-sider the simplest situation first, i.e., when g = 1 (only one eigenvector).We also drop the index i for clarity.

190 NLD-draft

By Remark A.7, we may reverse the ordering of the subspaces in thechain, i.e., we construct the set K = y, (A−λI)y, · · · , (A−λI)k−1y,where y ∈ ker((A − λI)k)\ ker((A − λI)k−1) (unless k = 1 and Kconsists of just one (eigen)vector). We also notice that the last vectorof the chain K is an eigenvector. We claim that the set is linearlyindependent.

Consider 0 = c0y + · · ·+ ck−1(A− λI)k−1y. Multiplying on the leftby (A− λI)k−1 we find that c0 = 0 (since all vectors other than y aremapped to zero). Multiplying by (A− λI)k−2 we find that also c1 = 0.Continuing in the same way we find that all coefficients are zero.

Finally, we show that since there exists only one eigenvector, thenk = m, i.e., the set K is a basis for ker((A − λI)k). Assume thecontrary, i.e., that along with K there exists an additional linearlyindependent vector b in the basis. Still, (A − λI)kb = 0. Assume,without loss of generality, that z = (A − λI)k−1b 6= 0, then z hasto be the (unique up to a nonzero constant) eigenvector in the setK. Hence, (A − λI)k−1(b − a0y) = 0. Again, since there is only oneeigenvector, (A− λI)k−2(b− a0y) has to be this eigenvector and hence(A−λI)k−2(b−a0y−a1(A−λI)y) = 0. Proceeding in the same way, werealise that (A−λI)(b−a0y−a1(A−λI)y−· · ·−ak−2(A−λI)k−2y) = 0,thus contradicting the assumption that b was linearly independent withthe set K.

Summing up, building with the vectors of K the invertible matrixS, we can now verify by direct computation as in the case n = 2 thatrestricted to the invariant subspace ker((A−λI)k), AS = SJ with J aJordan block. By manually computing det(A − λI) we realise that mequals the algebraic multiplicity of λ.

For the case g > 1, the proof along these lines gets cumbersome,but not impossible. We construct a basis for ker((A − λI)k) in thefollowing way: (i) Pick a vector y1 ∈ ker((A− λI)k)\ ker((A− λI)k−1)and construct a chain K1 ending at an eigenvector. This chain haslinearly independent vectors for the same reason as above. (ii) Considernow the invariant subspace ker((A − λI)k) ∩ (span(K1))⊥ and startall over again, i.e., compute k2 (may be smaller than k), pick a vectory2 ∈ ker((A−λI)k2)\ ker((A−λI)k2−1)∩(span(K1))⊥ (it is importantthat y2 lies in the orthogonal complement of the subspace spanned byK1). In this way we generate a new linearly independent chain K2

ending in another eigenvector. (iii) Repeat this procedure inductivelyand generate new chains. Since m = dim ker((A− λI)k) is finite, theprocess comes to an end when we ran out of eigenvectors.

Summing up now, we will have as many chains as the geometricmultiplicity of λ since each chain ends in a linearly independent eigen-vector. Some chains have length k, others may have smaller length(even length 1) and the sum of all lengths for all chains is m, since the


set of vectors thus constructed is linearly independent and exhaustsker((A−λI)k). This argument is inspired in the following proposition:

Proposition A.3. Every nilpotent operator B has a chain basis forran(B) (i.e., a basis consisting of chains of vectors vj, Bvj, · · · , Bsjvj,where the last vector in each chain is one of the linearly independenteigenvectors of B lying in ran(B)).

A proof of this result can be found in the book by Holst and Uf-narovski, [7.].

Using the chain basis constructed above, the matrix A is splittedup into diagonal blocks, one for each eigenvalue. Each of these blockshas dimension mi and consists in a juxtaposition of gi Jordan blocksof different dimensions, having λ in the diagonal and ones in the upperdiagonal (for the chains of length larger than one). Again, computingby hand det(A−λI) in this basis, we realise that mi equals the algebraicmultiplicity of λi.

A.4.5. The Perron–Frobenius Theorem.

Theorem A.13 (Perron–Frobenius). Let A be a n× n matrix withstrictly positive entries. Then

• A has a positive eigenvalue r.• There is a strictly positive eigenvector associated to the eigen-

value r.• No other eigenvector has strictly positive coordinates.• The eigenvalue r is simple (only one eigenvector and Jordan

block of dimension one).• All other eigenvalues ω satisfy |ω| < r.

Proof. Consider the set T ⊂ Rn of nonnegative rays (of vectors).Since A is strictly positive, A(T) ⊂ T, moreover, ∂T (i.e., nonnegativerays with some zero components) is mapped onto strictly positive rays.By the Brower fixed point theorem, there is a fixed point of A on T,i.e., there exists a positive eigenvector for A with positive eigenvalue.This proves the first two statements. We call the eigenvalue r and thepositive eigenvector v.

The matrix AT has also the positive eigenvalue r with positive eigen-vector u. This means that uTA = ruT . Hence, all positive eigenvectorsw of A have the same eigenvalue r. Indeed, let Aw = µw, for somepositive eigenvector w. Then uTAw = µuTw = ruTw. Since uTw > 0,µ = r.

Assume now that there exists an invariant plane π associated tor containing the eigenvector v. Consider the circle S1 on this planeand let L be the arc of nonnegative rays on π ∩ S1. The endpoints ofL map onto positive rays, hence L and S1 are not pointwise invariantby A (after eventually rescaling along rays). Hence, r has only one

192 NLD-draft

eigenvector (i.e., its geometric multiplicity is one). This proves thethird statement and part of the fourth one.

To complete the proof of the fourth statement, we will show thatthe equation (A − rI)u = v has no solutions (and therefore, the alge-braic multiplicity of r is also one). The matrix A − rI has negativeentries in the diagonal, since from the eigenvalue equation we obtainr = aii+

∑j 6=i aijvj/vi. Let us now solve the equation (A−rI)x = 0 by

Gauss elimination. We know that this equation has a unique strictlypositive solution v up to a constant factor. Pivoting by the first rowamounts to adding positive numbers to rows and columns 2 to n inA− rI. The resulting sub-matrix must still have negative diagonal en-tries, otherwise the eigenvalue equation cannot have a positive solution.Repeating this argument we realise that the echelon form of A−rI hasnegative diagonal elements, except the last one that is zero (only onerow of zeroes because the geometric multiplicity is one) and positiveupper-diagonal elements . Attempting now to solve (A− rI)u = v byGauss elimination, the elimination procedure adds positive terms toeach entry in v. Hence the equation corresponding to the last row hasno solutions.

Let us split the proof of the final statement of the theorem in twoparts. Let µ be a real eigenvalue of A with eigenvector w and uT

be the left eigenvector of A associated to r. It is clear from earlierarguments that eigenvector w must have negative entries. Let |w| bethe (nonnegative) vector obtained from w by taking the absolute valueof each entry. We have |µ||w| = |µw| = |Aw| < A|w|. By multiplicationwith uT we obtain: |µ| uT |w| < uTA|w| = ruT |w|, hence r > |µ| sinceuT |w| > 0.

Concerning complex eigenvalues η, the previous argument showsthat |η| ≤ r. Assume there exists a true complex eigenvalue whereequality holds. Let w be the associated eigenvector and |w| the non-negative vector obtained by taking the absolute value of each entryin w. First, we note that A|w| = r|w|. Indeed, r|w| = |ηw| =|Aw| ≤ A|w|. Hence, z = A|w| − r|w| is nonnegative. However,uT z = uTA|w| − ruT |w| = 0, what proves that z is the null vector.This proves in addition that |w| = v. Moreover, since A|w| = |Aw|,equality holds in the triangular inequality. Therefore, v = |w| = eiθw,which is a contradiction since v and w should be linearly independent.Then strict inequality holds, namely |η| < r.

A.4.6. On the Eigenvalues of 2× 2 matrices. Real 2× 2 ma-trices are particularly simple and many stability calculations for mapsfrom Chapter 5 can be done with paper and pencil. Let the trace (respdeterminant) of such a matrix be T (resp D). The characteristic equa-tion becomes: λ2 − λT + D = 0. We will consider two examples: (a)the condition to obtain one eigenvalue with modulus larger than one

A.5. LINEAR OPERATORS 193

and the other eigenvalue with modulus smaller than one, and also (b)the condition giving both eigenvalues with modulus smaller than one.

A.4.6.1. (a) Unstable (saddle) singular points. Whenever T 2/4 = 0

or T 2/4 ≤ D then the modulus of both eigenvalues is |λi| =√|D|, so

they cannot have different size. Hence, to obtain eigenvalues of dif-ferent modulus it is necessary that T 2/4 > max(D, 0). Moreover, theeigenvalues are then real. For positive T the largest-modulus eigen-value is positive and for negative T it is negative. Hence the proposedcondition reads:

|T |2

+

√T 2

4−D > 1 >

∣∣∣∣∣ |T |2−√T 2

4−D

∣∣∣∣∣ .For positive D the smaller eigenvalue is positive and the conditionabove reads: √

T 2

4−D >

∣∣∣∣1− |T |2

∣∣∣∣ ,which leads to |T | > 1 + D = |1 + D|. For negative D the conditioncan be recasted as:

|T |2

+

√T 2

4−D > 1 >

√T 2

4−D − |T |

2⇒ |T |

2>

∣∣∣∣∣1−√T 2

4−D

∣∣∣∣∣ ,which after squaring a couple of times can be written as T 2 > (1 +D)2, and hence |T | > |1 + D|. Hence, in either case the condition forhaving one eigenvalue with modulus larger than one and the other withmodulus smaller than one reads: |T | > |1+D| and T 2/4 > max(D, 0).

A.4.6.2. (b) Stable singular points. Following the same reasoning

as above, whenever T 2/4 < D both eigenvalues have modulus√D.

Hence, the stability condition reads in this case 1 > D > T 2/4 > 0.Otherwise, when T 2/4 ≥ D the eigenvalues are real and the largest-modulus eigenvalue satisfies:

|λ1| =|T |2

+

√T 2

4−D.

In this case, the stability condition is 1 > T 2/4 ≥ D > |T | − 1 ≥ −1.

A.5. Linear operators

A.5.1. Operator norm. When matrices are regarded as linearmappings between finite-dimensional vector spaces there is a naturalgeneralisation to arbitrary vector spaces through the concept of linearoperators. They are just continuous mappings between vector spaces.Spaces of linear operators are also vector spaces and a norm can bedefined on them following the intuition from matrix norm.

194 NLD-draft

Definition A.21 (Operator Norm). A linear operator A : X → Xhas norm

‖A‖ := sup‖x‖=1

‖A(x)‖.

Again, the norm of the operator is defined through vector norms.

A.5.2. Norm Estimates.

Theorem A.14. Let L, G be linear operators in a Banach space Ewith ‖L‖ ≤ a < 1 and ‖G−1‖ ≤ a < 1 and Id be the identity operatorId(x) = x. Then the operators Id +L and Id +G are isomorphisms with

‖(Id +L)−1‖ ≤ 1

1− a, ‖(Id +G)−1‖ ≤ a

1− a.

Proof. For a fixed element y ∈ E define the map

F (x) := y − L(x).

Then

‖F (x1)− F (x2)‖ = ‖L(x1)− L(x2)‖ ≤ a‖x1 − x2‖.By the Banach Fixed Point Theorem there is a unique solution to thefixed point problem x = F (x). Hence for each y there is a unique x suchthat x = F (x) = y − L(x), or equivalently, the equation (Id +L)x = yhas a unique solution x. Therefore Id +L is a bijection.

We now want to estimate the norm of (Id +L)−1. We have

‖(Id +L)−1‖ = sup‖x+L(x)‖=1

‖x‖.

But we have1 = ‖x+ L(x)‖ ≥ ‖x‖ − a‖x‖.

This implies that ‖(Id +L)−1‖ ≤ 11−a .

For the second statement we remark that

Id +G = G(Id +G−1

).

By the above considerations for L = G−1 we have that Id +G−1 is abijection and also G is invertible since G−1 is assumed to exist. Then

‖(Id +G)−1‖ ≤ ‖(Id +G−1

)−1 ‖ · ‖G−1‖ ≤ a

1− a.

A.6. Algebra

A.6.1. Closed subgroups of the real numbers. We will usethe following fact about closed subgroups of the real numbers.

Definition A.22. A set G is called an additive group if thereexists an operation (·, ·) : G × G → G; (a, b) = a + b ∈ G calledaddition defined with the following properties

1. Associativity: a+ (b+ c) = (a+ b) + c for all a, b, c ∈ G,

A.6. ALGEBRA 195

2. Neutral element: There is an element 0 ∈ G such that

a+ 0 = 0 + a = a for all a ∈ G.

3. Inverse element: For any a ∈ G there is an element −a ∈ Gwith

a+ (−a) = −a+ a = 0.

Remark A.10. The real numbers form an additive group with theusual addition. Moreover, the real numbers are a topological space.Therefore we can talk about open and closed sets.

Theorem A.15. Any proper closed subgroup of R as an additivegroup has the form τ ·Z, where τ ∈ R. Such groups are called lattices.

Proof. We will proceed in several steps.Step 1. Any subgroup G of R without a smallest positive element isdense in R.

Let us fix a number r ∈ R and a number ε > 0. Since there isno smallest positive element we can find a sequence G 3 gn → 0 withgn > 0. Let m ∈ N. Obviously we have mgn ∈ G. We want to showthat for some numbers m,n ∈ N we have mgn ∈ (r − ε, r + ε). Since rand ε are arbitrary, this implies that G is dense in R. The last inclusioncan be rewritten as

gn ∈(r − εm

,r + ε

m

).

If we show that

gn ∩⋃m∈N

(r − εm

,r + ε

m

)6= ∅, (A.1)

then we are sure that some element mgn ∈ G lies closer to r than ε.Now we have that for m > r−ε

2ε

r − εm

<r + ε

m+ 1.

This means that two consecutive intervals overlap and hence⋃m∈N

(r − εm

,r + ε

m

)⊃ (0, 2ε).

Since 0 < gn → 0 for n large enough we have gn < 2ε and eq.(A.1) isproved.

Step 2. Since G is closed and proper it is not dense. Hence, we knowby step 1 that G has a smallest element τ > 0. Clearly τ · Z ⊂ G.Assume that there is a number g ∈ G \ τ · Z. If we set

n = max k ∈ Z : kτ < gthen

g = nτ + r, 0 < r < τ.

But r = g − nτ ∈ G contradicting that τ is the smallest element.

196 NLD-draft

A.7. The Liouville Theorem of Hamiltonian Dynamics

Towards the second half of the 19th. century there were a lot ofdevelopments arising from the original work of Newton in Mechanics,with contributions by Euler, D’Alembert, Lagrange and many others,culminating with the work by Hamilton on the equations of motion.We develop here the tools for proving the Liouville Theorem on fullyintegrable hamiltonian systems.

A.7.1. Example: The Harmonic Oscillator. The most thor-oughly described example in Classical Mechanics regards the motionof a point particle of constant mass m along a straight line, subjectto a force F = −kx, where x is the displacement of the particle alongthe line from an equilibrium position and k is a constant. Newton’sequation reads

md2x

dt2= −kx.

With the introduction of the auxiliary variable p = mdx

dtthe equation

turns into the dynamical system

dx

dt=

p

mdp

dt= −kx

The auxiliary function H(x, p) =1

2

(p2

m+ kx2

), called the Hamilton-

ian has two interesting properties,

• H is constant along trajectories, i.e.,

dH

dt=

∂H

∂x

dx

dt+∂H

∂p

dp

dt

= kxp

m+p

m(−kx) = 0.

• The equations of motion can be rephrased as

dx

dt=

∂H

∂p

dp

dt= −∂H

∂x

Another interesting fact is that there exist coordinate transformations(called canonical transformations) (x, p) → (r, s) such that they pre-serve the above structure. In other words, H(r, s) = H(x(r, s), p(r, s))

A.7. THE LIOUVILLE THEOREM OF HAMILTONIAN DYNAMICS 197

is still a constant along trajectories and the dynamical equations in thenew coordinates read

dr

dt=

∂H

∂sds

dt= −∂H

∂r

Consider the new coordinate J =

∮pdx where the integral is taken

along a trajectory. With the subtitution x =

√2E

ksin θ and integrat-

ing θ ∈ [0, 2π] we obtain J = 2πH

√m

k. Where H = E is the value of

H along the given trajectory. Let W be the companion coordinate of

J . Whatever W is, we will have that∂H

∂W= 0 since we have already

managed to express H =

√k

m

J

2πas a function of J only, which also

turns to be a constant along trajectories. The new equations of motionread

dW

dt=

∂H

∂J=

1

2π

√k

mdJ

dt= − ∂H

∂W= 0

with immediate solution.The idea behind Liouville Theorem is to state the conditions in

which a 2n-dimensional dynamical system of this kind will have imme-diate solution.

A.7.2. Liouville Theorem.

Definition A.23. An autonomous 2n-dimensional dynamical sys-tem is called Hamiltonian if there exists a function H(q1 · · · qn, p1 · · · pn)that is constant along trajectories and such that the dynamical equationscan be rephrased as

dqidt

=∂H

∂pidpidt

= −∂H∂qi

, i = 1, · · · , n.

198 NLD-draft

Definition A.24. Given a Hamiltonian system, a functionf(q1 · · · qn, p1 · · · pn) that is constant along trajectories, i.e, such that

df

dt=

n∑i=1

∂f

∂qi

dqidt

+∂f

∂pi

dpidt

=n∑i=1

∂f

∂qi

∂H

∂pi− ∂f

∂pi

∂H

∂qi= 0

is called a constant of the motion.

Theorem A.16 (Liouville, Arnold). A Hamiltonian system hav-ing n independent constants of the motion f1, · · · , fn in involution (ofwhich we say that the first one, f1 = H) satisfies that

• If the n-dimensional level surface of first integralsMf = (q1 · · · qn, p1 · · · pn) : fk = ck is compact and connectedthen it is diffeomorphic to a torus T n = S1 × · · · × S1.• A coordinate system I1, · · · , In, φ1, · · ·φn can be introduced in

phase-space such that the angles φ1, · · ·φn, 0 ≤ φk ≤ 2π arecoordinates on Mf and I1, · · · , In are adequate functions of(f1, · · · , fn) and also independent constants of the motion.• The dynamical equations in this coordinate system become

dIkdt

= 0,dφkdt

= wk(I1, · · · , In), k = 1, · · · , n

Hence, the original system is solvable by quadratures (integration ofknown functions).

Proof. We start by making the following facts explicit:

• For each possible choice of constants c1, · · · , cn the surfaceMf

(which is compact by assumption) where the motion takes placeis defined.• By independent constants of the motion it is meant that the

normal vectors (given by ∇fk) to the hypersurfaces are linearlyindependent. By the constants of motion being in involution itis meant that the flow on Mf associated to the vector fields

V fk(p · · · , q) =

(∂fk∂p1

, · · · , ∂fk∂pn

,−∂fk∂q1

, · · · ,−∂fk∂qn

),

(k = 1, · · · , n), commute, i.e., that

ϕVfk (t, ϕV

fj(s, x0)) = ϕV

fj(s, ϕV

fk (t, x0)).

Since on the manifoldMf the flow given by the vector fields are definedfor all times, this flow is described by a mapping Φt1,··· ,tn :Mf →Mf

given by Φt1,··· ,tn(x) = ϕVfn

(tn, · · ·ϕVf1 (t1, x)). In other words, the flow

is described by the action of the additive group Rn = t1, · · · , tn onMf . SinceMf is compact while Rn is not, there are elements t1, · · · , tn

A.7. THE LIOUVILLE THEOREM OF HAMILTONIAN DYNAMICS 199

such that Φt1,··· ,tn(x0) = x0. This property is independent of x0 anddefines an additive, discrete, proper subgroup of Rn of the form τ · Znin the same way as it was done for n = 1 in Theorem A.15. Also, inthe same way as the quotient group R/Z is the unit circle S1, we havethat Rn/Zn = Tn = S1× · · ·×S1. Hence, the compact manifoldMf isshown to be diffeomorphic to Tn by the action of Φ. This proves thefirst statement.

By a suitable change of coordinates, the dynamics can be sepa-rated in n independent dynamics on each coordinate φi, i = 1, · · · , nof Tn with constant rhs. With this coordinate change, the constantsf1, · · · , fn transform to new constants I1, · · · , In and the dynamics onthe Torus is separated as

dφidt

= V Ii

with constant rhs depending solely on the Ii’s.By the implicit function theorem, equations fk = ck can be simul-

taneously solved for pi(q1, · · · , qn, c1, · · · , cn) and the relationsfi(qi, pi(q1, · · · , qn, c1, · · · , cn)) = ci hold identically onMf . Moreover,since

∂fi∂qj

+n∑k=1

∂fi∂pk

∂pk∂qj

= 0

we have thatn∑j

∂fm∂pj

∂fi∂qj

+n∑j,k

∂fm∂pj

∂fi∂pk

∂pk∂qj

= 0.

By interchanging i and m we obtain

n∑j,k

(∂fm∂pj

∂fi∂qj− ∂fi∂pj

∂fm∂qj

)+

n∑j,k

∂fm∂pj

∂fi∂pk

(∂pk∂qj− ∂pj∂qk

)= 0.

The first sum vanishes because of the involution condition. Hence, the

second sum implies that

(∂pk∂qj− ∂pj∂qk

)= 0 and hence,∮

Γ

∑j

pjdqj

is an exact differential integrating to zero on any closed loop Γ onMf

contractible to a point. However, since Mf is diffeomorphic to Tn thepreviously invoked cordinate change corresponds to n distinct closedloops γk that are not contractible to a point (one for each unit circlebuilding Tn) and the new constants of motion to the quantities

Ik =

∮γk

∑j

pj(q)dqj.

200 NLD-draft

By construction, the Ik’s are constants depending only on c1, · · · , cn(and hence constants of the motion). Moreover, this relation is invert-ible to express ck(I1, · · · , In), what in particular gives that the Hamil-tonian c1 = f1 = H depends solely on the Ik’s. This proves the secondstatement.

Propagating the coordinate change (φ1, · · · , φn, I1, · · · , In) and us-ing that

n∑i=1

∂φj∂qi

∂Ik∂pi− ∂Ik∂qi

∂φj∂pi

= ∇φj · V Ik = δjk,

the dynamical equations in these coordinates read

dφidt

=∂H

∂Ii;

dIidt

= 0 i = 1, · · · , n.

Remark A.11. The technical formulation of the property that twoconstants of motion f1 and f2 are in involution is that

n∑i=1

∂f1

∂qi

∂f2

∂pi− ∂f2

∂qi

∂f1

∂pi= 0.

Remark A.12. The modern, differential geometric, coordinate-freeversion of this Theorem using Lie algebras and their commutators canbe found in [17.].

A.8. Uniform Distribution

The notion of uniform distribution comes from Monte–Carlo–Me-thods. It deals with pseudo–random sequences.

Definition A.25. A sequence (xn) ∈ [0, 1] is uniformly dis-tributed if for any continuous function f : [0, 1]→ C we have

limN→∞

1

N

N∑n=1

f(xn) =

∫[0,1]

f(x) dx.

Theorem A.17 (H. Weyl). A sequence (xn) is uniformly distributedif and only if

limN→∞

1

N

N∑n=1

e2πihxn = 0

for all 0 6= h ∈ Z.

Proof. By Weierstraß’ theorem, trigonometric polynomials aredense in the set of continuous functions of the interval, i.e., for anycontinuous f and any ε > 0 there are numbers k ∈ N, zj ∈ C, nj ∈ N;1 ≤ j ≤ N such that

maxx∈[0,1]

∣∣∣∣∣k∑j=1

zje2πinjx − f(x)

∣∣∣∣∣ < ε.

A.9. NUMBER THEORY 201

Hence,

limN→∞

1

N

N∑n=1

f(xn)−∫

[0,1]

f(x) dx =

limN→∞

1

N

N∑n=1

f(xn)− limN→∞

1

N

N∑n=1

(k∑j=1

zje2πinjxn

)+

limN→∞

1

N

N∑n=1

(k∑j=1

zje2πinjxn

)−∫

[0,1]

(k∑j=1

zje2πinjxn

)dx+

∫[0,1]

(k∑j=1

zje2πinjxn

)dx−

∫[0,1]

f(x) dx.

Next note that for 0 6= h ∈ Z∫[0,1]

e2πihx dx = 0

and also∣∣∣∣∣ limN→∞

1

N

N∑n=1

f(xn)− limN→∞

1

N

N∑n=1

(k∑j=1

zje2πinjxn

)∣∣∣∣∣ < ε, and∣∣∣∣∣∫

[0,1]

(k∑j=1

zje2πinjxn

)dx−

∫[0,1]

f(x) dx

∣∣∣∣∣ < ε.

Hence, we may write∣∣∣∣∣ limN→∞

1

N

N∑n=1

f(xn)−∫

[0,1]

f(x) dx

∣∣∣∣∣ ≤ 2ε+

∣∣∣∣∣ limN→∞

1

N

N∑n=1

(k∑j=1

zje2πinjxn

)∣∣∣∣∣and∣∣∣∣∣ limN→∞

1

N

N∑n=1

(k∑j=1

zje2πinjxn

)∣∣∣∣∣ ≤ 2ε+

∣∣∣∣∣ limN→∞

1

N

N∑n=1

f(xn)−∫

[0,1]

f(x)

∣∣∣∣∣ ,thus proving the statement.

A.9. Number Theory

A standard theorem in Number Theory is the following

Theorem A.18 (Kronecker). Let α ∈ R \ Q, then the sequencenα mod 1 is dense in the interval [0, 1].

Proof. We are not going to prove this theorem directly. We willprove a stronger fact. Namely, the sequence nα mod 1 is uniformly

202 NLD-draft

distributed. Since∣∣∣∣∣ limN→∞

1

N

N∑n=1

e2πihnα

∣∣∣∣∣ =

∣∣∣∣∣ limN→∞

1

N

N∑n=1

e(2πihα)n

∣∣∣∣∣=

∣∣∣∣ limN→∞

1

N

1− e2πihNα

1− e2πihα

∣∣∣∣≤ lim

N→∞

1

N

2

|1− e2πihα|= 0

because 1 6= e2πihα for 0 6= h ∈ Z (α 6∈ Q). Now Weyl’s theorem A.17implies the uniform distribution.

Since any uniformly distributed sequence is dense, the Theorem isproved.

Remark A.13. If a sequence is not dense, it misses an intervalI. We can find a continuous function which is identically zero outsidethe interval but positive inside. Then the sum in Definition A.25 isidentically zero while the integral is positive. Therefore, the sequenceis not uniformly distributed.

A.10. The general Birkhoff Ergodic Theorem

In this section we are going to prove Birkhoff’s Ergodic Theorem infull generality. We present a combinatoric–functional theoretic proof.We believe that this kind of proof gives a recipe for other related the-orems.

Theorem A.19 (General Birkhoff Ergodic Theorem). Let T : X →X be a measure preserving map of a measurable space X with probabilitymeasure µ. Let φ ∈ L1(µ). Then the following limit exists µ-a.e.:

limN→∞

1

N

N−1∑n=0

φ(T nx) ≡ φ(x)

Before proving this theorem we will show how Theorem 9.1 followsfrom this theorem. First we observe that∫

X

(1

N

N−1∑n=0

φ(T nx)

)dµ(x) =

1

N

N−1∑n=0

∫X

φ(T nx) dµ(x)

=

∫X

φ(x) dµ(x)

since µ is invariant. Therefore∫X

φ(x) dµ(x) =

∫X

φ(x) dµ(x). (A.2)

A.10. THE GENERAL BIRKHOFF ERGODIC THEOREM 203

The space of continuous functions is separable, i.e., there is a densesequence φn of continuous functions. Theorem A.19 states that foreach n ∈ N we can choose a set An of measure µ(An) = 1 such that

φn(x) exists for all x ∈ An. Let A = ∩n∈NAn. Then µ(A) = 1 and for

all x ∈ A the limit φn(x) exists for all n ∈ N. Since the the averagingoperator is continuous, i.e.,

‖φ− ψ‖ < ε =⇒

∥∥∥∥∥ 1

N

N−1∑n=0

φ(T nx)− 1

N

N−1∑n=0

ψ(T nx)

∥∥∥∥∥ < ε

we have that φ(x) exists for all x ∈ A and all continuous functions φ.Moreover, for x ∈ A

φ(x) = limN→∞

1

N

N−1∑n=0

φ(T nx)

= limN→∞

1

N

(φ(x)− φ(TNx) +

N−1∑n=0

φ(T n+1x)

)

= limN→∞

1

N

N−1∑n=0

φ(T n(Tx))

= φ(Tx)

showing that φ is an invariant function. By ergodicity, it must beconstant µ–a.e. By equation (A.2) this constant must be equal to theintegral of φ. Hence, Theorem 9.1 is proved. Now we return to theproof of the present Theorem.

Proof of the general Birkhoff Ergodic Theorem. In or-der to make the ideas more transparent we are going to make two sim-plifying assumptions. First, we assume that T : X → X is invertible,i.e., T−1 exists. This is essential for one argument of the proof. Thenon–invertible case holds as well, but it needs more involved functionalanalysis which we do not want to present here.

Second, we are going to prove the general Birkhoff Ergodic Theo-rem in the case that φ ∈ L2(µ). The L1–case can be obtained by ap-proximating an L1 function by bounded functions (truncations). Theadvantage of L2 is that we can use Hilbert space techniques.

The idea of the proof is the following. First we provide a classof functions were the theorem holds (more or less) trivially. Then weprove that this set of functions is dense. Finally we use Theorem A.20(to be stated and proven next) to prove that the averaging operator iscontinuous and hence the theorem holds for all functions in L2.

204 NLD-draft

Let I := φ ∈ L2 : φ(x) = φ(Tx) µ–a.e. be the space of invariantL2–functions. Then for φ1 ∈ I and µ–a.e. x ∈ X we have∥∥∥∥∥ 1

N

N−1∑n=0

φ1(T nx)− φ1(x)

∥∥∥∥∥ =

∥∥∥∥ 1

NN · φ1(x)− φ1(x)

∥∥∥∥ = 0.

Let C be the space of functions θ which have a representation

θ(x) = h(Tx)− h(x) + c

where h ∈ L2∩L∞ is a bounded square–integrable function and c ∈ R.Such functions are called Co-boundaries. Note that C ⊂ L2 sinceµ(X) = 1. For φ2 ∈ C we have∥∥∥∥∥ 1

N

N−1∑n=0

φ2(T nx)− c

∥∥∥∥∥ =

∥∥∥∥∥ 1

N

N−1∑n=0

(h(T n+1x)− h(T nx) + c

)− c

∥∥∥∥∥=

∥∥∥∥ 1

N

(h(TNx)− h(x) +Nc

)− c∥∥∥∥

≤ 2 supx∈X |h(x)|N

Moreover, using the invariance of µ we verify that∫X

φ2 dµ =

∫X

(h(Tx)− h(x) + c) dµ

=

∫X

h(Tx) dµ−∫X

h(x) dµ+

∫X

c dµ

=

∫X

h(x) dµ−∫X

h(x) dµ+ c = c.

Hence, the following result holds for φ2 ∈ C:

limN→∞

1

N

N−1∑n=0

φ2(T nx) = c =

∫X

φ2 dµ.

Therefore, if φ ∈ D := I + C (i.e., φ = φ1 + φ2 is the sum of anI–function and a C–function) then the Birkhoff average exists µ-a.e.,since the averaging operator is linear and we can “split” its action overboth terms in the sum.

We are now going to show that the set of functions D is dense in L2.For this we prove first that the orthogonal complement of C consists ofinvariant functions, i.e., C⊥ ⊂ I. Let ψ ∈ C⊥, i.e.,

< ψ, φ >:=

∫X

ψ(x) · φ(x) dµ = 0


for all φ ∈ C. This means that for all h ∈ L2 ∩ L∞ we have

0 =

∫X

ψ(x) · φ(x) dµ

=

∫X

ψ(x) (h(Tx)− h(x) + c) dµ

=

∫X

ψ(x) · h(Tx) dµ−∫X

ψ(x) · h(x) dµ+ c

∫X

ψ(x) dµ

=

∫X

ψ(x) · h(Tx) dµ−∫X

ψ(Tx) · h(Tx) dµ+ c

∫X

ψ(x) dµ

=

∫X

(ψ(x)− ψ(Tx)) g(x) dµ+ c

∫X

ψ(x) dµ

where g(x) = h(Tx) ∈ L2∩L∞. Note that g runs through all functionsin L2 ∩ L∞ since h does it and T is invertible. Setting c = 0 we havethat

< ψ(x)− ψ(Tx), g(x) >= 0

for all g ∈ L2 ∩ L∞ and since the bounded functions are dense in L2

(i.e., L2 ∩ L∞ = L2 where the closure is taken in the L2 norm) we havethat ψ(x)− ψ(Tx) = 0 as an L2–function, i.e., ψ ∈ C⊥ ⇒ ψ ∈ I.

It is now left to prove that the averaging operator is continuous.Since this operator is linear, we have to check this only at 0.

Let Ac(φ) := x ∈ X : supN∈N sup1≤n≤N1n

∑n−1k=0 |φ(T kx)| > c, for

φ ∈ L1. By the Maximal Ergodic Theorem A.20 we have that

‖φ‖L1 =

∫X

|φ| dµ ≥∫Ac

|φ| dµ ≥ cµ(Ac(φ))

or

µx ∈ X : supN∈N

sup1≤n≤N

1

n

n−1∑k=0

|φ(T kx)| > c ≤ 1

c‖φ‖L1 .

Now Fix c ∈ R+, ε > 0 and let φm ∈ D and ‖φ−φm‖L1 → 0. We denote

φm(x) = limn→∞1n

∑n−1k=0 φm(T kx) for µ–a.e. x ∈ X. Let mc ∈ N be

such that ‖φ − φm‖L1 < cε2

for all m > mc. Then by the previous

206 NLD-draft

inequality we get for m > mc

µ

x ∈ X : lim sup

n→∞

1

n

n−1∑k=0

φ(T kx)− lim infn→∞

1

n

n−1∑k=0

φ(T kx) > c

≤

x ∈ X :

∣∣∣∣∣lim supn→∞

1

n

n−1∑k=0

φ(T kx)− φm(x)

∣∣∣∣∣+

∣∣∣∣∣lim infn→∞

1

n

n−1∑k=0

φ(T kx)− φm(x)

∣∣∣∣∣ > c

≤

x ∈ X : 2

∣∣∣∣∣lim supn→∞

1

n

n−1∑k=0

φ(T kx)− φm(x)

∣∣∣∣∣ > c

≤ µ

x ∈ X : sup

N∈Nsup

1≤n≤N

1

n

n−1∑k=0

|φ(T kx)− φm(T kx)| > c

2

≤ 2

c‖φ− φm‖L1 < ε

Since c, ε > 0 were arbitrary we conclude that

µ

x ∈ X : lim sup

n→∞

1

n

n−1∑k=0

φ(T kx)− lim infn→∞

1

n

n−1∑k=0

φ(T kx) > 0

= 0

and the theorem is proven.

A.10.1. The Maximal Ergodic Theorem.

Theorem A.20. Let c ∈ R and µ be an invariant measure (invari-ant for the dynamics given by T ).Let Ac(φ) := x ∈ X : supN∈N sup1≤n≤N

1n

∑n−1k=0 φ(T kx) > c. Then

for any φ ∈ L1(µ) we have∫Ac(φ)

(φ(x)− c) dµ ≥ 0.

Proof of Theorem A.20 after K. Petersen[9.]. Let us de-note

φN(x) := sup1≤n≤N

1

n

n−1∑k=0

φ(T kx)

and assume that φ ∈ L1 ∩ L∞. The general case is a question of astandard approximation procedure. For c ∈ R fixed we set, for N ∈ N

EN :=x ∈ X : φN(x) > c

.

Then (φ(x)−c)χEN (x) ≥ φ(x)−c, since both sides are equal if x ∈ EN ,while for x 6∈ EN we have that φ(x)− c ≤ 0 = (φ(x)− c)χEN (x).

Let us split the sum∑m−1

k=0

[(φ(T kx)− c)χEN (T kx)

]into several

parts for m− 1 > N (actually, m will be much larger than N):


• The first part consists of zeroes as long as T ix 6∈ EN .• Then there is a first k1 where T k1x ∈ EN . By the definition ofEN we have to wait at most l1 ≤ N steps untill1∑k=0

[(φ(T k1+kx)− c)χEN (T k1+kx)

]= a1 > 0.

This gives a second string of length l1. Set r1 = k1 + l1.• There might be again some n > r1 where T nx 6∈ EN . Let us

denote with k2 > r1 the first step where T k2x ∈ EN and thenrepeat the previous splitting.• Assume we have done this procedure n times, i.e., we have

defined the strings from ki till ri; i ≤ n with ri − ki = li ≤ N ,such thatli∑k=0

[(φ(T ki+kx)− c)χEN (T ki+kx)

]= ai > 0 and

ki−1∑k=ri−1+1


]= 0.

Then we can continue by induction until we reach m−1. Say westopped at rlast. The rest of the sum, i.e., the sum from rlast+1till m − 1 can have at most q < N terms in χEN (otherwiserlast was not the last one) and the worst scenario is that thoseq terms are negative.

We can estimate:

m−1∑k=0


]=

k1−1∑k=0


]+

r1∑k=k1


]+

k2−1∑k=r1+1


]+

r2∑k=k2


]+· · ·

+

m−q−1∑k=1+rlast


]+

m−1∑k=m−q


]= 0 + a1 + 0 + a2 + · · ·+ 0 +

m−1∑k=m−q


]≥ N (−‖φ‖L∞ − c) .

By invariance of T we have that∫EN

m−1∑k=0

[(φ(T kx)− c)

]dµ = m

∫EN

(φ− c) dµ

208 NLD-draft

Now we integrate both sides of the estimation:

m

∫EN

(φ− c) dµ = m

∫X

(φ− c)χEN dµ ≥ N (−‖φ‖L∞ − c) .

Hence, dividing by m,∫EN

(φ− c) dµ ≥ N

m(−‖φ‖L∞ − c) .

Letting m→∞ we get ∫EN

(φ− c) dµ ≥ 0.

EN is not exactly the set Ac(φ) but letting N → ∞ completes theproof.

APPENDIX B

Notes and Answers to the Exercises

These notes are intended to be read in articulo mortis. Rather,you should try the exercises on your own, regardless of the opinions,approaches or suggestions written here. If you do not succeed at thefirst try, insist. When everything else is lost, come to these notes. Theymay help some of the readers. Answers to a few exercises are added atthe end.

Notes

1.1 This is explicit from the text.1.2 Having the flow explicitly, we can explicitly compute F by set-

ting t = 1. Plot typical orbits in both systems.2.1 Use that if v = x, then v = x.2.2 (b) Rewrite t = βτ and find suitable β such that the original

equation is independent of ω0.2.3 Standard exercise in the theory of matrices. Compute exp(tA)

and put x(t) = exp(tA)x0. Note that A2 is non-diagonalisable.2.4 After changing variables to z = x/t the equation is separable.

2.5 Change variables to u = exp(x) and set u(t) = a(t)C+a(t)

. The

equation for a is separable.2.6 Derive the equation with respect to x and eliminate C.2.7 If v(x) = 0 then x is constant. This is a possible solution of the

ODE. Other solutions x(t) do not cross this solution, so theynever come to an x such that v(x) = 0. Use then continuity.

2.8 Use that the basis vectors of the space are linearly independentand they all participate in the solution.

2.9 Prove continuity by the definition. Check the proof of Picard’sTheorem for a hint.

2.10 Don’t solve the equations. Just perform the Picard iterates.2.11 Same as above, after recasting the problem as a first-order equa-

tion system.3.1 Move around x and t until they land in different sides of the

equality, then integrate.3.2 Apply the transformation formally. You will find that certain

a and b highly simplify the remaining calculations.3.3 Same as above.3.4 The solution is Ck in µ.

209

210 NLD-draft

3.5 (a): For all times. (b) Yes, see Index theory in the Appendix.3.6 The condition on the vector field allows to consider an open set

as in the definition of attracting set lying inside the trappingregion.

3.7 Note here that the ω-limit of any point is compact but the setO (the union of all ω-limits) consists of three isolated points.

3.8 Use polar coordinates. Calculations are lengthy.3.9 There is a fixed point at the origin and in addition a pair of

fixed points appear for r > 1, their location depending on r.3.10 We want a closed connected set encircling some ball containing

the origin, which we know is a fixed point. Let us try with anellipsoid. If the vector field points inwards on the surface of theellipsoid, it will be a trapping region. The ellipsoid looks likeH(z, y, z) = Ax2 +By2 +Cz2 = D for some positive A,B,C,D.We want the vector field to point inwards, which means that∇H · v < 0 on the surface of the ellipsoid. Putting C = Bfacilitates the computations. Verify that for 0 < r < 1, any suchellipsoid as proposed (based on the origin) will be a trappingregion by choosing e.g., B = C = 1, A = 1/σ.

3.11 Here we want a trapping region that is good for a large regionin r, e.g., 0 < r < 35 in order to encompass all types of inter-esting behaviour of the system (this is to be understood afterChapter 7). The approach with an ellipsoid with B = C stillholds, but now we need a much larger region, D will have tobe comparatively large. Computations are largely simplified ifthe ellipsoid is not centered at the origin, but rather at somepoint (0, 0, z0), with z0 = r + Aσ/B. On the surface of theellipsoid H(z, y, z) = Ax2 + By2 + B(z − z0)2 = D, one has to

verify that the function ∇H·v2B

= −y2 − AσBx2 − b(z − z0

2)2 + b

z204

is negative. Using the usual parameters of the Lorenz system,namely σ = 10 and b = 8/3, you may verify that there are afew critical points on the surface of the ellipsoid and that onall of them the vector field points inwards for D

B> 16

15z2

0 . Thecritical points (following this approach) are: (y = 0 = x, z =

z0 ±√

DB

), (y = 0, z = z02

( b−2σb−σ ), Ax2 = D − B(z − z0)2) and

(x = 0, z = z02

( b−2b−1

), y2 = DB− (z − z0)2).

4.1 Write the first few iterations and try to guess the general rule.x is multiplied by powers of a and b is summed up times powersof a. Finally, xk = akX + b(1 + a+ · · ·+ ak−1), k > 0.

4.2 Change variables to y = lnx and try as in the previous Exercise.4.3 Rewrite the dynamics for z and choose γ such that it becomes

linear.4.4 Population growth requires rn > 1 and hence r > 1. Population

decrease correspondingly demands r < 1.

NOTES 211

4.5 For point 4 it is necessary to check when f has Lipschitz con-stant smaller than unity. The derivative of f may help.

4.6 Rewrite the problem in polar coordinates. Verify that the dy-namics for the polar angle is decoupled and take as Poincaresection the line φ = φ0.

4.7 Try with polar coordinates again for the phase variables y andp = y′.

4.8 The x-coordinate is decoupled from the y-coordinate. Take ar-bitrary open sets U, V and points (x0, y0), (x0

′, y0′) in their

interiors. Consider k large enough so that the square regionSU = [x0, x0 + 1/k] × [y0, y0 + 1/k] ⊂ U and similarly for thecorresponding square SV but now with side-length 2/k. Let αbe irrational. The forward iterates of the first square will even-tually stretch along the whole y-coordinate of the torus (beforetaking mod 1). Show that it is enough to iterate j times, withj > k. Take then j0 > k such that x0

′ < F j0(x0) < x0′ + 1/k

(this is possible because the projection of the orbit on the firstcoordinate is dense). Show now that F j0(SU) ∩ SV 6= ∅. Fur-ther, the map cannot be mixing, since this property fails for thex-coordinate, as discussed in Example 3.3. If α is rational, thenthe x-coordinate is periodic (of period, say, p), and the p-powerof the map is a rational shift in y, hence periodic. See [12.].

4.9 A similar Exercise is worked out in detail in Section 23.3 of [6.,pp. 378, 4th. ed.].

4.10 This is a special case of a more general statement discussed byAkin and Carlson in [1.].

4.11 Writing each number in decimal notation, f shifts the decimalpoint to the right. The preimages of 0 are all points with finitedecimal expression.

4.12 Use the hint. In binary expression each number in [0, 1) iswritten as x = 0.a1a2 · · · where the ak = 0 or 1. The mapshifts the dot to the right.

4.13 Use the hint and rewrite the map as a map for θ. Writing eachnumber in [0, 1) in base 3, the map shifts the dot to the right,cf the previous Exercise.

5.1 A quadratic, positive definite function will do.5.2 First rewrite the equations as a dynamical system. The usual

trick of trying a quadratic, positive definite function of the co-ordinates does not work. For (a) try f(x, y) = αx4 +βx2 +γy2.The second problem is more tricky, you have to add to theprevious trial a term βxy.

5.3 Apply Lyapunov’s second theorem, i.e., study the eigenvaluesof the linearisation.

5.4 We have h invertible and differentiable such that h(x(t, h−1(y0) =y(t, y0) or h(etAh−1(y0) = etBy0. Differentiate with respect

212 NLD-draft

to y0 at y0 = 0: (dh/dx)|x=0etA(dh−1/dy)|y=0 = etB. The h-

derivatives are matrices which are inverse of each other. Hence,the matrices A, B are similar (they have the same Jordan nor-mal form and eigenvalues).

5.5 This result is related to a theorem by Chetaev. The Chetaevfunction is analogous to the Lyapunov function for unstablefixed points. The function f is increasing along trajectoriesbecause of the positive derivative. An arbitrary neighbourhoodof p will contain a point xn of the sequence. But f(x(t, xn)) > 0and the trajectory cannot approach zero.

5.6 We have v(p) = 0 = −∇f(p) ⇔ df = (p) =∑

i(∂f/∂xi)dxi =0. Also D(v(p)) = ∂2f/∂xi∂xj is a symmetric matrix andd2f(p) is a bilinear form. Non-degenerate means no zero eigen-values and hence the fixed point is hyperbolic.

5.7 This is a classical exercise in electrodynamics. Check a book onClassical Electromagnetic Theory. The electrostatic potentialinside a circle of radius R is given by

V (r, θ) =1

2π

∫ 2π

0

V (R, φ)R2 − r2

R2 + r2 − 2Rr cos (φ− θ)dφ.

It turns out that the integral equals unity for V (R, φ) = const.5.8 This is now a classical exercise in Dynamical Systems. Solutions

for the first part can be found somewhere in the Internet. Theperiod-3 is more involved. Solve it numerically using the sug-gestion and plotting the polynomial that should give the roots.For analytic computations, see [2.].

5.9 Same as above.5.10 The systems can be solved in closed form by direct integration.5.11 Apply Lyapunov’s Definitions and Theorems.6.1 Straightforward application of the Theorems in this Chapter.6.2 Same as above. The dynamical equation is now one–dimensional.6.3 Disregard the fact that one eigenvalue of the linearisation at the

origin is larger than unity and focus on the shape and dynamicsof the Centre Manifold.

6.4 Recast µ as a third coordinate with dynamics µ = 0. Thissystem has a two-dimensional centre manifold. Compute themanifold up to second order and write the dynamics on themanifold. Verify that in a small µ-interval around zero, thesystem has two fixed points of opposite stability, that collideexactly at µ = 0.

6.5 Same ideas as the previous one, now in maps. There is only onefixed point along with an invariant set of the form x2 + y2 = C,constant and φ = 1 (the polar angle).

7.1 Your first Hopf calculation? Check that it fits in the previoustheory.

NOTES 213

7.2 First compute the eigenvalues of the linearisation, then checkfor what value(s) of a the eigenvalues are +1, −1 or complexwith modulus one. All three cases actually occur.

7.3 Now we need pure imaginary eigenvalues (a = −1) or havingone of the eigenvalues equal to zero, which occurs for a = −3, 0.

8.1 The logistic map revisited. Computations are straightforwardand some of them have been done in previous exercises. Use acomputer algebra programme.

8.2 Same as above. Color-coding is not essential but might givea beautiful picture to hang on your office. This is the fa-mous logistic bifurcation diagram that most books display. Seehttp://en.wikipedia.org/wiki/Logistic map.

8.3 More colorful computations that should be straightforward us-ing a computer algebra programme and/or a numerical compu-tations programme.

8.4 All symbolic sequences correspond to trajectories. Build a se-quence with “half” the sequence of Ws(p) and half of Wu(q).Assume there exists a stable orbit. Since there exists a denseorbit, any neighbourhood of the assumed stable orbit will havepoints (of the dense orbit) wandering away from the neighbour-hood, which is a contradiction.

8.5 Try to adapt the technique used in Example 8.1. The idea is tofind the fixed points/periodic points of the return map and usethe linearisation to analyse their stability properties.

8.6 Let S be the unit square. Periodic points satisfy An(x, y) =(x, y) mod 1 or (An − Id)(x, y) = (0, 0) mod 1. The numberof such points as well as the number of preimages of a pointin the unit square is equal to the number of times (An − Id)Scovers the unit square, which in turn equals the area of theimage. This is given by det (An − Id) = (λn1 − 1)(λn2 − 1) whereλi are the eigenvalues of A. There are many webpages withexamples and analyses of this problem.

8.7 See Appendix A.2.1 first. You can “construct” homoclinic sym-bolic sequences matching together portions of Ws(p) and Wu(p).

9.1 Check the definition inspiring the exercise. If the product oftwo nonnegative numbers is zero, then one of them is zero.

9.2 Express the characteristic function of an interval with its Fourierseries: χ(a,b)(x) =

∑k cke

2πikx. Then,

χ(a,b)(Rnα(x)) = χ(a,b)(x + nα mod 1) =

∑k ck

(e2πiknα

)e2πikx.

Hence,

limN→∞

1

N#0 ≤ n < N : Rn

α(x) ∈ (a, b) =

limN→∞

1

N

N−1∑n=0

∑k

ck(e2πiknα

)e2πikx = c0 =

∫[0,1)

χ(a,b)(x) dx,

214 NLD-draft

since characteristic functions take the values 1 or 0. The secondequality follows making the geometric sum on n and taking thelimit (only k = 0 survives the N -limit). The last equality isjust the definition of c0 in Fourier series. We can extend thisresult from characteristic functions to continuous functions aswell, hence:

limN→∞

1

N

N−1∑n=0

φ(Rnα(x)) =

∫[0,1)

φ(x) dx,

so the only invariant measure is the Lebesgue measure (the onlyfunctional that fits in the RHS of the last equation). If α =p/q ∈ Q then there are many invariant ergodic measures: Takeany periodic orbit A = x1, . . . , xq and define a measure withvalues 1/q if xi ∈ A and zero otherwise. Recall the previousExercise.

9.3 We have two different ergodic measures. Hence, there must besome continuous function such that∫

X

φ(x) dµ 6=∫X

φ(x) dν.

We use the Ergodic Theorem again:∫X

φ(x) dµ = limN→∞

1

N

N−1∑n=0

φ(fn(x)), x ∈ Aµ

∫X

φ(x) dν = limN→∞

1

N

N−1∑n=0

φ(fn(x)), x ∈ Aν

Aµ is a set of unit µ-measure where the averages make sense.Since the measures are different, Aµ∩Aν = ∅ and the measuresare singular.

9.4 Suppose µ is invariant. Then µ(f−1(suppµ)) = µ(suppµ) = 1and also µ(f−1(suppµ)∩µ(suppµ)) = 1. Suppose suppµ is notinvariant. Then there exists a point x ∈ suppµ\f−1(suppµ) orx ∈ f−1(suppµ)\ suppµ. Since suppµ is closed, there must bea whole ball around x such that

B(x) ⊂ suppµ\f−1(suppµ) ∪ f−1(suppµ)\ suppµ.

This ball should have zero measure, otherwise it is not truethat µ(f−1(suppµ) ∩ µ(suppµ)) = 1. But in such a case eitherx /∈ suppµ or f(x) /∈ suppµ which is a contradiction. Hencesuppµ is invariant.

9.5 Assume the measure of this set is a > 0. Then there existsa compact set K ⊂ X \

⋃x∈X ω(x) such that µ(K) > a/2.

However, by Poincare Recurrence Theorem, almost any point

ANSWERS TO SELECTED EXERCISES 215

x ∈ K returns infinitely often to K and hence, since K is com-pact, ω(x) ∩K 6= ∅, which is a contradiction.

A.1 The complement of the Cantor set consists of a disjoint unionof open intervals. Its measure is then the sum of the individualmeasures. At step n ≥ 0 we eliminate 2n subintervals of width1/3n+1 so the total measure of the eliminated intervals is m =(1/3)

∑n≥0(2/3)n = 1. Hence, the Cantor set has zero measure.

A.2 The measure of the remaining set is still zero. After a moreinvolved calculation we recover m = 1 for the deleted intervals,for any 0 < p < 1/3.

A.3 We still eliminate 2n intervals of length 1/kn+1. The total areaof the subtracted part is m = 1/(k − 2), positive but smallerthan unity for k > 3.

A.4 For R notice that the function arctanx maps the real line ontoa subset of itself. Z can be mapped to N through the functionf(0) = 0 and f(k 6= 0) = 2|k|+(1−sg(k))/2. Positive numbersmap to even numbers and negative map to odd. The secondproof requires writing the rationals in the form of an infinitematrix. The last statement is the famous diagonal proof ofCantor. Assume the set of reals in [0, 1] to be equipotent to N.Express the numbers in decimal form and write each elementin the set in a list that by assumption is exhaustive. Considerthe number having as k-th decimal figure xkk +1 where xkk is thek-th decimal figure of the k-th element in the list. If xkk = 9 wetake xkk − 1 instead. The number so constructed is not in thelist, hence the list was not complete.

A.5 Write the elements in C in base-3 and map each 2 in the ex-pansion to a 1. The image of the map consists of all possiblestrings of 0’s and 1’s, i.e., all real numbers in [0, 1], now writtenin binary form.

A.6 If the hint is not enough, see [15.].A.7 To compute a fixed point it is enough to solve xn+1 = xn. To

verify that it is unique, we need to show that the map is a con-traction for a suitable interval and then use Banach’s Theorem.

A.8 Same as the previous Exercise.

Answers to Selected Exercises

The solution to some selected exercises is given here. One maybetter profit of this Section if read together with the Notes in theprevious Section.

1.2 F (x) = xe and xn = x0en, n ≥ 0.

2.2 (a) x = p; p = −w20x; x(t) = x0 cosw0t+ p0 sinw0t. (b) Rescale

as x = y/w0 and t = T/w0. (c) x = p; p = −βp − w20x. (d)

w20 = k/m.

216 NLD-draft

2.3 (1) x(t) =7

2

(11

)e−t − 1

2

(1−1

)e−3t.

(2) x(t) =

(3 + 4t

4

)et.

(3) x(t) = 4

(21

)et − 5

(10

).

2.5 u = ex and u(t) =et

2/2∫ t0es2/2ds

.

2.6 (a) y′ = yx/(x2 − y), (b) 2xyy′ = 2x3 + y2.2.10 (a) x0(t) = 0, x1(t) = t2/2, x2(t) = t2/2− t5/20.

3.1 (a) x(t) =x0

1 + t0(1 + t)e−(t−t0), (b) x(t) =

x0

1 + x0 ln t2−1t20−1

.

3.2 Use x = z − 2t and obtain x(t) = 1− 2t+ Cet.3.3 x = −t and ω = −1.3.4 x(t) = −1/t+ µ(1− 1/t2) + µ2(8/(3t2) + t/3− 2/t− 1/t3).3.7 Three fixed points at y = 0, x = −1, 0, 1. Cartesian coordinate

axes through the fixed points are also part of the invariant sets.A large circle enclosing all fixed points serves as a trappingregion. The ω-limit set is not connected.

3.8 Polar coordinates. A fixed point at the origin and a periodicorbit at r =

√a, which is the ω-limit of all points outside the

origin.3.9 The origin is a fixed point, always present. For r > 1 a pair of

fixed points appear at x = y = ±√b(r − 1), z = r − 1.

3.10 For r < 1, the ellipsoid x2/σ + (y2 + z2)/r = D2, D ≥ 0 is atrapping region.

4.1 For 0 6= a 6= 1, xn = anx0 + bn−1∑k=0

ak; for a = 0, xn = b (n ≥ 1);

while for a = 1, xn = x0 + nb (n ≥ 0).

4.2 xn = xαn

0 βαn−1α−1 .

4.3 For γ = b/(a− 1), xn =anx0

1 + γx0(an − 1).

4.4 (1) xn = x0rn; (2) Exponential growth for r > 1; (3) Expo-

nential decrease for r < 1; (4) Valid for small-sized populationsthat do not influence the environement.

4.5 (1) When population densities increase towards their maximum(= 1), birth-rate decreases; (2) The model admits real-valuedsolutions, however, “true” populations can only take rationalvalues with denominator K; (3) A fixed point at x = 0 andanother at x = 1− 1/r, whenever r ≥ 1; (4) For r < 1.

4.6 Return map: P (r) =re2π√

1 + r2(e4π − 1), with a fixed point at

r = 0.

ANSWERS TO SELECTED EXERCISES 217

4.11 Writing each x in decimal form, f(x) shifts one step to the rightin the decimal expression of x: f(0.a1a2a3 · · · ) = (0.a2a3 · · · ).

5.1 The function H(x, y) = x2 + 2y2 is a Lyapunov function.5.2 (a) E.g., H(x, p) = p2 + ω2x2(1 + x2/2).

(b) E.g., H(x, p) = p2 + ω2x2(1 + x2/2) + 2x(p+ x).5.3 −2 < a < −1.5.8 (a) For fixed points see Ex 4.5. The origin is stable for µ < 1

and the other fixed point is stable for 1 < µ < 3.5.10 In the first case the origin is “stable from one side” and unstable

from the other. In the second case the origin is unstable.5.11 (a) The origin is center, x = π is saddle. (b) Origin is saddle,

points in (±1, 0) are stable for ε > 0, unstable for ε < 0. (c)Origin saddle, fixed point in (1,−1), unstable.

6.2 In the coordinates diagonalising the linear part, h(u) =

(−3u3/968

3u2/8

)and u = u3 +O(4).

6.3 h4(x) =2

3x2 − 4

3x3 − 8

9x4 and xn+1 = −xn −

4

9x4n +O(5).

7.2 Fixed point at the origin, Hopf bifurcation for a = 5/3, perioddoubling for a = 5/2 and saddle-node for a = 9/4.

7.3 Fixed point at the origin, Hopf bifurcation for a = −1, a bifur-cation of the saddle-node family for a = −3.

8.1 See Ex. 4.5 and 5.8. Same holds for 8.2 and 8.3.8.4 (a) If (a0, · · · , am), (b0, · · · , bn) are the periodic strings of p, q,

then z = · · · (b0, · · · , bn).(a0, · · · , am) · · · ∈ Ws(p) ∩Wu(q).8.5 Following ideas in Example 8.1, the return map is

x = λm− (X + ax+ b(y − Y ))y − Y = λm+ (µ+ cx+ e(y − Y )2)

.

A.1 The eliminated intervals add up to measure one. The Cantorset is measurable, with measure zero.

A.2 Same as above.A.3 The eliminated intervals have measure AE = 1/(k− 2) (k > 3),

hence the complement has positive measure.A.7 (a) x = (1 +

√5)/2, (b) x =

√3.

A.8 x = 1/2.

Bibliography

[1.] Akin, E. and Carlson J. D. (2011). Conceptions of Topological TransitivityarXiv:1108.4710v1.

[2.] Bechhoefer, J., (1996). The Birth of Period 3 Revisited. Mathematics Maga-zine, 69(2), 115-118.

[3.] Carr, J., (1981). Applications of Centre Manifold Theory. Springer. New York,Heidelberg.

[4.] Guckenheimer, J., Holmes, P. J., 1986. Nonlinear Oscillators, Dynamical Sys-tems and Bifurcations of Vector Fields. Springer, New York, 1st printing 1983.

[5.] Hale, J. K., 1969. Ordinary Differential Equations. Wiley, New York.[6.] Hardy, G. H. and Wright, E. M., 1938. An Introduction to the Theory of

Numbers. Fifth ed. 1979. Oxford Science Publications.[7.] Holst, A. and Ufnarovski, V. (2012) Matrix Theory. KFS, Lund.[8.] Katok, A. and Hasselblatt, B, (1996). Introduction to the Modern Theory of

Dynamical Systems. Cambridge University Press.[9.] Keane, M. and Petersen, K., (2006). Easy and Nearly Simultaneous Proofs

of the Ergodic Theorem and Maximal Ergodic Theorem. IMS Lecture Notes–Monograph Series. Dynamics & Stochastics, 48(6), 248-251.

[10.] Lorenz, N., 1963. Deterministic non-periodic flow. J. Atmospheric Science 20,130–141.

[11.] Metropolis, N., Stein, M. L., Stein, P. R., 1973. On finite limit sets for trans-formations on the unit interval. J. Combinatorial Theory 15A, 25.See also http://en.wikipedia.org/wiki/Logistic map.

[12.] Robinson, C., 2008. What is a chaotic attractor. Qualitative Theory of Dy-namical Systems, 7, 227236.

[13.] Solari, H., Natiello, M., Mindlin, B., 1996. Nonlinear Dynamics: A Two-wayTrip from Physics to Math. Institute of Physics, Bristol.

[14.] Spanne, S., 1995. Olinjara Dynamiska System. KFS, Lund.[15.] Ueno, K., Shiga K. and Morita, S., (1995). A Mathematical Gift I, Mathe-

matical World, v. 19. American Mathematical Society, Rhode-Island.[16.] Wagon, Stan, 1985. The Banach-Tarski paradox. Cambridge University Press.[17.] Arnold, V. I., (1989). Mathematical Methods of Classical Mechanics. 2nd e

dition, Springer, New York (1st ed 1978). See pages 203– and 271–285.

219

nld-draftctr.maths.lu.se/matematiklth/courses/fman15/media/... · the treatment of discrete{time...

Documents