nonlinear mechanics - department of...

Nonlinear Mechanics

A. W. Stetz

January 8, 2012

Contents

1 Lagrangian Dynamics 5

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Generalized Coordinates and the Lagrangian . . . . . . . . . 6

1.3 Virtual Work and Generalized Force . . . . . . . . . . . . . . 8

1.4 Conservative Forces and the Lagrangian . . . . . . . . . . . . 10

1.4.1 The Central Force Problem in a Plane . . . . . . . . . 11

1.5 The Hamiltonian Formulation . . . . . . . . . . . . . . . . . . 13

1.5.1 The Spherical Pendulum . . . . . . . . . . . . . . . . . 15

2 Canonical Transformations 17

2.1 Contact Transformations . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 The Harmonic Oscillator: Cracking a Peanut with aSledgehammer . . . . . . . . . . . . . . . . . . . . . . 20

2.2 The Second Generating Function . . . . . . . . . . . . . . . . 21

2.3 Hamilton’s Principle Function . . . . . . . . . . . . . . . . . . 22

2.3.1 The Harmonic Oscillator: Again . . . . . . . . . . . . 24

2.4 Hamilton’s Characteristic Function . . . . . . . . . . . . . . . 25

2.4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5 Action-Angle Variables . . . . . . . . . . . . . . . . . . . . . . 27

2.5.1 The harmonic oscillator (for the last time) . . . . . . . 29

3 Abstract Transformation Theory 33

3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.1.1 Poisson Brackets . . . . . . . . . . . . . . . . . . . . . 35

3.2 Geometry in n Dimensions: The Hairy Ball . . . . . . . . . . 38

3.2.1 Example: Uncoupled Oscillators . . . . . . . . . . . . 41

3.2.2 Example: A Particle in a Box . . . . . . . . . . . . . . 43

3

4 CONTENTS

4 Canonical Perturbation Theory 454.1 One-Dimensional Systems . . . . . . . . . . . . . . . . . . . . 45

4.1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 494.1.2 The simple pendulum . . . . . . . . . . . . . . . . . . 49

4.2 Many Degrees of Freedom . . . . . . . . . . . . . . . . . . . . 51

5 Introduction to Chaos 555.1 The total failure of perturbation theory . . . . . . . . . . . . 565.2 Fixed points and linearization . . . . . . . . . . . . . . . . . . 585.3 The Henon oscillator . . . . . . . . . . . . . . . . . . . . . . . 625.4 Discrete Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5 Linearized Maps . . . . . . . . . . . . . . . . . . . . . . . . . 705.6 Lyapunov Exponents . . . . . . . . . . . . . . . . . . . . . . . 725.7 The Poincare-Birkhoff Theorem . . . . . . . . . . . . . . . . . 745.8 All in a tangle . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.9 The KAM theorem and its consequences . . . . . . . . . . . . 80

5.9.1 Two Conditions . . . . . . . . . . . . . . . . . . . . . . 815.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Chapter 1

Lagrangian Dynamics

1.1 Introduction

The possibility that deterministic mechanical systems could exhibit the be-havior we now call chaos was first realized by the French mathematicianHenri Poincare sometime toward the end of the nineteenth century. Hisdiscovery emerged from analytic or classical mechanics, which is still partof the foundation of physics. To put it a bit facetiously, classical mechanicsdeals with those problems that can be “solved,” in the sense that it is possi-ble to derive equations of motions that describe the positions of the variousparts of a system as functions of time using standard analytic functions.Nonlinear dynamics treats problems that cannot be so solved, and it is onlyin these problems that chaos can appear. The simple pendulum makes agood example. The differential equation of motion is

θ + ω2 sin θ = 0 (1.1)

The sin is a nonlinear function of θ. If we linearize by setting sin θ ≈ θ,the solutions are elementary functions, sinωt and cosωt. If we keep the sin,the solutions can only be expressed in terms of elliptic integrals. This isnot a chaotic system, because there is only one degree of freedom, but if wehang one pendulum from the end of another, the equations of motion arehopeless to find (even with elliptic integrals) and the resulting motion canbe chaotic.1

1I should emphasize the distinction between the differential equations of motion, whichare usually simple (though nonlinear), and the equations that describe the positions ofthe elements of the system as functions of time, which are usually non-existent.

5

6 CHAPTER 1. LAGRANGIAN DYNAMICS

In order to arrive at Poincare’s moment of discovery, we will have toreview the development of classical mechanics through the nineteenth cen-tury. This material is found in many standard texts, but I will cover it herein some detail. This is partly to insure uniform notation throughout theselectures and partly to focus on those things that lead directly to chaos innonlinear systems. We will begin formulating mechanics in terms of gen-eralized coordinates and Lagrange’s equations of motion. We then studyLagrange transformations and use them to derive Hamilton’s equations ofmotions. These equations are particularly suited to conservative systemsin which the Hamiltonian is constant in time, and it is such systems thatwill be our primary concern. It turns out that Lagrange transformationscan be used to transform Hamiltonians in a myriad of ways. One particu-larly elegant form uses action-angle variables to transform a certain class ofproblems into a set of uncoupled harmonic oscillators. Systems that can beso transformed are said to be integrable, which is to say that they can be“solved,” at least in principle. What happens, Poincare asked, to a systemthat is almost but not quite integrable? The answer entails perturbationtheory and leads to the disastrous problem of small divisors. This is thepath that led originally to the discovery of chaos, and it is the one we willpursue here.

1.2 Generalized Coordinates and the Lagrangian

Vector equations, like F = ma, seem to imply a coordinate system. Be-ginning students learn to use cartesian coordinates and then learn that thisis not always the best choice. If the system has cylindrical symmetry, forexample, it is best to use cylindrical coordinates: it makes the problem eas-ier. By “symmetry” we mean that the number of degrees of freedom of thesystem is less that the dimensionality of the space in which it is imbedded.The familiar example of the block sliding down the incline plane will makethis clear. Let’s say that it’s a two dimensional problem with an x-y coor-dinate system. The block is constrained to move in a straight line, however,so that its position can be completely specified by one variable, i.e. it hasone degree of freedom. The clever student chooses the x axis so that it liesalong the path of the block. This reduces the problem to one dimension,since y = 0 and the x coordinate is given by one simple equation. In thependulum example from the previous section, it was most convenient to usea polar coordinate system centered at the pivot. Since r is constant, themotion can be described completely in terms of θ.

1.2. GENERALIZED COORDINATES AND THE LAGRANGIAN 7

These coordinate systems conceal a subtle point: the pendulum movesin a circular arc and the block moves in a straight line because they areacted on by forces of constraint. In most cases we are not interested in theseforces. Our choice of coordinates simply makes them disappear from theproblem. Most problems don’t have obvious symmetries, however. Considera bead sliding along a wire following some complicated snaky path in 3-dspace. There’s only one degree of freedom, since the particle’s positionis determined entirely by its distance measured along the wire from somereference point. The forces are so complicated, however, that it is out ofthe question to solve the problem by using F = ma in any straightforwardway. This is the problem that Lagrangian mechanics is designed to handle.The basic (and quite profound) idea is that even though there may be nocoordinate system (in the usual sense) that will reduce the dimensionality ofthe problem, yet there is usually a system of coordinates that will do this.Such coordinates are called generalized coordinates.

To be more specific, suppose that a system consists of N point masseswith positions specified by ordinary three-dimensional cartesian vectors, ri,i = 1 · · ·N , subject to some constraints. The easiest constraints to deal withare those that can be expressed as a set of l equations of the form

fj(r1, r2, . . . , t) = 0, (1.2)

where j = 1 · · · l. Such constraints are said to be holonomic. If in addition,the equations of constraint do not involve time explicitly, they are said to bescleronomous, otherwise they are called rheonomous. These constraints canbe used to reduce the 3N cartesian components to a set of 3N − l variablesq1, q2, . . . , q3N−l. The relationship between the two is given by a set of Nequations of the form

ri = ri(q1, q2, . . . , q3N−l, t). (1.3)

The q’s used in this way are the generalized coordinates. In the example ofthe bead on a curved wire, the equations would reduce to r = r(q), whereq is a distance measured along the wire. This simply specifies the curvatureof the wire.

It should be noted that the q’s need not all have the same units. Alsonote that we can use the same notation even if there are no constraints.For example, the position of an unconstrained particle could be written r =r(q1, q2, q3), and the q’s might represent cartesian, spherical, or cylindricalcoordinates. In order to simplify the notation, we will often pack the q’s


into an array and use vector notation,

q =

∣∣∣∣∣∣∣∣∣q1q2q3...

∣∣∣∣∣∣∣∣∣ (1.4)

This is not meant to imply that q is a vector in the usual sense. For onething, it does not necessarily posses “a magnitude and a direction” as goodvectors are supposed to have. By the same token, we cannot use the notionof orthogonal unit vectors.

Along with the notion of generalized coordinates comes that of general-ized velocities.

qk ≡dqkdt

(1.5)

Since qi depends only on t, this is a total derivative, but when we differentiateri, we must remember that it depends both explicitly on time as well asimplicitly through the q’s.

ri =∑k

∂ri∂qk

qk +∂ri∂t

(1.6)

(In this chapter I will consistently use the index i to sum over the N pointmasses and k to sum over the 3N − l degrees of freedom.) Differentiatingboth sides with respect to qk yields

∂ri∂qk

=∂ri∂qk

(1.7)

which will be useful in the following derivations.

1.3 Virtual Work and Generalized Force

There are several routes for deriving Lagrange’s equations of motion. Themost elegant and general makes use of the principle of least action and thecalculus of variation. I will use a much more pedestrian approach based onNewton’s second law of motion. First note that F = ma can be written inthe rather arcane form

d

dt

(∂T

∂vi

)= Fi (1.8)

Where Fi is i-th component of the total force acting on a particle withkinetic energy T . The point of writing this in terms of energy rather than

1.3. VIRTUAL WORK AND GENERALIZED FORCE 9

acceleration is that we can separate out the forces of constraint, which arealways perpendicular to the direction of motion and hence do no work. Thetrick is to write this in terms of generalized coordinates and velocities. Thisis rather technical, but the underlying idea is simple, and the result looksmuch like (1.8).

The qk’s are all independent, so we can vary one by a small amount δqkwhile holding all others constant.

δri =∑k

∂ri∂qk

δqk (1.9)

This is sometimes called a virtual displacement. The corresponding virtualwork is

δWk =∑i

Fi ·(∂ri∂qk

δqk

)(1.10)

We define a generalized force

ℑk =∑i

Fi ·∂ri∂qk

(1.11)

The forces of constraint can be excluded from the sum for the reason ex-plained above. We are left with

ℑk =δWk

δqk(1.12)

The kinetic energy is calculated using ordinary velocities.

T =1

2

∑i

mi ri · ri (1.13)

∂T

∂qk=∑i

mi ri ·∂ri∂qk

=∑i

pi ·∂ri∂qk

(1.14)

∂T

∂qk=∑i

mi ri ·∂ri∂qk

=∑i

pi ·∂ri∂qk

(1.15)

Equation (1.7) was used to obtain the last term. A straightforward calcula-tion now leads to

??ℑk =d

dt

(∂T

∂qk

)− ∂T

∂qk(1.16)

which is the generalized form of (1.8).


1.4 Conservative Forces and the Lagrangian

So far we have made no assumptions about the nature of the forces includedin ℑ except that they are not forces of constraint. Equation (16) is thereforequite general, although seldom used in this form. In these notes we areprimarily concerned with conservative forces, i.e. forces that can be derivedfrom a potential.

Fi = −∇iV (r1 · · · rN ) (1.17)

Notice that V doesn’t depend on velocity. (Electromagnetic forces are ve-locity dependent, of course, but they can easily be accommodated into theLagrangian framework. I will return to this issue later on.) Now calculatethe work done by changing some of the q’s.

W =

∫ ∑i

Fi · dri = −∑i

∫∇iV · dri

= −∑i

∫∇iV ·

∑k

∂ri∂qk

dqk

= −∑k

∫ (∑i

∇iV ·∂ri∂qk

)dqk

= −∑k

∫∂V

∂qkdqk (1.18)

The integral is a multidimensional definite integral over the various q’s thathave changed. Summing over (1.12) then gives

δW =∑k

δWk =∑k

ℑkδqk (1.19)

W =∑k

∫ℑkdqk (1.20)

Comparison with (1.18) yields

ℑk = −∂V

∂qk(1.21)

Finally define the Lagrangian

L = T − V (1.22)

1.4. CONSERVATIVE FORCES AND THE LAGRANGIAN 11

Equation (??) becomes

d

dt

(∂L

∂qk

)− ∂L

∂qk= 0. (1.23)

Equation (1.23) represents a set of 3N−l second order partial differentialequations called Lagrange’s equations of motion. I can summarize this longdevelopment by giving you a “cookbook” procedure for using (1.23) to solvemechanics problems: First select a convenient set of generalized coordinates.Then calculate T and V in the usual way using the ri’s. Use equation (1.3)to eliminate the ri’s in favor of the qk’s. Finally substitute L into (1.23) andsolve the resulting equations.

Classical mechanics texts are full of examples in which this program iscarried to a successful conclusion. In fact, most of these problems are con-trived and of little interest except to illustrate the method. The vast major-ity of systems lead to differential equations that cannot be solved in closedform. The modern emphasis is to understand the solutions qualitativelyand then obtain numerical solutions using the computer. The Hamiltonianformalism described in the next section is better suited to both these ends.

1.4.1 The Central Force Problem in a Plane

Consider the central force problem as an example of this technique.

V = V (r) F = −∇V (1.24)

L = T − V =1

2m(r2 + r2ϕ2

)− V (r) (1.25)

Let’s choose our generalized coordinates to be q1 = r and q2 = ϕ. Equation(1.23) becomes

mr −mrϕ2 + dV

dr= 0 (1.26)

d

dt

(mr2ϕ

)= 0 (1.27)

This last equation tells us that there is a quantitymr2ϕ that does not changewith time. Such a quantity is said to be conserved. In this case we haverediscovered the conservation of angular momentum.

mr2ϕ ≡ lz = constant (1.28)

This reduces the problem to one dimension.

mr =l2zmr3

− dV

dr(1.29)


Since there are no constraints, the generalized forces are identical with theordinary forces

ℑϕ = −dVdϕ

= 0 ℑr = −dV

dr(1.30)

This equation has an elegant closed form solution in the special case ofgravitational attraction.

V = −GmMr≡ −k

r(1.31)

mr =l2zmr3

− k

r(1.32)

This apparently nonlinear equation yields to a simple trick, let u = 1/r.

d2u

dϕ2+ u =

m2k

l2z(1.33)

If the motion is circular u is constant. Otherwise it oscillates around thevalue m2k/l2z with simple harmonic motion.2 The period of oscillation isidentical with the period of rotation so the corresponding orbit is an ellipse.

This problem was easy to solve because we were able to discover a non-trivial quantity that was constant, in this case the angular momentum. Theconstant enabled us to reduce the number of independent variables fromtwo to one. Such a conserved quantity is called an integral of the motionor a constant of the motion. Obviously, the more such quantities one canfind, the easier the problem. This raises two practical problems. First, howcan we tell, perhaps from looking at the physics of a problem, how manyindependent conserved quantities there are? Second, how are we to findthem?

In the central force problem, both of these questions answered them-selves. We know that angular momentum is conserved. This fact manifestsitself in the Lagrangian in that L depends on ϕ but not on ϕ. Such a coor-dinate is said to be cyclic or ignorable. Let q be such a coordinate. Then

d

dt

(∂L

∂q

)= 0 (1.34)

The quantity in brackets has a special significance. It is called the canoni-cally conjugate momentum.3

∂L

qk≡ pk (1.35)

2This illustrates a general principle in physics: When correctly viewed, everything is aharmonic oscillator.

3This notation is universally used, hence the old aphorism that mechanics is a matterof minding your p’s and q’s.

1.5. THE HAMILTONIAN FORMULATION 13

To summarize, if q is cyclic, p is conserved.Suppose we had tried to do the central force problem in cartesian coor-

dinates. Both x and y would appear in the Lagrangian, and neither px norpy would be constant. If we insisted on this, central forces would remain anintractable problem in two dimensions. We need to choose our generalizedcoordinates so that there are as many cyclic variables as possible. The twoquestions reemerge: how many are we entitled to and how do we find thecorresponding p’s and q’s?

A partial answer to the first is given by a well-known result calledNoether’s theorem: For every transformation that leaves the Lagrangianinvariant there is a constant of the motion.4 This theorem (which underliesall of modern particle physics) says that there is a fundamental connectionbetween symmetries and invariance principles on one hand and conserva-tion laws on the other. Momentum is conserved because the laws of physicsare invariant under translation. Angular momentum is conserved becausethe laws of physics are invariant under rotation. Despite its fundamentalsignificance, Noether’s theorem is not much help in practical calculations.Granted it gives a procedure for finding the conserved quantity after thecorresponding symmetry transformation has been found, but how is one tofind the transformation? The physicist must rely on his traditional tools:inspiration, the Ouija Board, and simply pounding one’s head against awall. The fact remains that there are simple systems, e.g. the Henon-Heilesproblem to be discussed later, that have fascinated physicists for decadesand for which the existence of these transformations is still controversial.

I will have much more to say about the second question. As you willsee, there is a more or less “cookbook” procedure for finding the right setof variables and some fundamental results about the sorts of problems forwhich these procedures are possible.

1.5 The Hamiltonian Formulation

I will explain the Hamiltonian assuming that there is only one degree offreedom. It’s easy to generalize once the basic ideas are clear. Lagrangiansare functions of q and q. We define a new function of q and p (given by(1.34)).

H(p, q) = p q − L(q, q) (1.36)

The new function is called the Hamiltonian, and the transformation L→ His called a Lagrange transformation. The equation is much more subtle than

4See Finch and Hand for a simple proof and further discussion.


it looks. In fact, its worth several pages of explanation.

It’s clear from elementary mechanics that q, q, and p can’t all be inde-pendent variables, since p = mq. You might say that there are two ways offormulating Newton’s second law: a (q, q) formulation, F = mq, and a (q, p)formulation, F = p. The connection between q and its canonically conjugatemomentum is usually more complicated than this, but there is still a (q, q)formulation, the Lagrangian, and a (q, p) formulation, the Hamiltonian. TheLegendre transformation is a procedure for transforming the one formula-tion into the other. The key point is that it is invertible.5 To see what thismeans, let’s first assume that q, q and p are all independent.

H(q, q, p) = p q − L(q, q) (1.37)

dH =

(p− ∂L

∂q

)dq + q dp− ∂L

∂qdq (1.38)

What is the condition that H not depend on q?

p(q, q) =∂L(q, q)

∂q(1.39)

OK. This is the definition of p anyhow, so we’re on the right track.

dH = q dp −∂L∂q

dq

dH = ∂H∂p dp +

∂H

∂qdq

Adding and subtracting these two equations gives

q(q, p) =∂H

∂p(1.40)

−∂L∂q

=∂H

∂q(1.41)

Combining (1.23), (1.39), and (1.4141) gives the fourth major result.

p(q, p) = −∂H∂q

(1.42)

5The following argument is taken from Finch & Hand.

1.5. THE HAMILTONIAN FORMULATION 15

Now here’s what I mean that Legendre transformations are invertible:First follow the steps from L → H. We start with L = L(q, q). Equation(1.39) gives p = p(q, q). Invert this to find q = q(q, p). The Hamiltonian isnow

H(q, p) = q(q, p)p− L[q, q(q, p)]. (1.43)

Now suppose that we start from H = H(p, q). Use (1.40) to find q = q(q, p).Invert to find p = p(q, q). Finally

L(q, q) = qp(q, q)−H[q, p(q, q)] (1.44)

In both cases we were able to complete the transformation without knowingahead of time the functional relationship among q, q, and p. To summarize:Equations (1.37), (1.39), and (1.41) enable us to transform between the (q, q)(Lagrangian) prescription and the (q, p) (Hamiltonian) prescription; while(1.40) and (1.41) are Hamilton’s equations of motion.

1.5.1 The Spherical Pendulum

A mass m hangs from a string of length R. The string makes an angle θwith the vertical and can rotate about the vertical with an angle ϕ.

T =1

2mR2(θ2 + sin2 θ ϕ2) (1.45)

V = mgR(1− cos θ) (1.46)

The mgR constant doesn’t appear in the equations of motion, so we canforget about it. The Lagrangian is L = T − V as usual.

pθ =∂L

∂θ= mR2θ (1.47)

pϕ =∂L

∂ϕ= mR2 sin2 θ ϕ ≡ lϕ (1.48)

The angle ϕ is cyclic, so pϕ = lϕ is constant. At this point we are still in the(q, q) prescription. Invert (47) and (48) to obtain θ and ϕ as functions of pθand lϕ.

θ = pθ/mR2 (1.49)

ϕ = lϕ/mR2 sin2 θ (1.50)

H =p2θ

2mR2+

l2ϕ

2mR2 sin2 θ−mgR cos θ (1.51)


The equations of motion follow from this.

θ =∂H

∂pθ=

pθmR2

(1.52)

pθ = −∂H

∂θ=

l2ϕ cos θ

mR2 sin3 θ−mgR sin θ (1.53)

ϕ =∂H

∂pϕ=

lϕ

mR2 sin2 θ(1.54)

pϕ = 0 (1.55)

Suppose we were to try to find an analytic solution to this system ofequations. First note that there are two constants of motion, the angularmomentum lϕ, and the total energy E = H.

1. Invert (1.49) to obtain pθ = pθ(θ, E, lϕ).

2. Substitute pθ into (1.49) and integrate∫mR2

pθdθ = t ≡ N(θ)

The integral is hopeless anyhow, so we label its output N(θ), (shortfor an exceedingly nasty function).

3. Invert the nasty function to find θ as a function of t.

4. Take the sine of this even nastier function and substitute it into (1.54)to find ϕ.

5. Integrate and invert to find ϕ as a function of t.

This makes sense in principle, but is wildly impossible in practice. Nowsuppose we could change the problem so that both θ and ϕ were cyclic sothat the two constants of motion were pθ and pϕ (rather than E and pϕ).Then

θ =∂H

∂pθ= ωθ θ = ωθt+ θ0

ϕ =∂H

∂pϕ= ωϕ ϕ = ωϕt+ ϕ0

Here ωθ and ωϕ are two constant “frequencies” that we could easily extractfrom the Hamiltonian. This apparently small change makes the problemtrivial! In both cases there are two constants of motion: it makes all thedifference which two constants. This is the basis of the idea we will bepursuing in the next chapter.

Chapter 2

Canonical Transformations

We saw at the end af the last chapter that a problem in which all thegeneralized coordinates are cyclic is trivial to solve. We also saw that thereis a great flixibility allowed in the choice of coordinates for any particularproblem. It turns out that there is an important class of problems for whichit is possible to choose the coordinates so that they are in fact all cyclic.The choice is usually far from obvious, but there is a formal procedure forfinding the “magic” variables. One formulates the problem in terms of thenatural p’s and q’s and then transforms to a new set of variables, usuallycalled Qk and Pk, that have the right properties.

2.1 Contact Transformations

The most general transformation is called a contact transformation.

Qk = Qk(q, p, t) Pk = Pk(q, p, t) (2.1)

(In this formula and what follows, the symbols p and q when used as argu-ments stand for the complete set, q1, q2, q3, · · · , etc.) There is a certain privi-leged class of transformations called canonical transformations that preservethe structure of Hamilton’s equation of motion for all dynamical systems.This means that there is a new Hamiltonian function called K(Q,P ) forwhich the new equations of motion are

Qk =∂K

∂PkPk = −

∂K

∂Qk(2.2)

In a footnote in Classical Mechanics, Goldstein suggested that K be calledthe Kamiltonian. The idea has caught on with several authors, and I willuse it without further apology. The trick is to find it.

17

18 CHAPTER 2. CANONICAL TRANSFORMATIONS

Theorem: Let F be any function of qk and Qk and possibly pk and Pk,as well as time. Then the new Lagrangian defined by

L = L− dF

dt(2.3)

is equivalent to L in the sense that it yields the same equations of motion.

Proof:

F =∑k

∂F

∂qkqk +

∑k

∂F

∂QkQk +

∂F

∂t(2.4)

d

dt

(∂F

∂qk

)=

d

dt

(∂F

∂qk

)=∂F

∂qk

d

dt

(∂F

∂Qk

)=

d

dt

(∂F

∂Qk

)=

∂F

∂Qk

These last two can be rewritten

d

dt

(∂F

∂qk

)− ∂F

∂qk= 0

d

dt

(∂F

∂Qk

)− ∂F

∂Qk= 0

So F satisfies Lagrange’s equation whether we regard it as a function of qkor Qk. Obviously, if L satisfies Lagrange’s equation, then so does L − F .(The conclusion is unchanged if F contains pk and/or Pk.) The function Fis called the generating function of the transformation.

K is obtained by a Legendre transformation just as H was.

K(Q,P ) =∑k

PkQk − L(Q, Q, t) (2.5)

This has the same form as (1.36), so the derivation of the equations of motion(1.39) through(1.42) are unchanged as well.

Pk =∂K

∂QkQk =

∂K

∂PkPk = −

∂K

∂Qk(2.6)

These simple results provide the framework for canonical transforma-tions. In order to use them we will need to know two more things: (1) How

2.1. CONTACT TRANSFORMATIONS 19

to find F , and given F , (2) how to find the transformation (q, p)→ (Q,P ).We deal with (2) now and postpone (1) to later sections.

Consider the variables q, Q, p, and P . Any two of these constitute acomplete set, so there are four kinds of generating functions usually calledF1(q,Q, t), F2(q, P, t), F3(p,Q, t), and F4(p, P, t). All four are discussed inGoldstein. F1 provides a good introduction. Most of our work will makeuse of F2.

Starting with F1(q,Q) (2.3) becomes

L(Q, Q, t) = L(q, q, t)− d

dtF1(q,Q, t) (2.7)

Since∂L

∂qk=

∂L

∂Qk= 0,

we get with the help of (4)

∂L

∂qk=∂L

∂qk− ∂F1

∂qk= pk −

∂F1

∂qk= 0.

∂L

∂Qk= Pk =

∂L

∂Qk− ∂F1

∂Qk= − ∂F1

∂Qk

This yields the two transformation equations

Pk = −∂F1

∂Qk(2.8)

pk =∂F1

∂qk(2.9)

A straightforward set of substitutions gives our final formula for the Kamil-tonian.

K =∑k

[− ∂F

∂QkQk − L+

∂F

∂qkqk +

∂F

QkQk

]+∂F

∂t

= −L+∑k

pkqk +∂F

∂t

To be more explicit

K(Q,P ) = H(q(Q,P ), p(Q,P ), t) +∂

∂tF1(q(Q,P ), Q, t) (2.10)

Summary:


1. Here is the typical problem: We are given the Hamiltonian H =H(q, p) for some conservative system. H = E is constant, but theq’s and p’s change with time in a complicated way. Our goal is to findthe functions q = q(t) and p = p(t) using the technique of canonicaltransformations.

2. We need to know the generating function F = F1(q,Q). This is thehard part, and I’m postponing it as long as possible.

3. Substitute F into (2.8) and (2.9). This gives a set of coupled algebraicequations for q, Q, p, and P . They must be combined in such a wayas to give qk = qk(Q,P ) and pk = pk(Q,P ).

4. Use (2.10) to find K. If we had the right generating function to startwith, Q will be cyclic, i.e. K = K(P ). The equations of motion areobtained from (2.6). Pk = 0 and Qk = ωk. The ω’s are a set ofconstants as are the P ’s. Qk(t) = ωkt + αk. The α’s are constantsobtained from the initial conditions.

5. Finally qk(t) = qk(Q(t), P ) and pk(t) = pk(Q(t), P ).

2.1.1 The Harmonic Oscillator: Cracking a Peanut with aSledgehammer

H =p2

2m+kq2

2=

1

2m(p2 +m2ω2q2) (2.11)

It’s useful to try a new technique on an old problem. As it turns out, thegenerating function is

F =mωq2

2cotQ (2.12)

The transformation is found from (2.8) and (2.9).

p =∂F

∂q= mωq cotQ

P = −∂F∂Q

=mωq2

2 sin2Q

Solve for p and q in terms of P and Q and then substitute into (2.10) tofind K.

q =

√2P

mωsinQ p =

√2Pmω cosQ

2.2. THE SECOND GENERATING FUNCTION 21

K = ωP P = E/ω

We have achieved our goal. Q is cyclic, and the equations of motion aretrivial.

Q =∂K

∂P= ω Q = ωt+Q0 (2.13)

q =

√2E

mω2sin(ωt+Q0) p =

√2mE cos(ωt+Q0) (2.14)

2.2 The Second Generating Function

There’s an old recipe for tiger stew that begins, “First catch the tiger.” Inour quest for the tiger, we now turn our attention to the second generatingfunction, F2 = F2(q, P, t). F2 is obtained from F1 by means of a Legendretransformation.1

F2(q, P ) = F1(q,Q) +∑k

QkPk (2.15)

We are looking for transformation equations analogous to (refe2.8) and (2.9).Since L = L+ F1,∑

k

pkqk −H =∑k

PkQk −K +d

dt(F2 −

∑QkPk)

= −∑

QkPk −K + F2

Substitute

F2 =∑k

[∂F2

∂qkqk +

∂F2

∂PkPk

]+∂F2

∂t

−H = −K +∑k

[(∂F2

∂qk− pk

)qk +

(∂F2

∂Pk−Qk

)Pk

]+∂F2

∂t

We are working on the assumption that q and P are not independent vari-ables. We enforce this by requiring that

∂F2

∂qk= pk (2.16)

∂F2

∂Pk= Qk (2.17)

K(q, P ) = H(q(Q,P ), P ) +∂

∂tF2(q(Q,P ), P ) (2.18)

1When in doubt, do a Legendre transformation.


2.3 Hamilton’s Principle Function

The F1 style generating functions were used to transform to a new set ofvariables (q, p)→ (Q,P ) such that all the Q’s were cyclic. As a consequence,the P ’s were constants of the motion, and the Q’s were linear functionsof time. The generating function itself was hard to find, however. TheF2 generating function goes one step further; it can transform to a set ofvariables in which both the Q’s and P ’s are constant and simple functions ofthe initial values of the phase space variables. In essence, our transformationis

(q(t), p(t))↔ (q0, p0)

This is a time dependent transformation, of course. The fact that we canfind such transformations shows that the time evolution of a system is itselfa canonical transformation.

We look for an F2 so that K in (2.18) is identically zero! Then from(2.6), Qk = 0 and Pk = 0. The appropriate generating function will be asolution to

H(q, p, t) +∂F2

∂t(2.19)

We eliminate pk using (2.16)

H

(q1, . . . , qn;

∂F2

∂q1, . . . ,

∂F2

∂qn; t

)+∂F2

∂t= 0. (2.20)

The solution to this equation is usually called S, Hamilton’s principle func-tion. The equation itself is the Hamilton-Jacobi equation.2

There are two serious issues here: does it have a solution, and if it does,can we find it? We will take a less serious approach: if we can find a solution,then it most surely exists. Furthermore, if we can find it, it will have theform

S =∑k

Wk(qk)− αt (2.21)

Partial differential equations that have solutions of the form (2.21) are saidto be separable.3 Most of the familiar textbook problems in classical me-chanics and atomic physics can be separated in this form. The questionof separability does depend on the system of generalized coordinates used.For example, the Kepler problem is separable in spherical coordinates butnot in cartesian coordinates. It would be nice to know whether a particular

2See Goldstein, Classical Mechanics, Chapter 103Or to be meticulous, completely separable

2.3. HAMILTON’S PRINCIPLE FUNCTION 23

Hamiltonian could be separated with some system of coordinates, but nocompletely general criterion is known.4 As a rule of thumb, Hamiltonianswith explicit time dependence are not separable.

If our Hamiltonian is separable, then when (2.21) is substituted into(2.20), the result will look like

f1

(q1,

dW1

dq1

)+ f2

(q2,

dW2

dq2

)+ · · · = α (2.22)

Each function fk is a function only of qk and dWk/dqk. Since all the q’sare independent, each function must be separately constant. This gives us asystem of n independent, first-order, ordinary differential equations for theWk’s.

fk

(qk,

dWk

dqk

)= αk. (2.23)

TheW ’s so obtained are then substituted into (2.21). The resulting functionfor S is

F2 ≡ S = S(q1, . . . , qn;α1, . . . , αn;α, t)

The final constant α is redundant for two reasons: first,∑αk = α, and

second, the transformations equations (2.16) and (2.17) involve derivativeswith respect to qk and Pk. When S is so differentiated, the −αt piece willdisappear. In order to make this apparent, we will write S as follows:

F2 ≡ S = S(q1, . . . , qn;α1, . . . , αn; t) (2.24)

Since the F2 generating functions have the form F2(q, P ), we are entitled tothink of the α’s as “momenta,” i.e. αk in (??) corresponds to Pk in (2.17).In a way this makes sense. Our goal was to transform the time-dependentq’s and p’s into a new set of constant Q’s and P ’s, and the α’s are mostcertainly constant. On the other hand, they are not the initial momenta p0that evolve into p(t). The relationship between α and p0 will be determinedlater.

If we have dome our job correctly, the Q’s given by (2.17) are also con-stant. They are traditionally called β, so

Qk = βk =∂S(q, α, t)

∂αk(2.25)

Again, β’s are constant, but they are not equal to q0.We can turn this into a cookbook algorithm.

4The is a very technical result, the so-called Staeckel conditions, which gives necessaryand sufficient conditions for separability in orthogonal coordinate systems.


1. Substitute (2.21) into (2.20) and separate variables.

2. Integrate the resulting fist-order ODE’s. The result will be n inde-pendent functions Wk = Wk(q, α). Put the Wk’s back into (2.21) toconstruct S = S(q, α, t).

3. Find the constant β coordinates using

βk =∂S

∂αk(2.26)

4. Invert these equations to find qk = qk(β, α, t)

5. Find the momenta with

pk =∂S

∂qk(2.27)

2.3.1 The Harmonic Oscillator: Again

The harmonic oscillator provides an easy example of this procedure.

H =1

2m(p2 +m2ω2q2)

1

2m

[(∂S

∂q

)2

+m2ω2q2

]+∂S

∂t= 0

1

2m

[(∂W

∂q

)2

+m2ω2q2

]= α

Since there is only one q, the entire quantity on the left of the equal sign isa constant.

W (q, α) =√2mα

∫dq

√1− mω2q2

2α

The new transformed constant “momentum” P = α.

β =∂S(q, α, t)

∂α=∂W (q, α)

∂α− t

=1

ωsin−1

[q

√mω2

2α

]− t (2.28)

Invert this equation to find q as a function of t and β.

q =

√2α

mω2sin(ωt+ βω)

2.4. HAMILTON’S CHARACTERISTIC FUNCTION 25

Evidentally, β has something to do with initial conditions: ωβ = ϕ0, theinitial phase angle.

p =∂S

∂q=√

2mα−m2ω2q2

=√2mα cos(ωt+ ϕ0)

The maximum value of p is√2mE, so that makes sense too.

2.4 Hamilton’s Characteristic Function

There is another way to use the F2 generating function to turn a difficultproblem into an easy one. In the previous section we chose F2 = S =W−αt,so that K = 0. It is also possible to to take F2 =W (q) so that

K = H

(qk,

∂W

∂qk

)= E = α1 (2.29)

The W obtained in this way is called Hamilton’s characteristic function.

W =∑k

Wk(qk, α1, . . . , αn)

=W (q1, . . . , qn, E, α2, . . . , αn) =W (q1, . . . , qn, α1, . . . , αn) (2.30)

It generates a contact transformation with properties quite different fromthat generated by S. The equations of motion are

Pk = −∂K

∂Qk= 0 (2.31)

Qk =∂K

∂Pk=∂K

∂αk= δk1 (2.32)

The new feature is that Q1 = 1 so Q1 = t− t0. In general

Qk =∂W

∂αk= βk (2.33)

but now β1 = t− t0.pk =

∂W

∂qk(2.34)

as before.The algorithm now works like this:


1. Substitute (2.30) into (2.29) and separate variables.

2. Integrate the resulting fist-order ODE’s. The result will be n inde-pendent functions Wk = Wk(q, α). Put the Wk’s back into (2.30) toconstruct S = S(q, α, t).

3. Find the constant β coordinates using

βk =∂S

∂αk(2.35)

Remember that β1 = t− t0.

4. Invert these equations to find qk = qk(β, α, t)

5. Find the momenta with

pk =∂S

∂qk(2.36)

2.4.1 Examples

Problems with one degree of freedom are virtually identical whether theyare formulated in terms of the characteristic function or the principle func-tion. Take for example, the harmonic oscillator from the previous section.Equation (2.28) becomes

β =∂W (q, α)

∂α=

1

ωsin−1

[q

√mω2

2α

]= t− t0

q =

√2α

mω2sin[ω(t− t0)] (2.37)

The following problem raises some new issues.Consider a particle in a stable orbit in a central potential. The motion

will lie in a plane so we can do the problem in two dimensions.

H =1

2m

(p2r +

p2ψr2

)+ V (r) (2.38)

pψ = mr2ψ is the angular momentum. It is conserved since ψ is cyclic.

1

2m

[(∂W

∂r

)2

+1

r2

(∂W

∂ψ

)2]+ V (r) = α1 (2.39)

2.5. ACTION-ANGLE VARIABLES 27[r2(dWr

dr

)2

+ 2mr2V (r)− 2mα1r2

]+

(dWψ

dψ

)2

= 0 (2.40)

At this point we notice ∂W∂ψ = pψ, which we know is constant. Why not

call it something like αψ? Then Wψ = αψψ. This is worth stating as ageneral principle: if q is cyclic, Wq = αqq, where αq is one of the n constantα’s appearing in (2.30).

W =

∫dr√

2m(α1 − V )− α2ψ/r

2 + αψψ (2.41)

We can find r as a function of time by inverting the equation for β1, just aswe did in (2.37), but more to the point

βψ =∂W

∂αψ= −

∫αψdr

r√

2m(α1 − V )− α2ψ/r

2+ ψ (2.42)

Make the usual substitution,u = 1/r.

ψ − βψ = −∫

du√2m(α1 − V (r))/α2

ψ − u2(2.43)

This is a new kind of equation of motion, which gives ψ = ψ(r) or r = r(ψ)(assuming we can do the integral), i.e. there is no explicit time dependence.Such equations are called orbit equations. Often it will be more useful tohave the equations in this form, when we are concerned with the geometricproperties of the trajectories.

2.5 Action-Angle Variables

We are pursuing a rout to chaos that begins with periodic or quasi-periodicsystems. A particularly elegant approach to these systems makes use of avariant of Hamilton’s characteristic function. In this technique, the integra-tion constants αk appearing directly in the solution of the Hamilton-Jacobiequation are not themselves chosen to be the new momenta. Instead, wedefine a set of constants Ik, which form a set of n independent functions ofthe α’s known as action variables. The coordinates conjugate to the J ’s areangles that increase linearly with time. You are familiar with a system thatbehaves just like this, the harmonic oscillator!

q =

√2E

ksinψ p =

√2mE cosψ


Where ψ = ωt+ ψ0. In the language of action-angle variables I = E/ω, so

q =

√2I

mωsinψ p =

√2mIω cosψ

I is the “momentum” conjugate to the “coordinate” ψ.Action angle variables are only appropriate to periodic motion, and there

are other restrictions we will learn as we go along, but within these limi-tations, all systems can be transformed into a set of uncoupled harmonicoscillators.5 To see what “periodic motion” implies, have a look at thesimple pendulum.

H =p2θ

2ml2−mgl cos θ = E = α (2.44)

pθ = ±√

2ml2(E +mgl cos θ) (2.45)

There are two kinds of motion possible. If E is small, the pendulum willreverse at the points where pθ = 0. The motion is called libration, i.e.bounded and periodic. If E is large enough, however, the pendulum willswing around a complete circle. Such motion is called rotation (obviously).There is a critical value of E = mgl for which, in principle, the pendulumcould stand straight up motionless at θ = π. An orbit in pθ - θ phase spacecorresponding to this energy forms the dividing line between the two kindsof motion. Such a trajectory is called a separatrix.

For either type of periodic motion, we can introduce a new variable Idesigned to replace α as the new constant momentum.

I(α) =1

2π

∮p(q, α) dq (2.46)

This is a definite integral taken over a complete period of libration or rota-tion.6 I will prove (1) the angle ψ conjugate to I is cyclic, and (2) ∆ψ = 2πcorresponds to one complete cycle of the periodic motion.

1. Since I = I(α) and H = α, it follows that H is a function of I only.H = H(I).

I = −∂H∂ψ

= 0 ψ =∂H

∂I= ω(I) (2.47)

5When correctly viewed, everything is a harmonic oscillator.6Textbooks are about equally divided on whether to call action I or J and whether or

not to include the factor 1/2π.

2.5. ACTION-ANGLE VARIABLES 29

2. We are using an F2 type generating function, which is a function of theold coordinate and new momentum. Hamilton’s characteristic functioncan be written as

W =W (q, I). (2.48)

The transformation equations are

ψ =∂W

∂Ip =

∂W

∂q(2.49)

Note that∂ψ

∂q=

∂

∂I

(∂W

∂q

)so∮

dψ =

∮∂ψ

∂qdq =

∂

∂I

∮∂W

∂qdq =

∂

∂I

∮p dq =

∂

∂I(2πI) = 2π.

2.5.1 The harmonic oscillator (for the last time)

H =1

2m(p2 +m2ω2q2)

p = ±√

2mE −m2ω2q2

I =1

2π

∮ √2mE −m2ω2q2 dq

The integral is tricky in this form because p changes sign at the turningpoints. We won’t have to worry about this if we make the substitution

q =

√2E

mω2sinψ (2.50)

This substitution not only makes the integral easy and takes care of the signchange, it also makes clear the meaning of an integral over a complete cycle,i.e. ψ goes from 0 to 2π.

I =E

πω

∮cos2 ψ dψ = E/ω

From this point of view the introduction of ψ at (50) seems nothingmore that a mathematical trick. We would have stumbled on it eventually,


however, as the following argument shows. The Hamilton-Jacobi equationis

1

2m

[(dW

dq

)2

+m2ω2q2

]= E

W =

∫ √2mIω −m2ω2q2 dq

∂W

∂I= mω

∫dq√

2mIω −m2ω2q2

= sin−1

(q

√mω2

2I

)− ψ0 = ψ

q =

√2E

mω2sin(ψ − ψ0)

In the last equation ψ0 appears as an integration constant. Evidentally, ψis the angle variable conjugate to I.

In summary, to use action-angle variables for problems with one degreeof freedom:

1. Find p as a function of E = α and q.

2. Calculate I(E) using (2.46).

3. Solve the Hamilton-Jacobi equation to find W =W (q, I).

4. Find ψ = ψ(q, I) using (2.49).

5. Invert this equation to get q = q(I, ψ).

6. Use (2.47) to get ω(I).

7. Calculate p = p(I, q) from (2.49).

One attractive feature of this scheme is that you can find the frequencywithout using the characteristic function and without finding the equationsof motion. The phase space plot is particularly important. Use polar coor-dinates (what else) for (I, ψ). Every trajectory, whatever the system, is acircle!

Our derivation was based on the following assumptions: (1) The systemhad one degree of freedom. (2) Energy was conserved and the Hamiltonianhad no explicit time dependence. (3) The motion was periodic. Every suchsystem is at heart, a harmonic oscillator. Phase space trajectories are circles.

2.5. ACTION-ANGLE VARIABLES 31

The frequency can be found with a few deft moves. From a philosophicalpoint of view, (and we will be getting deeper and deeper into philosophy asthese lectures proceed) problems in this category are “as good as solved,”nothing more needs to be said about them. The same is definitely not truetrue with more than one degree of freedom. I will take a paragraph togeneralize before going on to some more abstract developments.

We must assume that the system is separable, so

W (q1, . . . , qn, α1, . . . , αn) =∑k

Wk(qk, α1, . . . , αn) (2.51)

pk =∂

∂qkWk(qk, α1, . . . , αn) (2.52)

Ik =1

2π

∮pk(qk, α1, . . . , αn) (2.53)

Next find all the q’s as function of the I’s and substitute into W .

W =W (q1, . . . , qn; I1, . . . , In)

Finally

ψk =∂W

∂IkIk = 0 ψk =

∂H

∂Ik= ωk (2.54)

Chapter 3

Abstract TransformationTheory

So, one-dimensional problems are simple. Given the restrictions listed inthe previous section, their phase space trajectories are circles. How doesthis generalize to problems with two or more degrees of freedom? A briefanswer is that, given a number of conditions that we must discuss carefully,the phase space trajectories of a system with n degrees of freedom, moveon the surface of an n-dimensional torus imbedded in 2n dimensional space.The final answer is a donut! In order to prove this remarkable assertion andunderstand the conditions that must be satisfied, we must slog through alot of technical material about transformations in general.

3.1 Notation

Our first job is to devise some compact notation for dealing with higherdimensional spaces. I will show you the notation in one dimension. It willthen be easy to generalize. Recall Hamilton’s equations of motion.

p = −∂H∂q

q =∂H

∂p

We will turn this into a vector equation.

η =

(qp

)J =

(0 1−1 0

)∇ =

(∂∂q∂∂p

)(3.1)

The equations of motion in vector form are

η = J ·∇H (3.2)

33

34 CHAPTER 3. ABSTRACT TRANSFORMATION THEORY

J is not a vector of course. Sometimes an array used in this way is called adyadic. At any rate this is just shorthand for matrix multiplication, i.e.(

qp

)=

(0 1−1 0

)( ∂H∂q∂H∂p

)

The structure of J is important. Notice that it does two things: it exchangesp and q and it changes one sign. This is called a symplectic transformation.I want to explore the connection between canonical transformations andsymlpectic transformations.

I’ll start with the generic canonical transformation, (q, p)→ (Q,P ). Howdo the velocities transform? Define

M =

(∂Q∂q

∂Q∂p

∂P∂q

∂P∂p

)(3.3)

(Q

P

)=

(∂Q∂q

∂Q∂p

∂P∂q

∂P∂p

)(qp

)(3.4)

Using the notation

ζ =

(QP

)(3.5)

this can be writtenζ =M · η =M · J ·∇H (3.6)

The gradient operator differentiatesH with respect to q and p. These deriva-tives transform e.g.

∂H

∂q=∂H

∂Q

∂Q

∂q+∂H

∂P

∂P

∂q

consequently∇(q,p) =M

T ·∇(Q,P )H (3.7)

The T stands for transpose, of course:

ζ =M · J ·MT ·∇(Q,P )H (3.8)

butζ = J ·∇(Q,P )H (3.9)

Combining (3.8) and (3.9):

J =M · J ·MT (3.10)

3.1. NOTATION 35

Those of you who have studied special relativity should find (??) conge-nial. Remember the definition of a Lorentz transformation: any 4×4 matrixΛ that satisfies

g = Λ · g ·ΛT (3.11)

is a Lorentz transformation.1 The matrix

g =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

(3.12)

is called the metric or metric tensor. Forgive me for exaggerating slightly:everything there is to know about special relativity flows out of (3.11). Wesay that Lorentz transformations “preserve the metric,” i.e. leave the metricinvariant. The geometry of space and time is encapsulated in (12). By thesame token, canonical transformations preserve the metric J . The geometryof phase space is encapsulated in the definition of J . Since J is symplectic,canonical transformations are symplectic transformation, they preserve thesymplectic metric.

Equation (4.10) is the starting point for the modern approach to mechan-ics that uses the tools of Lie group theory. I will only mention in passingsome points of contact with group theory. Both Goldstein’s and Schenk’stexts have much more on the subject.

3.1.1 Poisson Brackets

Equation (3.10) is really shorthand for four equations, e.g.

∂Q

∂q

∂P

∂p− ∂P

∂q

∂Q

∂p= 1 (3.13)

This combination of derivatives is called a Poisson bracket. The usual no-tation is

∂X

∂q

∂Y

∂p− ∂X

∂q

∂Y

∂p≡ [X,Y ]q,p (3.14)

The quantity on the left is called a Poisson bracket. Then (3.13) becomes

[Q,P ]q,p = 1 (3.15)

1It is not a good idea to use matrix notation in relativity because of the ambiguityinherent in covariant and contravariant indices. Normally one would write (11) usingtensor notation.


This together with the trivially true

[q, p]q,p = 1 (3.16)

are called the fundamental Poisson brackets. We conclude that canonicaltransformations leave the fundamental Poisson brackets invariant. It turnsout that all Poisson brackets have the same value when evaluated with re-spect to any canonical set of variables. This assertion requires some proof,however. I will start by generalizing to n dimensions.

η =

q1q2...qnp1p2...pn

J =

(0 ℑn−ℑn 0

)∇ =

∂∂q1∂∂q2...∂∂qn∂∂p1∂∂p2∂∂pn

(3.17)

The symbol ℑn is the anti-diagonal n× n unit matrix. (refe3.14) becomes

[X,Y ]η ≡∑k

(∂X

∂qk

∂Y

∂pk− ∂X

∂pk

∂Y

∂qk

), (3.18)

or in matrix notation

[X,Y ]η = (∇ηX)T · J ·∇ηY. (3.19)

The following should look familiar:

[qi, qk] = [pi, pk] = 0 [qi, pk] = δik

These are, of course, the commutation relations for position and momentumoperators in quantum mechanics. The resemblance is not accidental. Theoperator formulation of quantum mechanics grew out of Poisson bracketformulation of classical mechanics. This development is reviewed in all thestandard texts. In matrix notation

[η,η]η = [ζ, ζ]η = J (3.20)

The Poisson bracket of two vectors is itself a n× n matrix. i.e.

[X,Y ]ij ≡ [Xi, Yj ] (3.21)

3.1. NOTATION 37

The proof of the above assertion is straightforward.

∇ηY =MT ·∇ζY

(∇ηX)T = (MT ·∇ζX)T = (∇ζX)T ·M[X,Y ]η = (∇ζX)T ·M · J ·MT ·∇ζY

= (∇ζX)T · J ·∇ζY = [X,Y ]ζ

The last step makes use of (3.10). The invariance of the Poisson brackets isa non-trivial consequence of the symplectic nature of canonical transforma-tions. From now on will will not bother with the subscripts on the Poissonbrackets.

Here is another similarity with quantum mechanics. Let f be any func-tion of canonical variables.

f =∑k

(∂f

∂qkqk +

∂f

∂pkp

)+∂f

∂t

=∑k

(∂f

∂qk

∂H

∂pk− ∂f

∂pk

∂H

∂qk

)+∂f

∂t

df

dt= [f,H] +

∂f

∂t(3.22)

This looks like Heisenberg’s equation of motion. For our purposes it meansthat if f doesn’t depend on time explicitly and if [f,H] = 0, then f is aconstant of the motion. We can use (3.22) to test if our favorite function isin fact constant, and we can also use it to construct new constants as thefollowing argument shows.

Let f , g, and h be arbitrary functions of canonical variables. The fol-lowing Jacobi identity is just a matter of algebra.

[f, [g, h]] + [g, [h, f ]] + [h, [f, g]] = 0 (3.23)

Now suppose h = H, the Hamiltonian, and f and g are constants of themotion. Then

[H, [f, g]] = 0

Consequence: If f and g are constants of the motion, then so is [f, g].This should make us uneasy. Take any two constants. Well, maybe theycommute, but if not, then we have three constants. Commute the newconstant with f and g and get two more constants, etc. How many constantsare we entitled to – anyway? This is a deep question, which has somethingto do with the notion of involution. I’ll get to that later.


3.2 Geometry in n Dimensions: The Hairy Ball

Have another look at equation (3.2). Let’s call η a velocity field. By thisI mean that it associates a complete set of q’s and p’s with each point inphase space. In what direction does η point? This is an easy question inone dimension; η evaluated at the point P points in the direction tangentto the trajectory through P . Since trajectories can’t cross in phase space,there is only one trajectory through P , and the direction is unambiguous.If we use action-angle variables, the trajectory is a circle, and η is what wewould call in Ph211, a tangent velocity. The same is true, no doubt, forn > 1, but how do these circles fit together? How does one visualize this inhigher dimensions?

The answer, as I have mentioned before, is that the trajectories all lieon the surface of an n dimensional torus imbedded in 2n dimensional space.Your ordinary breakfast donut is a two dimensional torus imbedded in threedimensional space.2 This is easy to visualize, so let’s limit the discussionto two degrees of freedom for the time being. The step from one degree offreedom to two involves some profound new ideas. The step from two tohigher dimension is mostly a matter of mathematical generalization.

Since we are dealing with conservative systems, the trajectories are lim-ited by the conservation of energy, i.e. H(q1, q2; p1, p2) = E is an equationof constraint. The trajectories move on a manifold with three independentvariables. Now the gradient of a function has a well defined geometricalsignificance: at the point P , ∇f points in a direction perpendicular to thesurface or contour of constant f through P . In this case∇H is a four compo-nent vector perpendicular to the surface of constant energy. Unfortunately,η points in the direction of J ·∇H. What direction is that? Well,

(∇H)T · J ·∇H = [H,H] = 0

Consequently, J ·∇H points in a direction perpendicular to ∇H, which isperpendicular to the plane of constant H, i.e. η lies somewhere on the threedimensional surface of constant H.

We could have guessed that ahead of time, of course, but we can take theargument further. H is probably not the only constant of motion. Supposethere are others; call them F , G, etc. For each of these constants we can

2The word “dimension” gets used in two different ways. When we talk about physicalsystems, Lagrangians, Hamiltonians, etc., the dimension is equal to the number of degreesof freedom. Here I am using dimension to mean the number of independent variablesrequired to describe the system.

3.2. GEOMETRY IN N DIMENSIONS: THE HAIRY BALL 39

construct a vector field using (2).

ηF = J ·∇FηG = J ·∇G· · · etc. · · ·

How many such fields can we construct that are independent of one another?To put it another way, how many independent constants of motion are there?That’s a good question – what do you mean by “independent”? The answercomes from differential geometry. I’m afraid I can only give a hand-wavingintroduction to it. There are two related requirements:

1. Suppose F and G are independent constants of motion. Take anytrajectory from the manifold of constant F and another from constantG. There is no continuous canonical transformation that maps the onetrajectory into another.

2. For each point P in space there must be one unique trajectory that liesin the plane of constant F and simultaneously in the plane of constantG.

Think about this last requirement in the case where there are two degreesof freedom and two independent constants of motion. The trajectories mustlie on a two- dimensional surface. If we use action-angle variables, thetrajectories are circles. This sounds like a globe of the earth. Trajectorieswith constant ϕ are called longitudes, lines of constant θ are latitudes. Butwait! We have a serious problem at the poles. The north and south poleshave all possible longitudes. Requirement 2 is violated. Could you rearrangethe lines so that this problem doesn’t occur? It turns out that this is notpossible. This deep result is known in mathematical circles a the Poincare-Hopf theorem. In the sort of less exalted company we keep, it’s the HairyBall Theorem. The idea is this: try to comb the hair on a hairy ball so thatthere is no bald spot. It can’t be done. So long as you really use a comb, i.e.so long as the trajectories don’t cross, you will always be left with one hairstanding straight up! This is not a proof, of course, but it is a vivid wayof visualizing the content of the theorem. It is easy to see, however, thatwhat is impossible on a sphere is trivially easy on the surface of a donut. Itcan be done in an infinite variety of ways. The simplest is to choose your“longitudes” so they go around the donut the long way. Latitudes go aroundthe short way. This also satisfies requirement 1. You can’t deform a latitudeinto a longitude without cutting through the donut.


OK. Suppose you have two constants of motion F and G. How can youtell if they are independent? The answer is surprisingly simple, [F,G] = 0

Proof: Take a point P on the surface of the donut. We should be ableto set up a local coordinate system with its origin at P to describe thetrajectories on the surface. We need two unit vectors, ξF and ξG, such thatevery trajectory in the ξF - ξG plane has constant F and G. Choose

ξF = ϵJ ·∇F

This is guaranteed to lie in the surface of constant F ; however, G shouldremain constant along ξF . This means that

0 = (ξF )T ·∇G = ϵ(J ·∇F )T ·∇G

= −ϵ(∇F )T · J ·∇G = −ϵ[F,G]

This proves the assertion. It’s worth reflecting on the fact that this con-struction would be impossible on the surface of a sphere. The sphere, unlikethe donut, has only one independent constant, its radius. This theorem alsorelieves our anxiety about extra constants. If F and G are independent, wedon’t get a “free” constant K = [F,G], because K = 0.

Summary and generalization:

1. A system with n degrees of freedom has at most n independent con-stants of motion. Otherwise we could use the additional constants toeliminate one or more of these degrees. For example, we could use theHamilton-Jacobi procedure to make all the momenta constant. TheHamiltonian would then only be a function of the n coordinates, butthese would not be independent because of the additional constraints.

2. Let’s say there are k constants, Fi, i = 1, . . . , k. If they are indepen-dent we must have [Fi, Fj ] = 0.

3. In the best case there are exactly n independent constants. Suchconstants are said to be in involution. Such a system is said to beintegrable.

4. All trajectories of integrable systems are confined to the surfaces ofn-dimensional tori imbedded in 2n-dimension space.

5. If k < n there are no general statements we can make about thebehavior of the trajectories. We will be very much concerned in thenext chapter with systems that are “almost” integrable.


6. There are no general criteria known for deciding whether or not asystem is integrable; however, if the Hamiltonian is separable, thesystem is integrable.

3.2.1 Example: Uncoupled Oscillators

The Hamiltonian for two uncoupled harmonic oscillators (with m = 1) is

H =1

2(p21 + p22 + ω2

1q21 + ω2q

22)

This is an important problem because every linear oscillating system canbe put in this form by a suitable choice of coordinates.3 There are twoconstants of motion

E1 =1

2(p21 + ω2

1q21) E2 =

1

2(p22 + ω2

2q22)

In terms of action-angle variables, the constants are I1 and I2.

H = I1ω1 + I2ω2 = E1 + E2 = E

Every integrable system can be put in this form, although in general theω’s will be functions of the I’s. Here they are just parameters from theHamiltonian.

This is a simple problem, but the phase space is four dimensional. Let’sthink about all possible ways we might visualize it. In the q1 - p1 or (q2 -p2) plane the trajectories are ellipses with

qk(max) =√2Ek/ωk pk(max) =

√2Ek,

where k = 1, 2. The area enclosed by each ellipse is significant, because

area =

∫sdq dp =

∮p dq = 2πI (3.24)

The first integral is a surface integral over the area of the ellipse. The secondis a line integral around the ellipse. This identity is a variant of Stokes’stheorem. It’s useful to rescale the variables so that they both have the sameunits and the trajectory is a circle. An natural choice would be

q′k = qk√ωk =

√2Ik sinψk p′k = pk/

√ωk =

√2Ik cosψk

3This comes under the heading of “theory of small oscillations.” Most mechanics textsdevote a chapter to it.


The trajectories are now circles with radius√2Ik. The area enclosed is 2πIk,

as required by (3.24).The motion in the q1 - q2 plane is more complicated. It depends on

the ratio ω1/ω2 called the winding number. If this is a rational number,say N1/N2 then after N1 cycles of q1 and N2 cycles of q2, the trajectorywill come back to its starting point. This is called a Lissajou figure. If thewinding number is irrational, the trajectory will be confined to a limitedarea but will never return to its starting point. It will eventually “colorin” all available space. In the next chapter we will be concerned with sys-tems that are “almost” integrable. For such systems the winding number isall-important. Systems with irrational winding numbers tend to be stableunder perturbation. Those with rational winding numbers disintegrate atthe slightest push!

The centerpiece of this chapter is the torus. The trajectories spiralaround the donut. If the winding number is rational they “wear a path”around the donut. If it’s irrational they cover the donut evenly. A usefulway of visualizing this was invented by Poincare. Imagine a flat plane cut-ting through the donut in such a way that every point on the plane has theangle ψ1 = 0. Place a dot on the plane at he point where each trajectorypasses through it. If the winding number is a rational fraction, there willbe a finite number of points. Each time a trajectory passes through ψ1 = 0it will pass through one of the dots. If the winding number is irrational thecrossings will mark out a continuous circle. The Poincare section as it iscalled (some books call it the surface of section) is a useful diagnostic tool.Suppose you have a system of equations that are not integrable (so far asyou know) but is amenable to computer calculation. Take various Poincaresections. If they are circles then the system is at least approximately in-tegrable and can be described with action-angle variables. As we will see,there are often regions of phase space, “islands” as it were, where motion issimply periodic and other regions that are wildly chaotic.

Pictures of this motion appear in all the standard texts. I have yet tosee a clear explanation of the coordinates involved, however. What does itmean really to say that the donut is a 2-d surface in a 4-d space? Yourbreakfast donut, after all, is imbedded in 3-d space. If we take a Poincaresection through the donut at the plane ψ2 = 0 and plot q′1 versus p′1, wewill get either a circle of dots or a continuous circle with a radius equal to√2I1, or we can take a slice through ψ1 = 0 and get a circle with radius√2I2. Put it this way, any point on the torus has four (polar) coordinates,

(√2I1, ψ1,

√2I2, ψ2), but in 3-d space, only three of them are independent.

When the torus is in 4-d space, all four of them are independent. If we really


lived in 4-d space, we would label the axes of the donut plot (q′1, p′1, q

′2, p

′2).

This is impossible for us to imagine. The donut is easy; just remember thatthere is no equation of constraint among the four variables.4

3.2.2 Example: A Particle in a Box

Consider a particle in a two-dimensional box with elastic walls.

0 ≤ x ≤ a 0 ≤ y ≤ b

H =1

2m(p2x + p2y) =

π2

2m

(I21a2

+I22b2

)I1 =

1

2π

∮px dx =

a

π|px| I2 =

b

π|py|

ω1 =∂H

∂I1=

π2

ma2I1 ω2 =

π2

mb2I2

There are several interesting points about this apparently trivial problem.The Hamiltonian looks linear, but in fact it contains an invisible nonlinearpotential that reverses the particle’s momentum when it hits the wall. Onesymptom of this is that the frequencies depend on I. This looks odd, butit’s just the action-angle way of saying that the particle makes a round trip(in the x direction) in a time T = 2am/px. The loop integral in this contextis an integral over one “round trip” of the particle.∮

pxdx =

∫ a

0|px| dx+

∫ 0

a(−|px|) dx = 2a|px|

My real point in showing this example is to call your attention to theangle variable. I will work through the calculation for the x variable. Thissame thing holds for y of course.

1

2m

(dWx

dx

)2

= E1

Wx =

∫(±)√

2mE1 dx =

∫(±)πI1

adx

ψ1 =∂Wx

∂I1= ±π

a

∫dx = ±π

ax+ ψ10 = ψ1

4Of course, I1 and I2 are constant for any given set of initial conditions. It is this sensein which the torus is a 2-d surface.


The term ψ10 is an integration constant. There is no reason why it must bethe same for both legs of the journey. We are free to choose it as follows:

0→ x→ a: ψ1 = πx/a

0← x← a: ψ1 = 2π − πx/a

While the particle is bouncing violently between the walls, the angle vari-ables are increasing smoothly with time, ψ1 = ω1t and ψ2 = ω2t. Even thisstrange problem is equivalent to a donut!5

5When correctly viewed, everything is a harmonic oscillator – in this case two harmonicoscillators.

Chapter 4

Canonical PerturbationTheory

So far we have assumed that our systems had exact analytic solutions. Oneway of stating this is that we can find a canonical transformation to actionangle variables such that the new Hamiltonian is a function of the actionvariables only, H = H(I). Such problems are the exception rather than therule. For our purposes they are also uninteresting. All periodic integrablesystems are equivalent to a set of uncoupled harmonic oscillators. Once youget over the thrill of this discovery, the oscillators are boring! The existenceof chaos depends on the system not being equivalent to a set of oscillators.In order to deal with systems that are non-trivial in this sense, we need someway of doing perturbation theory.1

4.1 One-Dimensional Systems

I will present the theory first for systems with one degree of freedom. Thiswill simplify the notation, however the interesting complications only appearin higher dimensions. Here is the basic situation: A bounded conservativesystem with one degree of freedom is described by a constant HamiltonianH(q, p) = E. We need to obtain the equations of motion in the form q = q(t)

1I will follow the treatment in Chaos and Integrability in Nonlinear Dynamics, MichaelTabor, Wiley-Interscience, 1989. Another good reference is Classical Mechanics by R. A.Metzner and L.C. Shepley, Prentice Hall, 1991. The subject is also discussed in ClassicalMechanics, Goldstein, Poole and Safko, third edition, Addison-Wesley, 2002. Goldsteindiscusses time-dependent and time-independent perturbation theory. We are doing thetime-independent variety.

45

46 CHAPTER 4. CANONICAL PERTURBATION THEORY

and p = p(t), but this is impossible due to the non-linear nature of theproblem. We are able to split up the Hamiltonian

H = H0 + ϵH1

in such a way that H0 is amenable to exact solution, and H1 is in somesense small. We indicate the smallness by multiplying it by ϵ. This is abookkeeping device; it will be set to one after the approximations have beenderived.

The first step is to find the canonical transformation that makes H0

cyclic, i.e. q = q(I, ψ), p = p(I, ψ), and H0 = H0(I) where I and ψ = ω0

are both constant. Unfortunately, this transformation does not render thecomplete Hamiltonian cyclic, so we write

H(I, ψ) = H0(I) + ϵH1(I, ψ). (4.1)

H is still constant, and consequently I now depends on ψ. H0 is not anexplicit function of ψ, but it does depend on ψ implicitly through I.

Despite this inconvenience, I and ψ are still a perfectly good set ofcanonical variables, so that the equations of motion

I = − ∂

∂ψH(I, ψ) ψ =

∂

∂IH(I, ψ) (4.2)

are valid without approximation, even though we are unable to solve them inthis form. The so-called time-dependent perturbation proceeds from here byexpanding the solutions of (4.2) as power series in ϵ. Our approach is to finda second canonical transformation, i.e. (q, p) → (I, ψ) → (J, φ) such thatH(I, ψ)→ K(J). This last step must be done as a series of approximations,of course, otherwise the problem would be exactly solvable.

In order to make the transformation (I, ψ)→ (J, φ) we will use a gener-ating function of the F2 genus, i.e. F = F (ψ, J). We need to expand

F = F0(ψ, J) + ϵF1(ψ, J) + · · · (4.3)

where F0 = Jψ. This is the identity transformation as can be seen as follows:

I =∂

∂ψF0 = J φ =

∂

∂JF0 = ψ

In terms of (??)the transformation equations are

I =∂F

∂ψ= J + ϵ

∂F1

∂ψ(ψ, J) + · · · (4.4)

4.1. ONE-DIMENSIONAL SYSTEMS 47

φ =∂F

∂J= ψ + ϵ

∂F1

∂J(ψ, J) + · · · (4.5)

Before going on there are some technical points about ψ and J that needto be discussed. When ϵ = 0, ψ is the exact angle variable for the system.This means that we can find p and q as functions of ψ such that p and qreturn to their original values when ∆ψ = 2π. We can in principle invertthis transformation to find ψ as a function of p and q.

ψ = ψ(q, p) (4.6)

When p and q run through a complete cycle, ψ advances by 2π. When ϵ = 0the orbit will be different from the unperturbed case, but the functionalrelationship doesn’t change, so when p and q run through a complete cycle,we must still have ∆ψ = 2π. Of course, the exact angle variable will alsoadvance 2π. In summary

∆ψ = ∆φ = 2π (4.7)

for one complete cycle.

The following integrals are all equal because canonical transformationspreserve phase space volume.

J =1

2π

∮p dq =

1

2π

∮J dφ =

1

2π

∮I dψ (4.8)

Now integrate (4.4) around one orbit:

1

2π

∮I dψ =

1

2π

∮J dψ +

1

2πϵ

∮∂F1

∂ψdψ + · · · ;

that is

J = J +1

2πϵ

∮∂F1

∂ψdψ + · · · ;

We have just seen that ∆ψ = 2π around one cycle. Consequently∮∂F1

∂ψdψ = 0 (4.9)

implies that the derivative of F1 is purely oscillatory with a fundamentalperiod of 2π in ψ. (The same is true of the higher order terms as well.)

The Hamiltonian is transformed using (??) with the new variables.

K(φ, J) = H(ψ(φ, J), I(φ, J)) +∂

∂tF2(ψ(φ, J), J, t) (4.10)


As explained above, we seek a transformation that makes φ cyclic so thatK = K(J). The appropriate generating function does not depend on time,so (4.10) becomes

K(J) = H(ψ(φ, J), I(φ, J)) (4.11)

The approximation procedure consists in expanding the left and right sidesof this equation in powers of ϵ and then equating terms of zeroth and firstorder. This procedure could be carried out to higher order. I’m interestedin first order corrections only.

The so-called Kamiltonian is expanded as follows:

K(J) = K0(J) + ϵK1(J) + · · ·

At first sight, this agenda looks hopeless. We need to know the exact valueof J to make use any of these terms, even the zeroth order approximation.The exquisite point is that we can use (4.8) to calculate J exactly withoutknowing the complete transformation.

The zeroth order Hamiltonian is expanded with the help of (4.4).

H0(I) = H0(∂F

∂ψ) = H0(J + ϵ

∂F1

∂ψ+ · · · ) = H0(J) + ϵ

∂F1

∂ψ

∂H0

∂J

∣∣∣∣ϵ=0

+ · · ·

∂H0(J)

∂J

∣∣∣∣ϵ=0

=∂H0(I)

∂I= ω0 (4.12)

The first order term is already multiplied by ϵ.

ϵH1(ψ, I) = ϵH1(φ, J) + · · ·

Substitute all this into (??) gives

K0(J) = H0(J) (4.13)

K1(J) =∂F1

∂ψω0 +H1(φ, J) (4.14)

The notation H0(J) means that you take your formula for H0(I) and replacethe symbol I with the symbol J without making any change in the functionalform of H0.

Integrate (4.14) around one cycle and use (4.9); (4.14) becomes

K1(J) = H1(J) ≡1

2π

∫ 2π

0H1(ψ, J) dψ, (4.15)

4.1. ONE-DIMENSIONAL SYSTEMS 49

and∂

∂ψF1(ψ, J) =

1

ω0(J)

[H1 −H1(ψ, J)

]≡ −H1(ψ, J)

ω0(4.16)

H1 is the periodic part of H1. We are left with a differential equation thatis easy to integrate.

F1(ψ, J) = −1

ω0(J)

∫dψ H1(ψ, J) (4.17)

4.1.1 Summary

I will summarize all these technical details in the form of an algorithm fordoing first order perturbation theory. Remember that the object is to findequations of motion in the form q = q(t) and p = p(t). We do this in threesteps: (1) Find q = q(I, ψ) and p = p(I, ψ). (2) Find I = I(J, φ) andψ = ψ(J, φ). (3) J and φ are constant, so φ = φt+ φ0.

1. Identify the H0 part of the Hamiltonian. Find the transformationequations q = q(I, ψ) and p = p(I, ψ) using the Hamiltonian-Jacobiequation as described in the previous section. Use (??) to get ω0.

2. Equation (4.8) can be used to find J in terms of the total energyE. The integral presents no difficulties in principle, especially if theHamiltonian is separable. In fact, textbooks never bother to do this.It seems sufficient to display the results in terms of J , the assumptionbeing that we could find J = J(E) if we really had to.

3. The first order correction to the energy is obtained from the integral in(4.15). Get the first order correction to the frequency by differentiatingit with respect to J .

4. The generating function F1 is calculated from (4.17). It is then sub-stituted into (4.4) and (4.5). These give implicit equations for ψ =ψ(J, φ) and I = I(J, φ). Unfortunately, it is usually impossible toinvert them to obtain these formula explicitly.

4.1.2 The simple pendulum

The pendulum makes a nice example

H =l2

2mR2+mgR(1− cos θ)


The angular momentum l = mR2θ is canonically conjugate to the angle θ.

H =1

2mR2

[l2 +m2R4ω2

0θ2

(1− θ2

12+ · · ·

)]The first two terms reduce to the familiar harmonic oscillator. This is thezeroth order problem.

H0 = E0 =l2

2mR2+mgRθ2

2

l2 =

(dW

dθ

)2

= 2mR2E0 −m2R4ω20θ

2

Make the natural substitution

sin2 ψ =ml2ω2

0

2E0θ2 (4.18)

l =

(dW

dθ

)=√2mR2E cosψ (4.19)

We can look on this as a convenient change of variable, but ψ is also theangle variable. This can be seen as follows:

I =1

2π

∮l dθ =

√2mR2E0

2π

∮ [1− mR2ω2

0

2E0θ2]1/2

dθ

Use (4.19) to get the familiar result, I = E0/ω0. The generating function isobtained from the indefinite integral

W =

∫ (dW

dθ

)dθ =

∫ [2mR2ω0I −m2R4ω2

0θ2]1/2

dθ (4.20)

According to the basic transformation formula we should have

ψ =∂W

∂I

One can show by differentiating (4.20) and using (4.19) to complete theintegration, that this is indeed so.

Equations (4.18) and (4.19) can be rearranged to give

l =√

2mR2Iω0 cosψ (4.21)

4.2. MANY DEGREES OF FREEDOM 51

θ =

√2I

mR2ω0sinψ (4.22)

The goal of the action-angle program is to express the original coordinatesand momenta in terms of the action-angle variables. This has now beencompleted to zeroth order.

The first order correction is

H1(I, ψ) = −mR2ω2

0θ4

24= − I2

6mR2sin4 ψ.

We are now in a position to recast our Hamiltonian a la (4.1).

H(I, ψ) = Iω0 + ϵ

(− I2

6mR2sin4 ψ

)We have also obtained ω0 =

√g/R “for free.” The ϵ is there for bookkeeping

purposes only. We have no further need for it.

K0(J) = H0(J) = Jω0

K1(J) = H1(J) =1

2π

∫ 2π

0H1 dψ = − J2

16mR2

H = H1 −H1 =J2

48mR2(3− 8 sin4 ψ)

F1(J, ψ) = −1

ω0

∫dψ H1 =

J2

192 mR2ω0(sin 4ψ − 8 sin 2ψ)

ω = ω0 −J

32mR2

4.2 Many Degrees of Freedom

For systems of two or more degrees of freedom, canonical perturbation the-ory is formulated in exactly the same way as before – but now profounddifficulties arise, even to first order in ϵ. The problem centers around equa-tion (4.16) repeated here for reference

ω0(J)∂F1(ψ, J)

∂ψ= −H1(ψ, J)

We were able to solve this with a simple integration (4.17). This is notpossible for more that one degree of freedom, so we must resort to Fourier


series. Before doing this, however, we will need to generalize our notation.Let’s use the vectors

J = (J1, · · · , Jn) ω0 = (ω01, · · · , ω0n) ∇ψ = (∂

∂ψ1, · · · , ∂

∂ψn)

where n is the number of degrees of freedom. In this notation (4.16) becomes

ω0(J) ·∇ψF1(ψ,J) = −H1(J ,ψ) (4.23)

where

H1(J ,ψ) =

∫ 2π

0dψ1 · · ·

∫ 2π

0dψnH1(J ,ψ) (4.24)

and H1 = H1 − H1. Since both sides of (4.16) are periodic, we can solvethem with Fourier series.

H1(J ,ψ) =∑k

Ak(J)eik·ψ (4.25)

F1(J ,ψ) =∑k

Bk(J)eik·ψ (4.26)

where k is a vector of integers

k = k1, · · · , kn

It seems as if we could proceed as follows: H1 is known at this point, so wecan find Ak Substitute these definitions into (4.16) we get

Bk = iAkω0 · k

(4.27)

Now here’s the infamous problem. Suppose, for example, there were onlytwo degrees of freedom. In this case the denominator of (??) would be

ω0 · k = ω01k1 + ω02k2 (4.28)

You can see that if the winding number ω01/ω02 is a rational number, thenfor some k, Bk will be infinite. It seems that the slightest perturbationwill blow this system into outer space! Even if the winding number is notrational, there will always be values of k that will make ω0 · k arbitrarilysmall.

This problem was discovered in the early twentieth century, and all theeffort of the most eminent mathematicians of the day failed to solve it.

4.2. MANY DEGREES OF FREEDOM 53

One opinion held that the slightest perturbation would cause the system tobecome “ergodic,” that is to say, the trajectories would fill up all of phasespace. Numerical calculations later showed that this was often not the case.Trajectories will often “lock in” to stable patterns. This has been the subjectof much contemporary research. When and why do trajectories lock in, andwhat happens when they do not? The question of what trajectories remainstable under small perturbations is at least partly answered by the so-calledKAM (Kolmogorov, Arnold, Moser) theorem. In the general case there is,if not a complete theory, at least a well-developed taxonomy. We will turnto these matters in the next chapter.

Chapter 5

Introduction to Chaos

The canonical perturbation theory of the previous chapter is a lot of work,and in two or more degrees of freedom it summons up the ogre of smalldenominators. Many people have tried to solve this problem by poundingtheir heads on it. This turns out not to be a fruitful approach. I willillustrate the limitations of perturbation theory by considering the van derPol oscillator. This is a simple nonlinear, one-dimensional, second-orderdifferential equation closely resembling a damped harmonic oscillator. Ithas stable solutions which can easily be found numerically, yet it has noknown analytic solutions, and perturbation theory, on general principles,just can’t work!1 We then go on to discuss linear stability theory. Withthese simple techniques you can analyze most nonlinear systems (the vander Pol oscillator is an exception) and get a qualitative picture of the phasespace dynamics. In one degree of freedom (two-dimensional phase space)it will become immediately apparent where perturbation theory is possibleand a qualitative idea of the motion of the system where it is not.

Higher dimensional spaces are not so easy to analyze, in part becausethey are hard to visualize and in part because they are often not inte-grable. It is this non-intagrability that leads to chaos. Here we resort to thePoincaree section and the notion of discrete maps. The Poincare-Birkoff andKAM theorems can then tell us something about the onset and structure ofchaos.

1It should be remembered that all the major developments in elementary particle theoryover the last few decades starting with the standard model in the 1970’s are based on thenotion of spontaneous symmetry breaking. Spontaneous symmetry breaking, almost bydefinition, cannot be described with perturbation theory. When perturbation theory failswe always expect new physics. The same is true (to a lesser extent) in classical mechanicsas well.

55

56 CHAPTER 5. INTRODUCTION TO CHAOS

5.1 The total failure of perturbation theory

To get some feeling for how perturbation theory might be useless, look atthe following “toy” example.

x = −x+ ϵ(x2 + x2 − 1) sin(√2t) (5.1)

This looks like a harmonic oscillator with a resonant frequency ω = 1 anda “small” driving term with a frequency ω =

√2. Obvious solutions are

x(t) = sin t and x(t) = cos t, which hold for all values of ϵ. If we set ϵ = 0then the solutions more generally are x(t) = x0 sin(t + t0). This solutionplotted on a phase space plot of x(t) versus x(t) will be a circle with radiusr = x0. What would you expect for finite ϵ? There presumably are othersolutions, but don’t waste your time looking for them! You should convinceyourself however, that there are no solutions of the form

x(t) = sin t+

∞∑n=1

ϵnfn(t) (5.2)

Also convince yourself that the trouble comes from the non-linear terms.The point is because of the non-linearity, it is not possible to start withunperturbed solutions and get new solutions by adding to them.

A more interesting and oft-studied example is the van der Pol equation.It was first introduced by van der Pol in 1926 in a study of the nonlinearvacuum tube circuits of early radios.

x+ ϵ(x2 − 1)x+ x = 0 (5.3)

Again the ϵ = 0 equations are x(t) = x0 sin(t + t0). In phase space this isa circle of radius x0. If we make ϵ ever so much larger than zero, however,something remarkable happens as shown in the first of the plots in Fig.5.1. Yes the orbit eventually becomes a circle, but regardless of the initialconditions, the radius r ≈ 2. The same sort of behavior is shown in Fig. 5.1for larger values of ϵ. The shape of the final orbit is determined entirely by ϵand is completely unaffected by the initial conditions. A curve of the sort iscalled a limit cycle. It’s easy to see in vague way why the limit cycle exists.The term proportional to ϵ in (5.3) looks like an oscillator damping term, butits sign depends on whether x2 is greater or less than 1. If it is greater, theoscillation is damped; if it is smaller the oscillation is “undamped.” Indeed,if ϵ is made negative, the orbits either collapse to zero or diverge to infinitydepending on the initial conditions. For obvious reasons the solutions with

5.1. THE TOTAL FAILURE OF PERTURBATION THEORY 57

−4 −2 0 2 4−4

−2

0

2

4

x

dx/d

t

Epsilon=0.1

−4 −2 0 2 4−4

−2

0

2

4

x

dx/d

t

Epsilon=0.5

−4 −2 0 2 4−4

−2

0

2

4

x

dx/d

t

Epsilon=1.5

−4 −2 0 2 4−10

−5

0

5

10

x

dx/d

t

Epsilon=3

Figure 5.1: The van der Pol plot for four values of ϵ and two starting values(indicated by asterisks)


positive ϵ are said to be stable and those with negative ϵ are said to beunstable.

This simple model makes an important point. Conventional perturbationtheory starts with unperturbed, i.e. ϵ = 0 solutions, and then looks forseries solutions in powers of ϵ. This is obviously hopeless here since evena smidgeon of ϵ is enough to completely alter the nature of the orbits. Itwould be better to start with some simple function that approximated thelimit cycle and then expand in powers of some parameter that characterizedthe deviation of the actual orbit from the simple function. Alas, I don’tknow how to do this. The trouble is that the limit cycle is so weird, at leastfor large ϵ, that it’s hard to come up with a “lowest-order” solution. Formany systems however, this is a practical approach. The trick is to look forthe fixed points.

5.2 Fixed points and linearization

Equations of motion can always be cast in the form

ξ = f(ξ, t). (5.4)

With n degrees of freedom ξ and f are 2n-dimensional vectors. For example,Hamilton’s equations with one degree of freedom are

q =∂H

∂p

p = −∂H∂q

ξ =

[pq

](5.5)

To keep the notation simple and general (and to save typing) I will keep thenotation in the form (5.5) for the time being and not type out the p’s andq’. I will also restrict the discussion to autonomous systems, i.e. those inwhich the Hamiltonian does not depend explicitly on time.2

A fixed point (also called a stationary point, equilibrium point, or criticalpoint) is simply the point ξf where all the time derivatives vanish, f(ξf ) =ξf = 0. It’s the place where nothing happens. Detailed information about

2The notation in this section is taken from Classical Dynamics by J. V. Jose and E. JSaletan

5.2. FIXED POINTS AND LINEARIZATION 59

the motion of a system close to a fixed point can be obtained by linearizingthe equations of motion. This is done as follows: First the origin is movedto the fixed point by writing

ζ(t) = ξ(t; ξ0)− ξf (5.6)

Second, (5.4) is written for ζ rather than for ξ.

ζ = f(ζ + ξf ) ≡ g(ζ) (5.7)

Third, g is expanded in a Taylor series about ζ = 0.

ζj =dgj

dζk

∣∣∣∣ξf

+O(ζ2) ≡ Ajkζk +O(ζ2) (5.8)

I am using the Einstein summation convention in which one sums over re-peated indices. Dropping the ζ2 terms gives the matrix equation

z = A · z (5.9)

A is a constant matrix called (among other things) the stability matrix. It’seasy to solve (9) using the matrix exponential.

z(t) = eAtz0 (5.10)

where

eAt ≡∞∑n=0

Antn/n! (5.11)

For our purposes it will be enough to take the case of one degree of freedomin which case A is a 2× 2 real, constant matrix. If A is diagonal

eAt =

∣∣∣∣ eλ1t 00 eλ2t

∣∣∣∣ (5.12)

where λ1 and λ2 are eigenvalues, which might be real or complex. If theyare complex they come in complex-conjugate pairs, λ∗1 = λ2.

Various cases can be identified. If both eigenvalues are real and positive,all trajectories flow away from the fixed point which is then called unstable.If they are both negative all trajectories flow toward and the fixed point issaid to be stable. If the eigenvalues have opposite signs then the trajectoriesare repelled from one axis and attracted to the other. This is called ahyperbolic fixed point or a saddle point.


Figure 5.2: Unstable fixed point for real λ2 > λ1 > 0.

Figure 5.3: Unstable fixed point for real λ1 = λ2 > 0.

5.2. FIXED POINTS AND LINEARIZATION 61

Figure 5.4: Hyperbolic fixed point for real λ1 < 0, λ2 > 0.

It is possible that A cannot be diagonalized. In that case it can at leastbe in upper triangular form, i.e.

A =

∣∣∣∣ λ 0µ λ

∣∣∣∣ (5.13)

then

z(t) = eλt∣∣∣∣ 1 0µt 1

∣∣∣∣ (5.14)

Complex eigenvalues require a bit more discussion. Let λ = α+ iβ andz = u+ iv, where α and β are real numbers, and u and v are real vectorsorthogonal to one another. Separating real and imaginary parts

A · u = αu− βv A · v = βu+ αv (5.15)

Evidentally A · z∗ = λ∗z∗ so z∗ is an eigenvector with eigenvalue λ∗. Eignevectors belonging to different eigenvalues are independent. We can constructthe independent real vectors u and v as follows

u =z+ z∗

2v =

z− z∗

2i(5.16)


Figure 5.5: Unstable fixed point for nondiagonalizable A matrix. All of theintegral curves are tangent to z2 at the fixed point.

Substituting these definitions into (5.10) gives

eAtu = eαt(u cosβt− v sinβt), (5.17)

eAtv = eαt(u sinβt+ v cosβt). (5.18)

There are two important cases, α > 0 in which case the fixed point isunstable and the orbits are spirals, and α = 0 when the phase portraitconsists of circles. In this case the fixed point is call a center or an ellipticpoint.

It should be remembered that (5.9) is a linearized equation. It holdin some small region of the fixed point, and of course, as is so often thecase, the theory gives us no well to tell how small that region might be. Thedamped oscillator makes a good example of the method, and there are manyother examples in the textbooks. On the other hand, the the theory failscompletely for the van der Pol oscillator in the previous section.

5.3 The Henon oscillator

Although the theory from the previous section is perfectly general in thesense that it can be applied to systems with any number of degrees of free-

5.3. THE HENON OSCILLATOR 63

Figure 5.6: Unstable fixed point for complex λ with ℜ(λ) > 0.

Figure 5.7: Stable fixed point for λ pure imaginary.


dom, it is almost impossible to visualize in four or more dimensions, and thenumber of cases that must be considered increases rapidly. The best toolfor visualizing higher dimensional spaces is the Poincare section. This wasdescribed briefly in Chapter 3, and we will make more use of it shortly. Be-fore doing so it will be useful to a good example of motion with two degreesof freedom. A fascinating and oft-studied cases is the Henon-Heiles Hamil-tonian. The Hamiltonian was originally used to model the motion of starsin the galaxy3. Written in terms of dimensionless variables the Hamiltonianis

H =1

2(x2 + y2 + k1x

2 + k2y2) + λ(x2y − y3

3) (5.19)

This is the Hamiltonian of two uncoupled harmonic oscillators with a per-turbation proportional to λ. The oscillators have frequencies ω1 =

√k1 and

ω2 =√k2. The phase space is the four-dimensional space spanned by x,

x, y, and y. We can think of the unperturbed orbit as lying on two tori.In this case their cross sections are circular with radii determined by theinitial conditions. If the winding number is w = r/s, x will complete rcycles while y completes s. Let us make a Poincare section through the ytorus at x = 0. Each time the orbit passes through from x < 0 to x > 0we mark a point at y and y on the x = 0 plane. An example is shown inFigure (5.8) for w = 7/2. Because the winding number is rational thereare seven discrete dots on the Poincare section. The case of an irrationalwinding number is shown in Figure (5.9). The x vs. y plot is completelyfilled in, and the Poincare plot is a continuous loop. Continuous loops likethis on the Poincare plot are a sign that the system is circulating around aninvariant torus and hence is integrable.

When we turn on the perturbation by making λ = 0 something remark-able happens.4 Figures (5.10) through (5.13) show a progression from theorderly motion of the uncoupled oscillators Figure (5.9) through the loop in(5.10) suggesting motion around a single distorted torus. As the interactionstrength is increased this one torus breaks up into five separate tori. In thenext plot Figure (5.12) the points are beginning to disperse in a randomway with some structure remaining. Because of the poor resolution of theplots one cannot see the fine details that remain. Finally in the last plot thepoints are arranged in a completely random pattern. This is paradigmatic.As the strength of the perturbation increases orderly motion disintegrates

3See Goldstein’s Classical Dynamics for a review of the physics4I am following standard practice by varying the perturbation strength by changing

the total energy with λ = 1.


−0.1 −0.05 0 0.05 0.1

−0.1

−0.05

0

0.05

0.1

x

y

x vs y

−0.1 −0.05 0 0.05 0.1

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

y

dy/d

t

Poincare Section through x=0

Figure 5.8: Harmonic oscillator coordinates for w = 7/2

−0.3 −0.2 −0.1 0 0.1 0.2 0.3

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

y

dy/d

t


Figure 5.9: Harmonic oscillator coordinates irrational winding number


−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

y

dy/d

tPoincare Section through x=0

Figure 5.10: Henon-Heiles Hamiltonian. Orbit circulates a distorted torus.

−0.4 −0.2 0 0.2 0.4 0.6−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

y

dy/d

t


Figure 5.11: The orbit breaks up into smaller tori.


−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

y

dy/d

t


Figure 5.12: Chaos begins to set in.

−0.5 0 0.5 1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

y

dy/d

t


Figure 5.13: Complete Chaos


into chaos. One of the goals of chaos theory is to explain explain and predictthis phenomena. This will require some new formalism.

5.4 Discrete Maps

Suppose we were to number the points on the Poincare plot in the orderthey appeared as the orbit repeatedly cut through the x = 0 plane. Thiswould give us a series of coordinates (x1, y2), (x2, y2), · · · , (xn, yn). Think ofthis in terms of a mapping operator T that maps the n’th point into then+ 1’th point.

T(xn, yn) ≡ (xn+1, yn+1) (5.20)

In principle we could derive the exact mathematical form for this operator.(I doubt that anyone has actually done this.) Certainly we could writea computer program to do the mapping, and certainly we could derive alinearized version of T that would be OK for small displacements. For mypurposes it will be enough to consider the general properties such operatorsmust have. The first of these (from which all others flow) is that they mustbe area preserving.

Canonical transformations preserve the volume of phase space. This iscalled Liouville’s theorem; it’s proved in most mechanics texts. For a one-degree of freedom system, this is just preservation of area in the (p, q)-phaseplane. Thus for some area A, enclosed by a closed curve C, we can useStokes’ theorem to write ∮

Cp dq =

∮C′p dq (5.21)

where C′ is the shape of the curve after it has been changed by some canon-ical transformation including the passage of time which is itself a canonicaltransformation. Another way to say the same thing is that if the (q, p) pointin phase space is transformed to (q′, p′), then the Jacobian∣∣∣∣∂(q′, p′)∂(q, p)

∣∣∣∣ = 1 (5.22)

These results can be extended to higher dimensions in a completely straight-forward manner.

So far, so good. There is a corollary to Liouville’s theorem that is notso easy to prove. The transformations of the form (5.20) on the Poincare

5.4. DISCRETE MAPS 69

0 1 2 3 4 5 6

−3

−2

−1

0

1

2

3

phi

J

Standard Map, ε = 0.00

Figure 5.14: The standard map with ϵ = 0.

section also preserve area in the sense of (5.22).5 Discrete maps are areapreserving.

Let’s take time out for an example. The following transformation iscalled the standard map, presumably because it appears in so many differentcontexts. Thanks to the Jn+1 (rather than Jn) in the first of equation (5.23)it is trivially area preserving.

ϕn+1 = (ϕn + Jn+1)mod2π (5.23)

Jn+1 = ϵ sinϕn + Jn

This is a one-dimensional map written in terms of action-angle variables Jand ϕ. Not only ϕ, but also J is periodic with period 2π. We can imagineall the orbits wrapped around a cylinder. In the case ϵ = 0, the (ϕn, Jn)’s liealong parallel circles as shown in Figure 5.14. When ϵ is increased to 0.050a new feature appears, a loop in the center of the plot. This is unusual inthe sense that it can be contracted to a point; it is topologically distinctfrom all the ϵ = 0 circles. As ϵ is increased an assortment of smaller loopsappear together with a smattering of completely random points. Because oflimitations on plot resolution, computer time, and my patience you cannotsee the really significant thing about this plot: this pattern of islands of loopyorder interspersed with random dots persists at ever smaller and smallerscales. They have a property called self similarity. In this sense they are

5Tabor, Appendix 4.1


0 1 2 3 4 5 6

−3

−2

−1

0

1

2

3

phi

J


Figure 5.15: The standard map with ϵ = 0.050.

similar to fractal patterns. Finally, as ϵ is increased further, all appearanceof order disappears and the dots become completely random. This is thestate of complete chaos.

5.5 Linearized Maps

Like the continuous transformations we studied in Section 5.2, discrete mapshave fixed points about which one can analyze the local topology. Considera generic mapping of the form6[

xi+1

yi+1

]= T

[xiyi

](5.24)

A fixed point of the mapping would be a point where xi+1 = xi and yi+1 = yi.I will argue later on that in a plot like Figure 5.16 there are an infinitenumber of fixed points, but to keep the algebra simple here I will assumethat the fixed point is at the origin (0, 0). Linearizing T about this pointgives [

δxi+1

δyi+1

]=

[T11 T12T21 T22

] [δxiδyi

](5.25)

where of course

Tij =∂Ti∂xj

∣∣∣∣xi,xj=0

(5.26)

6I am using Tabor’s notation from section 4.3.4.

5.5. LINEARIZED MAPS 71

0 1 2 3 4 5 6

−3

−2

−1

0

1

2

3

phi

J


Figure 5.16: The standard map with ϵ = 0.750.

The eigenvalues λi of the Tij matrix must satisfy

λ2 − λ(trace(T )) + det(T ) = 0 (5.27)

The all-important point here is that because of the area-preserving propertyof T, det(T ) = 1. This greatly restricts the allowed types of fixed points.There are only three cases to consider.

If |trace(T )| < 2, λ1 λ2 are a complex conjugate pair lying on the unitcircle, that is,

λ1 = e+iα, λ2 = e−iα (5.28)

This is simply a rotation in the vicinity of the fixed point (0, 0). This cor-responds to a stable or elliptic point. Thus in the immediate neighborhoodof (0, 0) we expect to find invariant curves like Figure 5.7.

If |trace(T )| > 2, λ1 λ2 are real numbers satisfying

λ1 = 1/λ2 (5.29)

There are two subcases to consider here depending on whether λ is positiveor negative. If it is positive we have a regular hyperbolic fixed point inwhich successive iterate stay on the same branch of the hyperbola as inFigure 5.17 (a). If λ < 0 we have a hyperbolic-with-reflection fixed pointin which successive iterates jump backwards and forwards between oppositebranches of the hyperbola. (See Figure 5.17 (b).)


Figure 5.17: (a) Hyperbolic fixed point. (b) Hyperbolic-with-reflection fixedpoint.

5.6 Lyapunov Exponents

Loosely speaking, systems are chaotic because adjacent trajectories divergeexponentially from one another. If this were literally true we could parame-terize this divergence with the function eλx, where λ is some constant and xis the independent variable, which might be continuous or discrete depend-ing on the application. This is the basic idea behind Lyapunov exponentials,a formalism with many alternate definitions (and spellings).

Let’s apply this idea first to a one-dimensional iterative map of the form

xi+1 = f(xi) (5.30)

We can characterize the divergence of two trajectories separated by ϵ uponthe n-th iteration as

limϵ→0

|f(xn + ϵ)− f(xn)|ϵ

=

∣∣∣∣df(xn)dxn

∣∣∣∣ (5.31)

A small but finite deviation at the n-th iteration, say δxn, should grow to

δxn+1 ≈∣∣∣∣df(xn)dxn

∣∣∣∣ δxn (5.32)

Continuing this reasoning

δxn+1

δx0=

∣∣∣∣df(xn)dxn

df(xn−1)

dxn−1× · · · × df(x0)

dx0

∣∣∣∣ (5.33)

5.6. LYAPUNOV EXPONENTS 73

=n∏i=0

|f ′(xi)| = eλn

The last equality is just a hypothesis. λ will certainly depend on the pointn where we stop iterating. We should write instead

λ(n) =1

nln

n∏i=0

|f ′(xi)| (5.34)

with the understanding that the definition only makes sense if there is somerange of n over which λ(n) is more or less constant. λ defined in this way isa Lyapunov exponent.

In the case of multidimensional mappings

xi+1 = F (xi) (5.35)

where x and F are n-dimensional vectors, there will be a set of n charac-teristic exponents corresponding to the n eigenvalues of the linearized map(??). Introducing the eigenvalues λi(N), i = 1, . . . , n, of the matrix

(LM)N = (T (xN )T (xN−1) · · ·T (x1))1/N (5.36)

where T (xi) is the linearization of F at the point xi, the exponents aredefined as

σi(N) = ln |λi(N)| (5.37)

Since the T ’s have unit determinant for area-preserving maps, it is clear thatthe sum of the exponents must be zero.

For the final example, suppose the equation of motion is

x = f(x) (5.38)

Let s(t) = x(t) − x0(t) be the difference between two near-by trajectories.If this does indeed diverge exponentially with time, then s = λs. Then wecan argue that

s = x− x0 = f(x)− f(x0) = λs = λ(x− x0) (5.39)

λ =f(x)− f(x0)

x− x0≈ df

dx

∣∣∣∣x0

(5.40)


5.7 The Poincare-Birkhoff Theorem

The phase-space trajectories of integrable systems move on smooth tori. Theappearance of the Poincare section depends on whether the winding numberis rational or irrational. If it is rational the section shows discrete points. Ifirrational, the points are ‘ergodic’ and form a continuous loop. Under theinfluence of nonlinear perturbations the tori become distorted, then breakup into smaller tori, and finally disintegrate into chaos. It turns out thatthe way this happens depends on whether the winding number is rational orirrational. If it is irrational the tori are preserved, distorted but preserved,under small perturbations. This is a gross oversimplification of the KAMtheorem, which I will discuss in the next Section 5.8. If the winding numberis rational, the tori break up in a way that is governed by the so-calledPoincare-Birkoff theorem, the subject of this section. This may seem likea swindle, since every irrational number can be approximated to arbitraryaccuracy by a rational number. But, as it turns out, some numbers are moreirrational than others!

I will prove the PB theorem for the standard map equation (5.23) but itis true under quite general assumptions. I will use the symbol Tϵ for (5.23),i.e.

Tϵ(ϕn, Jn) = (ϕn+1, Jn+1).

Now imagine the points in Figure 5.14 (ϵ = 0) plotted in polar coordinates(for positive J) with ϕ the angular and J the radial coordinate. The pointsnow lie on concentric circles of constant J . Choose J ≡ Jr = 2πj/k, withk and j integers, i.e. Jr has a rational winding number. If we iterate T0 ktimes, J remains unchanged and ϕ is incremented by k factors of 2π, whichis to say, ϕ is not changed at all. Symbolically

Tk0(ϕ, Jr) = (ϕ, Jr)

Now take a J+ slightly larger than Jr. Tk0 will increment ϕ by slightly more

than 2πk so ϕ will increase. In the same way if J− < Jr, Tk0 will cause ϕ to

decrease. We can imagine the values of ϕ lying on three circles J+, Jr, andJ− as shown in Figure 5.18(a).

Now turn on a small perturbation ϵ > 0. Tkϵ will map some ϕ’s to largervalues and some to smaller, but there will be some locus of points, called Cin Figure 5.19 which are not changed at all. In other words, the curve C ismapped purely radially.

Tkϵ (Jr, ϕ) = (Jc, ϕ)

5.7. THE POINCARE-BIRKHOFF THEOREM 75

Figure 5.18: (a) Three orbits of the unperturbed standard map Tk0. (b) Theϕ coordinate is left invariant on C by the perturbed map Tkϵ .

Curve C is mapped into a new curve called D in Figure 5.18(b).

Tkϵ (Jc, ϕ) = (Jd, ϕ)

The curves C and D must have the same area (remember these are area-preserving transformations) so they must cross one another an even numberof times. This situation is shown in Figure 5.19. The crossings representpoints that are invariant under Tkϵ – they are fixed points.

This is our first result. A torus with rational winding number j/k isinvariant under Tk0, i.e. every point on the torus is a fixed point of Tk0.When ϵ is even slightly larger than zero, only a discrete (even) number offixed points of Tkϵ survive. You can ascertain the type of fixed points byseeing how other points in their immediate vicinity are mapped. Comparethis flow as it’s called with the arrows in Figures 5.4 and 5.17. You should beable to convince yourself that the points along the curve C are alternativelyhyperbolic and elliptic. Figure 5.20 should help you visualize this. Sincethere are an even number of fixed points, half of them will be elliptic andhalf hyperbolic. How many are there? Suppose (ϕ0, J0) is a fixed point ofTkϵ . We can create more fixed points by multiplying by Tϵ as the following


Figure 5.19: The curves C and D. Crossings, like a and b, are fixed points.

Figure 5.20: A closer look at the fixed points a and b.

5.8. ALL IN A TANGLE 77

simple argument shows.

Tkϵ [Tϵ(ϕ0, J0)] = TϵTkϵ (ϕ0, J0) = Tϵ(ϕ0, J0)

Starting with (ϕ0, J0) we can create k − 1 additional fixed points by multi-plying repeatedly with Tϵ. To put it another way, every fixed point of Tkϵ is amember of a family of k fixed points obtained by multiplying by various pow-ers of Tϵ Because each mapping is a continuous function of ϕ and J , all themembers of an elliptic family are elliptic and all the members of a hyperbolicfamily are hyperbolic. I claim that all the members of a family are distinct.Proof: Let (ϕs, Js) be the fixed point obtained by Tsϵ(ϕ0, J0) = (ϕs, Js) withs < k. Then of course all such points are fixed points of Tkϵ . The claim isthat there is no m < k such that Tmϵ (ϕs, Js) = (ϕs, Js). Multiply both sidesof this equation with T−s

ϵ . The result is Tmϵ (ϕ0, J0) = (ϕ0, J0) It is just thisequation with m replaced by k that defines (ϕ0, J0). Hence m = k. Finallynote that none of these newly created fixed points can lie along the originalcurve C. If there were there would be instances in which two hyperbolic ortwo elliptic points appeared side by side. This we know to be impossible.Consequently each torus breaks up into k fixed points for every fixed pointon the curve C. This is the Poincare-Birkoff theorem.

5.8 All in a tangle

Have another look at the hyperbolic fixed points in Figures 5.4 and 5.17.There are always two loci of points leading directly toward the fixed pointand two loci leading away from it. These are called the stable and unstablemanifolds respectively. Following the notation of Hand and Finch I will callthem H+ and H−. Call the fixed point pf . Any point along H+ will bemapped asymptotically back to pf under repeated applications of Tϵ, andany point on H− will be mapped asymptotically back to pf under repeatedapplications of T−1

ϵ . Can these manifolds cross one another? I claim thefollowing.

• H+ and H− cannot intersect themselves, but they can and do intersectone another.

• Stable manifolds of different fixed points cannot intersect one another.

• Unstable manifolds of different fixed points cannot intersect one an-other.


• Stable manifolds can intersect with unstable ones. The stable andunstable manifolds of a single fixed point intersect in what are calledhomoclinic points and those of two different fixed points, in heteroclinicpoints.

• Neither H+ nor H− can cross the tori surrounding elliptic fixed points.

• There are, depending on the size of ϵ, narrow bands surrounding toriwith irrational winding number that are not broken up into isolatedfixed points. This is the content of the KAM theorem to be discussedin the next section. Neither H+ nor H− can cross these bands.

The proofs of these assertions are easy and are given in Finch and Hand.Referring to Figure 5.21(a), x0 is a heteroclinic point that lies on the unstablemanifold H− of pf1 and the stable manifold H+ of pf2. Since both manifoldsare invariant under Tϵ, the T

kϵx0 are a set of discrete points that lie on both

manifolds, so the two manifolds must therefore intersect again. For instance,because x1 = Tϵx0 is on both manifolds, H− must loop around to meet H+.Similarly the xk = Tkϵ must lie on both manifolds, so H− must loop aroundover and over again as illustrated in Figure 5.21(b). The inverse map alsoleaves H+ and H+ invariant and hence the x−k = T−k

ϵ x0 are intersectionsthat force H+ to loop around to meet H−. As k increases and xk approachesone of the fixed points, the spacing between the intersections gets smaller,so the loops they create get narrower. But because Tϵ is area-preserving,the loop areas are the same, so the loops get longer, which leads to manyintersections among them, as shown in Figure 5.21(c) and (d).

Try explaining all this to an intimate friend on a date. The more youexplain the more you will see that this mechanism produces a tangle offathomless complexity.7 Nonetheless the mess is contained, at least for smallϵ. Since stable manifolds cannot cross, the stable manifold emanating frompf1 acts as a barrier to the stable manifold emanating from pf2. The same istrue of the unstable manifolds. The tangle also cannot cross the stable torrisurrounding the elliptic fixed points nor can they cross the KAM tori. Asa consequence we expect to see islands of chaos developing between stableellipses. This is clear in Figures 5.15 and 5.16. As ϵ increases, the KAM torialso break down and chaos engulfs the entire plot.

7Don’t try to explain this for higher-dimensional spaces. That way lies madness.

5.8. ALL IN A TANGLE 79

Figure 5.21: A hereroclinic intersection. (a) Two hyperbolic fixed points pf1and pf2, and an intersection x0 of the unstable manifold of pf1 with the sta-ble manifold of pf2. (b) Adding the forward maps Tkx0 of the intersection.(c) Adding the backward maps T−1x0 of the intersection. (d) Adding an-other intersection x′ and some of its backward maps. U – unstable manifold;S – stable manifold.


5.9 The KAM theorem and its consequences

For a system with n independent degrees of freedom to be integable, it isa necessary and sufficient condition that n independent constants of themotion exist. In this case the system can be transformed into a set ofaction-angle variables

ω0 ≡ (ω01, ω02, · · · , ω0n) I0 ≡ (I01, I02, · · · , I0n) (5.41)

In this notation

ω0 =∂H0

∂I0(5.42)

Now suppose the system is perturbed slightly

H(ω0, I0, ϵ) = H0(I0) + ϵH1(ω0, I0) (5.43)

where I0 and ω0 are the AA variables of H0. According to our perturbationformalism from Chapter 4, there are two series that must converge (4.25)and (4.26) repeated here for convenience.

H1(I,ψ) =∑k

Ak(I)eik·ψ (5.44)

F1(I,ψ) =∑k

Bk(I)eik·ψ (5.45)

where k is a vector of integers8

k = k1, · · · , kn

and

Bk = iAkω0 · k

(5.46)

The rate of decrease of the |Bk| depends both on the |Ak| and the denomi-nators |ω0 ·k|, so even if the |Ak| decrease fast enough for (5.44) to converge,(5.45) will not converge if the |ω0 · k| decrease too rapidly.

The situation seems hopeless. If any of the ω0’s yields a rational windingnumber, the series will blow up immediately, and if one is working with finiteprecision – on a computer for example – every number is a rational number.And yet we have seem from our computer models that some stable periodic

8The sum over k means the sum over all possible combinations of the n integersk1, · · · , kn.

5.9. THE KAM THEOREM AND ITS CONSEQUENCES 81

trajectories persist even under the influence of small perturbations. The cir-cumstances under which this happens is spelled out in a remarkable theoremfirst outlined by Kolmogorov and later proved independently by Arnold andMoser. The theorem is extremely difficult and sophisticated although Taborhas a nice explanation of the basic ideas, and an understandable outline ofthe proof is given in Classical Dynamics by Jose and Saletan. I will explainthe theorem as carefully as I can and let it go at that.

5.9.1 Two Conditions

The KAM theorem claims that in regions of phase space where certain con-ditions hold, the perturbation series converges to all order in ϵ. The firstcondition involves the Hessian matrix.

det

∣∣∣∣∂ω0α

∂I0β

∣∣∣∣ ≡ det

∣∣∣∣ ∂2H0

∂I0α∂I0β

∣∣∣∣ = 0. (5.47)

The content of this statement is as follows: We assume that each torus hasa unique frequency associated with it. Thus if we knew all the ω0’s we couldcalculate all the I0’s and vice versa. Equation (5.47) ensures that this istrue. A simple (albeit artificial) example is provided by Jose and Saletan.

Consider the one degree of freedom Hamiltonian

H = I3/3 (5.48)

in which I takes on values in the interval −1 < I < +1. The above conditionrequires that

d2H

dI2= 2aI = 0 (5.49)

Why is this significant? Note that ω(I) = dH/dI = aJ2. Inverting this givesI(ω) = ±

√ω/a. The ± is a sign that the inversion is not unique. There

are two regions separated by I = 0. In the region 0 < I ≤ 1 I =√ω/a. In

the region −1 ≤ I < 0, I = −√ω/a. Thus there are two “good” regions

separated by a barrier.

There is a second condition restricting the frequencies. Of course weare only considering frequencies with irrational winding numbers. Even ifthe frequencies are incommensurate, |ω0 ·k| could be arbitrarily small. TheKAM theorem requires that it be bounded from below by the so-called “weakdiophantine condition”

|ω0 · k| ≥ γ|k|−κ for all integer k (5.50)


where k =√k · k and γ and κ > n are positive constants.

What is the significance of this strange inequality? The best way tounderstand it, I think, is to face up to the paradox I mentioned earlier thatthe series can only converge for irrational winding numbers, and yet it seemsthat every irrational number is “arbitrarily close” to a rational number. It isthis last statement that needs to be examined more carefully. This requiresa brief excursion into number theory. Consider the unit interval [0, 1]. Therationals have measure zero in the interval. That means roughly that theydon’t take up any space. This can be proved as follows. First put therationals in a one-to-one correspondence with the integers. Construct asmall open interval of length ϵ < 1 about the first rational, and one oflength ϵ2 about the second and so forth. The sum of all these little intervals(this is a geometric series) is σ = ϵ/(1 − ϵ), which can be made arbitrarilysmall by choosing ϵ small enough. Thus the space occupied by the rationalsis less than any positive number. This requires taking the limit ϵ→ 0. Theparadoxical thing is that it is possible to remove a finite interval around eachrational without deleting all of [0, 1]. This can be seen as follows. Write eachrational in [0, 1] in its lowest form as p/q, and about each one construct aninterval of length 1/q3. For each q there are at most q − 1 rationals. Thusfor a given q no more than (q − 1)/q3 is covered by the intervals, and thetotal length Q that is covered is less (because of overlaps) than the sum ofthese intervals over all q.

Q <

∞∑q=2

q − 1

q3<

∞∑q=2

1

q3(5.51)

This sum is related to the Riemann zeta function. At any rate Q < 0.645.We can make this number as small as we like by replacing 1/q3 with Γ/q3

where Γ < 1. Even if we leave Γ = 1, the fraction of [0, 1] covered by thefinite intervals is less than 1.

Now we can divide the irrationals into two sets, those covered by theintervals around the rationals and those outside the intervals. Those uncov-ered satisfy the condition ∣∣∣∣ω − p

q

∣∣∣∣ ≥ Γ

q3. (5.52)

Several comments are in order regarding this inequality.

• Equation (5.52) makes irrationality a quantitative concept.9 Thoseirrationals that satisfy (5.52) are “more irrational” than those that

9This can also be quantified in terms of continued fraction expansions.

5.10. CONCLUSION 83

don’t, and the extent of their irrationality can be quantified by thatvalue of Γ for which they just do or do not satisfy the inequality.

• Equation (5.50) is just an n-dimensional version of (5.52). The con-stants γ and κ characterize the degree of irrationality of ω in the sameway that Γ and the exponent 3 characterize the irrationality of ω in(5.52).

• The uncovered irrationals occupy isolated “islands” between the cov-ered intervals. We expect that as the perturbation parameter ϵ isincreased, those tori with less irrational winding numbers will be de-stroyed first, but islands of stability will remain between them. Even-tually as the perturbation is increased, all will be swept away in chaos.

• The KAM theorem gives us no clue how to calculate the appropriatevalues of γ and κ or the values of ϵ for which chaos will set in. Someestimates placed the critical value of ϵ to be something around 10−50!If this were true, of course, the theorem would be quite pointless.Numerical test with specific models have found critical values of ϵ aslarge as ϵc ≈ 1. I will close with a quote from Jose and Saletan, “Toour knowledge a rigorous formal estimate of a realistic critical valuefor ϵ remains an open question.”

5.10 Conclusion

This is the end of our story about chaos. Remember that we have onlydealt with bounded, conservative systems with time-independent Hamilto-nians. (Classical mechanics is a big subject.) Systems with one degree offreedom are trivial (in principle) to solve using the method of quadratures.Systems with n degrees of freedom are trivial (again in principle) if they haven constants of motion. Such a system can be reduced by using action-anglevariables to an ensemble of uncoupled oscillators. These systems are said tobe integrable and they do not display chaos. The trouble comes when weintroduce some non-integrability as a perturbation. Perturbation theory isstraightforward with one degree of freedom, but with two or more degreesof freedom comes the notorious problem of small denominators.10 Pertur-bation theory fails immediately for all periodic trajectories with rational

10There are other ways of doing perturbation theory in addition to the one describedhere. They all suffer the same problem.


winding number. According to the Poincare-Birkoff theorem, these trajec-tories on the Poincare section break up into complicated whorls and tanglessurrounded be regions of stability corresponding to irrational winding num-bers. According to the KAM theorem these regions break down with thosewith “more irrational” winding numbers surviving those with less. At last“Universal darkness covers all,” and the trajectories though deterministicshow no order or pattern.

nonlinear mechanics - department of...

Documents