a course forphd mathematical statistics … and representation theorems for stationary processes, in...

LECTURES ON

STATIONARY STOCHASTIC PROCESSES

A COURSE FOR PHD STUDENTS IN

MATHEMATICAL STATISTICS AND OTHER FIELDS

GEORG LINDGREN

October 2006

Faculty of EngineeringCentre for Mathematical SciencesMathematical Statistics

CE

NT

RU

MSC

IEN

TIA

RU

MM

AT

HE

MA

TIC

AR

UM

Contents

Foreword vii

1 Some probability and process background 11.1 Probability measures on sample spaces . . . . . . . . . . . . . . . 11.2 Events, probabilities and random variables . . . . . . . . . . . . . 3

1.2.1 Events and families of events . . . . . . . . . . . . . . . . 31.2.2 Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.3 Random variables and random sequences . . . . . . . . . 51.2.4 Conditional expectation . . . . . . . . . . . . . . . . . . . 7

1.3 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.1 Stochastic processes and finite-dimensional distributions . 71.3.2 The distribution of a random sequence . . . . . . . . . . . 91.3.3 The continuous parameter case . . . . . . . . . . . . . . . 10

1.4 Stationary processes and fields . . . . . . . . . . . . . . . . . . . 131.4.1 Stationary processes . . . . . . . . . . . . . . . . . . . . . 131.4.2 Random fields . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5 Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5.1 Multivariate normal distributions and Gaussian processes 151.5.2 Linear prediction and reconstruction . . . . . . . . . . . . 181.5.3 Some useful inequalities . . . . . . . . . . . . . . . . . . . 18

1.6 Some historical landmarks . . . . . . . . . . . . . . . . . . . . . . 191.6.1 Brownian motion and the Wiener process . . . . . . . . . 191.6.2 Rice and electronic noise . . . . . . . . . . . . . . . . . . . 211.6.3 Gaussian random wave models . . . . . . . . . . . . . . . 221.6.4 Detection theory and statistical inference . . . . . . . . . 24

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Stochastic analysis 272.1 Quadratic mean properties . . . . . . . . . . . . . . . . . . . . . 272.2 Sample function continuity . . . . . . . . . . . . . . . . . . . . . 28

2.2.1 Countable and uncountable events . . . . . . . . . . . . . 282.2.2 Conditions for sample function continuity . . . . . . . . . 302.2.3 Probability measures on C[0, 1] . . . . . . . . . . . . . . . 37

2.3 Derivatives, tangents, and other characteristics . . . . . . . . . . 37

i

ii Contents

2.3.1 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . 372.3.2 Jump discontinuities and Holder conditions . . . . . . . . 40

2.4 Quadratic mean properties a second time . . . . . . . . . . . . . 452.4.1 Quadratic mean continuity . . . . . . . . . . . . . . . . . 452.4.2 Quadratic mean differentiability . . . . . . . . . . . . . . 462.4.3 Higher order derivatives and their correlations . . . . . . 48

2.5 Summary of smoothness conditions . . . . . . . . . . . . . . . . . 492.6 Stochastic integration . . . . . . . . . . . . . . . . . . . . . . . . 492.7 An ergodic result . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 Crossings 573.1 Level crossings and Rice’s formula . . . . . . . . . . . . . . . . . 57

3.1.1 Level crossings . . . . . . . . . . . . . . . . . . . . . . . . 573.1.2 Rice’s formula for absolutely continuous processes . . . . 583.1.3 Alternative proof of Rice’s formula . . . . . . . . . . . . . 603.1.4 Rice’s formula for differentiable Gaussian processes . . . . 62

3.2 Prediction from a random crossing time . . . . . . . . . . . . . . 633.2.1 Prediction from upcrossings . . . . . . . . . . . . . . . . . 643.2.2 The Slepian model . . . . . . . . . . . . . . . . . . . . . . 673.2.3 Excursions and related distributions . . . . . . . . . . . . 73

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4 Spectral- and other representations 794.1 Complex processes and their covariance functions . . . . . . . . . 79

4.1.1 Stationary processes . . . . . . . . . . . . . . . . . . . . . 794.1.2 Non-negative definite functions . . . . . . . . . . . . . . . 804.1.3 Strict and weak stationarity . . . . . . . . . . . . . . . . . 82

4.2 Bochner’s theorem and the spectral distribution . . . . . . . . . . 824.2.1 The spectral distribution . . . . . . . . . . . . . . . . . . 824.2.2 Properties of the spectral distribution . . . . . . . . . . . 864.2.3 Spectrum for stationary sequences . . . . . . . . . . . . . 88

4.3 Spectral representation of a stationary process . . . . . . . . . . 894.3.1 The spectral process . . . . . . . . . . . . . . . . . . . . . 894.3.2 The spectral theorem . . . . . . . . . . . . . . . . . . . . 904.3.3 More on the spectral representation . . . . . . . . . . . . 934.3.4 Spectral representation of stationary sequences . . . . . . 100

4.4 Linear filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.4.1 Projection and the linear prediction problem . . . . . . . 1004.4.2 Linear filters and the spectral representation . . . . . . . 1014.4.3 Linear filters and differential equations . . . . . . . . . . . 1064.4.4 White noise in linear systems . . . . . . . . . . . . . . . . 1124.4.5 The Hilbert transform and the envelope . . . . . . . . . . 1154.4.6 The sampling theorem . . . . . . . . . . . . . . . . . . . . 119

Contents iii

4.5 Karhunen-Loeve expansion . . . . . . . . . . . . . . . . . . . . . 1214.5.1 Principal components . . . . . . . . . . . . . . . . . . . . 1214.5.2 Expansion of a stationary process along eigenfunctions . . 1224.5.3 The Karhunen-Loeve theorem . . . . . . . . . . . . . . . . 124

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5 Ergodic theory and mixing 1335.1 The basic Ergodic theorem in L2 . . . . . . . . . . . . . . . . . . 1335.2 Stationarity and transformations . . . . . . . . . . . . . . . . . . 134

5.2.1 Pseudo randomness and transformation of sample space . 1345.2.2 Strict stationarity and measure preserving transformations135

5.3 The Ergodic theorem, transformation view . . . . . . . . . . . . . 1375.3.1 Invariant sets and invariant random variables . . . . . . . 1385.3.2 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 1395.3.3 The Birkhoff Ergodic theorem . . . . . . . . . . . . . . . . 141

5.4 The Ergodic theorem, process view . . . . . . . . . . . . . . . . . 1445.5 Ergodic Gaussian sequences and processes . . . . . . . . . . . . . 1485.6 Mixing and asymptotic independence . . . . . . . . . . . . . . . . 150

5.6.1 Singularity and regularity . . . . . . . . . . . . . . . . . . 1505.6.2 Asymptotic independence, regularity and singularity . . . 1525.6.3 Uniform, strong, and weak mixing . . . . . . . . . . . . . 156

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6 Vector processes and random fields 1616.1 Cross-spectrum and spectral representation . . . . . . . . . . . . 161

6.1.1 Spectral distribution . . . . . . . . . . . . . . . . . . . . . 1626.1.2 Spectral representation of x(t) . . . . . . . . . . . . . . . 163

6.2 Some random field theory . . . . . . . . . . . . . . . . . . . . . . 1656.2.1 Homogeneous fields . . . . . . . . . . . . . . . . . . . . . . 1666.2.2 Isotropic fields . . . . . . . . . . . . . . . . . . . . . . . . 1676.2.3 Randomly moving surfaces . . . . . . . . . . . . . . . . . 1696.2.4 Stochastic water waves . . . . . . . . . . . . . . . . . . . . 170

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

A The axioms of probability 175A.1 The axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175A.2 Extension of a probability from field to σ -field . . . . . . . . . . 176A.3 Kolmogorov’s extension to R∞ . . . . . . . . . . . . . . . . . . . 178Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

B Stochastic convergence 181B.1 Criteria for convergence almost surely . . . . . . . . . . . . . . . 181B.2 Criteria for convergence in quadratic mean . . . . . . . . . . . . 183B.3 Criteria for convergence in probability . . . . . . . . . . . . . . . 184Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

iv Contents

C Hilbert space and random variables 187C.1 Hilbert space and scalar products . . . . . . . . . . . . . . . . . . 187C.2 Projections in Hilbert space . . . . . . . . . . . . . . . . . . . . . 189C.3 Stochastic processes and Hilbert spaces . . . . . . . . . . . . . . 190

D Spectral simulation of random processes 193D.1 The Fast Fourier Transform, FFT . . . . . . . . . . . . . . . . . . 193D.2 Random phase and amplitude . . . . . . . . . . . . . . . . . . . . 194D.3 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194D.4 Simulation scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 195D.5 Difficulties and details . . . . . . . . . . . . . . . . . . . . . . . . 196D.6 Simulation of the envelope . . . . . . . . . . . . . . . . . . . . . . 197D.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Literature 199

Index 202

Contents v

Foreword

The book Stationary and Related Stochastic Processes [9] appeared in 1967.Written by Harald Cramer and M.R. Leadbetter, it drastically changed thelife of PhD students in Mathematical statistics with an interest in stochasticprocesses and their applications, as well as that of students in many other fieldsof science and engineering. By that book, they got access to tools and results forstationary stochastic processes that until then had been available only in ratheradvanced mathematical textbooks, or through specialized statistical journals.The impact of the book can be judged from the fact that still in 1999, aftermore than thirty years, it is a standard reference to stationary processes in PhDtheses and research articles.

Unfortunately, the book only appeared in a first edition and it is since longout of print. Even if many of the more specialized results in the book nowhave been superseded by more general results, and simpler proofs have beenfound for some of the statements, the general attitude in the book makes itenjoyable reading both for the student and for the teacher. It will remain adefinite source of reference for many standard results on sample function andcrossings properties of continuous time processes, in particular in the Gaussiancase.

These lecture notes are the results of a series of PhD courses on Stationarystochastic processes which have been held at the Department of Mathemat-ical Statistics, Lund University, during a sequence of years, all based on andinspired by the book by Cramer and Leadbetter. The aim of the notes is to pro-vide a reasonably condensed presentation of sample function properties, limittheorems, and representation theorems for stationary processes, in the spirit of[9]. It must be said, however, that it represents only a selection of the material,and the reader who has found interest in the present course, should take thetime to read the original.

Even if the Cramer and Leadbetter book is the basic source of inspiration,other texts have influenced these notes. The most important of these is thenow reprinted book on Probability [5] by Leo Breiman. The Ergodic chapter isa mixture of the two approaches. The Karhunen-Loeve expansion follows thebook by Wong [35]. Finally, the classical memoirs by S.O. Rice [27], have alsobeen a source of inspiration.

Some knowledge of the mathematical foundations of probability helps while

vii

viii

reading the text; I have included most of it in Appendices on the probabilityaxioms, together with the existence and basic convergence properties, as wellas some Hilbert space concepts. There is also an Appendix on how to simulatestationary stochastic processes by spectral methods and the FFT algorithm.

I am grateful to the PhD-students Rikard Berthilsson, Jonas Brunskog,Halfdan Grage, Peter Gustafsson, Par Johannesson, Finn Lindgren, Karl-OlaLundberg, Dan Mattsson, Tord Rikte, Jesper Ryden, Martin Skold, MartinSvensson, Magnus Wiktorsson, in the 1998/99 course for many detailed com-ments on the text and pertinent questions during the lectures, which hopefullyhelped clarify some of the obscurities. They also helped to remove many of themisprints.

Lund in May, 1999

Georg Lindgren

Printing September 2002 and April 2004

These printings of the Lecture notes differ considerably from the printing of May1999. Many misprints have been corrected and there are also many additions,mostly of previously left out details, but there is also some new aspects. Iam grateful to comments by Bengt Ringner, which helped to remove unclearstatements and errors, the PhD students at the Lund 2000/01 course, AnastasiaBaxevanni, Torgny Lindstrom, Ulla Machado, Anders Malmberg, SebastianRasmus, Mikael Signahl, and to Lars Holst and PhD student Henrik Hult atKTH who used the material and made several suggestions.

Some new references have been added, in particular some standard text-books on ergodic theory (K. Petersen: Ergodic Theory), real analysis (H.L.Royden: Real Analysis), and the new probability book, Weighting the Odds byD. Williams [25, 29, 37].

In the 2004 printing several more changes were made, as the result of nu-merous valuable comment by Oskar Hagberg and Linda Werner.

Lund in September, 2002 and April 2004

Georg Lindgren

Printing October 2006

In this printing, several changes and additions have been made. I have includedsome more elementary facts, together with historical and general examples inChapter 1, and expanded the section on Linear filters in Chapter 4 in order notto rely too much on specific Lund courses.

To provide some background to the theory I have, in Chapter 1, highlightedfour remarkable research achievements that have helped to shape the theory ofstationary processes in general. The examples are specialized and based on real

ix

demands, namely Albert Einsteins derivation of the physics and mathematicsbehind the Brownian motion from 1905, Steve Rice 1944-45 broad invitationto stochastic Gaussian noise, so important in communication theory, and theequally important introduction of stochastic thinking in naval engineering byStDenis and Pierson from 1953, and finally the link between stochastic processestheory and statistical inference, by Ulf Grenander from 1950, which forms thebasis for present day signal processing.

The old Chapter 3, on prediction, has been transformed into a chapter oncrossing-related problems, including the form and use of the Slepian modelprocess. What remains of prediction has been moved to the chapter on Ergodictheory. The section on random fields has been slightly expanded. The notationλ for frequency has been changed to the standard ω .

Besides the additions, several errors and misprints have been corrected eftercomments from the PhD students in the 2004 course, Klas Bogsjo, Johan Lind-strom, and Sofia Aberg. Timo Koski, who used the material in Linkoping, hasalso given me several suggestions and pointed at unclear points and misprints.

Last, but not least, the Cramer & Leadbetter book has finally reappeared,reprinted by Dover Publications, 2004.

Lund in October, 2006

Georg Lindgren

Chapter 1

Some probability and process

background

This introductory chapter gives a brief summary of the probability theoryneeded for the study of stochastic processes with discrete or continuous time.It concentrates on the finite-dimensional distribution functions, which uniquelydefine probabilities for a sufficiently rich family of events, namely events thatcan be identified through process values at discrete sets of times. In particular,they allow us to find conditions for sample function continuity, at least if werestrict ourselves to a study of the process at a dense discrete time set.

1.1 Probability measures on sample spaces

Stochastic processes are often called random functions; these two notions putemphasis on two different aspects of the theory, namely

• stochastic processes as families of infinitely many random variables on thesame sample space, usually equipped with a fixed probability measure,

• stochastic processes as a means to assign probabilities to sets of func-tions, for example some specified sets of continuous functions, or sets ofpiecewise constant functions with unit jumps.

These two aspects of stochastic processes can be illustrated as in Figure 1.1,corresponding to an experiment where the outcomes are continuous functions.

The figure illustrates the three levels of abstraction and observability fora random experiment. To be concrete, think of an experiment controlling thesteering of a ship. The general sample space Ω is an abstract set that containsall the possible outcomes of the experiment that can conceivably happen – andit may contain more. A probability measure P is defined on Ω that assignsprobabilities to all interesting subsets – we need only one single probabilitymeasure to describe our whole world.

1

2 Some probability and process background Chapter 1

The abstractsample space Ω

The functionsample space C

The finite-dimensionalco-ordinate space Rn

outcome ωprobability P

��

�

x(t, ω), t ∈ R; Px

y(t, ω), t ∈ R; Py

z(t, ω), t ∈ R; Pz

�

�

�

x(t, ω), t = t1, . . . , tn; P(n)x

y(t, ω), t = t1, . . . , tn; P(n)y

z(t, ω), t = t1, . . . , tn; P(n)z

This space containseverything possible

This space containsthe possible

function outcomes

This space containsthe finite-dimensional

observations

Figure 1.1: Overview of the three types of worlds in which our processes live.

During the experiment one can record the time evolution of a number ofthings, such as rudder angle, which we call {x(t), t ∈ R}, ship head angle,called {y(t), t ∈ R}, and roll angle {z(t), t ∈ R}. Each observed function is anobservation of a continuous random process. In the figure, the randomness isindicated by the dependence of the experiment outcome ω . The distributionsof the different processes are Px , Py , Pz – we need one probability measure foreach of the phenomena we have chosen to observe.1

In practice, the continuous functions are sampled in discrete time steps,t = t1, . . . , tn , resulting in a finite-dimensional observation vector, (x1, . . . , xn),with an n-dimensional distribution, P

(n)x , etc. This is illustrated in the third

box in the figure.Since we do not always want to specify a finite value for n , the natural

mathematical model for the practical situation is to replace the middle box,the sample space C of continuous functions, by the sample space R∞ of infinitesequences of real numbers (x0, x1, . . .). This is close, as we shall see later, reallyvery close, to the finite-dimensional space Rn , and mathematically not muchmore complicated.

Warning: Taking the set C of continuous functions as a sample space andassigning probabilities Px , etc, on it, is not as innocent as it may sound fromthe description above. Chapter 2 deals with conditions that guarantee that astochastic process is continuous, i.e. has continuous sample functions. In fact,these conditions are all on the finite-dimensional distributions.

Summary: The abstract sample space Ω contains everything that can conceiv-ably happen and is therefore very complex and detailed. Each outcome ω ∈ Ω isunique, and we need only one comprehensive probability measure P to describeevery outcome of experiment we can do. An experiment is a way to “observethe world”.

1The symbol ω is here used to represent then elementary experimental outcome, a practice

that is standard in probability theory. In most part of this book, ω will stand for (angular)

frequency; no confusion should arise from this.

Section 1.2 Events, probabilities and random variables 3

The function (sequence) sample space C (R∞ ) is simple. It can be usedas sample space for a specified experiment for which the result is a function orsequence of numbers. We have to define a unique probability measure for eachexperiment.

1.2 Events, probabilities and random variables

1.2.1 Events and families of events

A probability measure P assigns probabilities to certain events, i.e. subsets, inthe sample space Ω, in such a way that Kolmogorov’s probability axioms aresatisfied.2 Thus, if a subset A has a probability, then also its complement A∗

has a probability, and P (A∗) = 1−P (A), and further, if A is disjoint with B ,i.e. A∩B = ∅, and B has probability P (B), then also A∪B has a probability,and P (A∪B) = P (A)+ P (B). These requirements lead to the conclusion thatprobabilities have to be defined at least on a certain minimal family of subsetsof Ω.

Definition 1:1 A family of subsets F0 to an arbitrary space Ω is called afield if it contains the whole set Ω and is closed3 under the set operationscomplement, A∗ , union, A∪B , and intersection, A∩B . It then also containsall unions of finitely many sets A1 , . . . , An in F0 . A field F of subsets iscalled a σ -field if it contains all countable unions and intersections of its sets.The terms algebra and σ -algebra are also used, instead of field and σ -field.

To every collection A of subsets of Ω there is always a unique smallest fieldF0 that contains all the sets in A . Similarly, there exists a (unique) smallestσ -field F that contains all A-sets. That σ -field F is said to be generated byA , and it is denoted F = σ(A).

Example 1:1 (Fields and σ -fields in R) The simplest useful field F0 ofsubsets of the real line R consists of all finite half-open intervals, a < x ≤ b ,together with unions of a finite number of such intervals. In order for F0 tobe a field, it is required that it also contains the complement of such unions.To anticipate the introduction of probabilities and random variables, we canremark here that F0 is a natural family of sets, since distribution functions canbe used to assign probabilities to intervals.

The smallest σ -field that contains all half-open intervals is called the Borelfield in R . It is denoted B , and its sets are called Borel sets.

2See Appendix A.3That is, if a set A is in the family F0 , then also the complement A∗ belongs to F0 , etc.


Example 1:2 (Fields and σ -fields in Rn) Intervals in R correspond to n-dimensional rectangles in Rn , and the smallest interesting field in Rn consistsof unions of finitely many half-open rectangles, with sides (ai, bi] , i = 1, . . . , n ,and the complements of such unions. As in R , the σ -field generated by F0 iscalled the Borel field in Rn . It is denoted by Bn and its sets are the Borel setsin Rn .

One could note here that it is possible to start with more general ”rect-angles”, where the ”sides” are real Borel sets instead of intervals, i.e. setsB1 × B2 × . . . × Bn , where the Bj are one-dimensional Borel sets. However,even if these generalized rectangles form a richer class than the simple rectan-gles, the smallest σ -field that contains all such generalized rectangles is exactlyequal to Bn .

1.2.2 Probabilities

Probabilities are defined for events, i.e. subsets of a sample space Ω. By aprobability measure is meant any function P defined for every event in a fieldF0 , such that

0 ≤ P (A) ≤ 1, P (∅) = 0, P (Ω) = 1,

and such that, first of all, for any finite number of disjoint events A1, . . . , An

in F0 one has

P (A1 ∪ . . . ∪ An) = P (A1) + . . . + P (An). (1.1)

That is, probabilities are finitely additive. In order to deal with limiting eventsand the infinity, they are also required to be countably additive, i.e. equation(1.1) holds for infinitely many disjoint events, i.e. it holds with n = ∞ ,

P (∪∞k=1Ak) =

∞∑k=1

P (Ak)

for all disjoint events Ak ∈ F0 such that ∪∞1 Ak ∈ F0 .

As remarked, it is easy to assign probabilities to intervals, and unions ofintervals, simply by taking

P ((a, b]) = F (b) − F (a),

for some distribution function F . By additivity and the property of fields, onethen also assigns probability to the field F0 of finite unions of intervals.

A natural question to ask is, whether this also produces probabilities to theevents in the σ -field F generated by F0 . In fact, it does, and that in a uniqueway:

Section 1.2 Events, probabilities and random variables 5

Extension of probability measures: Every probability measue P , definedand countably additive on a field F0 , can be extended to be defined forevery event in the σ -field F generated by F0 . This can be done in oneway only. This means that a probability measure on the real Borel setsis uniquely determined by its values on the half-open intervals, i.e. itdepends only of the values of the function

F (x) = P ((−∞, x]), F (b) − F (a) = P ((a, b]).

Probabilities on the Borel sets in Rn are similarly uniquely determinedby the n-dimensional distribution function

F (x1, . . . , xn) = P ((−∞, x1] × . . . × (−∞, xn]). (1.2)

For example, for a bivariate variable (x,x2),

P ((a1, b1] × (a2, b2]) = F (b1, b2) − F (a1, b2) − F (b1, a2) + F (a1, a2)).

The probability measure P is defined on the measurable space (Ω,F),and sets in the σ -field F are called the measurable sets. For a proof ofthe existence of a unique extension, see Appendix A.

Remark 1:1 The completion of a probability measure P is obtained as follows.Suppose P is defined on (Ω,F), i.e. it assigns a probability P (A) to every A inF . Now, if there is an event B with P (B) = 0, then it seems natural to assignprobability 0 to any smaller set B′ ⊂ B . Unfortunately, subsets of measurablesets, are not necessarily measurable, so one can not immediately conclude thatP (B′) = 0. However, no other choice is possible, and it is also easy to createthe σ -field that also contains all subsets of F -sets B with P (B) = 0. Theextended probability measure is called a complete probability measure.

1.2.3 Random variables and random sequences

1.2.3.1 A random variable and its distribution

A random variable is just a real-valued function x(ω), ω ∈ Ω, on a probabilityspace (Ω,F , P ), such that it is possible to talk about its distribution, i.e. theprobability

P (x ≤ a) = P ({ω;x(ω) ≤ a})is defined for all real a . This means that the set (event)

Aa = x−1((−∞, a]) = {ω;x(ω) ≤ a}is a member of the family F , for all a ∈ R . This is equivalent to the seeminglymore general statement that

x−1(B) ∈ F for all Borel sets B ∈ B , (1.3)


and of course it holds that

P (x−1(B)) = Prob(x ∈ B).

The requirement (1.3) is the formal definition of a random variable: a randomvariable is a Borel measurable function.

If x is a random variable on (Ω,F , P ), then we write Px for the probabilitymeasure on (R,B) that is defined by

Px(B) = P (x−1(B)).

It is possible to define several random variables x1, x2, . . . , xn on the sameprobability space (Ω,F , P ), and there is no difficulty to let n = ∞ . In thatcase we call the sequence {xn}∞n=1 a stochastic process with discrete time, or arandom sequence.

1.2.3.2 The σ -field generated by random variables

When x is a random variable on (Ω,F , P ), the ω -set {ω ∈ Ω;x(ω) ≤ a}belongs to F and hence, it has a probability Prob(x ≤ a). Furthermore, allsets of the type x−1(B), where B is a Borel set, belong to F . In fact, the familyof such Ω-sets is a σ -field, and it is denoted F(x) or σ(x). It is obvious thatF(x) ⊂ F and we already know that P assigns a probability to these sets. Ifx were the only random variable of interest to us, we could have worked on theprobability space (Ω,F(x), P ). The reason for using a general, usually largerσ -field F , is that is allows us perfect freedom to include any further randomvariable without changing neither the σ -field, nor the probability measure.

Another characterization of F(x) is that it is the smallest σ -field on Ω thatmakes the function x measurable, i.e. a random variable. The σ -field F(x) iscalled the σ -field generated by the random variable x .

When there are several random variables x1, . . . , xn , n ≤ ∞ , there willbe a smallest σ -field, denoted F(x1, . . . , xn), that contains all the sub-σ -fieldsF(xj). It is the smallest σ -field that makes all the xj random variables.

Remark 1:2 When we have the σ -field generated by a random variable x wehave got our first opportunity to really construct a probability measure, in thesense that we can define the values of P (A) for certain events A ∈ F . If F

is a distribution function4 on R and x is a Borel measurable function, i.e. arandom variable, then

P (Aa) = P ({ω ∈ Ω;x(ω) ≤ a}) = F (a)

defines probabilities on the sub-class of events Aa , and that can be extended toa probability measure on the σ -field F(x).

4i.e. F is non-decreasing, right-continuous, with 0 ≤ F (x) ≤ 1, and limx→−∞ F (x) = 0

and limx→∞ F (x) = 1.

Section 1.3 Stochastic processes 7

1.2.4 Conditional expectation

Here we will give an elementary definition of the important concept conditionalexpectation. A more general definition will be introduced in Section 5.6.1, buthere the simple definition is sufficient.

If x, y are two random variables, where y may be multivariate, with jointdensity f(x, y), and with marginal y -density f(y) =

∫u f(u, y) du , the condi-

tional expectation of x given y = v , is a random variable defined as a functionof y , for ω such that y(ω) = v , and f(v) = 0, as

ϕ(v) = E(x | y = v) =∫

uu

f(u, v)f(v)

du. (1.4)

For outcomes such that f(y) = 0, ϕ(y) can be defined arbitrarily. We writeE(x | y) = ϕ(y).

It satisfies

E(x) = E(ϕ(y)) = E(E(x | y)) =∫

yϕ(y)f(y) dy, (1.5)

V (x) = E(V (x | y)) + V (E(x | y)), (1.6)

where V (x | y) =∫x(x − ϕ(y))2f(x | y) dx . The reader should show this, and

the following important theorem:

Theorem 1:1 The best predictor of x given y in least squares sense is givenby ϕ(y), i.e.

E((x − ϕ(y))2

) ≤ E((x − ψ(y))2

)for every function ψ(y).

1.3 Stochastic processes

1.3.1 Stochastic processes and finite-dimensional distributions

We are now ready to define stochastic processes in general. Remember thatwe have already defined infinite sequences of random variables, y = {xn}∞n=1 ,defined on the same probability space (Ω,F , P ). Here, each xj is a real-valuedfunction on Ω, i.e. xj(ω) ∈ R .

There is no further difficulty in considering more than countably many ran-dom variables at the same time, and letting t denote a general parameter takingvalues in a parameter space T . Thus we can consider a family of functions,{x(t, ω) ∈ R}t∈T , where each x(t) = x(t, ·) is a random variable, i.e. a mea-surable function from Ω to R . Hence it has a distribution with a distributionfunction on R , which we denote F (·; t), i.e.

F (a; t) = Prob(x(t) ≤ a).


Taking several variables, at times t1, . . . , tn , one gets an n-variate randomvariable

(x(t1), . . . , x(tn))

with an n-variate distribution in Rn ,

F (a1, . . . , an; t1, . . . , tn) = Prob(x(t1) ≤ a1, . . . , x(tn) ≤ an).

We write Ftn for the n-dimensional distribution function of any vector

(x(t1), . . . , x(tn)).

We summarize the terminology in a formal, but simple, definition.

Definition 1:2 Let T be a parameter set. A stochastic process {x(t)}t∈T in-dexed by the parameter t ∈ T is a family of random variables x(t) defined onone and the same probability space (Ω,F , P ). In other worlds, a stochasticprocess is a function

T × Ω (t, ω) �→ x(t, ω) ∈ R,

such that for fixed t = t0 , x(t0, ·) is a random variable, i.e. a Borel measurablefunction, Ω ω �→ x(t0, ω) ∈ R, and for fixed ω = ω0 , x(·, ω0) is a functionT t �→ x(t, ω0) ∈ R.

The family {Ftn}∞n=1 of finite-dimensional distributions is the family of dis-tribution functions

F (a1, . . . , an; t1, . . . , tn) = Prob(x1 ≤ a1, . . . , xn ≤ an); n = 1, 2, . . . ; tj ∈ T.

The finite-dimensional distributions in {Ftn}∞n=1 of a stochastic process sat-isfy some trivial conditions to make sure they are consistent with each other,of the type

F (a1, a2; t1, t2) = F (a2, a1; t2, t1)F (a1,∞; t1, t2) = F (a1, t1).

By this definition we have the following concepts at our disposal in the threescenes from Section 1.1:

sample space events probabilityabstract space: Ω σ-field F Pcontinuous functions: C ??? ???real sequences: R∞ ??? ???real vectors: Rn Borel sets:

Bn

Ptn from finite-dimensional dis-tribution functions Ftn

real line: R Borel sets: B P from a distribution function F

In the table, the ??? indicate what we yet have to define – or even showexistence of – to reach beyond the elementary probability theory, and into theworld of stochastic processes.


1.3.2 The distribution of a random sequence

Our aim now is to find the events in R∞ (= all real sequences) and see howone can define a probability measure for these events. When this is done, it islegitimate to talk about the distribution of an infinite random sequence. It wasthis step, from probabilities for one-dimensional or finite-dimensional real setsand events, to probabilities in R∞ and probabilistic statements about infinitesequences, that was made axiomatic in Kolmogorov’s celebrated Grundbegriffeder Wahrscheinlichkeitsrechnung from 1933, [20].

Generalized rectangles, intervals, and the field IThe basic requirement on the events in R∞ is that they should not be sim-pler than the events in the finite-dimensional spaces Rn , which means that ifan event Bn ∈ Bn is expressed by means of a finite set of random variablesx1, . . . , xn , then it should be an event also in the space R∞ . Now, it can bewritten

{y = (x1, x2, . . .) ∈ R∞; (x1, x2, . . . , xn) ∈ Bn} = Bn ×R×R× . . . = Bn ×R∞.

A set of this form is called a generalized rectangle in R∞ . Hence, we haveto require that the σ -field of events in R∞ contains at least all generalizedrectangles. The natural event field is exactly the smallest σ -field which containsall such sets; cf. Example 1:2. This σ -field is denoted B∞ and is called the Borelfield. Symbolically, we can write

B∞ = σ (∪∞n=1 (Bn × R∞)) .

A particularly simple form of rectangles are the intervals, which are sets ofthe form

I = (a1, b1] × (a2, b2] × . . . × (an, bn] × R∞,

where each (aj , bj ] is a half-open interval. Thus, the sequence x = (x1, x2, . . .)belongs to the interval I if

a1 < x1 ≤ b1, a2 < x2 ≤ b2, . . . , an < xn ≤ bn. (1.7)

Sets which are unions of a finite number of intervals will be important later;they form a field, which we denote I . The σ -field generated by I is exactlyB∞ , i.e.

σ(I) = B∞.

Probabilities on R∞

The next step is to assign probabilities to the events in B∞ , and this canbe done in either of two ways, from the abstract side or from the observable,finite-dimensional side:


from a random sequence: if the probability space (Ω,F , P ) is given a pri-ori, and

y = {xn}∞n=1

is a random sequence, i.e. a function from Ω to R∞ , then a probabilitymeasure Py is defined on (R∞,B∞) by

Py(B) = P (y−1(B)), for B ∈ B∞ .

Thus, each random sequence y produces a probability measure Py on(R∞,B∞).

from a family of finite-dimensional distributions: if a consistent familyof finite-dimensional distributions

F = {Ftn}∞n=1

is given a priori, then one can define probabilities PF for all half-openn-dimensional intervals in R∞ , by, for n = 1, 2, . . . , taking (cf. (1.2))

PF((a1, b1] × . . . × (an, bn] × R∞) = Pn((a1, b1] × . . . × (an, bn]).

Here the probability measure Pn on (Rn,Bn) is uniquely defined by thedistribution functions in F . Now, it remains to show that this will giveus a countably additive probability measure on the field I of finite unionsof intervals. By the extension property of probability measures on fields,one can then conclude that there is a unique probability measure P on(R∞,B∞) that has F as finite-dimensional distributions. The proof ofthe countable additivity is a significant part of Kolmogorov’s existencetheorem for stochastic processes; see Appendix A.

By this, we have defined events in R∞ and know how to define proba-bility measures in (R∞,B∞). What to remember here is in particular, thatevery probability measure on (R∞,B∞) is uniquely determined by its finite-dimensional distributions.

1.3.3 The continuous parameter case

Now we shall investigate stochastic processes with continuous, one-dimensionaltime parameter, t in a real interval T . By definition, it is a family of randomvariables {x(t)}t∈T , defined on the same probability space (Ω,F , P ), i.e. it isa function of time t ∈ T and outcome ω ∈ Ω, measurable as a function of ωfor fixed t .

Even if this definition is simple and innocent – obviously such processes exist– the practical application needs some care. The sample space Ω is an abstractspace and a mathematical construction, and the link to reality is provided bythe random variables. In an experiment, one can observe the values of one or


more random variables, x1, x2 , etc. and also find their distribution, by somestatistical procedure. There is no serious difficulty to allow the outcome to beany real number, and to define probability distributions on R .

When the result of an experiment is a function with continuous parameter,the situation is more complicated. In principle, all functions of t ∈ T arepotential outcomes, and the sample space of all functions on T is simply toobig to allow any sensible probabilistic structure. There are too many possiblerealizations that ask for probability.

Here practice comes to our assistance. In an experiment one can only ob-serve the values of x(t) at a finite number of times, t1, t2, . . . , tn ; with n = ∞we allow an unlimited series of observations. The construction of processes withcontinuous time is built on exactly this fact: the observable events are thosewhich can be defined by countably many x(tj), j ∈ mN , and the probabilitymeasure shall assign probabilities to only such events.

Write RT for the set of all real-valued functions of t ∈ T . By an interval inRT is meant any set of functions x(t) which are characterized by finitely manyinequalities of the same type as (1.7),

a1 < x(t1) ≤ b1, a2 < x(t2) ≤ b2, . . . , an < x(tn) ≤ bn,

the only difference being that now t1, . . . , tn are any n time points in T . TheBorel field in RT is the smallest σ -field that contains all intervals,

BT = σ(I).

1.3.3.1 Sets with countable basis

One may wonder how far the Borel sets in RT are from the intervals. Theintervals were characterized by some restriction on function values at a finitenumber of times. A set C ⊆ RT which is characterized by function values at acountable set of times, T ′ = (t1, t2, . . .) is said to have a countable basis. Moreprecisely, C ⊆ RT has a countable basis T ′ if there is a Borel set B ⊂ R∞ ,(with B ∈ B∞ ), such that

x ∈ C if and only if (x(t1), x(t2), . . .) ∈ B.

The Borel sets in RT are exactly those sets which have a countable basis, i.e.

BT = {C ⊂ RT ;C has a countable basis}.We show this, as an example of a typical σ -field argument.

First, it is clear that if B is a Borel set in R∞ , then

C = {x ∈ RT ; (x(t1), x(t2), . . .) ∈ B}is a Borel set in RT , since BT contains all intervals with base in T ′ , and henceall sets in the σ -field generated by those intervals. This shows that

{C ⊂ RT ;C has a countable basis} ⊆ BT .


To show the other inclusion we show that the family of sets with countablebasis is a σ -field which contains the intervals, and then it must be at least aslarge as the smallest σ -field that contains all interval, namely BT . First, wenote that taking complements still gives a set with countable basis. Then, takea sequence C1, C2, . . . , of sets, all with countable basis, and let T1, T2, . . . , Tj ={t11, t12, . . .}, . . . , {tj1, tj2, . . .} be the corresponding countable sets of time points,so that

Cj = {x ∈ RT ; (x(t(j)1 ), x(t(j)2 ), . . .) ∈ Bj}, with Bj ∈ B∞ .

Then T ′ = ∪jTj is a countable set, T ′ = (t′1, t′2, . . .), and ∪∞

j=1Cj is character-ized by its values on T ′ .

Example 1:3 Here are some examples of function sets with and without count-able basis, when T = [0, 1]:

• {x ∈ RT ; limn→∞ x(1/n) exists} ∈ BT ,

• {x ∈ RT ; limt→0 x(t) exists} /∈ BT ,

• {x ∈ RT ;x is a continuous function} /∈ BT ,

• {x ∈ RT ;x(t) ≤ 2 for all rational t} ∈ BT ,

1.3.3.2 Approximation by finite-dimensional events

The events in the σ -field B∞ in R∞ can be approximated in probability byfinite-dimensional sets. If (R∞,B∞, P ) is a probability space, and B ∈ B∞ ,then for every ε > 0, there is a finite n and an event Bn ∈ Bn such that

P (BΔBn) ≤ ε,

where Bn = {x ∈ R∞; (x1, . . . , xn) ∈ Bn} and AΔB = (A − B) ∪ (B − A).Similarly, events in BT in RT can be approximated arbitrarily close by

events defined by the values of x(t) for a finite number of t-values: P (BΔBn) ≤ε, with

Bn = {x ∈ R∞; (x(t1), . . . , x(tn)) ∈ Bn}.Remember that every probability measure on (R∞,B∞) is uniquely de-

termined by its finite-dimensional distributions, which implies that also everyprobability measure P on (RT ,BT ) is determined by the finite-dimensionaldistributions, {Ftn}∞n=1 . In particular, the probability

P ( limn→∞x(t0 + 1/n) exists and is equal to x(t0))

is determined by the finite-dimensional distributions. Unfortunately, x(t0 +1/n) → x(t0) as n → ∞ , is almost, but not quite, the same as x(t) → x(t0)as t → t0 . To deal with sample function continuity we need more refined con-struction of the probability measure from the finite-dimensional distributions.

Section 1.4 Stationary processes and fields 13

1.4 Stationary processes and fields

This section summarizes some elementary notation for stationary processes.More details, properties, and proofs of the most important facts, will be givenin Chapter 4.

To avoid too cumbersome notation we from now on allow ourselves to talkabout “the process x(t)”, when we should have used the notation {x(t), t ∈ R}for the process. If we mean “the random variable x(t) we will say so explicitly.

1.4.1 Stationary processes

A stochastic process x(t) is strictly stationary if all n-dimensional distributionsof

x(t1 + τ), . . . , x(tn + τ)

are independent of τ . It is called weakly stationary (the term second order sta-tionary is also used) if its mean is constant, E(x(t)) = m , and if its covariancefunction

r(t) = Cov(x(s + t), x(s)),

is a function only of the time lag t . Every continuous covariance function hasa representation as a Fourier integral,

r(t) =∫ ∞

−∞eiωt dF (ω), (1.8)

where the function F (ω) is called the spectral distribution function. It is char-acterized by the properties:

• symmetry: dF (−ω) = dF (ω),

• monotonicity: ω ≤ ω′ implies F (ω) ≤ F (ω′),

• boundedness: F (+∞) − F (−∞) = r(0) < ∞ .

As indicated by the way we write the three properties, F (ω) is defined only upto an additive constant, and we usually take F (−∞) = 0. The spectral distri-bution function is then equal to a cumulative distribution function multipliedby a positive constant, equal to the variance of the process.

If F (ω) is absolutely continuous with F (ω) =∫ ωs=−∞ f(s) ds , then the

spectrum is said to be (absolutely) continuous, and f(ω) is the spectral densityfunction; see Section 5.6.2.1 for more discussion of absolute continuity.

The spectral moments are defined as

ωk =∫ ∞

−∞|ω|k dF (ω).

Note that the odd spectral moments are defined as absolute moments. SinceF is symmetric around 0 the signed odd moments are always 0. Spectral


Simulation

Nar

row

Spectrum Covariance

Mod

erat

eLow

0 50 1000 1 20 50 100 150 200 250

0 50 1000 1 20 50 100 150 200 250

0 50 1000 1 20 50 100 150 200 250

-5

0

5

0

2

4

-5

0

5

-5

0

5

0

10

20

-5

0

5

-5

0

5

0

10

20

-5

0

5

Figure 1.2: Processes with narrow band spectrum, moderate width JONSWAPwave spectrum, and low frequency white noise spectrum.

moments may be finite or infinite. As we shall see in the next chapter, thefiniteness of the spectral moments are coupled to the smoothness properties ofthe process x(t). For example, the process is differentiable (in quadratic mean),see Section 2.1, if ω2 = −r′′(0) < ∞ , and similarly for higher order derivatives.

As we shall see in later sections, ω is in a natural way interpreted as anangular frequency, not to be confused with the elementary event ω in basicprobability theory.

Example 1:4 Here is a first example on the visual characteristics of spectrum,covariance function, and sample function. Figure 1.2 illustrates one very nar-row spectrum, one realistic water wave spectrum, and one “low frequency whitenoise” spectrum. Figure 1.3 show the output of a linear oscillator driven bywhite noise, and with different relative damping; see Section 4.4.3.3 and Exam-ple 4:10.

1.4.2 Random fields

A random field is a stochastic process x(t) with multi-dimensional parametert = (t1, . . . , tp) ∈ T , which can be discrete or continuous. For example, if

Section 1.5 Gaussian processes 15

Harmonic random oscillator

ζ=

0.01

Spectrum Covariance

ζ=

0.1

ζ=

0.5

0 50 1000 1 20 50 100 150 200 250

0 50 1000 1 20 50 100 150 200 250

0 50 1000 1 20 50 100 150 200 250

-5

0

5

0

2

4

6

-5

0

5

-5

0

5

0

5

10

15

-5

0

5

-5

0

5

0

50

100

150

-5

0

5

Figure 1.3: Harmonic oscillator with different relative damping ζ .

t = (t1, t2) is two-dimensional we can think of (t1, t2, x(t)), (t1, t2) ∈ R2 , as arandom surface. The mean value and covariance functions are defined in thenatural way, m(t) = E(x(t)) and r(t,u) = C(x(t), x(u)).

A random field is called homogeneous if it has constant mean value m(t) =m and the covariance function r(t,u) depends only on the vector t−u betweenthe two observation points, i.e. assuming m = 0,

r(t) = r(u + t,u) = E(x(u + t) · x(u)).

The covariance of the process values at two parameter points depends on dis-tance as well as on direction of the vector between the two points.

If the covariance between x(u) and x(v) depends only on the distanceτ = ‖u− v‖ between the observation points and not on the direction, the fieldis called isotropic. This requirement poses severe restrictions on the covariancefunction, as we shall see in Chapter 6, where random fields are treated in moredetail.

1.5 Gaussian processes

1.5.1 Multivariate normal distributions and Gaussian processes

Definition 1:3 A vector ξ = (ξ1, . . . , ξp)′ of p random variables is said tohave a p-variate Gaussian (normal) distribution if every linear combination


of its components a′ · ξ =∑

k akξk has a normal distribution. The variablesξ1, . . . , ξp are then said to be “jointly normal”.

With mean vector m = E(ξ) and covariance matrix

Σ = Cov(ξ; ξ) = E((ξ − m) · (ξ − m)′),

the variance of a′ · ξ isV (a′ · ξ) = a′Σa.

If the determinant of Σ is positive, the distribution of ξ is non-singularand has a density

fξ(x) =1

(2π)p/2√

Σe−

12(x−m)′Σ−1(x−m).

If the determinant is zero, the distribution of ξ is concentrated to linear sub-space of Rn and there exists at least one linear relationship between the com-ponents, i.e. there is at least one a for which a′ · ξ is a constant.

Definition 1:4 A stochastic process {x(t), t ∈ R} is a Gaussian process ifevery linear combination

S =∑

k

akx(tk)

for real ak and tk ∈ R has a Gaussian distribution.

It is an easy consequence of the definition that the derivative of a Gaussianprocess is also Gaussian (when it exists), since it is the limit of the Gaussianvariable zh = (x(t + h) − x(t))/h as h → 0. For a stationary Gaussian process{x(t), t ∈ R} the mean of zh is 0 and it has variance V (zh) = 2(r(0)−r(h))/h2 .As we shall prove in Section 2.4.2 this converges to ω2 =

∫ω2 dF (ω ≤ ∞ . The

derivative exists only if this limit is finite.Also the integral of a Gaussian process is a Gaussian variable; conditions

for the existence will be given in Section 2.6.

1.5.1.1 Conditional normal distributions

The multivariate normal distribution has the very useful property that condi-tioned on observations of a subset of variables, the unobserved variables arealso normal. Further, the conditional mean is a linear in the observations whilevariances and covariances are independent of observations.

Let ξ = (ξ1, . . . , ξn)′ and η = (η1, . . . , ηm)′ be two jointly Gaussian vectorswith mean values

E(ξ) = mξ, E(η) = mη,

and with covariance matrix (with Σξη = Σ′ηξ )

Section 1.5 Gaussian processes 17

Σ = Cov ((ξ,η); (ξ,η)) =(

Σξξ Σξη

Σηξ Σηη

).

If the determinant of the covariance matrix Σ is positive, then the distri-bution of (ξ, η) has a non-singular density

fξη(x,y) =1

(2π)(m+n)/2√

det Σe−

12(x−mξ ,y−mη )Σ−1(x−mξ ,y−mη )′ .

The density of η is

fη(y) =1

(2π)m/2√

det Σηη

e−12(y−mη )Σ−1

ηη (y−mη )′ .

and the conditional density fξ|η(x|y), defined as

fξ|η(x | y) =fηξ(y,x)

fη(y),

is also Gaussian with conditional mean matrix

E(ξ | η = y) = ξ(y) = E(ξ) + Cov(ξ,η)Σ−1ηη (y − E(η))′

= mξ + Σξη Σ−1ηη (y − mη)′. (1.9)

(1.10)

The conditional covariance is

Σξξ|η = E((ξ − ξ(η)) · (ξ − ξ(η))′) = Σξξ − ΣξηΣ−1ηηΣηξ. (1.11)

In two dimensions the formulas read

mx|y = mx + σx σy ρxy · y − my

σy,

σ2x|y = σ2

x(1 − ρ2xy),

with ρxy = Cov(x, y)/√

V (x)V (y); thus the squared correlation ρ2xy gives the

relative reduction of the variability (uncertainty) in the random variable xgained by observation of y .

Observe the mnemotechnical friendliness of these formulas. For example,the covariance matrix Σξξ|η has dimension n × n and the configuration onthe right hand side of (1.11) is the only way to combine the matrices involvedthat matches their dimensions – of course, you have to remember the generalstructure.


1.5.2 Linear prediction and reconstruction

Prediction and reconstruction are two of the most important applications ofstationary process theory. Even though these problems are not main topics inthis work, we present one of the basic concepts here; in Section 5.6 we will dealwith the more philosophical sides of the prediction problem.

Suppose we have observed the outcomes of a set of random variables, η =(η1, . . . , ηm) and that we want to give a statement ξ about the outcome of someother variable ξ , either to be observed sometimes in the future, or perhaps amissing observation in a time series. These two cases constitute the frameworkof prediction and reconstruction, respectively. Also suppose that we want tomake the statement in the best possible way in the mean square sense, i.e. wewant E((ξ − ξ)2) to be as small as possible.

Now we know from Theorem 1:1 that the best solution in mean square senseis given by the conditional expectation, ξ = E(ξ | η) = φ(η). On the otherhand, if the variables are jointly Gaussian, then we know from Section 1.5,formula 1.9 that the conditional expectation of ξ given η is linear in η , so thatfor Gaussian variables the optimal solution is

ξ = E(ξ | η) = mξ + ΣξηΣ−1ηη(η − mη), (1.12)

and expression that depends only on the the mean values and the second ordermoments, i.e. variances and covariances.

We now look at the general case, without assuming normality, and restrictourselves to solutions that are linear functions of the observed variables. It isclear that the solution that is optimal in the mean square sense only dependson the mean values and variances/covariances of the variables. It therefore hasthe same form for all variables with the same first and second order moments.Thus, (1.12) gives the best linear predictor in mean square sense.

1.5.3 Some useful inequalities

We shall in the next chapter need the following inequality for the normal densityand distribution functions, φ(x) = 1√

2πe−x2/2 , Φ(x) =

∫ x−∞ φ(y) dy :

φ(x)(

1x− 1

x3

)≤ 1 − Φ(x) ≤ φ(x)

1x

, (1.13)

for x > 0.The following asymptotic expansion is useful as x → ∞ ,

1 − Φ(x) ∼ φ(x)(

1x− 1

x3+

1 · 3x5

− 1 · 3 · 5x7

+ − . . . (−1)k1 · 3 · · · (2k − 1)

x2k+1

).

Here the right hand side overestimates 1 − Φ(x) for x > 0 if k is even andunderestimates it if k is odd. More precisely, the difference between the leftand right hand side is of the same order as φ(x)/x2k+3 as x → ∞ . (The sign∼ means that the ratio between the left and the right hand side goes to 1.)

Section 1.6 Some historical landmarks 19

1.6 Some historical landmarks

This section contains a personal selection of research achievements that haveshaped the theory of stationary processes in general. The examples are cho-sen, not only because they are important in their respective fields, but alsobecause they illustrate the necessity of exchange of ideas between probabilityand statistics theory and applications.

1.6.1 Brownian motion and the Wiener process

The is no other Gaussian process with as wide applicability as the Wiener pro-cess. Even if it a non-stationary process it appears repeatedly in the theory ofstationary processes, and we spend this section to describe some of its propertiesand applications.

Definition 1:5 The Wiener process {w(t); t ≥ 0} is a Gaussian process withw(0) = 0 such that E(w(t)) = 0, and the variance of the increment w(t + h)−w(t) over any interval [t, t + h], h > 0, is proportional to the interval length,

V (w(t + h) − w(t)) = hσ2.

A Wiener process {w(t), t ∈ R} over the whole real line is a combination oftwo independent Wiener processes w1 and w2 , so that w(t) = w1(t) for t ≥ 0and w(t) = w2(−t) for t < 0.

It is an easy consequence of the definition that the increment w(t) − w(s) isuncorrelated with w(s) for s < t ,

V (w(t)) = V (w(s)) + V (w(t) − w(s)) + 2Cov(w(s), w(t) − w(s)),

and that therefore Cov(w(s), w(t)) = σ2 min(s, t). Since the increments overdisjoint intervals are normal, by the definition of a normal process, they arealso independent.

A characteristic feature of the Wiener process is that its future changes arestatistically independent of its actual and previous values. It is intuitively clearthat a process with this property cannot be differentiable. The increment over asmall time interval from t to t+h is of the order

√h , which is small enough to

make the process continuous, but it is too large to give a differentiable process.5

The sample functions are in fact objects that have fractal dimension, and theprocess is self similar in the sense that when magnified with proper scales itretains its statistical geometrical properties. More precisely, for each a > 0,the process

√aw(t/a) has the same distributions as the original process w(t).

The Wiener process is commonly used to model phenomena where the localchanges are virtually independent. Symbolically, one use to write dw(t) for the

5Chapter 2 gives conditions for continuity and differentiability of sample functions.


infinitesimal independent increments, or simply as a “derivative” w′(t). TheBrownian motion is a good example of how one can use the Wiener process toget models with more or less physical realism.

The Brownian motion, first described 1828 by the Scottish biologist RobertBrown, is an erratic movement by small particles immersed in a fluid, for ex-ample pollen particles in water as in Brown’s original experiment. Albert Ein-stein’s presented 1905 a quantitative model for the Brownian movement in hispaper On the movements of small particles in a stationary liquid demanded bythe molecular-kinetic theory of heat, reprinted in [13], based on the assumptionthat the movements are caused by independent impacts on the particle by themolecules of the surrounding fluid medium.

In Einstein’s model the changes in location due to collisions over separatetime intervals are supposed to be independent. This requires however that theparticles have no mass, which physically wrong, but the model is still sufficientlyaccurate for microscopic purposes. According to Einstein, the change in loca-tion in any of the three directions, (x, y, z) over a time interval of length t israndom and normal with mean zero, which is not surprising, since they are theresults of a very large number of independent collisions. What made Einstein’scontribution conclusive was that he derived an expression for the variance interms of other physical parameters, namely

V (x(t)) = V (y(t)) = V (z(t)) = t4RT

Nf= t σ2, (1.14)

where T is the absolute temperature, R is the Boltzmann constant, and Nis Avogadro’s number, i.e. the number of molecules per mole of an ideal gas,and the friction coefficient f depends on the shape and size of the particle andon the viscosity of the fluid. Each coordinate are here independent Wienerprocesses.

Observations of the Brownian movement and estimation of its variancemakes it possible to calculate any of the factors in σ2 , for example N , fromthe other ones. The French physicist J.B. Perrin estimated in a series of ex-periments 1908-1911 Avogadro’s number in this way by observing suspendedrubber particles and found an estimate correct within about 10%.

In a more realistic model, one takes also particle mass and velocity intoaccount. If v(t) denotes the velocity at time t , the fluid offers a resistancefrom the friction force, which is equal to fv(t), with the friction coefficient asin (1.14). Further, the particle offers a resistance to changes in velocity propor-tional to its mass m . Finally, one needs to model the independent collisions bythe fluid molecules, and here the Wiener process can be used, more precisely itsincrements dw(t). This gives the Langevin equation for the particle velocity,

dv(t) + αv(t) dt =1m

dw(t), (1.15)


where α = f/m , and w(t) is a standardized Wiener process. It is usuallywritten

dv(t)dt

+ α v(t) =1m

w′(t). (1.16)

We will meet the Langevin equation in Example 4:3 on the Ornstein-Uhlenbeckprocess in Section 4.3.3.6.

1.6.2 Rice and electronic noise

The two papers Mathematical analysis of random noise, by S.O. Rice, appearedin Bell System Technical Journal, 1944-1945, [27]. They represent a landmarkin the history of stochastic processes in that they bring together and exhibitthe wide applicability of the spectral formulation of a stationary process as asum, or asymptotically an integral, of harmonic cosine functions with randomamplitudes and phases. Correlation functions and their Fourier transforms hadbeen studied at least since the early 1900s, and Rice’s work brought togetherthese results in a systematic way. But it also contained many new results, inparticular pertaining crossing related properties, on the statistical propertiesof stationary processes “obtained by passing random noise through physicaldevices”.

Rice uses the spectral representation of a stationary Gaussian process as asum over discrete positive frequencies ωn > 0,

x(t) =∑

n

an cos ωnt + bn sinωnt =∑n

cn cos(ωnt + φn) (1.17)

where the amplitudes an, bn and normal and independent with mean 0 andE(a2

n) = E(b2n) = σ2

n , and φn uniformly distributed over (0, 2π), independentof the amplitudes. As we shall see in Chapter2 such a process has covariance

r(t) =∑n

σ2n cos ωnt.

The spectral distributions function is a discrete distribution with point massσ2

n/2 at the symmetricly located frequencies ±ωn .The absolutely continuous, integral form of the spectral representation is

presented as limiting cases in Rice’s work. At about the same time, Cramergave a probabilistic formulation of the continuous spectral representation, in amathematically impeccable way; [6, 7].

Besides the previously known “Rice’s formula” for the expected number oflevel crossings, Rice’s 1945 paper also analyzed crossing and excursion distri-butions and investigated the joint occurrence of crossings of a fixed level attwo distinct points, necessary for calculation of the variance of the number ofcrossings.

The flexibility and generality of Rice’s methods and examples, made cor-relation and spectral theory fundamental ingredients in communication theoryand signal processing for decades to come. An example by Rice himself, is theingenious explanation of the intriguing click noise in analogue FM-radio, [28].


1.6.3 Gaussian random wave models

Steve Rice’s analysis of time dependent stationary processes had, as mentioned,great influence on signal processing in the information sciences. Less well knownin the statistical world is the effect his work had in oceanography and navalarchitecture.

It is well worth to cite in extenso (references deleted) the first two paragraphsin Manley St. Denis and Willard J. Pierson’s paper: On the motion of shipsin confused seas, which came out 1954, [32].

History

Three years ago to first co-author of the present work collab-orated with Weinblum in the writing of a paper entitled “On themotion of ships at sea”. In that paper Lord Rayleigh was quotedsaying: “The basic law of the seaway is the apparent lack of anylaw”. Having made this quotation, however, the authors then pro-ceed to consider the seaway as being composed of “a regular trainof waves defined by simple equations”. This artificial substitutionof pattern for chaos was dictated by the necessity of reducing theutterly confused reality to a simple form amenable to mathematicaltreatment.

Yet at the same time and in other fields the challenging studyof confusion was being actively pursued. Thus in 1945 Rice waswriting on the mathematical analysis of random noise and in 1949Tukey and Hamming were writing on the properties of stationarytime series and their power spectra in connection with colored noise.In the same year Wiener published his now famous book on timeseries. These works were written as contributions to the theory ofcommunication. Nevertheless the fundamental mathematical disci-pline expounded therein can readily be extended to other fields ofscientific endeavor. Thus in 1952 the second co-author, inspired bya contribution of Tukey, was able to apply the foregoing theories tothe study of actual ocean waves. As the result of analyses of actualwave records, he succeeded in giving not only a logical explanationas to why waves are irregular, but a statement as well of the lawsunderlying the behavior of a seaway. There is indeed a basic law ofthe seaway. Contrary to the obvious inference from the quotationof Lord Rayleigh, the seaway can be described mathematically andprecisely, albeit in a statistical way.

If Rice’s work had been in the vein of generally accepted ideas in communi-cation theory, the St Denis and Pierson paper represented a complete revolutionin common naval practice. Nevertheless, its treatment of irregular water wavesas, what now is called, a random field was almost immediately accepted, andset a standard for much of naval architecture.


One possible reason for this can be that the authors succeeded to formulateand analyze the motions of a ship that moved with constant speed through thefield in a rational way. The random sea could directly be used as input to alinear (later also non-linear) filter representing the ship.

St. Denis and Pierson extended the one-dimensional description of a timedependent process {x(t), t ∈ R}, useful for example to model the waves mea-sured at a single point, to a random field x(t, (s1, s2)) with time and locationparameter (s1, s2). They generalized the sum (1.17) to be a sum of a packet ofdirected waves, with ω = (ω, κ1, κ2),∑

ω

Aω cos(ωt − κ1s1 − κ2s2 + φω). (1.18)

with random amplitude and phase.For fixed t each element in (1.18) is a cosine-function in the plane, which is

zero along lines ωt− κ1s1 − κ2s2 + φω = π/2 + kπ , k integer. The parametersκ1 and κ2 are called the wave numbers. For fixed (s1, s2) it is a cosine wavewith (angular) frequency ω .

Water waves are special cases of homogeneous random fields, for which thereis a special relation between time and space frequencies (wave numbers). Fora one-dimensional time dependent Gaussian wave x(t, s), where s is distancealong an axis, the elementary waves have the form

Aω cos(ωt − κs + φω).

By physical considerations one can derive an explicit relation, called the dis-persion relation, between wave number κ and frequency ω . If h is the waterdepth, then

ω2 = κg tanh(hκ),

which for infinite depth reduces to ω2 = κg . Here g is the constant of gravity.The case of a two-dimensional time dependent Gaussian wave x(t, s1, s2),

the elementary waves with frequency ω and direction θ becomes

Aω cos(ωt − κ(s1 cos θ + s2 sin θ) + φω),

where κ is given by the dispersion relation.The spectral distribution is often written in polar form, with spectral density

f(ω, θ) = f(ω)g(ω, θ),

where the spreading function g(ω, θ) has∫ 2π0 g(ω, θ) dω = 1.

In their paper St. Denis and Pierson also laid out the theory for how thewave spectrum should be transformed to a response spectrum for the motion ofa ship, and they also described how the spectrum is changed to an encounteredspectrum when a ship sails with constant speed through the waves.


1.6.4 Detection theory and statistical inference

The first three landmarks illustrated the relation between stochastic modelbuilding and physical knowledge, in particular how the concepts of statisticalindependence and dependence between signal and functions relate to the phys-ical world. About the same time as Rice and StDenis & Pierson advancedphysically based stochastic modeling, the statistical inference methodology wasplaced firmly into a theoretical mathematical framework, as documented by theclassical book by Harald Cramer, Mathematical methods of Statistics, 1945, [8].

A few years later, the connection between the theoretical basis bor statis-tical inference and important engineering questions related to signal detectionwas elegantly exploited by Ulf Grenander in his PhD thesis from Stockholm,Stochastic processes and statistical inference, [15]. The classical problem in sig-nal processing of deciding whether a deterministic signal of known shape s(t) ispresent in an environment of Gaussian dependent, colored as opposed to white,random noise, x(t) can be treated as an infinite dimensional decision problem,testing an infinite dimensional statistical hypothesis; see also the classical bookon detection theory [33]

Suppose one observes a Gaussian stochastic process x(t), a ≤ t ≤ b , withknown correlation structure, but with unknown mean value function m(t). Ifno signal is present, the mean value is 0, but with signal, the mean is equal tothe known function s(t). In statistical terms one has to test the following twohypotheses against each other:

H0 : m(t) = 0,

H1 : m(t) = s(t).

Grenander introduced a series of independent Gaussian observables, yk =∫hk(t)x(t) dt , by chosing the filter functions hk as solutions to the integral

equation ∫r(s, t)hk(t) dt = ckhk(s),

with c1 ≥ c2 ≥ . . . , ck → 0 as k → ∞ . Under H1 the observables will havemean ak =

∫hk(t)m(t) dt and variance ck , while under H0 they will have mean

0, and the same variance. So instead of a continuous problem, we have gottena denumerable problem, in which one can make a Likelihood-Ratio test of thetwo alternatives. We will return to this problem in Section 4.5, Example 4:12.


Exercises

1:1. Consider the sample space Ω = [0, 1] with uniform probability P , i.e.P ([a, b]) = b − a, 0 ≤ a ≤ b ≤ 1. Construct a stochastic process y =(x1, x2, . . .) on Ω such that the components are independent zero-onevariables, with P (xk = 0) = P (xk = 1). What is the distribution of∑∞

k=1 xk/2k ?

1:2. Show that B(R) = the class of Borel sets in R is generated by

a) the open intervals,

b) the closed intervals.

1:3. A set A ⊂ {1, 2, . . .} is said to have asymptotic density θ if

limn→∞n−1|A ∩ {1, 2, . . . , n}| = θ.

(Note, |B| denotes the number of elements in B .) Let A be the familyof sets for which the asymptotic density exists. Is A a field? A σ -field?

1:4. Let x1, x2, . . . be random variables with values in a countable set F , andsuppose there are real constants ak such that

∞∑k=1

P (xk = ak) < ∞,

∞∑k=1

ak < ∞.

Prove that the sum x =∑∞

k=1 xk has a discrete distribution, i.e. thereexists a countable set D such that P (x ∈ D) = 1. (Hint: Use the Borel-Cantelli lemma, that says that if

∑k P (Ak) < ∞ then, with probability

one, only a finite number of the events Ak occur.

Show by example that it is possible for independent random variables xk

to have a sum∑∞

k=1 xk with a continuous distribution, although all xk

are discrete variables with a common value space – obviously they cannot be identically distributed.

1:5. Take Rn and motivate that the family F0 whose elements are unions offinitely many rectangles (ai, bj ] (with possibly infinite end points) is afield.

Let T be an interval and convince yourself that the finite dimensionalrectangles in RT and unions of finitely many such rectangles, is a field.

1:6. Take T = [0, 1], and consider the set of functions which are continuouson the rational numbers, i.e.

CQ = {x ∈ RT ;x(q) → x(q0) for all rational numbers q0},where the limit is taken as q tends to q0 through the rational numbers.Show that CQ ∈ BT .


1:7. Prove Theorem 1:1.

1:8. Prove that the increments of a Wiener process, as defined as in Defini-tion 1:5, are independent and normal.

Chapter 2

Stochastic analysis

This chapter is the stochastic equivalent of real analysis and integration. As inits deterministic counterpart limiting concepts and conditions for the existenceof limits are fundamental. We repeat the basic stochastic limit definitions; asummary of basic concepts and results on stochastic analysis is given in Ap-pendix B.

Definition 2:1 Let {xn}∞n=1 be a random sequence, with the random variablesx1(ω), x2(ω), . . . defined on the same probability space as a random variablex = x(ω). Then, the convergence xn → x as n → ∞ can be defined in threeways:

• almost surely, with probability one (xna.s.→ x): P ({ω;xn → x}) = 1;

• in quadratic mean (xnq.m.→ x): E

(|xn − x|2

)→ 0;

• in probability (xnP→ x): for every ε > 0, P (|xn − x| > ε) → 0.

In Appendix B we give several conditions, necessary and sufficient, as well asonly sufficient, for convergence of a random sequence xn . The most useful ofthese involve only conditions on the bivariate distributions of xm and xn . Weshall in this chapter examine such conditions for sample function continuity,differentiability, and integrability. We shall also give conditions which guar-antee that only simple discontinuities occur. In particular, we shall formulateconditions in terms of bivariate distributions, which are easily checked for moststandard processes, such as the normal and the Poisson process.

2.1 Quadratic mean properties

We first repair some concepts and properties that may be well known fromprevious courses in stochastic processes. We return to proofs and more detailsin Section 2.4.

27

28 Stochastic analysis Chapter 2

A stochastic process {x(t), t ∈ R} is said to be continuous in quadraticmean (or L2 -continuous) at time t if

x(t + h)q.m.→ x(t)

as h → 0, i.e. if E((x(t+h)−x(t))2) → 0. It is called differentiable in quadraticmean with derivative y(t) if

x(t + h) − x(t)h

q.m.→ y(t)

as h → 0. Of course, the process {y(t), t ∈ R} is called the (quadratic mean)derivative of {x(t), t ∈ R} and is denoted x′(t). A stationary process x(t) iscontinuous in quadratic mean if its covariance function r(t) is continuous att = 0. It is differentiable if r(t) is twice differentiable and then the derivativehas covariance function

rx′(t) = −r′′(t).

Second and higher order derivatives are defined recursively: {x(t), t ∈ R}is twice differentiable in quadratic mean if and only if its (quadratic mean)derivative is (quadratic mean) differentiable, i.e. if riv(t) exists. etc. Thecovariance function of {x′′(t), t ∈ R} is rx′′(t) = riv(t).

Expressed in terms of the spectral distribution function F (ω), the processis differentiable in quadratic mean if and only if the second spectral moment isfinite, i.e.

ω2 =∫ ∞

−∞ω2 dF (ω) < ∞;

for a proof, see Lemma 2.3, page 47. Since ω2 = −r′′(0) = V (x′(t)), thefiniteness of ω2 is necessary and sufficient for the existence of a quadraticmean derivative. Analogous relations hold for higher derivatives of order kand the spectral moments ω2k =

∫ω2k dF (ω). We will give some more details

on quadratic mean properties in Section 2.4.

2.2 Sample function continuity

2.2.1 Countable and uncountable events

The first problem that we encounter with sample function continuity is thatthe sample event of interest, namely the set of continuous functions,

C = {x ∈ RT ;x(·) is a continuous function},

does not have a countable basis, and is not a Borel set, i.e. C ∈ BT . If {x(t); t ∈T} is a stochastic process on a probability space (Ω,F , P ), then the probabilityP (C) need not be defined – it depends on the structure of (Ω,F) and on howcomplicated x is in itself, as a function on Ω. In particular, even if P (C) is

Section 2.2 Sample function continuity 29

defined, it is not uniquely determined by the finite-dimensional distributions.To see this, take a process on a sufficiently rich sample space, i.e. one thatcontains enough sample points ω , and suppose we have defined a stochasticprocess {x(t); t ∈ R}, which has, with probability one, continuous sample paths.Then x has a certain family of finite-dimensional distributions. Now, takea random time, τ , independent of x , and with continuous distribution, forexample an exponential distribution.1 Then define a new process {y(t); t ∈ R},such that

y(t) = x(t), for t = τ ,y(τ) = x(τ) + 1.

Then y has the same finite-dimensional distributions as x but its sample func-tions are always discontinuous at τ .

2.2.1.1 Equivalence

In the constructed example, the two processes x and y differ only at a singlepoint τ , and as we constructed τ to be random with continuous distribution,we have

P (x(t) = y(t)) = 1, for all t. (2.1)

Two processes x and y which satisfy (2.1) are called equivalent. The samplepaths of two equivalent process always coincide, with probability one, whenlooked at a fixed, pre-determined time point. (In the example above, the timeτ where they differed was random.)

2.2.1.2 Separability

The annoying fact that a stochastic process can fail to fulfill some naturalregularity condition, such as continuity, even if it by all natural standards shouldbe regular, can be partly neutralized by the concept of separability, introducedby Doob. It uses the approximation by sets with countable basis mentionedin Section 1.3.3. Loosely speaking, a process {x(t), t ∈ R} is separable in aninterval I if there exists a countable set of t-values T = {tk} ⊂ I such thatthe process, with probability one, does not behave more irregularly on I thanit does already on T . An important consequence is that for all t in the interiorof I , there are sequences τ1 < τ2 < . . . τn ↑ t and τ ′

1 > τ ′2 > . . . τ ′

n ↓ t suchthat, with probability one,

lim infn→∞x(τn) = lim infτ↑tx(τ) ≤ lim supτ↑tx(τ) = lim supn→∞x(τn),

with a similar set of relations for the sequence τ ′n . Hence, if the process is

continuous on any discrete set of points then it is continuous. Every processhas an equivalent separable version; see [11].

1This is where it is necessary that Ω is rich enough so we can define an independent τ .


2.2.2 Conditions for sample function continuity

The finite-dimensional distributions of any two equivalent processes x(t) andy(t) are always the same – show that as an exercise. We shall now see underwhat conditions, on the finite-dimensional distribution functions, we can assumea stochastic process to have continuous sample paths. Conditions will be givenboth in terms of the bivariate distributions directly, and in terms of probabilisticbounds on the process increments. As we have seen, one has to be satisfied if,among all equivalent processes, one can find one which has continuous samplepaths.

Theorem 2:1 Let {x(t); 0 ≤ t ≤ 1} be a given stochastic process. If there existtwo non-decreasing functions, g(h) and q(h), 0 ≤ h ≤ 1, such that

∞∑n=1

g(2−n) < ∞∞∑

n=1

2n q(2−n) < ∞,

and, for all t < t + h in [0, 1],

P (|x(t + h) − x(t)| ≥ g(h)) ≤ q(h), (2.2)

then there exists an equivalent stochastic process y(t) whose sample paths are,with probability one, continuous on [0, 1].

Proof: Start with the process x(t) with given finite-dimensional distributions.Such a process exists, and what is questioned is whether it has continuous sam-ple functions if its bivariate distributions satisfy the conditions in the theorem.We shall now explicitly construct a process y(t), equivalent to x(t), and withcontinuous sample paths. Then y(t) will automatically have the same finite-dimensional distributions as x(t). The process y(t) shall be constructed as thelimit of a sequence of piecewise linear functions xn(t), which have the correctdistribution at the dyadic time points of order n ,

t(k)n = k/2n, k = 0, 1, . . . , 2n; n = 1, 2, . . . .

Define the process xn equal to x at the dyadic points,

xn(t) = x(t), for t = t(k)n , k = 0, 1, . . . , 2n ,

and let it be linear between these points; see Figure 2.1.Then we can estimate the maximal distance between two successive approx-

imations. As is obvious from the figure, the maximal difference between twosuccessive approximations for t between t

(k)n and t

(k+1)n , occurs in the middle


xn

xn+1

t(k)n t

(k+1)n

= t(2k)n+1 t

(2k+1)n+1 = t

(2k+2)n+1

Figure 2.1: Successive approximations with piecewise linear functions.

of the interval, and hence

|xn+1(t) − xn(t)| ≤∣∣∣∣x(t(2k+1)

n+1 ) − 12

(x(t(k)

n ) + x(t(k+1)n )

)∣∣∣∣≤ 1

2

∣∣∣x(t(2k+1)n+1 ) − x(t(2k)

n+1)∣∣∣+ 1

2

∣∣∣x(t(2k+2)n+1 ) − x(t(2k+1)

n+1 )∣∣∣

=12A +

12B, say.

The tail distribution of the maximal difference between two successive ap-proximations,

M (k)n = max

t(k)n ≤t≤t

(k+1)n

|xn+1(t) − xn(t)| ≤ 12A +

12B,

can therefore be estimated by

P (M (k)n ≥ c) ≤ P (A ≥ c) + P (B ≥ c),

since if M(n)n ≥ c , then either A ≥ c or B ≥ c , or both.

Now take c = g(2−n−1) and use the bound (2.2), to get

P (M (k)n ≥ g(2−n−1)) ≤ 2q(2−n−1),


for each k = 0, 1, . . . , 2n − 1. By Boole’s inequality2 we get, since there are 2n

intervals,

P

(max0≤t≤1

|xn+1(t) − xn(t)| ≥ g(2−n−1))

= P

(2n−1⋃k=0

M (k)n ≥ g(2−n−1)

)≤ 2n+1 q(2−n−1).

Now∑

n 2n+1q(2−n−1) < ∞ by assumption, and then the Borel-Cantellilemma (see Exercises in Appendix B), gives that, with probability one, onlyfinitely many of the events

max0≤t≤1

|xn+1(t) − xn(t)| ≥ g(2−n)

occur. This means that there is a set Ω0 with P (Ω0) = 1, such that for everyoutcome ω ∈ Ω0 , from some integer N (depending of the outcome, N = N(ω))and onwards, (n ≥ N ),

max0≤t≤1

|xn+1(t) − xn(t)| < g(2−n).

First of all, this shows that there exists a limiting function y(t) for allω ∈ Ω0 ; the condition (B.4) for almost sure convergence, given in Appendix B,says that limn→∞ xn(t) exists with probability one.

It also shows that the convergence is uniform: for ω ∈ Ω0 and n ≥ N ,m > 0,

|xn+m(t) − xn(t)|≤ |xn+1(t) − xn(t)| + |xn+2(t) − xn+1(t)| + . . . + |xn+m(t) − xn+m−1(t)|

≤m−1∑j=0

g(2−n−j) ≤∞∑

j=0

g(2−n−j).

Letting m → ∞ , so that xn+m(t) → y(t), and observing that the inequalitieshold for all t ∈ [0, 1], we get that

max0≤t≤1

|y(t) − xn(t)| ≤∞∑

j=0

g(2−n−j) =∞∑

j=n

g(2−j).

Since this bound tends to 0 as n → ∞ , we have the uniform convergence, andsince all xn are continuous functions, we also have that y is continuous for allω ∈ Ω0 . For ω ∈ Ω0 , define y(t) ≡ 0, making y a continuous function for allω ∈ Ω.

2 P (∪Ak) ≤ PP (Ak) .


It remains to prove that x and y are equivalent, i.e. P (x(t) = y(t)) = 1,for all t ∈ [0, 1]. For that sake, take any t ∈ [0, 1] and find a sequence of dyadicnumbers t

(kn)n → t such that

t(kn)n ≤ t < t(kn)

n + 2−n.

Since both g(h) and q(h) are non-decreasing, we have from (2.2),

P(∣∣∣x(t(kn)

n ) − x(t)∣∣∣ ≥ g(2−n)

)≤ P

(∣∣∣x(t(kn)n ) − x(t)

∣∣∣ ≥ g(t − t(kn)n )

)≤ q(t − t(kn)

n ) ≤ q(2−n).

Adding over n gives

∞∑n=1

P(∣∣∣x(t(kn)

n ) − x(t)∣∣∣ ≥ g(2−n)

)≤

∞∑n=1

q(2−n) < ∞,

and it follows from the Borel-Cantelli lemma that it can happen only finitelymany times that |x(t(kn)

n ) − x(t)| ≥ g(2−n). Since g(2−n) → 0 as n → ∞ ,we have proved that x(t(kn)

n ) → x(t) with probability one. Further, since y(t)is continuous, y(t(kn)

n ) → y(t). But x(t(kn)n ) = y(t(kn)

n ), and therefore the twolimits are equal, with probability one, as was to be proved. �

The theorem says that for each process x(t) that satisfies the conditionsthere exists at least one other equivalent process y(t) with continuous samplepaths, and with exactly the same finite-dimensional distributions. Of course itseems unnecessary to start with x(t) and immediately change to an equivalentcontinuous process y(t). In the future we assume that we only have the contin-uous version, whenever the sufficient conditions for sample function continuityare satisfied.

2.2.2.1 Special conditions for continuity

Theorem 2:1 is simple to use, since it depends only on the distribution of theincrements of the process, and involves only bivariate distributions. For specialprocesses conditions that put bounds on the moments of the increments areeven simpler to use. One such is the following.

Corollary 2.1 If there exist constants C , and r > p > 0, such that for allsmall enough h > 0,

E (|x(t + h) − x(t)|p) ≤ C|h|

| log |h||1+r(2.3)

then the condition in Theorem 2:1 is satisfied and the process has, with proba-bility one, continuous sample paths.


Note, that many processes satisfy a stronger inequality than (2.3), namely

E (|x(t + h) − x(t)|p) ≤ C|h|1+c (2.4)

for some constants C , and c > 0, p > 0. Then (2.3) is automatically satisfiedwith any r > p, and the process has, with probability one, continuous samplepaths.

Proof: Markov’s inequality, a generalization of Chebysjev’s inequality, statesthat for all random variables U , P (|U | ≥ λ) ≤ E(|U |p)/λp. Apply the theoremwith g(h) = | log |h||−b , 1 < b < r/p . One gets,

P (|x(t + h) − x(t)| > g(h)) ≤ C|h|| log |h||1+r−bp

.

Since b > 1, one has∑

g(2−n) =∑ 1

(n log 2)b < ∞ , and, with

q(h) = C|h|/| log |h||1+r−bp,

and 1 + r − bp > 1,∑2nq(2−n) =

∑ C

(n log 2)1+r−bp< ∞,

which proves the assertion. �

Example 2:1 We show that the Wiener process W (t) has, with probabilityone, continuous sample paths. In the standard Wiener process, the incrementW (t + h) − W (t), h > 0, is Gaussian with mean 0 and variance h . Thus,

E(|W (t + h) − W (t)|p) = C|h|p/2,

with C = E(|U |p) < ∞ , for a standard normal variable U , giving the momentbound

E(|W (t + h) − W (t)|4) = C|h|2 <|h|

| log |h||6 ,

for small h . We see that condition (2.3) in the corollary is satisfied with r =5 > 4 = p . Condition (2.4) is satisfied with p = 3, c = 1/2.

2.2.2.2 Continuity of stationary processes

Stationary processes have constant mean and a covariance function

r(t) = Cov(x(s + t), x(s)),

which is a function only of the time lag t . Since the increments have variance

E((x(t + h) − x(t))2) = 2(r(0) − r(h)), (2.5)


it is clear that continuity conditions can be formulated in terms of the covariancefunction. Equivalent conditions can be formulated by means of the spectraldistribution function F (ω), which is such that

r(t) =∫ ∞

−∞eiωt dF (ω).

A first immediate consequence of (2.5) is that x(t + h)q.m.→ x(t) as h → 0

if and only if the covariance function r(t) is continuous for t = 0. For samplefunction continuity, a sufficient condition in terms of the covariance functionfollows directly from Corollary 2.1.

Theorem 2:2 If r(t) is the covariance function of a stationary stochastic pro-cess x(t), such that, as t → 0,

r(t) = r(0) − O

( |t|| log |t||q

), (2.6)

for some q > 3, then x(t) has3 continuous sample functions.4

2.2.2.3 Continuity of stationary Gaussian processes

For stationary Gaussian processes the conditions for sample function continuitycan be considerably weakened, to require slightly more that just continuity ofthe covariance function. We state the sufficient condition both in terms of thecovariance function and in terms of the spectrum, and to this end, we formulatean analytic lemma, the proof of which can be found in [9, Sect. 9.3].

Lemma 2.1 a) If, for some a > 0,∫ ∞

0(log(1 + ω))a dF (ω) < ∞, (2.7)

thenr(t) = r(0) − O

(| log |t||−b

), as t → 0 , (2.8)

for any b ≤ a.

b) If (2.8) holds for some b > 0, then (2.7) is satisfied for any a < b.

Theorem 2:3 A stationary Gaussian process x(t) has, with probability one,continuous sample paths if, for some a > 3, any of the following conditions issatisfied:

r(t) = r(0) − O(| log |t||−a

), as t → 0, (2.9)∫ ∞

0(log(1 + ω))a dF (ω) < ∞. (2.10)

3Or rather ”Is equivalent to a process that has . . . ”4The notation f(x) = g(x)+0(h(x)) as x → 0 means that |(f(x)−g(x))/h(x)| is bounded

by some finite constant C as x → 0.


Proof: In a stationary Gaussian process, x(t + h) − x(t) has a normal distri-bution with mean zero and variance σ2

h = 2(r(0)− r(h)), where by assumption

σh ≤ C

| log |h||a/2

for some constant C > 0. Writing Φ(x) =∫ x−∞ φ(y) dy = 1√

2π

∫ x−∞ e−y2/2 dy

for the standard normal distribution, we have

P (|x(t + h) − x(t)| > g(h)) = 2{

1 − Φ(

g(h)σh

)}Now, take g(h) = | log |h|/ log 2|−b , where b is chosen so that 1 < b < (a−1)/2,which is possible since a > 3 by assumption. From the bound (1.13) of thenormal distribution tail, 1 − Φ(x) ≤ φ(x)/x , we then get

P (|x(t + h) − x(t)| > g(h)) ≤ 2

{1 − Φ

(g(h)| log |h||a/2

C

)}

≤ 2Cg(h)| log |h||a/2

φ

(g(h)| log |h||a/2

C

)= q(h), say.

The reader should complete the proof and show that∑g(2−n) < ∞ and

∑2nq(2−n) < ∞,

and then apply Theorem 2:1 to see that (2.9) is sufficient.Lemma 2.1 shows that also (2.10) is sufficient for sample function continuity

for a Gaussian process. �

Example 2:2 Any stationary process with

r(t) = r(0) − C|t|α + o(|t|α),

some C > 0, as t → 0,5 has continuous sample paths if 1 < α ≤ 2. If itfurthermore is a Gaussian process it is continuous if 0 < α ≤ 2.

Remark 2:1 The sufficient conditions for sample function continuity given inthe theorems are satisfied for almost all covariance functions that are encoun-tered in applied probability. But for Gaussian processes, even the weak condition(2.9) for a > 3, can be relaxed to require only that a > 1, which is very closeto being necessary; see [9, Sect. 9.5].

Gaussian stationary processes which are not continuous behave with neces-sity very badly, and it can be shown that the sample functions are, with prob-ability one, unbounded in any interval. This was shown by Belyaev [4] but isalso a consequence of a theorem by Dobrushin [10], see also [9, Ch. 9.5].

5At this stage you should convince yourself that α > 2 impossible?

Section 2.3 Derivatives, tangents, and other characteristics 37

2.2.3 Probability measures on C[0, 1]

We can now complete the table at the end of Section 1.3.1 and define proba-bilities on C[0, 1], the space of continuous functions on the interval [0, 1]. Astochastic process {x(t)}0≤t≤1 on a probability space (Ω,F) has its realizationsin R[0,1] . If the finite-dimensional distributions of x satisfy any of the sufficientconditions for sample function continuity, either x(t) or an equivalent processy(t) has, with probability one, continuous sample functions, and hence has itsrealizations in C[0, 1] ⊂ R[0,1] .

In the same way as the finite-dimensional distributions define a probabilitymeasure on (R[0,1],B[0,1]), assigning probabilities to all Borel sets, we can nowdefine a probability measure on C[0, 1]. The question is, what is the σ -field ofevents which get probability? In fact, we can take the simplest choice,

B = B[0,1] ∩ C[0, 1] ={B ∩ C[0, 1];B ∈ B[0,1]

},

i.e. take those parts of the Borel-sets which intersect C[0, 1].

Theorem 2:4 If {Ftn}∞n=1 is a family of finite-dimensional distributions thatsatisfy any of the sufficient conditions for sample functions continuity, thenthere exists a probability measure on (C[0, 1],B) such that the co-ordinate pro-cess

{x(t)}0≤t≤1

has the given finite-dimensional distributions.

2.2.3.1 Open sets in C[0, 1]

The family B can be described alternatively in terms of open sets in C[0, 1].Take a continuous function x(t) ∈ C[0, 1]. By an ε-surrounding of x we meanthe set of functions which are in an ε-band around x ,{

y ∈ C[0, 1]; max0≤t≤1

|y(t) − x(t)| < ε

}.

A set of functions A ⊆ C[0, 1] is called open if for every x ∈ A , there is anε-surrounding of x that is completely in A . The open sets in C[0, 1] generatea σ -field, the smallest σ -field that contains all open sets, and that σ -field isexactly B .

2.3 Derivatives, tangents, and other characteristics

2.3.1 Differentiability

2.3.1.1 General conditions

When is a continuous stochastic process differentiable in the sense that itssample functions are continuously differentiable? The answer can be given


as conditions similar to those for sample function continuity, but now withbounds on the second order differences. By pasting together piecewise linearapproximations by means of smooth arcs, one can prove the following theorem;see [9, Sect. 4.3].

Theorem 2:5 a) Suppose the stochastic process {x(t); 0 ≤ t ≤ 1} satisfies theconditions for sample function continuity in Theorem 2:1. If, furthermore, forall t − h and t + h in [0, 1],

P (|x(t + h) − 2x(t) + x(t − h)| ≥ g1(h)) ≤ q1(h), (2.11)

where g1 and q1 are two non-decreasing functions, such that

∞∑n=1

2ng1(2−n) < ∞ and∞∑

n=1

2nq1(2−n) < ∞,

then there exists an equivalent process {y(t); 0 ≤ t ≤ 1} with continuouslydifferentiable sample paths.

b) The sufficient condition in (a) is satisfied if

E(|x(t + h) − 2x(t) + x(t − h)|p) ≤ K|h|1+p

| log |h||1+r, (2.12)

for some constants p < r and K .

c) Many processes satisfy a stronger inequality than (2.12), namely

E (|x(t + h) − 2x(t) + x(t − h)|p) ≤ C|h|1+p+c, (2.13)

for some constants C , and c > 0, p > 0. Then (2.12) is satisfied, and theprocess has, with probability one, continuously differentiable sample paths.

In Section 2.1 we mentioned a condition for quadratic mean (q.m.) differ-entiability of a stationary process. One may ask: What is the relation betweenthe q.m.-derivative and the sample function derivative? They are both limitsof the differential quotient (x(t + h) − x(t))/h as h → 0. Now, it is easy toprove that if the limit exists in both quadratic mean and as a sample functionlimit with probability one, then the two limits are equal (also with probabilityone), and hence the two derivative processes are equivalent, and have the samefinite-dimensional distributions. It follows that the covariance function of thederivative {x′(t), t ∈ R} is rx′(t) = −r′′x(t).


2.3.1.2 Differentiable Gaussian processes

In order that a stationary Gaussian process has continuously differentiable sam-ple functions it is necessary that its covariance function has a smooth higherorder Taylor expansion; cf. condition (2.9).

As we shall see in the following theorem, demanding just slightly morethan a finite second spectral moment guarantees that a Gaussian process hascontinuously differentiable sample paths. For a proof, the reader is referred to[9, Sect. 9.3].

Theorem 2:6 a) A stationary Gaussian process is continuously differentiable6

if, for some a > 3, its covariance function has the expansion

r(t) = r(0) − ω2t2

2+ O

{t2

| log |t||a}

, (2.14)

where ω2 = −r′′(0) < ∞.

b) Condition (2.14) can be replaced by the condition that, for some a > 3,∫ ∞

0ω2 (log(1 + ω))a dF (ω) < ∞. (2.15)

As for sample function continuity, it can be shown that for continuouslydifferentiable sample functions, it suffices that the constant a is greater than1.

Example 2:3 Condition (2.14) is easy to use for Gaussian processes. Mostcovariance functions used in practice have an expansion

r(t) = r(0) − ω2t2

2+ O(|t|a),

where a is an integer, either 3 or 4. Then the process is continuously differ-entiable. Processes with covariance function admitting an expansion r(t) =r(0) − C|t|α + o(|t|α) with α < 2 are not differentiable; they are not even dif-ferentiable in quadratic mean. An example is the Ornstein-Uhlenbeck processwith r(t) = r(0)e−C|t| .

As a final example of covariance conditions, we encourage the reader toprove a sufficient condition in terms of −r′′(t), the covariance function of thequadratic mean derivative.

Theorem 2:7 A stationary process x(t) is continuously differentiable if anyof the following two conditions hold,

6As usual, this means that there exists an equivalent process that, with probability one,

has continuously differentiable sample functions.


a) −r′′(t) = −r′′(0) − C|t|α + o(|t|α) with 1 < α ≤ 2,

b) it is Gaussian and −r′′(t) = −r′′(0) − C|t|α + o(|t|α) with 0 < α ≤ 2.

2.3.2 Jump discontinuities and Holder conditions

What type of discontinuities are possible for stochastic processes which do nothave continuous sample functions? For example, how do we know that thePoisson process has sample functions that increase only with jumps of size one?We would like to have a condition that guarantees that only simple discontinu-ities are possible, and such a condition exists, with restriction on the incrementover two adjacent interval. Similarly, for a process which is not continuouslydifferentiable, how far from differentiable are the sample functions.

2.3.2.1 Jump discontinuities

The proof of the following theorem is indicated in [9, Sec. 4.4].

Theorem 2:8 If there are positive constants C , p, r , such that for all s, t with0 ≤ t < s < t + h ≤ 1,

E {|x(t + h) − x(s)|p · |x(s) − x(t)|p} ≤ C|h|1+r (2.16)

then the process {x(t); 0 ≤ t ≤ 1} has, with probability one,7 sample functionswith at most jump discontinuities, i.e.

limt↓t0

x(t) and limt↑t0

x(t)

exist for every t0 ∈ [0, 1].

Example 2:4 The Poisson process with intensity λ has independent incre-ments and hence

E{|x(t + h) − x(s)|2 · |x(s) − x(t)|2

}= E(|x(t + h) − x(s)|2) · E(|x(s) − x(t)|2)= (λ(t + h − s) + (λ(t + h − s))2) · (λ(s − t) + (λ(s − t))2) ≤ Cλ2h2.

The conditions of the theorem are obviously satisfied.

7As usual, this means that there exists an equivalent process with this property.


2.3.2.2 Holder continuity and the continuity modulus

How large are the increments in a non-differentiable process? In fact, momentbounds on the increments give precise estimates for the distribution of thecontinuity modulus ωx(h) of the sample functions. This is defined as

ωx(h) = sup|s−t|≤h

|x(s) − x(t)| (2.17)

where the supremum is taken over 0 ≤ s, t ≤ 1. (For a continuous process thesupremum in (2.17) can be taken over the countable rationals, so it is a welldefined random variable.)

Functions for which there exist constants A, a such that for all t, t+h ∈ [0, 1]

|x(t + h) − x(t)| ≤ A|h|a,are said to be Holder continuous of order a , and to satisfy a Lipschitz conditionof order a .

In this section we shall present stochastic estimates of the continuity modu-lus for a stochastic process, and also give a sufficient condition for a stochasticprocess to be Holder continuous of order a < r/p ≤ 1.

Theorem 2:9 If there are constants C , p ≥ r > 0 such that

E {|x(t + h) − x(t)|p} ≤ C|h|1+r (2.18)

then there exists a random variable A with P (A < ∞) = 1, such that thecontinuity modulus satisfies an inequality

ωx(h) ≤ A|h|a, for all h > 0 ,

for all 0 < a < r/p.

Proof: First examine the increments over the dyadic numbers

t(k)n = k/2n, k = 0, 1, . . . , 2n; n = 1, 2, . . . .

Take an a < r/p and write δ = 2−a . Then, by Markov’s inequality,

P (|x(t(k+1)n ) − x(t(k)

n )| > δn) ≤ E(|x(t(k+1)n ) − x(t(k)

n )|p)δnp

≤ C(2−n)1+r

2−anp=

C

2n(1+r−ap).

From Boole’s inequality, summing over n , we obtain in succession,

P ( max0≤k≤2n−1

|x(t(k+1)n ) − x(t(k)

n )| > δn) ≤ C2n

2n(1+r−ap)

=C

2n(r−ap),

∞∑n=0

P ( max0≤k≤2n−1

|x(t(k+1)n ) − x(t(k)

n )| > δn) ≤∞∑

n=0

C

2n(r−ap)< ∞,


since r−ap > 0. The Borel-Cantelli lemma gives that only finitely many events

An = { max0≤k≤2n−1

|x(t(k+1)n ) − x(t(k)

n )| > δn}

occur, which means that there exists a random index ν such that for all n > ν ,

|x(t(k+1)n ) − x(t(k)

n )| ≤ δn for all k = 0, 1, . . . , 2n − 1. (2.19)

Next we estimate the increment from a dyadic point t(k)n to an arbitrary

point t . To that end, take any t ∈ [t(k)n , t

(k+1)n ), and consider its dyadic expan-

sion, (αm = 0 or 1),

t = t(k)n +

∞∑m=1

αm

2n+m.

Summing all the inequalities (2.19), we obtain that the increment from t(k)n to

t is bounded (for n > ν ),

|x(t) − x(t(k)n )| ≤

∞∑m=1

δn+m =δn+1

1 − δ. (2.20)

The final estimate relates t + h to the dyadic points. Let ν < ∞ be therandom index just found to exist. Then, suppose h < 2−ν and find n, k suchthat 2−n ≤ h < 2−n+1 and k/2n ≤ t < (k + 1)/2n . We see that n > ν and

t(k)n ≤ t < t(k+1)

n < t + h ≤ t(k+�)n ,

where � is either 2 or 3. As above, we obtain

|x(t + h) − x(t(k+1)n )| ≤ δn+1

1 − δ+ δn. (2.21)

Summing the three estimates (2.19-2.21), we see that

|x(t + h) − x(t)| ≤ δn +δn+1

1 − δ+

δn+1

1 − δ+ δn =

21 − δ

(2−n)a ≤ 21 − δ

ha,

for 2−n ≤ h < 2−ν . For h ≥ 2−ν it is always true that

|x(t + h) − x(t)| ≤ M ≤ M

2−νha

for some random M . If we take A = max(M2ν , 2/(1 − δ)), we complete theproof by combining the last two inequalities, to obtain |x(t + h)−x(t)| ≤ Aha .

�

Example 2:5 For the Wiener process,

E(|x(t + h) − x(t)|p) = Cp|h|p/2 = Cp|h|1+(p/2−1),

so the Wiener process is Holder continuous of order a < (p/2−1)/p = 1/2−1/pfor every p > 0. This means that it is Holder continuous of all orders a < 1/2.


In the next section we shall investigate the existence of tangents of a prede-termined level. Then we shall need a small lemma on the size of the continuitymodulus of a continuous stochastic process.

Lemma 2.2 Let x(t) be a stochastic process with continuous sample functionsin 0 ≤ t ≤ 1, and let ωx(h) be its (random) continuity modulus, defined by(2.17). Then, to every ε > 0 there is a (deterministic) function ωε(h) suchthat ωε(h) ↓ 0 as h ↓ 0, and

P (ωx(h) < ωε(h) for 0 < h ≤ 1) > 1 − ε.

Proof: The sample continuity of x(t) says that the continuity modulus tendsto 0 for h → 0,

limh→0

P (ωx(h) < c) = 1

for every fixed c > 0. Take a sequence c1 > c2 . . . > cn ↓ 0. For a given ε > 0we can find a decreasing sequence hn ↓ 0 such that

P (ωx(hn) < cn) > 1 − ε/2n+1.

Since ωx(h) is non-increasing as h decreases, then also

P (ωx(h) < cn for 0 < h ≤ hn ) > 1 − ε/2n+1,

for n = 1, 2, . . . . Summing the exceptions, we get that

P (ωx(h) < cn for 0 < h ≤ hn and n = 1, 2, . . . ) > 1 −∞∑

n=1

ε/2n+1 = 1 − ε/2.

(2.22)Now we can define the deterministic function ωε(h) from the sequences cn

and hn . Take

ωε(h) =

{c0 for h1 < h ≤ 1

cn for hn+1 < h ≤ hn, n = 1, 2, . . . .

If we take c0 large enough to make

P (ωx(h) < c0 for h1 < h ≤ 1) > 1 − ε/2,

and combine with (2.22) we get the desired estimate. �

2.3.2.3 Tangencies

We start with a theorem due to E.V. Bulinskaya on the non-existence of tan-gents of a pre-specified level.


Theorem 2:10 Suppose the density ft(x) of x(t) is bounded for 0 ≤ t ≤ 1,

ft(x) ≤ c0 < ∞,

and that x(t) has, with probability one, continuously differentiable sample paths.Then,

a) for any level u, the probability is zero that there exists a t ∈ [0, 1] such thatsimultaneously x(t) = u and x′(t) = 0, i.e. there exists no points where x(t)has a tangent on the level u in [0, 1],

b) there are only finitely many t ∈ [0, 1] for which x(t) = u.

Proof: a) By assumption, x(t) has continuously differentiable sample paths.We identify the location of those t-values for which x′(t) = 0 and x(t) is closeto u . For that sake, take an integer n and a constant h > 0, let Hτ be theevent

Hτ = {x′(τ) = 0} ∩ {|x(τ) − u| ≤ h},and define, for k = 1, 2, . . . , n ,

Ah = {Ht occurs for at least one t ∈ 0, 1]},Ah(k, n) = {Hτ occurs for at least one τ ∈ [k−1

n , kn

]},Ah = ∪n

k=1Ah(k, n).

Now take a sample function that satisfies the conditions for Ah(k, n) andlet ωx′ be the continuity modulus of its derivative. For such a sample function,

x(k/n) = x(τ) + (k/n − τ)x′(τ + θ(k/n − τ)),

for some θ, 0 ≤ θ ≤ 1, and hence, on Ah(k, n)

|x(k/n) − u| ≤ h + n−1ωx′(n−1). (2.23)

We now use Lemma 2.2 to bound ωx′ . If ω(t) ↓ 0 as t ↓ 0, let Bω denote thesample functions for which ωx′(t) ≤ ω(t) for all t in [0, 1]. By the lemma, givenε > 0, there exists at least one function ωε(t) ↓ 0 such that P (Bωε) > 1 − ε/2.For outcomes satisfying (2.23) we use the bound ωε , and obtain

P (Ah) ≤n∑

k=1

P (Ah(k, n) ∩ Bωε) + (1 − P (Bωε))

≤n∑

k=1

P (|x(k/n) − u| ≤ h + n−1ωε(n−1)) + ε/2

≤ 2nc0(h + n−1ωε(n−1)) + ε/2,

where c0 is the bounding constant for the density ft(x).

Section 2.4 Quadratic mean properties a second time 45

Since ωε(t) → 0 as t → 0, we can select first an n0 and then an h0 to makeP (Ah0) ≤ ε . But if there exists a time point t for which x′(t) = 0 and x(t) = usimultaneously, then certainly Ah has occurred for any h > 0 and the eventof interest has probability less than ε , which was arbitrary. The probability ofsimultaneous occurrence is therefore 0 as stated.

b) To prove there are only a finite number of u-values in [0, 1], assume, on thecontrary, that there is an infinite sequence of points ti ∈ [0, 1] with x(ti) = u .There is then at least one limit point t0 ∈ [0, 1] of {ti} for which, by continuity,x(t0) = u . Since the derivative of x(t) is assumed continuous, we must alsohave x′(t0) = 0, and we have found a point where simultaneously x(t0) = u ,x′(t0) = 0. By (a), that event has probability 0. �

2.4 Quadratic mean properties a second time

Continuous or differentiable sample paths are what we expect to encounter inpractice when we observe a stochastic process. To prove that a mathematicalmodel for a random phenomenon has continuous or differentiable sample pathsis a quite different matter. Much more simple is to base the stochastic analy-sis on correlation properties, which could be checked against data, at least inprinciple. Such second order properties are studied in quite some detail in ele-mentary courses in stochastic processes, and we give here only some refinementsand extra comments in addition to those in Section 2.1. We assume throughoutin this section, as in most of the chapter, that the process x(t) has mean zero.

We first remind about the definition of convergence in quadratic mean of asequence of random variables {xn} with E(x2

n) < ∞ to a random variable x :

xnq.m.→ x if and only if E((xn − x)2) → 0,

as n → ∞ ; see Appendix B. We shall use the Loeve criterion (B.8) for quadraticmean convergence: the sequence xn converges in quadratic mean if and onlyif

E(xmxn) has a finite limit c, (2.24)

when m and n tend to infinity independently of each other.

2.4.1 Quadratic mean continuity

A stochastic process x(t) is continuous in quadratic mean (or L2 -continuous)at t if

x(t + h)q.m.→ x(t)

when h → 0, i.e. if E(|x(t + h) − x(t)|2) → 0. We formulate the condition forquadratic mean continuity in terms of the covariance function

r(s, t) = Cov(x(s), x(t)) = E(x(s) · x(t)),

for a, not necessarily stationary, process.


Theorem 2:11 A stochastic process x(t) with mean zero is continuous in qua-dratic mean at t0 if and only if the covariance function r(s, t) is continuous onthe diagonal point s = t = t0 .

Proof: If r(s, t) is continuous at s = t = t0 , then

E(|x(t0 + h) − x(t0)|2) = E(|x(t0 + h)|2) + E(|x(t0)|2) − 2E(x(t0 + h) · x(t0))

= r(t0 + h, t0 + h) − 2r(t0 + h, t0) + r(t0, t0) → 0

as h → 0, which shows the ”if” part.For the ”only if” part, expand

r(t0 + h, t0 + k) − r(t0, t0)

= E((x(t0 + h) − x(t0)) · (x(t0 + k) − x(t0)))

+ E((x(t0 + h) − x(t0)) · x(t0))

+ E(x(t0) · (x(t0 + k) − x(t0))) = e1 + e2 + e3, say.

Here

e1 ≤√

E(|x(t0 + h) − x(t0)|2) · E(|x(t0 + k) − x(t0)|2) → 0,

e2 ≤√

E(|x(t0 + h) − x(t0)|2) · E(|x(t0)|2) → 0,

e3 ≤√

E(|x(t0)|2) · E(|x(t0 + k) − x(t0)|2) → 0,

so r(t0 + h, t0 + k) → r(t0, t0) as h, k → 0. �

2.4.2 Quadratic mean differentiability

A stochastic process x(t) is called differentiable in quadratic mean at t if thereexists a random variable, naturally denoted x′(t), such that

x(t + h) − x(t)h

q.m.→ x′(t),

as h → 0, i.e. if E((x(t+h)−x(t)h − x′(t))2) → 0. If a process is differentiable

both in quadratic mean and in sample function meaning, with probability one,then the two derivatives are equal with probability one. We shall now actuallyprove the condition for quadratic mean differentiability of a stationary process,which was stated in Section 2.1.

Theorem 2:12 A stationary process x(t) is quadratic mean differentiable ifand only if its covariance function r(t) is twice continuously differentiable in aneighborhood of t = 0. The derivative process x′(t) has covariance function

rx′(t) = C(x′(s + t), x′(s)) = −r′′(t).

Section 2.4 Quadratic mean properties a second time 47

Proof: For the ”if” part we use the Loeve criterion, and show that, if h, k → 0independently of each other, then

E

(x(t + h) − x(t)

h· x(t + k) − x(t)

k

)=

1hk

(r(h − k) − r(h) − r(−k) + r(0))

(2.25)has a finite limit c . Define

f(h, k) = r(h) − r(h − k),

f ′1(h, k) =

∂

∂hf(h, k) = r′(h) − r′(h − k),

f ′′12(h, k) =

∂2

∂h∂kf(h, k) = r′′(h − k).

By applying the mean value theorem we see that there exist θ1, θ2 ∈ (0, 1) suchthat (2.25) is equal to

−f(h, k) − f(0, k)hk

= −f ′1(θ1h, k)

k

= −f ′1(θ1h, 0) + kf ′′

12(θ1h, θ2k)k

= −f ′′12(θ1h, θ2k) = −r′′(θ1h − θ2k). (2.26)

Since r′′(t) by assumption is continuous, this tends to −r′′(0) as h, k → 0,which is the required limit in the Loeve criterion.

To prove the ”only if” we need a fact about Fourier integrals and the spectralrepresentation r(t) =

∫∞−∞ eiωt dF (ω) =

∫∞−∞ cos ωt dF (ω).

Lemma 2.3 a) limt→02(r(0)−r(t))

t2= ω2 =

∫∞−∞ ω2 dF (ω) ≤ ∞.

b) If ω2 < ∞ then r′′(0) = −ω2 and r′′(t) exists for all t.

c) If r′′(0) exists, finite, then ω2 < ∞ and then, by (b), r′′(t) exists for allt.

Proof of lemma: If ω2 < ∞ , (a) follows from

2(r(0) − r(t))t2

=∫ ∞

−∞ω2 1 − cos ωt

ω2t2/2dF (ω)

by dominated convergence, since 0 ≤ 1−cos ωtω2t2/2

≤ 1. If ω2 = ∞ , the resultfollows from Fatou’s lemma, since limt→0

1−cos ωtω2t2/2

= 1.To prove (b), suppose ω2 < ∞ . Then it is possible to differentiate twice

under the integral sign in r(t) =∫∞−∞ cos ωt dF (ω) to obtain that r′′(t) exists

and−r′′(t) =

∫ ∞

−∞ω2 cos(ωt) dF (ω),


and in particular −r′′(0) = ω2 .For (c), suppose r(t) has a finite second derivative at the origin, and im-

plicitly is differentiable near the origin. By the same argument as for (2.26),with h = k = t , we see that

2(r(0) − r(t))t2

= −r′′((θ1 − θ2)t) → −r′′(0)

as t → 0. Then part (a) shows that ω2 < ∞ , and the lemma is proved. �

Proof of the ”only if” part of Theorem 2:12: If x(t) is quadratic meandifferentiable, (x(t + h) − x(t))/h

q.m.→ x′(t) with E(|x′(t)|2) finite, and (seefurther Appendix B.2),

E(|x′(t)|2) = limh→0

E

{∣∣∣∣x(t + h) − x(t)h

∣∣∣∣2}

= limh→0

2(r(0) − r(h))h2

= ω2 < ∞.

Part (b) of the lemma shows that r′′(t) exists.To see that the covariance function of x′(t) is equal to −r′′(t), just take the

limit of

E

(x(s + t + h) − x(s + t)

h· x(s + k) − x(s)

k

)

=1hk

(r(t + h − k) − r(t + h) − r(t − k) + r(t))

= −f ′′12(t + θ1h, θ2k) = −r′′(t + θ1h − θ2k) → −r′′(t),

for some θ1, θ2 as h, k → 0. �

2.4.3 Higher order derivatives and their correlations

To obtain higher order derivatives one just has to examine the covariance func-tion rx′(t) = −r′′x(t) for the conditions in the existence theorem, etc.

The derivative of a (mean square) differentiable stationary process is alsostationary and has covariance function rx′(t) = −r′′x(t), from Theorem 2:12.One can easily derive the cross-covariance relations between the derivative andthe process (strictly, we need (B.7) from Appendix B).

Theorem 2:13 The cross-covariance between derivatives x(j)(s) and x(k)(t)of a stationary process {x(t), t ∈ R} is

Cov(x(j)(s), x(k)(t)) = (−1)(k)r(j+k)x (s − t). (2.27)

In particular

Cov(x(s), x′(t)) = −r′x(s − t). (2.28)

Section 2.5 Summary of smoothness conditions 49

In Chapter 3 we will need the covariances between the the process and itsfirst two derivatives. The covariance matrix for x(t), x′(t), x′′(t) is⎛⎝ ω0 0 −ω2

0 ω2 0−ω2 0 ω4

⎞⎠ ,

where ω2k = (−1)kr2kx (0) =

∫ω2kdF (ω) are spectral moments. Thus, the slope

at a specified point is uncorrelated both with the process value at that point andwith the curvature, while process value and curvature have negative correlation.We have, for example, V (x′′(t) | x(0), x′(0)) = ω4 − ω2

2/ω0 .

2.5 Summary of smoothness conditions

The following table summarizes some crude and simple sufficient conditions fordifferent types of quadratic mean and sample function smoothness.

Condition on r(t) as t → 0 further condition propertyr(t) = r(0) − o(1) q.m. continuousr(t) = r(0) − C|t|α + o(|t|α) 1 < α ≤ 2 a.s. continuousr(t) = r(0) − C|t|α + o(|t|α) 0 < α ≤ 2 Gaussian a.s. continuousr(t) = r(0) − ω2t

2/2 + o(t2) q.m. differentiable−r′′(t) = ω2 − C|t|α + o(|t|α) 1 < α ≤ 2 a.s. differentiable−r′′(t) = ω2 − C|t|α + o(|t|α) 0 < α ≤ 2 aGaussian a.s. differentiable

2.6 Stochastic integration

In this section we shall define the two simplest types of stochastic integrals, ofthe form

J1 =∫ b

ag(t)x(t) dt,

J2 =∫ b

ag(t) dx(t),

where g(t) is a deterministic function and x(t) a stochastic process with mean0. The integrals can be defined either as quadratic mean limits of approximatingRiemann or Riemann-Stieltjes sums, and depending on the type of convergencewe require, the process x(t) has to satisfy suitable regularity conditions.

The two types of integrals are sufficient for our needs in these notes. Athird type of stochastic integrals, needed for stochastic differential equations,are those of the form

J3 =∫ b

ag(t, x(t)) dx(t),


in which g also is random and dependent on the integrator x(t). These willnot be dealt with here.

The integrals are defined as limits in quadratic mean of the approximatingsums

J1 = limn→∞

n∑k=1

g(tk)x(tk)(tk − tk−1),

J2 = limn→∞

n∑k=1

g(tk)(x(tk) − x(tk−1)),

when a = t0 < t2 < . . . < tn = b , and max |tk − tk−1| → 0 as n → ∞ , providedthe limits exist, and are independent of the subdivision {tk}. To simplify thewriting we have suppressed the double index in the sequences of tk -values; thereis one sequence for each n .

The limits exist as quadratic mean limits if the corresponding integralsof the covariance function r(s, t) = C(x(s), x(t)) = E(x(s)x(t)) are finite, asformulated in the following theorem, in which we assume E(x(t)) = 0. Since weshall mainly use the integrals with complex functions g(t) (in fact g(t) = eiωt ),we formulate the theorem for complex random functions.

Theorem 2:14 a) If r(s, t) is continuous in [a, b] × [a, b], and g(t) is suchthat the Riemann integral

Q1 =∫∫

[a,b]×[a,b]

g(s)g(t)r(s, t) ds dt < ∞,

then J1 =∫ ba g(t)x(t) dt exists as a quadratic mean limit, and E(J1) = 0 and

E(|J1|2) = Q1 .

b) If r(s, t) has bounded variation 8 in [a, b] × [a, b] and g(t) is such that theRiemann-Stieltjes integral

Q2 =∫∫

[a,b]×[a,b]

g(s)g(t) ds,tr(s, t) < ∞,

then J2 =∫ ba g(t) dx(t) exists as a quadratic mean limit, and E(J2) = 0 and

E(|J2|2) = Q2 .

Proof: The simple proof uses the Loeve criterion (B.8) for quadratic meanconvergence: take two sequences of partitions of [a, b] with points s0, s2, . . . , sm

8That f(t) is of bounded variation in [a, b] means that supP |f(tk)−f(tk−1)| is bounded,

with the sup taken over all possible partitions.

Section 2.6 Stochastic integration 51

and t0, t1, . . . , tn , respectively, and consider

E(SmSn) =m∑

k=1

n∑j=1

g(sk)g(tj)r(sk, tj)(sk − sk−1)(tj − tj−1).

If Q1 exists, then this converges to Q1 as the limits becomes infinitely fine.This proves (a). The reader should complete the proof for (b). �

Example 2:6 Take x(t) = w(t), as the Wiener process. Since rw(s, t) =σ2 min(s, t), we see that

∫ ba g(t)w(t) dt exists for all integrable g(t).

Example 2:7 If g(t) is a continuously differentiable function and x(t) = w(t),the Wiener process, then∫ b

ag(t) dw(t) = g(b)w(b) − g(a)w(a) −

∫ b

ag′(t)w(t) dt.

To prove this, consider

S2 =m∑

k=1

g(tk)(w(tk) − w(tk−1))

= g(tm)w(tm) − g(t1)w(t0) −m∑

k=2

(g(tk) − g(tk−1))w(tk−1).

Since g(t) is continuously differentiable, there is a ρk such that g(tk)−g(tk−1)=(tk − tk−1)(g′(tk) + ρk), and ρk → 0 uniformly in k = 1, . . . ,m as m → ∞ ,maxk |tk − tk−1| → 0. Thus

S2 = g(tm)w(tm) − g(t1)w(t0) −m∑

k=2

g(tk)′w(tk−1)(tk − tk−1)

+m∑

k=2

ρkw(tk−1)(tk − tk−1)

→ g(b)w(b) − g(a)w(a) −∫ b

ag′(t)w(t) dt.

The proofs of the following two theorems are left to the reader.

Theorem 2:15 If x(s) and y(t) are stochastic processes with cross-covariance

rx,y(s, t) = Cov(x(s), y(t)),

and if the conditions of Theorem 2:14 are satisfied, then

E

(∫ b

ag(s)x(s) ds ·

∫ d

ch(t)y(t) dt

)=∫ b

a

∫ d

cg(s)h(t)rx,y(s, t) ds dt,(2.29)

E

(∫ b

ag(s) dx(s) ·

∫ d

ch(t) dy(t)

)=∫ b

a

∫ d

cg(s)h(t) ds,trx,y(s, t). (2.30)


Theorem 2:16 For the Wiener-process with rx,x(s, t) = min(s, t) one has

ds,trx,x(s, t) = ds

for s = t and 0 otherwise, which gives, for a < c < b < d,

E

(∫ b

ag(s) dx(s) ·

∫ d

ch(t) dx(t)

)=∫ b

cg(t)h(t) dt.

Remark 2:2 A natural question is: are quadratic mean integrals and ordinaryintegrals equal? If a stochastic process has a continuous covariance function,and continuous sample paths, with probability one, and if g(t) is, for example,continuous, then

∫ ba g(t)x(t) dt exists both as a regular Riemann integral and

as a quadratic mean integral. Both integrals are random variables and they arelimits of the same approximating Riemann sum, the only difference being themode of convergence – with probability one, and in quadratic mean, respectively.But then the limits are equivalent, i.e. equal with probability one.

2.7 An ergodic result

An ergodic theorem deals with convergence properties of time or sample func-tion averages to ensemble averages i.e. to statistical expectation:

1T

∫ T

0f(x(t)) dt → E(f(x(0))) as T → ∞ ,

for a function of a stationary stochastic process x(t). Such theorems will bethe theme of the entire Chapter 5, but we show already here a simple suchresult based only on covariance properties. The process x(t) need not evenbe stationary, but we assume E(x(t)) = 0 and that the covariance functionr(s, t) = Cov(x(s), x(t)) exists.

Theorem 2:17 a) If r(s, t) is continuous for all s, t and

1T 2

∫ T

0

∫ T

0r(s, t) ds dt → 0, as T → ∞ , (2.31)

then1T

∫ T

0x(t) dt

q.m.→ 0.

b) If there exist constants K,α, β , such that 0 ≤ 2α < β < 1, and

|r(s, t)| ≤ Ksα + tα

1 + |s − t|β , for s, t ≥ 0 (2.32)

then1T

∫ T

0x(t) dt

a.s.→ 0.

Section 2.7 An ergodic result 53

Proof: a) This is immediate from Theorem 2:15, since

σ2T = E

((1T

∫ T

0x(t) dt

)2)

=1T 2

∫ T

0

∫ T

0r(s, t) ds dt → 0,

by assumption.

b) Before we prove the almost sure convergence, note that the condition 2α <β < 1 puts a limit on how fast E(x(t)2) = r(t, t) is allowed to increase ast → ∞ , and it limits the amount of dependence between x(s) and x(t) forlarge |s − t| . If the dependence is too strong, it may very well happen that1T

∫ T0 x(t) dt converges, but not to 0 but to a (random) constant different from

0.We show here only part of the theorem, and refer the reader to [9, p. 95]

for a completion. What we show is that there exists a subsequence of times,Tn → ∞ , such that 1

Tn

∫ Tn

0 x(t) dta.s.→ 0.

First estimate σ2T from the proof of (a):

σ2T = E

((1T

∫ T

0x(t) dt

)2)

=1T 2

∫ T

0

∫ T

0r(s, t) ds dt

≤ K

T 2

∫ T

0

∫ T

0

sα + tα

1 + |s − t|β ds dt

=K

T β−α· 1T 2−β

∫ T

0

∫ T

0

(s/T )α + (t/T )α

1 + |s − t|β ds dt

≤ K

T β−α· 2T 1−α

∫ T

0

11 + uβ

du.

Here (2/T 1−α)∫ T0 (1 + uβ)−1 du tends to a constant as T → ∞ , which implies

that σ2T ≤ K ′/T β−α for some constant K ′ .

Take the constant γ such that γ(β − α) > 1, which is possible from theproperties of α and β , and put Tn = nγ , with

∞∑n=1

σ2Tn

≤∞∑

n=1

K ′

nγ(β−α)< ∞. (2.33)

That the sum (2.33) is finite implies, by the Borel-Cantelli lemma and theChebysjev inequality, see (B.3) in Appendix B, that T−1

n

∫ Tn

0 x(t) dta.s.→ 0, and

so we have showed the convergence for a special sequence of times.To complete the proof, we have to show that

supTn≤T≤Tn+1

∣∣∣∣ 1T∫ T

0x(t) dt − 1

Tn

∫ Tn

0x(t) dt

∣∣∣∣ a.s.→ 0,

as n → ∞ ; see [9, p. 95]. �

For stationary processes, the theorem yields the following ergodic theoremabout the observed average.


Theorem 2:18 a) If x(t) is stationary and 1T

∫ T0 r(t) dt → 0 as T → ∞ then

1T

∫ T0 x(t) dt

q.m.→ 0.b) If moreover there is a constant K > 0 and a β > 0, such that |r(t)| ≤ K

|t|βas t → ∞, then 1

T

∫ T0 x(t) dt

a.s.→ 0.

Section 2.7 An ergodic result 55

Exercises

2:1. Prove the following useful inequality valid for any non-negative, integer-valued random variable N ,

E(N) − 12E(N(N − 1)) ≤ P (N > 0) ≤ E(N).

Generalize it to the following inequalities where

αi = E(N(N − 1) · . . . · (N − i + 1))

is the ith factorial moment:

1k!

2n−1∑i=0

(−1)i1i!

α(k+i) ≤ P (N = k) ≤ 1k!

2n∑i=0

(−1)i1i!

α(k+i).

2:2. Let {x(t), t ∈ R} and {y(t), t ∈ R} be equivalent processes which bothhave, with probability one, continuous sample paths. Prove that

P (x(t) = y(t), for all t ∈ R) = 1.

2:3. Find the values on the constants a and b that make a Gaussian processtwice continuously differentiable if its covariance function is

r(t) = e−|t|(1 + a|t| + bt2).

2:4. Complete the proof of Theorem 2:3 and show that, in the notation of theproof, ∑

n

g(2−n) < ∞, and∑n

2nq(2−n) < ∞.

2:5. Show that the sample paths of the Wiener process have infinite variation,a.s., by showing the stronger statement that if

Yn =2n−1∑k=0

∣∣∣∣W (k + 12n

)− W

(k

2n

)∣∣∣∣then

∑∞n=1 P (Yn < n) < ∞.

2:6. Show that a non-stationary process is continuous in quadratic mean att = t0 only if its mean value function m(t) = E(x(t)) is continuous att0 and its covariance function r(s, t) = Cov(x(s), x(t)) is continuous ats = t = t0 .

2:7. Convince yourself of the “trivial” fact that a sequence of normal variables{xn, n ∈ Z}, such that E(xn) and V (xn) have finite limits, then thesequence converges in distribution to a normal variable.


2:8. Give an example of a stationary process that violates the sufficient condi-tions in Theorem 2:10 and for which the sample functions can be tangentsof the level u = 1.

2:9. Assume that sufficient conditions on r(s, t) = E(x(s)x(t)) are satisfiedso that the integral ∫ T

0g(t)x(t) dt

exists for all T , both as a quadratic mean integral and as a sample functionintegral. Show that, if ∫ ∞

0|g(t)|

√r(t, t) dt < ∞,

then the generalized integral∫∞0 g(t)x(t) dt exists as a limit as T → ∞ ,

both in quadratic mean and with probability one.

2:10. Let (xn, yn) have a bivariate Gaussian distribution with mean 0 , vari-ance 1, and correlation coefficient ρn .

a) Show that P (xn < 0 < yn) = 12π arccos ρn.

b) Calculate the conditional density functions for

(xn + yn) | xn < 0 < yn, and (yn − xn) | xn < 0 < yn.

c) Let zn and un be distributed with the in (b) derived density func-tions and assume that ρn → 1 as n → ∞ . Take cn = 1/

√2(1 − ρn),

and show that the density functions for cn zn and cn un converge todensity functions f1 and f2 , respectively.Hint: f2(u) = u exp(−u2/2), u > 0 is the Rayleigh density.

2:11. Let {x(t), t ∈ R} be a stationary Gaussian process with mean 0, andwith a covariance function that satisfies

−r′′(t) = −r′′(0) + o(|t|a), t → 0,

for some a > 0. Define xn = x(0), yn = x(1/n), ρn = r(1/n) and usethe previous exercise to derive the asymptotic distribution of

x(1/n) − x(0)1/n

∣∣∣∣ x(0) < 0 < x(1/n)

as n → ∞ . What conclusion do you draw about the derivative at apoint with an upcrossing of the zero level? (Answer: it has a Rayleighdistribution, not a half normal distribution.)

2:12. Find an example of two dependent normal random variables U and Vsuch that C(U, V ) = 0; obviously you cannot let (U, V ) have a bivariatenormal distribution.

2:13. Prove that Theorem 2:18 follows from Theorem 2:17.

Chapter 3

Crossings

3.1 Level crossings and Rice’s formula

In applications one encounters level crossings, and in particular the distributionof the number of solutions to x(t) = u for a specified level u . This is a difficultquestion in general and very few explicit results can be derived. There existshowever, a very famous formula, found by Marc Kac and Steve O. Rice [27], forthe expected number of upcrossings of a level u . We shall here state and provethat formula. In Section 3.2 we shall use different forms of Rice’s formula toinvestigate the conditional behavior of a stationary process when it is observedin the neighborhood of a crossing of a predetermined level. Quantities like theheight and time extension of the excursion above the level will be analyzed bymeans of an explicit stochastic model, the Slepian model, used first by DavidSlepian 1962, [31], for a Gaussian process after zero crossings.

3.1.1 Level crossings

In practice, level crossing counting is often used as a means to describe thevariability and extremal behavior of a continuous stochastic process. For ex-ample, the maximum of the process in an interval is equal to the lowest levelabove which there exists no genuine level crossing, provided, of course, thatthe process starts below that level. Since it is often easier to find the sta-tistical properties of the number of level crossings than to find the maximumdistribution, crossing methods are of practical importance.

For sample functions of a continuous process {x(t), t ∈ R} we say that x(t)has an upcrossing of the level u at t0 if, for some ε > 0, x(t) ≤ u for allt ∈ (t0 − ε, t0] and x(t) ≥ u for all t ∈ [t0, t0 + ε). For any interval I = [a, b] ,write N+

I (x, u) for the number of upcrossings by x(t) in I ,

N+I = N+

I (x, u) = the number of u-upcrossings by x(t), t ∈ I .

For continuous processes which have only a finite number of u-values, theremust be intervals to the left and to the right of any upcrossing point such that

57

58 Crossings Chapter 3

x(t) is strictly less than u immediately to the left and strictly greater than uimmediately to the right of the upcrossing point. Also define

NI = NI(x, u) = the number of t ∈ I such that x(t) = u.

By the intensity of upcrossings we mean any function μ+t (u) such that∫

t∈Iμ+

t (u) dt = E(N+I (x, u)).

Similarly, we define the intensity of crossings, as μt(u) if∫t∈I

μt(u) dt = E(NI(x, u)).

For a stationary process, μ+t (u) = μ+(u) and μt(u) = μ(u) are independent

of t . In general, the intensity is the mean number of events per time unit,calculated at time t .

In reliability applications of stochastic processes one may want to calculatethe distribution of the maximum of a continuous process x(t) in an interval I =[0, T ] . The following approximation is then often useful, and also sufficientlyaccurate for short intervals,

P ( max0≤t≤T

x(t) > u) = P ({x(0) ≤ u} ∩ {N+I (x, u) ≥ 1}) + P (x(0) > u)

≤ P (N+I (x, u) ≥ 1) + P (x(0) > u)

≤ E(N+I (x, u)) + P (x(0) > u) = T · μ+(u) + P (x(0) > u).

The upcrossing intensity μ+(u) was found by Rice for Gaussian processes,results which were later given strict proofs through counting methods developedby Kac. The classical reference is [27]. We give first a general formulation andthen specialize to Gaussian processes; see [22, 30].

3.1.2 Rice’s formula for absolutely continuous processes

We present the simplest version and proof of Rice’s formula, valid for processes{x(t), t ∈ R} with absolutely continuous sample paths1 and absolutely contin-uous distribution with density fx(t)(u) = fx(0)(u), independent of t . For sucha process, the derivative x′(t) exists almost everywhere, and the conditionalexpectations

E(x′(0)+ | x(0) = u) and E(|x′(0)| | x(0) = u)

exist, (with x+ = max(0, x)).1A function x(t), t ∈ [a, b] is absolutely continuous if it is equal to the integral x(t) =R t

ay(s)ds of an integrable function y(s) . This is equivalent to the requirement that for every

ε > 0 there is a δ > 0 such that for every collection (a1, b1), (a2, b2), . . . , (an, bn) of disjoint

intervals in [a, b] withPn

1 (bk − ak) < δ one hasPn

1 |x(bk) − x(ak)| < ε . An absolutely

continuous function is always continuous and its derivative exists almost everywhere, x′(t) =

y(t) , almost everywhere.

Section 3.1 Level crossings and Rice’s formula 59

Theorem 3:1 (Rice’s formula) For any stationary process {x(t), t ∈ R} withdensity fx(0)(u), the crossings and up-crossings intensities are given by

μ(u) = E(N[0,1](x, u)) =∫ ∞

−∞|z|fx(0),x′(0)(u, z) dz

= fx(0)(u)E(|x′(0)| | x(0) = u), (3.1)

μ+(u) = E(N+[0,1](x, u)) =

∫ ∞

0zfx(0),x′(0)(u, z) dz

= fx(0)(u)E(x′(0)+ | x(0) = u). (3.2)

These expressions hold for almost any u, whenever the involved densities exist.

Before we state the short proof we shall review some facts about functionsof bounded variation, proved by Banach. To formulate the proof, write for anycontinuous function f(t), t ∈ [0, 1], and interval I = [a, b] ⊂ [0, 1],

NI(f, u) = the number of t ∈ I such that f(t) = u.

Further, define the total variation of f(t), t ∈ I as sup∑ |f(tk+1) − f(tk)|,

where the supremum is taken over all subdivisions a ≤ t0 < t1 < . . . tn ≤ b .

Lemma 3.1 (Banach) For any continuous function f(t), t ∈ I , the total vari-ation is equal to ∫ ∞

−∞NI(f, u) du.

Further, if f(t) is absolutely continuous with derivative f ′(t) then∫ ∞

−∞NI(f, u) du =

∫I|f ′(t)| dt.

Similarly, if A ⊆ R is any Borel measurable set, and 1A is its indicator func-tion, then ∫ ∞

−∞1A(u)NI(f, u) du =

∫I1A(f(t)) |f ′(t)| dt (3.3)

Proof of Rice’s formula We prove (3.1) by applying Banach’s theorem onthe stationary process {x(t), t ∈ R} with absolutely continuous, and hencea.s. differentiable, sample paths. If x(t) has a.s. absolutely continuous samplefunctions, then (3.3) holds for almost every realization, i.e.∫ ∞

−∞1A(u)NI(x, u) du =

∫I1A(x(t)) |x′(t)| dt


Taking expectations and using Fubini’s theorem to change the order of integra-tion and expectation, we get

|I|∫

u∈Aμ(u) du =

∫ ∞

−∞1A(u)E(NI (x, u)) du

= E

(∫I1A(x(t)) |x′(t)| dt

)= |I| E (1A(x(0)) |x′(0)|)

= |I|∫

u∈Afx(0)(u)E(|x′(0)| | x(0) = u) du;

here we also used that {x(t), t ∈ R} is stationary.Since A is an arbitrary measurable set, we get the desired result,

μ(u) = fx(0)(u)E(|x′(0)| | x(0) = u)

for almost all u . The proof of (3.2) is similar. �

3.1.3 Alternative proof of Rice’s formula

The elegant and general proof of Rice’s formula just given does not give anintuitive argument for the presence of the factor z in the integral (3.2). Thefollowing more pedestrian proof is closer to an explanation.

Theorem 3:2 For a stationary process {x(t), t ∈ R} with almost surely con-tinuous sample paths, suppose x(0) and ζn = 2n(x(1/2n) − x(0)) have a jointdensity gn(u, z) which is continuous in u for all z and all sufficiently large n .Also suppose gn(u, z) → p(u, z) uniformly in u for fixed z as n → ∞ and thatgn(u, z) ≤ h(z) with

∫∞0 zh(z) dz < ∞.2 Then

μ+(u) = E(N[0,1](x, u)) =∫ ∞

0zp(u, z) dz. (3.4)

Proof: We first device a counting technique for the upcrossings by dividingthe interval [0, 1] into dyadic subintervals, [(k − 1)/2n, k/2n] , k = 1, . . . , 2n ,and checking the values at the endpoints. Let Nn denote the number of pointsk/2n such that x((k − 1)/2n) < u < x(k/2n). Since x(t) has continuoussample paths (a.s.), there is at least one u-upcrossing in every interval suchthat x((k − 1)/2n) < u < x(k/2n), and hence

Nn ≤ N[0,1](x, u).

Furthermore, since x(t) has a continuous distribution, we may assume thatx(k/2n) = u for all n and k = 1, . . . , 2n . When n increases to n + 1 the

2It is of course tempting to think of p(u, z) as the joint density of x(0), x′(0) , but no

argument for this is involved in the proof.

Section 3.1 Level crossings and Rice’s formula 61

number of subintervals doubles, each interval contributing one upcrossing toNn will contribute at least one upcrossing to Nn+1 in at least one of the twonew subintervals. Hence Nn is increasing and it is easy to see that, regardless ofif N[0,1](x, u) = ∞ or N[0,1](x, u) < ∞ , Nn ↑ N[0,1](x, u) as n → ∞ . Monotoneconvergence implies that limn→∞ E(Nn) = E(N[0,1](x, u)).

Now, defineJn(u) = 2n P (x(0) < u < x(1/2n)),

so, by stationarity,

E(N[0,1](x, u)) = limn→∞E(Nn) = lim

n→∞ Jn(u).

By writing the event {x(0) < u < x(1/2n)} as

{x(0) < u < x(0) + ζn/2n} = {x(0) < u} ∩ {ζn > 2n(u − x(0))},we have

Jn(u) = 2nP (x(0) < u, ζn > 2n(u − x(0)))

= 2n

∫ u

x=−∞

∫ ∞

y=2n(u−x)gn(x, y) dy dx.

By a change of variables, x = u− zv/2n, y = z, (v = 2n(u− x)/y), this is equalto ∫ ∞

z=0z

∫ 1

v=0gn(u − zv/2n, z) dv dz,

where gn(u−zv/2n, z) tends pointwise to p(u, z) as n → ∞ , by the assumptionsof the theorem. Since gn is dominated by the integrable h(z) it follows thatthe double integral tends to

∫∞0 zp(u, z) dz as n → ∞ . �

Remark 3:1 The proof of Rice’s formula as illustrated in Figure 3.1 shows therelation to the Kac and Slepian horizontal window conditioning: one counts thenumber of times the process passes through a small horizontal window; we willdwell upon this concept in Section 3.2.1, page 65.

Remark 3:2 Rice’s formula can be extended to non-stationary processes, inwhich case the crossings intensity is time dependent. Then the density functionfx(t),x′(t)(u, z) in the integral depends on t:

E(N[a,b](x, u)) =∫ b

a

∫ ∞

−∞|z|fx(t),x′(t)(u, z) dz dt.

The integral

μt(u) =∫ ∞

−∞|z|fx(t),x′(t)(u, z) dz

is the local crossings intensity at time t.


x(t)

u

�t t + Δt

x′(t) = zx(t + Δt)

≈ x(t) + zΔt

�

�

x(t)

x′(t)

u

zzΔt︷︸︸︷

Figure 3.1: Upcrossing occurs between t and t + Δt when x(t) is betweenu − x′(t)Δt and u. The probability is the integral of the joint densityfx(t),x′(t)(u, z) over the dashed area.

3.1.4 Rice’s formula for differentiable Gaussian processes

For a Gaussian stationary process, Rice’s formula becomes particularly simple.We know from elementary courses that for a stationary differentiable Gaussianprocess {x(t), t ∈ R}, the process value, x(t), and the derivative, x′(t) atthe same time point, are independent3 and Gaussian, with mean E(x(t)) =m , E(x′(t)) = 0, respectively, and variances given by the spectral moments,V (x(t)) = r(0) = ω0 , V (x′(t)) = −r′′(0) = ω2 . Then

fx(0),x′(0)(u, z) =1

2π√

ω0ω2e−(u−m)2/2ω0 e−z2/2ω2 . (3.5)

Simple integration of (3.1) and (3.2) gives that for Gaussian stationary pro-cesses,

μ(u) = E(N[0,1](x, u)) =1π

√ω2

ω0e−(u−m)2/2ω0 ,

μ+(u) = E(N+[0,1](x, u)) =

12π

√ω2

ω0e−(u−m)2/2ω0 ,

which are the original forms of Rice’s formula. These formulas hold regardlessof whether ω2 is finite or not, so ω2 = ∞ if and only if the expected numberof crossings in any interval is infinite. This does not mean, however, that therenecessarily are infinitely many crossings, but if there is crossings, then theremay be infinitely many in the neighborhood.

3If you have not seen this before, prove it by showing that x(t) and x′(t) = limh→0(x(t +

h) − x(t))/h are uncorrelated.

Section 3.2 Prediction from a random crossing time 63

Remark 3:3 The expected number of mean-level upcrossings per time unit ina stationary Gaussian process is

μ+(m) =12π

√ω2/ω0 =

12π

√∫ω2f(ω) dω∫f(ω) dω

,

and it is called the (root) mean square frequency of the process. The inverseis equal to the long run average time distance between successive mean levelupcrossings, 1/μ+(m) = 2π

√ω0/ω2, also called the mean period.

A local extreme, minimum or maximum, for a differentiable stochastic pro-cess {x(t), t ∈ R} corresponds to, respectively, an upcrossing and a downcross-ing of the zero level by the process derivative {x′(t), t ∈ R}. Rice’s formulaapplied to x′(t) therefore gives the expected number of local extremes. Fora Gaussian process the formulas involve the fourth spectral moment ω4 =V (x′′(t)) =

∫ω4f(ω) dω . The general and Gaussian expressions are, respec-

tively,

μmin =∫ ∞

0zfx′,x′′(0, z) dz =

12π

√ω4

ω2,

μmax =∫ 0

−∞|z|fx′,x′′(0, z) dz =

12π

√ω4

ω2.

If we combine this with Remark 3:3 we get the average number of localmaxima per mean level upcrossings,

1/α =12π

√ω4/ω2

12π

√ω2/ω0

=√

ω0ω4

ω22

.

The parameter α is always bounded by 0 < α < 1, and it is used as an irregu-larity measure: an α near 1 indicates a very regular process with approximatelyone local maximum and minimum between mean level upcrossings. If α is nearzero one can expect many local extremes between the upcrossings.

Seen in relation to the spectrum, the parameter α can be seen as a measureof spectral width. A spectrum with α near 1 is narrow banded, i.e. the spectraldensity is concentrated to a small frequency band around a dominating centerfrequency. A narrow banded process has very regular sample functions, withslowly varying random amplitude; see Section 4.4.5 that deals with the envelopeprocess.

3.2 Prediction from a random crossing time

In the prediction theory briefly presented in Section 1.5.2 our concern was topredict, as accurately as possible, the unknown future value x(t0 + τ) of aprocess {x(t), t ∈ R}, from what we know by observations available at time


t0 . An implicit assumption has been that there is no stochastic dependencebetween the choice of t0 and x(t0 + τ).

For example, given that we have a complete record of all old values, thebest predictor, in the sense of smallest mean square error, is the conditionalexpectation x(t0 + τ) = E(x(t0 + τ) | x(s), s ≤ t0). If the process is Gaussianthe predictor is linear in the observed value; for example, when we know onlythe value of x(t0), the optimal solution that has smallest mean square error,taken over all outcomes of x(t0), is

x(t0 + τ) = E(x(t0 + τ) | x(t0)) =r(τ)r(0)

x(t0). (3.6)

3.2.1 Prediction from upcrossings

There are situations where the time point from which we want to predict thefuture process is not a deterministic time point but a random time, determinedby the process itself. An example is an alert predictor of the water level in aflood protection system: when the water level reaches a certain warning levelspecial actions are taken, such as special surveillance, more detailed prediction,etc. Another example is a surveillance system in a medical intense care unit.

We assume throughout this section that the process {x(t), t ∈ R} has con-tinuous sample functions. Prediction from an upcrossing time point t0 withspecified level u , shall be based on the conditional distributions, given thatan upcrossing of the level u has occurred at t0 . So, we need to find thoseconditional distributions,4 in particular

xu(t0 + τ) = E(x(t0 + τ) | x(t0) = u, upcrossing). (3.7)

3.2.1.1 A word about conditioning

A conditional expectation was defined in an elementary way in Section 1.2.4as ϕ(v) = E(x | y = v) =

∫u ufx,y(u, v)/fy(v) du . The main property of the

conditional expectation is relation (1.5), E(x) =∫y ϕ(y)f(y) dy and its more

refined version,

E(x | y ∈ A) =

∫y∈A ϕ(y)f(y) dy∫

y∈A f(y) dy,

for every Borel set A . For example, when the density f(y) is continuous, withA = [u − ε, u] ,

limε→0

E(x | u − ε ≤ y ≤ u) = E(x | y = u) = ϕ(u). (3.8)

The meaning of this is that E(x | y = u) is the expected value of x forthose outcomes where the value of y is close to u ; for more details, see [5, Ch.4].

4In Markov process theory one has introduced the strong Markov property to handle con-

ditioning from a time point that depends on the process.


If {x(t), t ∈ R} is a stationary process, take y = x(t0) and x = x(t0 + τ).Then

P (x(t0 + τ) ≤ v | x(t0) = u)= lim

ε→0P (X(t0 + τ) ≤ v | u − ε ≤ x(t0) ≤ u), (3.9)

E(x(t0 + τ) | x(t0) = u)= lim

ε→0E(X(t0 + τ) | u − ε ≤ x(t0) ≤ u), (3.10)

calculated at time t0 , and (3.10) gives the best predictor of the future valuex(t0 + τ) in the sense that it minimizes the squared error taken as an averageover all the possible outcomes of x(t0). By ”average” we then mean expectedvalue as well as an empirical average over many realizations observed at thefixed predetermined time t0 , chosen independently of the process.

3.2.1.2 Empirical limits and horizontal window conditioning

We now consider prediction from the times of upcrossings of a fixed level u . Thisdiffers from the previous type of conditioning in that the last observed valueof the process is known, and the time points are variable. The interpretationof ”average future value” is then not clear at this moment and has to be madeprecise. Obviously, what we should aim at is a prediction method that workswell on the average, in the long run, for all the u-level upcrossings we observein the process. Call these upcrossings time points tk > 0. To this end, definethe following distribution.

Definition 3:1 For a stationary process {x(t), t ∈ R}, the (long run, ergodic)conditional distribution of x(t0 + ·) after u-upcrossing at t0 is defined as

P u(A) = ′′P (x(t0 + ·) ∈ A | x(t0) = u, upcrossing)′′

= limT→∞

#{tk; 0 < tk < T and x(tk + ·) ∈ A)}#{tk; 0 < tk < T} . (3.11)

Thus, in P u(A) is counted all those u-upcrossings tk for which the process,taken with tk as new origin, satisfies the condition given by A.

The definition makes sense only if the limit exists, but, as we shall provein Chapter 5, the limit exists for every stationary process {x(t), t ∈ R}, but itmay be random. If the process is ergodic the limit is non-random and it definesa proper distribution on C for a (non-stationary) stochastic process.

The empirical long run distribution is related to Kac and Slepian’s hori-zontal window conditioning, [19]. For a stationary process {x(t), t ∈ R}, the(horizontal window) conditional distribution of x(t0 + ·) after u-upcrossing at


0 20 40 60 80 100

−5

0

5

0 2 4 6 8 10

−5

0

5 ... translated to the origin.

Seven excursions above level u=1.5 ...

Figure 3.2: Excursions above u = 1.5 contribute to the distribution P 1.5(·).

t0 is

P hw(A) = ′′P (x(t0 + ·) ∈ A | x(t0) = u, upcrossing in h.w.sense)′′ (3.12)= lim

ε→0P (x(t0 + ·) ∈ A | x(s) = u, upcrossing, some s ∈ [t0 − ε, t0]).

It is easy to show that the two distributions just defined are identical, P u(A) =P hw(A), for every (ergodic) stationary process such that there are only a finitenumber of u-upcrossings in any finite interval; see [9].

With point process terminology one can look at the upcrossings as a se-quence of points in a staionary point process and the shape of the processaround the upcrossing as a mark, attached to the point. The conditional distri-bution of the shape is then treated as a Palm distribution in the marked pointprocess; see [9].

The term, horizontal window condition, is natural, since the process hasto pass a horizontal window at level u somewhere near t0 . In analogy, thecondition in (3.9) is called vertical window condition, since the process has topass a vertical window near v exactly at time t0 .

The distribution P u can be found via its finite-dimensional distributionfunctions. Take s = (s1, . . . , sn), v = (v1, . . . , vn), write

x(s) ≤ v for x(sj) ≤ vj, j = 1, . . . , n,

and define

N[0,T ](x, u; s,v) = #{tk; 0 ≤ tk ≤ T, and x(tk + s) ≤ v},as the number of u-upcrossings in [0, T ] which are such that the process, ateach of the times sj after the upcrossing is less than vj . Thus N[0,T ](x,u;s,v)

N[0,T ](x,u) ,v ∈


Rn is the empirical distribution function for the process at times sj after u-upcrossings.

Theorem 3:3 If {x(t), t ∈ R} is ergodic, with A = {y ∈ C; y(s) ≤ v},

P u(A) =E(N[0,1](x, u; s,v)

)E(N[0,1](x, u)

) (3.13)

=

∫∞z=0 zfx(0),x′(0)(u, z) P (x(s) ≤ v | x(0) = u, x′(0) = z) dz∫∞

z=0 zfx(0),x′(0)(u, z) dz. (3.14)

Proof: We need a result from Chapter 5, namely that for an ergodic process(with probability one)

N[0,T ](x, u)T

→ E(N[0,1](x, u)

)N[0,T ](x, u; s,v)

T→ E

(N[0,1](x, u; s,v)

)as T → ∞ . This gives (3.13).

The proof of (3.14) is analogous to that of Theorem 3:2 and can be foundin [22, Ch. 10] under the (unnecessarily strict) condition that {x(t), t ∈ R} hascontinuously differentiable sample paths. �

Noting that

pu(z) =zfx(0),x′(0)(u, z)∫∞

ζ=0 ζfx(0),x′(0)(u, ζ) dζ, z ≥ 0, (3.15)

is a density function in z we can introduce a random variable ζ with thisdensity. Then (3.14) can be formulated as

P u(A) = Eζ

(P (x(s) ≤ v | x(0) = u, x′(0) = ζ)

). (3.16)

The interpretation is that the distribution of the process values at times safter u-upcrossings is a mixture over the random slope ζ at the upcrossingpoint, of the ordinary conditional distributions of x(s) given that x(0) = uand x′(0) = ζ , when ζ has density pu(z).

3.2.2 The Slepian model

Theorem 3:3 presents the conditional distribution of a stationary process in theneighborhood of upcrossings of the fixed level u . The definition in the formof integrals in (3.14) is not very informative as it is. However, for Gaussianprocesses one can construct an explicit and simple process that has exactly thisdistribution, and lends itself to easy simulation, numerical calculations, andasymptotic expansions.


Definition 3:2 Let {x(t), t ∈ R} be a stationary and ergodic stochastic processand fix a level u, such that x(t) has a finite number of u-upcrossings in anyfinite interval. A Slepian model process for {x(t), t ∈ R} after u-upcrossingsis any stochastic process {ξu(t), t ∈ R} that has its distributions given by P u

in (3.14). In particular, its finite-dimensional distribution are given by

P (ξu(s) ≤ v) =∫ ∞

0pu(z)P

(x(s) ≤ v | x(0) = u, x′(0) = z

)dz.

The Slepian model is a stochastic model for individual excursions after alevel upcrossing, and its distribution is equal to the distribution of the samplefunctions illustrated in the lower diagram in Figure 3.2. More complex Slepianmodels can be formulated for other crossing problems, for example the processbehavior after a local maximum or minimum, since these are defined as down-crossing or upcrossing of the zero level by the derivative process {x′(t), t ∈ R}.

Every Slepian model has two elements, which depend on the type of cross-ing problem: the long run distribution of the gradient at the instances of thecrossings, and the conditional distribution of the process given its value andthe value of the gradient.

Typical problems that can be analyzed by a Slepian process are

• Prediction after crossing: What is the best predictor of the processa time τ after one has observed a level u upcrossing?

• Excursion shape: How high above level u will an excursion extend,and how long does it take before the process returns below the level?

• Crest shape: What is the shape of the process near its local maxima?

We can immediately solve the first problem: the best predictor xu(t0 + τ)after u-upcrossings is the expectation of the Slepian model:

xu(t0 + τ) = E (ξu(τ)) , (3.17)

in the sense that the average of (x(tk + τ) − a)2 , when tk runs over all u-upcrossings, takes it minimum value when a = E (ξu(τ)).

3.2.2.1 An explicit Slepian model for crossings in a Gaussian process

The conditional distribution of {x(t), t ∈ R} after u-upcrossing is particularlysimple in the Gaussian case, and the Slepian model can be expressed in a veryexplicit way.

We have to find the density pu(z) and the conditional distributions of x(s)in (3.16). For Gaussian processes, formula (3.5) holds, and with mean μ = 0,it says

fx(0),x′(0)(u, z) =1

2π√

ω0ω2e−u2/2ω0 e−z2/2ω2 .


Canceling 1√2πω0

e−u2/2ω0 in (3.15) we get

pu(z) =z

ω2e−z2/ω2 , z ≥ 0, (3.18)

i.e. the slope at upcrossings has a Rayleigh distribution with parameter ω2 ,regardless of the level u .5

Next, we need the conditional distribution of x(s) given x(0) = u, x′(0) = zand average over ζ = z with density pu(z). Since a conditional distribution ina multivariate normal distribution is still a normal distribution, we need onlyto find the conditional mean E(x(s) | x(0) = u, x′(0) = z) and the conditionalcovariances Cov(x(s1), x(s2) | x(0) = u, x′(0) = z), and these were given inSection 1.5.1, equations (1.9, 1.11).

Take ξ = (x(s1), x(s2)), η = (x(0), x′(0)), and calculate the joint covari-ance matrix of (ξ, η) from the covariance function r(t) for {x(t), t ∈ R}. ByTheorem 2:13 it is

Σ =

⎛⎜⎜⎝r(0) r(s2 − s1) r(s1) −r′(s1)

r(s1 − s2) r(0) r(s2) −r′(s2)r(s1) r(s2) r(0) 0

−r′(s1) −r′(s2) 0 −r′′(0)

⎞⎟⎟⎠ =(

Σξξ Σξη

Σηξ Σηη

).

Use the notation ω0 = r(0), ω2 = −r′′(0) and remember that mξ = mη =0 . Then we get the conditional expectation and covariance matrix given η =y = (u, z) as

E(ξ | η = y) = Σξη Σ−1ηη y′ =

(ur(s1)/ω0 − zr′(s1)/ω2

ur(s2)/ω0 − zr′(s2)/ω2

),

Σξξ|η = Σξξ − ΣξηΣ−1ηηΣηξ =

(rκ(s1, s1) rκ(s1, s2)

rκ(s2, s1) rκ(s2, s2)

),

say. Here

rκ(s1, s2) = r(s2 − s1) − r(s1)r(s2)/ω0 − r′(s1)r′(s2)/ω2 (3.19)

is the covariance function for a non-stationary process.Note the structure of rκ(s1, s2): The first term is the unconditional co-

variance function of the process and the two other terms are the changes incovariance that is obtained by the knowledge of the uncorrelated x(0) andx′(0). When s1 or s2 tend to infinity, the reduction terms go to 0 and theinfluence of the conditioning vanishes.

We have now found the explicit structure of the Slepian model in the Gaus-sian case; we formulate it as a theorem.

5This is the solution to Exercise 2:11.


Theorem 3:4 a) The Slepian model for a Gaussian process {x(t), t ∈ R} afteru-upcrossings has the form

ξu(t) =ur(t)ω0

− ζr′(t)ω2

+ κ(t), (3.20)

where ζ has the Rayleigh density pζ(z) = (z/ω2)e−z2/(2ω2) , and {κ(t), t ∈ R}is a non-stationary Gaussian process, independent of ζ , with mean zero andcovariance function rκ(s1, s2) given by (3.19).b) In particular, the best prediction of x(tk + τ) taken over all u-upcrossingstk , is obtained by taking E(ζ) =

√πω2/2 and κ(τ) = 0 in (3.20) to get

xu(t0 + τ) =ur(τ)

ω0− E(ζ)r′(τ)

ω2=

u

ω0r(τ) −

√π

2ω2r′(τ). (3.21)

We have now found the correct way of taking the apparent positive slope ata u-upcrossing into account in predicting the near future. Note that the simpleformula (3.6),

x(t0 + τ) = E(x(t0 + τ) | x(t0) = u) =u

ω0r(τ),

in the Gaussian case, lacks the slope term. Of course, the slope at an upcrossingis always positive, but it is perhaps intuitively obvious that the observed slopesat the upcrossings are “more positive” than that. The slope = derivative of astationary Gaussian process is normal with mean zero, but we do not expectslopes at fixed level upcrossings to have a half-normal distribution with a mode(= most likely values) at 0. The r′ -term in (3.21) tells us how we shall takethis “sample bias” into account.

The prediction of the slope is the expectation of the Rayleigh variable ζ . Ifthe slope at the u-upcrossing is observed and used in the prediction, then thedifference between the two approaches disappears; see [23].

Example 3:1 To illustrate the efficiency of the Slepian model, we shall analysethe shape of an excursion above a very high level u in a Gaussian process, andthen expand the Slepian model ξu(t) in a Taylor series as u → ∞ . It will turnout that the length and height of the excursion will both be of the order u−1 ,so we normalize the scales of ξu(t) by that factor. Using

r(t/u) = ω0 − ω2t2

2u2(1 + o(1)), r′(t/u) = −ω2

t

u(1 + o(1)),

as t/u → 0, and that κ(t/u) = o(t/u), and omitting all o-terms, we get

u {ξu(t/u) − u} = u

{u

(r(t/u)

ω0− 1)− ζ

r′(t/u)ω2

+ κ(t/u)}

≈ ζt − ω2t2

2ω0.

Thus, the excursion above a high leve u takes the form of a parabola withheight ζ2ω0

2uω2and length 2ω0ζ

uω2. It is easy to check that the normalized height of

the excursion above u has an exponential distribution.


Remark 3:4 One should be aware that a Slepian model as it is described hererepresents the “marginal distribution” of the individual excursions above thedefined level u. Of course one would like to use it also to analyse the dependencethere may exist between successive excursions in the original process x(t), andthis is in fact possible. For example, suppose we want to find how often ithappens that two successive excursions both exceed a critical limit T0 in length.Then, writing τ1 = inf{τ > 0; ξu(τ) = u, upcrossing} for the first u-upcrosingin ξu(t) strictly on the positive side, one can calculate

P (ξu(s) > u, for 0 < s < T0 , and ξu(τ1 + s) > u, for 0 < s < T0 ).

3.2.2.2 A Slepian model around local maxima in a Gaussian process

A local maximum for x(t) is a zero-downcrossing point for the derivative x′(t),and the second derivative x′′(t) at these local maxima has a negative Rayleighdistribution. A Slepian model for the derivative after local maxima, thereforehas the same structure as the level crossing model, with rx(t) replaced byrx′(t) = −r′′x(t). The time difference between maximum and minimum cantherefore be calculated as previously.

If we want the distribution of the height difference between the maximumand the following minimum, we need a more elaborate Slepian model, sincenow also the height of the maximum is random, not only the curvature. Thereader is encouraged to prove the following theorem, copying Theorem 3:3 withanalogous notation, now with t′k for the times of local maxima

Theorem 3:5 If {x(t), t ∈ R} is twice differentiable and ergodic the long runempirical distribution of x(t′k + s) around local maxima is equal to

Pmax1 (A)

=

∫ 0z=−∞ |z|fx′(0),x′′(0)(0, z)P (x(s) ≤ v | 0, z) dz∫ 0

−∞ |z|fx′(0),x′′(0)(0, z) dz,

where A = {y ∈ C; y(s) ≤ v} and

P (x(s) ≤ v | 0, z) = P (x(s) ≤ v | x′(0) = 0, x′′(0) = z).

An explicit representation of the model process for a Gaussian process is the

ξmax1 (t) = ζmax

1

r′′(t)ω4

+ Δ1(t), (3.22)

where ζmax1 has a negative Rayleigh distribution with density

pmax1 (z) =

|z|ω4

e−z2/(2ω4), z < 0, (3.23)


and the non-stationary Gaussian process Δ1(t) is independent of ζmax1 , has

mean 0 and the covariance function,

rΔ1(s1, s2) = r(s1 − s2) − r′(s1)r′(s2)ω2

− r′′(s1)r′′(s2)ω4

. (3.24)

Since Δ1(0) is normal with mean 0 and variance rΔ1(0, 0) = ω0 − ω22/ω4 ,

we see in particular that the distribution of the height of a local maximum isthe same as the distribution of

ξmax(0) = −ζmax1

ω2

ω4+ Δ1(0)

=√

ω0

{√1 − ε2 · Rayleigh + ε · Normal

}, (3.25)

with standard Rayleigh and normal variables, illustrating the relevance of thespectral width parameter α =

√1 − ε2 =

√ω2

2/(ω0ω4) .

Theorem 3:5 is the basis for numerical calculations of wave characteristicdistributions like height and time difference between local maxima and minima,as they can be made by the routines in the Matlab package WAFO, [34].The model (3.22) contains an explicit function r′′(t)/ω4 with a simple randomfactor, representing the Rayleigh distributed curvature at the maximum, plusa continuous parameter Gaussian process. The numerical procedures work bysuccessively replacing the continuous parameter process by explicit functionsmultiplied by random factors. We illustrate now the first steps in this procedure.

The model (3.22) contains the random curvature, and it is the simplestform of the Slepian model after maximum. There is nothing that prevents usto include also the random height of the local maximum in the model. We haveseen in (3.25) how the height and the curvature depend on each other, so wecan build an alternative Slepian model after maximum that explicitly includesboth the height of the maximum and the curvature.

To formulate the extended model we define three functions, A(t), B(t), C(t),by

E(x(t) | x(0) = u, x′(0) = y, x′′(0) = z)

= uA(t) + yB(t) + zC(t)

= uω4r(t) + ω0ω2r

′′(t)ω0ω4 − ω2

2

− yr′(t)ω2

+ zω2r(t) + ω0r

′′(t)ω0ω4 − ω2

2

.

The conditional covariance between x(s1) and x(s2) are found from the sametheorem, and the explicit expression is given in the following Theorem 3:6.

As we have seen in Section 2.4.3 the derivative x′(0) is uncorrelated withboth x(0) and x′′(0), but x(0) and x′′(0) are correlated. To formulate theeffect of observing a local maximum we will first introduce the crest height,


x(0), and then find the conditional properties of x′′(0) given x(0) and x′(0).We use Theorem 2:13 and define the function

b(t) =Cov(x(t), x′′(t) | x(0), x′(0))√

V (x′′(0) | x(0), x′(0))=

r′′(t) + (ω2/ω0)r(t)√ω4 − ω2

2/ω0

.

Theorem 3:6 If {x(t), t ∈ R} is twice differentiable and ergodic the long runempirical distribution of x(t′k + s) around local maxima is equal to

Pmax2 (A)

=

∫∞u=−∞

∫ 0z=−∞ |z|fx(0),x′(0),x′′(0)(u, 0, z)P (x(s) ≤ v | u, 0, z) dz du∫∞

u=−∞∫ 0−∞ |z|fx(0),x′(0),x′′(0)(u, 0, z) dz du

,

where A = {y ∈ C; y(s) ≤ v} snd

P (x(s) ≤ v | u, 0, z) = P (x(s) ≤ v | x(0) = u, x′(0) = 0, x′′(0) = z).

An explicit representation of the model process for a Gaussian process is the

ξmax2 (t) = ηmax

2 A(t) + ζmax2 C(t) + Δ2(t), (3.26)

where the random (ηmax2 , ζmax

2 ) has the two-dimensional density (with normal-izing constant c),

pmax2 (u, z) = c|z| exp

{−ω0z

2 + 2ω2uz + ω4u2

2(ω0ω4 − ω22)

}, −∞ < u < ∞, z < 0.

The process Δ2(t) with mean zero is non-stationary Gaussian, independent of(ηmax

2 , ζmax2 ), and has covariance function

rΔ2(s, t) = r(s − t) − r(s)r(t)ω0

− r′(s)r′(t)ω2

− b(s)b(t).

3.2.3 Excursions and related distributions

What is the shape, height and extension, of an arbitrary excursion above afixed level u for a stationary process? An example of this type of problem iswhen x(t) is an electrical potential which may not become too high, and shouldalways stay below a certain level. The integral

∫ tk+Tk

tk(x(t) − u) dt between an

excursion and the critical level represents the amount of extra electrical chargesthat are transmitted between the upcrossing at tk and the next downcrossingat tk + Tk .

One of the advantages of a Slepian model is that it lends itself to efficientnumerical calculations of important quantities related to crossings and maxima.The structure of the model is such that the random variables that representslope, height, and curvature in the crossings and crest models are easy to handle


numerically. The only problems are the residual processes which require infinitedimensional probabilities to be calculated. To overcome this in a numericalalgorithm one can use a successive conditioning technique that first introducesthe value of the normal residual at a single point, say, κ(s1), and include thatas a separate term in the model. The residual process will be correspondinglyreduced and the procedure repeated.

For numerical calculations of interesting crossings probabilities one can trun-cate the conditioning procedure when sufficient accuracy is attained. This ap-proximation technique is called regression approximation in crossing theory.

3.2.3.1 Length of excursions

The Slepian process ξu(t) has a u-upcrossing at t = 0 and we denote by Tthe time of its first downcrossing of the same level, so T is the length of anexcursion. Since T > t if and only if ξu(s) stays above the level u in the entireinterval 0 < s < t , we can express the probability P (T > t) by means of theindicator function

Iz(κ, t) =

{1, if ur(s)

ω0− zr′(s)

ω2+ κ(s) > u, for all s ∈ (0, t),

0, otherwise.

The result is,

P (T > t) =∫ ∞

z=0pu(z) · E(Iz(κ, t)) dz, (3.27)

where pu(z) is the Rayleigh density for the derivative at u-upcrossing, and theexpectation

E(Iz(κ, t)) = P

(inf

0<s<t

{ur(s)ω0

− zr′(s)ω2

+ κ(s)}

> u

)is an infinite dimensional normal probability. That probability has to be cal-culated numerically by special software. By means of routines in the Matlabpackage WAFO, [34], which is available on the departments homepage, one cancalculate the distribution with very high accuracy. Figure 3.3 shows the excur-sion length densities for a realistic water wave process with a common JonswapNorth Sea spectrum.

The probability density function for the excursion time T is of course minusthe derivative of P (T > t). It can also be expressed by means of Durbin’sformula,

fT (t) = fξu(t)(u)E(I{ξu(s) > u, 0 < s < t} · (−ξ′u(t)−) | ξu(t) = u

),

where ξ′u(t)− = min(0, ξ′u(t)) is the negative part of the derivative. The expec-tation can be calculated by mean of algorithms from WAFO, by means of theregression technique with successive conditioning on the residual process.


0 5 10 150

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

period [s]

Excusion time densities

Levels (from left to right):

2, 1, 0, −1

Figure 3.3: Probability densities for excursions above u = −1, 0, 1, 2 for pro-cess with North Sea wave spectrum Jonswap.

3.2.3.2 Wave shape

The distribution of wave characteristics such as drop in height and time dif-ference between a local maximum and the next local minimum can be derivedfrom the Slepian models in Theorem 3:5 or Theorem 3:6.

First consider the model (3.22),

ξmax1 (t) = ζmax

1

r′′(t)ω4

+ Δ1(t),

which completely describes the stochastic properties of the shape around maxi-mum. The simplest, zero order, approximation is to delete the residual processΔ1(t) completely, only keeping the curvature dependent term ζmax

1r′′(t)ω4

. Byreplacing ζmax

1 by its average −√ω4π/2. From this we can, for example, getthe average shape, as

ξmax(t) = −√ω0 α

r′′(t)ω2

.

The zero order approximation is usually too crude to be of any use. Abetter approximation is obtained from the model (3.26), which also includesthe (random) height at the maximum point,

ξmax2 (t) = ηmax

2 A(t) + ζmax2 C(t) + Δ2(t),


and define the random variable T as the time of the first local minimum ofξmax2 (t), t > 0. The height drop is then H = ξmax

2 (0)− ξmax2 (T ) and we ask for

the joint distribution of T and H .Using the fact that A(0) = 1, C(0) = 0 and ξmax

2 (T )′ = 0, we get thefollowing relations that need to be satisfied,

ηmax2 A′(T ) + ζmax

2 C ′(T ) + Δ′2(T ) = 0,

ηmax2 + Δ2(0) − (ηmax

2 A(T ) + ζmax2 C(T ) + Δ2(T )) = H.

We now describe the regression approximation of order 1, which is obtainedby deleting all of the residual process terms. The relations will then be

ηmax2 A′(T r) + ζmax

2 C ′(T r) = 0, (3.28)ηmax2 − (ηmax

2 A(T r) + ζmax2 C(T r)) = Hr, (3.29)

where we write T r,Hr for the approximative time and height variables.To write the solution in a form that can be generalized to more complicated

problems, define

G(t) =(

1 − A(t) C(t)A′(t) C ′(t)

)and write the equations (3.28) and (3.29) as (T for matrix transpose),

G(T r)(ηmax2 ζmax

2 )T = (Hr T r)T .

If detG(T r) = 0 we get from (ηmax2 ζmax

2 )T = G(T r)−1(Hr 0)T that the vari-ables with known distribution (ηmax

2 and ζmax2 ) are simple functions of the

variables with unknown distribution,

ηmax2 = Hr p(T r) q(T r), ζmax

2 = Hr q(T r),

where

p(t) =−C ′(t)A′(t)

, q(t) =−A′(t)

(1 − A(t))C ′(t) − A′(t)C(t).

We want the density at the point T r = t,Hr = h ; let ξ(t, h), ζ(t, h) bethe corresponding solution and define the indicator function I(t, h) to be 1if the approximating process ξ(t, h)A(s) + ζ(t, h)(s) is strictly decreasing for0 < s < t .

The Jacobian for the transformation is J(t, h) = hp′(t)q(t)2 , and thereforethe density of T r,Hr is

fT r,Hr(t, h) = fξmax,ζmax(hp(t)q(t), hq(t)) · |J(t, h)| I(t, h)

= const I(t, h)h2|q2(t)3p′(t)|

× exp{− 1

2ε2h2q(t)2(Tm/π)4

(((π/Tm)2p(t) + 1)2 +

ε2

1 − ε2

)}.


0 50 100 150 200 250−6

−4

−2

0

2

4

6

0 2 4 6 80

2

4

6

8

10

12

Level curves at:

0.010830.027070.054140.1083 0.2166 0.4331 0.5414 0.8121

Figure 3.4: Probability density for T,H for process with North Sea wave spec-trum Jonswap together with 343 observed cycles.

This form of the T,H distribution is common in the technical literature, whereTm = π

√ω2/ω4 is called the mean half wave period. Note that dependence on

the spectrum is only through the spectral width parameter ε =√

1 − ω22

ω0ω4=√

1 − α2 .This first order approximation of the T,H -density is not very accurate, but

it illustrates the basic principle of the regression approximation. The WAFO

toolbox, [34], contains algorithms for very accurate higher order approxima-tions. Figure 3.4 shows the result for a process with a common North SeaJonswap spectrum.


Exercises

3:1. Prove that κ(0) = κ′(0) = 0 in the Slepian model after upcrossing.

3:2. Formulate conditions on the covariance function rx(t) that guarantee thatthe residual process κ(t) has differentiable sample paths.

3:3. Complete the proof of Theorem 3:5.

Chapter 4

Spectral- and other

representations

This chapter deals with the spectral representation of weakly stationary pro-cesses – stationary in the sense that the mean is constant and the covarianceCov(x(s), x(t)) only depends on the time difference t − s . For real-valuedGaussian processes, the mean and covariance function determines all finite-dimensional distributions, and hence the entire process distribution. However,the spectral representation requires complex-valued processes, and then oneneeds to specify also the correlation structure between the real and the imag-inary part of the process. We therefore start with a summary of the basicproperties of complex-valued processes, in general, and in the Gaussian case.We remind the reader of the classical memoirs by S.O. Rice, [27], which can berecommended to anyone with the slightest historical interest. That work alsocontains many old references.

4.1 Complex processes and their covariance func-

tions

4.1.1 Stationary processes

A complex-valued process

x(t) = y(t) + iz(t)

is strictly stationary if all 2n-dimensional distributions of

y(t1 + τ), z(t1 + τ), . . . , y(tn + τ), z(tn + τ)

are independent of τ . It is called weakly stationary or second order stationaryif E(x(t)) = m is constant, and

E(x(s) · x(t)) = r(s − t) + |m|2

79

80 Spectral- and other representations Chapter 4

only depends on the time difference s − t . The covariance function

r(τ) = E((x(s + τ) − m)(x(s) − m)

)is Hermitian, i.e.

r(−τ) = r(τ).

For real-valued processes, the covariance function r(τ) determines all co-variances between x(t1), . . . , x(tn),

Σ(t1, . . . , tn) =

⎛⎜⎝ r(0) r(t1 − t2) . . . r(t1 − tn)...

.... . .

...r(tn − t1) r(tn − t2) . . . r(0)

⎞⎟⎠ (4.1)

=

⎛⎜⎝ V (x(t1)) C(x(t1), x(t2)) . . . C(x(t1), x(tn))...

.... . .

...C(x(tn), x(t1)) C(x(tn), x(t2)) . . . V (x(tn))

⎞⎟⎠ .

4.1.2 Non-negative definite functions

It is a unique characteristic property of a covariance function that is it non-negative definite in the following sense: Let t1, . . . , tn be any finite set of timepoints, and take arbitrary complex numbers a1, . . . , an . Then, for simplicityassuming E(x(t)) = 0,

n∑j,k

ajakr(tj − tk) = E

⎛⎝ n∑j,k

aj x(tj) ak x(tk)

⎞⎠ (4.2)

= E

∣∣∣∣∣∣n∑

j=1

aj x(tj)

∣∣∣∣∣∣2

≥ 0. (4.3)

Theorem 4:1 Every non-negative definite, possibly complex, function r(τ) isthe covariance function for a strictly stationary Gaussian process. Thus, theclass of covariance functions is equal to the class of non-negative definite func-tions.

Proof: We have to show that if r(τ) is non-negative definite then there arefinite-dimensional distributions for a process x(t) with E(x(t)) = 0, such that

r(τ) = E(x(s + τ)x(s)).

By Kolmogorov’s existence theorem, see Appendix A, we only have to showthat for every selection of time points t1, . . . , tn , there is an n-dimensional

Section 4.1 Complex processes and their covariance functions 81

distribution with mean 0 and covariances given by Σ(t1, . . . , tn) as defined by(4.1), and such that the obtained family of distributions forms a consistentfamily, i.e. for example FX,Y (x, y) = FY,X(y, x) and FX,Y (x,∞) = FX(x).

If r(t) is a real function and u = (u1, . . . , un) a real vector, consider thenon-negative quadratic form

Q(u) =∑j,k

ujukr(tj − tk).

Then we recognize exp(−Q(u)/2) as the characteristic function for an n-variatenormal distribution with covariance matrix Σ(t1, . . . , tn), and we have found adistribution with the specified properties.

If r(t) is complex, with r(−t) = r(t), there are real functions p(t) = p(−t),q(t) = −q(−t) such that

r(t) = p(t) + iq(t).

Take aj = uj−ivj and consider the non-negative quadratic form in real variables(u,v) = (u1, . . . , un, v1, . . . , vn),

Q(u,v) =∑j,k

ajak r(tj − tk)

=∑j,k

(uj − ivj)(uk + ivk)(p(tj − tk) + iq(tj − tk))

=∑j,k

{p(tj − tk)(ujuk + vjvk) − q(tj − tk)(ujvk − ukvj)} ;

note that the imaginary part vanishes since Q is assumed to be non-negative,and hence real. Similarly, as in the real case,

exp(−Q(u,v)/2) = E

⎛⎝exp(i∑

j

(ujyj + vjzj))

⎞⎠is the characteristic function of a 2n-dimensional normal variable

(y1, . . . , yn, z1, . . . , zn)

with the specified properties:

E(yjyk) = E(zjzk) = p(tj − tk)

E(yjzk) = −E(ykzj) = −q(tj − tk).

With xj = (yj + izj)/√

2, we have E(xj) = 0 and

E(xjxk) = p(tj − tk) + iq(tj − tk) = r(tj − tk),

as required. Since the Gaussian distribution of the y - and z -variables is de-termined by the covariances, and these depend only on the time difference, theprocess is strictly stationary. �


4.1.3 Strict and weak stationarity

Since the first two moments determines a real normal distribution, it is clearthat each weakly (covariance) stationary normal process is strictly stationary.For complex processes matters are not that easy, and one has to impose twostationarity conditions in order to guarantee strict stationarity.

Theorem 4:2 A complex normal process x(t) = y(t)+ iz(t) with mean zero isstrictly stationary if and only if the two functions

r(s, t) = E(x(s)x(t))

q(s, t) = E(x(s)x(t)),

only depend on t − s.

Proof: To prove the ”if” part, express r(s, t) and q(s, t) in terms of y and z ,

r(s, t) = E(y(s)y(t) + z(s)z(t)) + iE(z(s)y(t) − y(s)z(t)),

q(s, t) = E(y(s)y(t) − z(s)z(t)) + iE(z(s)y(t) + y(s)z(t)).

Since these only depend on t−s , the same is true for the sums and differences oftheir real and imaginary parts, i.e. for E(y(s)y(t)), E(z(s)z(t)), E(z(s)y(t)),E(y(s)z(t)). Therefore, the 2n-dimensional distribution of

y(t1), . . . , y(tn), z(t1), . . . , z(tn)

only depends on time differences, and x(t) is strictly stationary. The converseis trivial. �

Example 4:1 If x(t) is a real and stationary normal process, and μ is a con-stant, then

x∗(t) = eiμtx(t)

is a weakly, but not strictly, stationary complex normal process,

E(x∗(s)x∗(t)) = eiμ(s−t)E(x(s)x(t)),

E(x∗(s)x∗(t)) = eiμ(s+t)E(x(s)x(t)).

4.2 Bochner’s theorem and the spectral distribution

4.2.1 The spectral distribution

We have seen that covariance functions for stationary processes are character-ized by the property of being non-negative definite. From elementary courseswe also know that covariance functions are Fourier-transforms of their spectraldistributions. We shall now formulate this and prove this statement.

Section 4.2 Bochner’s theorem and the spectral distribution 83

Theorem 4:3 (Bochner’s theorem) A continuous function r(t) is non-negativedefinite, and hence a covariance function, if and only if there exists a non-decreasing, right continuous, and bounded real function F (ω) such that

r(t) =∫ ∞

−∞eiωt dF (ω).

The function F (ω) is the spectral distribution function of the process, and ithas all the properties of a statistical distribution function except that F (+∞)−F (−∞) = r(0) need not be equal to one. The function F (ω) is defined only upto an additive constant, and one usually takes F (−∞) = 0.

Proof: The ”if” part is clear, since if r(t) =∫

exp(iωt) dF (ω), then∑j,k

zjzkr(tj − tk) =∑j,k

zjzk

∫eiωtj · e−iωtk dF (ω)

=∫ ∑

j,k

zjeiωtj zkeiωtk dF (ω)

=∫ ∣∣∣∣∣∣∑

j

zjeiωtj

∣∣∣∣∣∣2

dF (ω) ≥ 0.

For the ”only if” part we shall use some properties of characteristic func-tions, which are proved elsewhere in the probability course. We shall show that,given r(t), there exists a proper distribution function F∞(ω) = F (ω)/F (∞)such that

F∞(∞) − F∞(−∞) = 1,∫eiωt dF∞(ω) =

r(t)r(0)

.

To this end, take a real A > 0, and define

g(ω,A) =1

2πA

∫ A

0

∫ A

0r(t − u) e−iω(t−u) dt du

=1

2πAlim∑j,k

r(tj − tk)e−iωtj e−iωtk Δtj Δtk

=1

2πAlim∑j,k

r(tj − tk)Δtje−iωtj · Δtke−iωtk ≥ 0,

since r(t) is non-negative definite. (Here, of course, the tj define a subdivisionof the interval [0, A] .) Going to the limit, g will give the density of the desiredspectral distribution.


Before we proceed, we express g(ω,A) as

g(ω,A) =1

2πA

∫ A

0

∫ A

0r(t − u)e−iω(t−u) dt du

=12π

∫ A

−A

(1 − |t|

A

)r(t)e−iωt dt =

12π

∫ ∞

−∞μ(t/A)r(t)e−iωt dt,

where

μ(t) ={

1 − |t| for |t| ≤ 10 otherwise.

The proof will now proceed in three steps:

Step 1) Prove that g(ω,A) ≥ 0 is integrable, and∫ω

g(ω,A) dω = r(0), (4.4)

so g(·, A)/r(0) is a regular statistical density function.

Step 2) Show that (1 − |t|

A

)r(t)r(0)

=∫ ∞

−∞

g(ω,A)r(0)

eitω dω, (4.5)

so the function(1 − |t|

A

)r(t)/r(0) for |t| ≤ A is the characteristic function for

the density g(ω,A)/r(0).

Step 3) Take limits as A → ∞ ,

limA→∞

(1 − |t|

A

)r(t) = r(t). (4.6)

Since the limit of a convergent sequence of characteristic functions is also acharacteristic function, provided it is continuous, we have shown that thereexists a statistical distribution such that r(t)/r(0) is its characteristic function.

We now show step (1) and (2). Multiply g(ω,A) by μ(ω/2M), integrate, andchange the order of integration (since μ(ω/2M)μ(t/A)r(t)e−iωt is bounded andhas support in [−2M, 2M ] × [−A,A] Fubini’s theorem permits this):∫ ∞

−∞μ(ω/2M)g(ω,A) dω =

12π

∫ ∞

−∞μ(ω/2M)

∫ ∞

−∞μ(t/A)r(t)e−iωt dt dω

=12π

∫ ∞

−∞μ(t/A)r(t)

∫ ∞

−∞μ(ω/2M)e−iωt dω dt.(4.7)


Here,∫ ∞

−∞μ(ω/2M)e−iωt dω =

∫ 2M

−2M

(1 − |ω|

2M

)e−iωt dω

=∫ 2M

−2M

(1 − |ω|

2M

)cos ωt dω = 2M

(sin Mt

Mt

)2

,

so (4.7) is equal to

M

π

∫ ∞

−∞μ(t/A)r(t)

(sin Mt

Mt

)2

dt =1π

∫ ∞

−∞μ(s/MA)r(s/M)

(sin s

s

)2

ds

≤ 1π

r(0)∫ ∞

−∞

(sin s

s

)2

ds = r(0).

Now, μ(ω/2M)g(ω,A) ↑ g(ω,A) as M ↑ ∞ , so∫ ∞

−∞g(ω,A) dω = lim

M→∞

∫ ∞

−∞μ(ω/2M)g(ω,A) dω ≤ r(0).

We have now shown that g(ω,A) and μ(t/A)r(t) are both absolutely in-tegrable over the whole real line. Since they form a Fourier transform pair,i.e.

g(ω,A) =12π

∫ ∞

−∞μ(t/A)r(t)e−iωt dt,

we can use the Fourier inversion theorem, which states that

μ(t/A)r(t) =∫ ∞

−∞g(ω,A)eiωt dω,

which is step (2) in the proof.By taking t = 0 we also get step (1), and fA(ω) = g(ω,A)/r(0) is a

probability density function for some distribution with characteristic function

φA(t) =∫ ∞

−∞

g(ω,A)r(0)

eiωt dω =μ(t/A)r(0)

r(t).

For step (3) we need one of the basic lemmas in probability theory, theconvergence properties of characteristic functions: if FA(x) is a family of dis-tribution functions with characteristic functions φA(t), and φA(t) converges toa continuous function φ(t), as A → ∞ , then there exists a distribution func-tion F (x) with characteristic function φ(t) and FA(x) → F (x), for all x whereF (x) is continuous.

Here the characteristic functions φA(t) = μ(t/A)r(0) r(t) converge to φ(t) =

r(t)/r(0), and since we have assumed r(t) to be continuous, we know from the


basic lemma that FA(x) =∫ x−∞ fA(ω) dω converges to a distribution function

F∞(x) as A → ∞ , with characteristic function φ(t):

r(t)r(0)

=∫ ∞

−∞eiωt dF∞(ω).

We get the desired spectral representation with F (ω) = r(0)F∞(ω). �

4.2.2 Properties of the spectral distribution

4.2.2.1 The inversion theorem

The covariance function r(t) and the spectral density f(ω) form a Fouriertransform pair. In general, the spectral distribution is uniquely determined bythe covariance function but the precise relationship is somewhat complicatedif the spectrum is not absolutely continuous. To formulate a general inversiontheorem we need to identify those ω for which the spectral distribution functionis not continuous.1 Write ΔFω = F (ω) − F (ω − 0) ≥ 0 for the jump (possibly0) at ω , and define F (ω) as the average between the left and right limits ofF (ω),

F (ω) =F (ω) + F (ω − 0)

2= F (ω) − 1

2ΔFω. (4.8)

For a proof of the following theorem we refer to [12].

Theorem 4:4 a) If ω1 < ω2 , then we have

F (ω2) − F (ω1) =12π

limT→∞

∫ T

−T

e−iω2t − e−iω1t

−itr(t) dt. (4.9)

b) If the covariance function r(t), t ∈ R is absolutely integrable, i.e.∫ ∞

−∞|r(t)| dt < ∞,

then the spectrum is absolutely continuous and the Fourier inversion formulaholds,

f(ω) =12π

∫ ∞

−∞e−iωtr(t) dt. (4.10)

Remark 4:1 The inversion formula (4.9) defines the spectral distribution forall continuous covariance functions. One can also use (4.10) to calculate thespectral density in case r(t) is absolutely integrable, but if it not, one may use4.9) and take f(ω) = limh→0(F (ω + h)− F (ω))/h. This is always possible, butone has to be careful in case f(ω) is not continuous. Even when the limit f(ω)exists it need not be equal to f(ω) as the following example shows. The limit,which always exists, is called the Cauchy principal value.

1Note that F (ω) can have only a denumerable number of discontinuity points.


Example 4:2 We use (4.9) to find the spectral density of low frequency whitenoise, with covariance function r(t) = sin t

t . We get

F (ω + h) − F (ω − h)2h

=12π

12h

limT→∞

∫ T

−T

e−i(ω+h)t − e−i(ω−h)t

−it

sin t

tdt

=12π

∫ ∞

−∞e−iωt sin ht

ht

sin t

tdt

=

⎧⎪⎪⎨⎪⎪⎩12 , for |ω| < 1 − h,

14(1 + (1 − |ω|)/h), for 1 − h < |ω| < 1 + h,

0, for |ω| > 1 + h.

The limit as h → 0 is 1/2, 1/4, and 0, respectively, which gives as spectraldensity

f(ω) =

⎧⎪⎪⎨⎪⎪⎩1/2, for |ω| < 1,

1/4, for |ω| = 1,

0, for |ω| > 1.

Note that the Fourier inversion formula (4.10) gives 1/4 for ω = 1 as theCauchy principal value,

limT→∞

12π

∫ T

−Te−iωt r(t) dt = lim

T→∞12π

∫ T

−T

sin 2t2t

dt = 1/4.

4.2.2.2 The one-sided real form

A real stationary process has a symmetric covariance function and a symmetricspectrum. In practical applications one often uses only the positive side of thespectrum. The one-sided spectral distribution will be denoted by G(ω), with

G(ω) =

{0 for ω < 0,

F (ω) − F (−ω − 0) = 2F (ω) − r(0) for ω ≥ 0.(4.11)

Then G(0−) = 0 and G(∞) = r(0). If F (ω) is discontinuous at ω = 0 thenG(ω) will have a jump F (0) − F (−0) = G(0+) at ω = 0. For discontinuitypoints ω > 0 the jump of G(ω) will be twice that of F (ω).

The covariance function can be expressed as

r(t) =∫ ∞

0−cos ωt dG(ω) =

∫ ∞

−∞cos ωt dF (ω),

and the inversion formula (4.9) gives that for any continuity points

G(ω) = F (ω) − F (−ω) =2π

∫ ∞

0

sin ωt

tr(t) dt.

Note that∫∞−∞ sinωt dF (ω) = 0, since F is symmetric.


4.2.3 Spectrum for stationary sequences

If a stationary process {x(t), t ∈ R}, with spectral distribution F (ω) andcovariance function r(t) =

∫∞−∞ eiωt dF (ω), is observed only at integer time

points t ∈ Z one obtains a stationary sequence {xn, n ∈ Z} for which thecovariance function is the same as that of x(t) restricted to integer n = t . Inthe spectral formula the factor eiωt = ei(ω+2kπ)t for all integer t and k , andthe spectrum may be restricted to the interval (−π, π] :

r(t) =∫ ∞

−∞eitω dF (ω) =

∞∑k=−∞

∫ 2kπ+π

2kπ−π+0eiωt dF (ω)

=∫ π

−π+0eiωt

∞∑k=−∞

dF (ω + 2kπ).

This means that all frequencies ω + 2kπ for k = 0 are lumped together withthe frequency ω and cannot be individually distinguished. This is the aliasingor folding effect of sampling a continuous time process.

For a stationary sequence {xn, n ∈ Z}, the covariance function r(t) isdefined only for t ∈ Z . Instead of Bochner’s theorem we have the followingtheorem, in the literature called Herglotz’ lemma.

Theorem 4:5 (Herglotz’ lemma) A function r(t), t ∈ Z, defined on the inte-gers, is non-negative definite, and hence a covariance function for a stationarysequence, if and only if there exists a non-decreasing, right-continuous, andbounded real function F (ω) on (−π, π], such that

r(t) =∫ π

−π+0eiωt dF (ω). (4.12)

Note that the spectrum is defined over the half-open interval to keep the right-continuity of F (ω). It is possible to move half the spectral mass in π to −πwithout changing the representation (4.12).

The inversion theorem states that if∑∞

t=−∞ |r(t)| < ∞ then the spectrumis absolutely continuous with spectral density given by

f(ω) =12π

∞∑t=−∞

e−iωt r(t),

while in general, for −π < ω1 < ω2 ≤ π ,

F (ω2) − F (ω1) =12π

r(0)(ω2 − ω1) + limT→∞

12π

T∑t=−T,

t�=0

r(t)e−iω2t − e−iω1t

−it,

where as before F (ω) is defined as the average of left and right hand side limitsof F (ω).

Section 4.3 Spectral representation of a stationary process 89

4.3 Spectral representation of a stationary process

4.3.1 The spectral process

In elementary courses one could have encountered processes of the form

x(t) =∑

k

Ak cos(ωkt + φk), (4.13)

where ωk > 0 are fixed frequencies, while Ak are random amplitudes, andφk random phases, uniformly distributed in (0, 2π) and independent of theAk . The uniformly distributed phases make the process stationary, and itsspectrum is discrete, concentrated at {ωk}. The covariance function and one-sided spectral distribution function are, respectively,

r(t) =∑

k

E(A2k/2) cos ωkt,

G(ω) =∑

k;ωk≤ω

E(A2k/2), ω > 0.

The process (4.13) can also be defined as the real part of a complex process

x(t) = �∑

k

Akeiφkeiωkt,

and it is in fact a special example of the general spectral representation of astationary process, which in takes the form of an integral

x(t) =∫ ∞

−∞eiωt dZ(ω),

where {Z(ω);ω ∈ R} is a complex spectral process with E(Z(ω)) = 0 andorthogonal increments, i.e.

E((Z(ω4) − Z(ω3)) · (Z(ω2) − Z(ω1))) = 0,

for ω1 < ω2 < ω3 < ω4 . The variance of its increments is equal to the incre-ments of the spectral distribution, i.e. for ω1 < ω2 ,

E(|Z(ω2) − Z(ω1)|2) = F (ω2) − F (ω1).

One can summarize the relations between Z(ω) and F (ω) as

E(dZ(ω) · dZ(μ)

)=

{dF (ω) if ω = μ,

0 if ω = μ.(4.14)

It follows that Z(ω) is continuous in quadratic mean if and only if the spectraldistribution function F is continuous. If F has a jump at a point ω0 ,

F (ω0+) − F (ω0−) = σ20,


then limε→0(Z(ω0 + ε) − Z(ω0 − ε)) exists and has variance σ20 .

Now, let us start with a spectral process {Z(ω);ω ∈ R}, a complex processwith E(Z(ω)) = 0 and with orthogonal increments, and define the functionF (ω) by

F (ω) =

{E(|Z(ω) − Z(0)|2) for ω ≥ 0,

−E(|Z(ω) − Z(0)|2) for ω < 0.

Since only the increments of Z(ω) are used in the theory, we can fix its valueat any point, and we take Z(0) = 0. Following the definition of a stochasticintegral in Section 2.6, we can define a stochastic process

x(t) =∫

eiωt dZ(ω) = lim∑

eiωkt(Z(ωk+1) − Z(ωk)),

where the limit is in quadratic mean. It is then easy to prove that E(x(t)) = 0and that its covariance function is given by the Fourier-Stieltjes transform ofF (ω): use Theorem 2:14, and (4.14), to get

E

(∫eiωs dZ(ω) ·

∫eiμt dZ(μ)

)=∫ ∫

ei(ωs−μt)E(dZ(ω) · dZ(μ)

)=∫

eiω(s−t) dF (ω).

4.3.2 The spectral theorem

We shall now prove one of the central results in the theory, namely that everyL2 -continuous weakly stationary process {x(t), t ∈ R} has a spectral repre-sentation, x(t) =

∫∞−∞ eiωt dZ(ω), where Z(ω) ∈ H(x), i.e. Z(ω) is an element

in the Hilbert space which is spanned by limits of linear combinations of x(t)-values. In fact, one can define Z(ω) explicitly for a continuity point ω of F ,

Z(ω) = limT→∞

12π

∫ T

−T

e−iωt − 1−it

x(t) dt, (4.15)

and prove that it has all the required properties. This is the technique usedin Yaglom’s classical book, [38]; see Exercise 5. We shall present a functionalanalytic proof, as in [9], and find a relation between H(x) = S(x(t); t ∈ R) andH(F ) = L2(F ) = the set of all functions g(ω) with

∫ |g(ω)|2 dF (ω) < ∞ . Westart by the definition of an isometry.

Definition 4:1 A linear one-to-one mapping f between two Hilbert spacesX and Y is called an isometry if it conserves the inner product (u, v)X =(f(u), f(v))Y . In particular ‖u − v‖X = ‖f(u)− f(v)‖Y , so distances are alsopreserved.


Theorem 4:6 If {x(t), t ∈ R} is a zero mean continuous stationary processwith spectral distribution F (ω) there exists a complex-valued spectral process{Z(ω), ω ∈ R} with orthogonal increments, such that

E(|Z(ω2) − Z(ω1)|2

)= F (ω2) − F (ω1),

for ω1 < ω2 and

x(t) =∫

eiωt dZ(ω).

Proof: We shall build an isometry between the Hilbert space of random vari-ables H(x) = S(x(t); t ∈ R) and the function Hilbert space H(F ) = L2(F ),with scalar products defined as

(u, v)H(x) = E(uv),

(g, h)H(F ) =∫

g(ω)h(ω) dF (ω).

First consider the norms in the two spaces,

‖y‖2H(x) = E(|y|2),

‖g‖2H(F ) =

∫|g(ω)|2 dF (ω),

and note that

‖x(t)‖2H(x) = E(|x(t)|2) = r(0),

‖ei·t‖2H(F ) =

∫|eiωt|2 dF (ω) = r(0).

This just means that x(t) has the same length as an element of H(x) as hasei·t as an element of H(F ), i.e. ‖x(t)‖H(x) = ‖ei·t‖H(F ) . Furthermore, scalarproducts are preserved,

(x(s), x(t))H(x) = E(x(s)x(t)) =∫

eiωseiωt dF (ω) = (ei·s, ei·t)H(F ).

This is the start of our isometry: x(t) and ei·t are the corresponding el-ements of the two spaces. Instead of looking for random variables Z(ω0) inH(x) we shall look for functions gω0(·) in H(F ) with the same properties.

Step 1: Extend the correspondence to finite linear combinations of x(t) andeiωt by letting

y = α1x(t1) + . . . + αnx(tn) (4.16)

g(ω) = α1eiωt1 + . . . + αneiωtn (4.17)


be corresponding elements. Check by yourself that scalar product is preserved,i.e.

(y1, y2)H(x) = (g1, g2)H(F ).

Step 2: Distances are preserved, i.e. ‖y1−y2‖H(x) = ‖g1 − g2‖H(F ) , so y1 = y2

if and only if g1 = g2 where equality means equal with probability one, andalmost everywhere, respectively.

Step 3: Convergence in the two spaces means the same. If y1, y2, . . . convergestowards y in H(x), and g1, g2, . . . are the corresponding elements in H(F ),then

‖yn − ym‖H(x) → 0 implies ‖gn − gm‖H(F ) → 0,

and since H(F ) is complete there exists a limit element g ∈ H(F ) such that‖y‖H(x) = ‖g‖H(F ) . The reverse implication also holds. Thus we have extendedthe correspondence between the two spaces to all limits of sums of x(t)-variablesand eiωt -functions.2

Step 4: The correspondence can then be extended to all of H(F ) and H(x).The set H(x) consists by definition of limits of linear combinations of x(tk),and every function in L2(F ) can be approximated by a polynomial in eiωtk fordifferent tk -s. This is the famous Stone-Weierstrass theorem. We have thenfound the isometry between H(x) and H(F ): if u and v are elements in H(x)and f(u) and f(v) the corresponding elements in H(F ), then ‖u − v‖H(x) =‖f(u) − f(v)‖H(F ) .

Step 5: The following function gω0 in H(F ) corresponds to Z(ω0),

gω0(ω) =

{1 for ω ≤ ω0,

0 for ω > ω0.

Obviously, ‖gω0‖2H(F ) =

∫ |gω0(ω)|2 dF (ω) =∫ ω0

−∞ dF (ω), and, with ω1 < ω2 ,

‖gω2 − gω1‖2H(F ) = F (ω2) − F (ω1).

Step 6: Let Z(ω) be the elements in H(x) that correspond to gω(·) in H(F ). Itis easy to see that Z(ω) is a process with orthogonal increments and incrementalvariance given by F (ω):

E((Z(ω4) − Z(ω3)) · (Z(ω2) − Z(ω1))

)=∫

(gω4(ω) − gω3(ω))(gω2(ω) − gω1(ω)) dF (ω) = 0,

E(|Z(ω2) − Z(ω1)|2) = F (ω2) − F (ω1),2Remember that in H(F ) all eiωt are functions of ω , and that we have one function for

every t . Similarly, in H(x) , x(t) = x(t, ω) is a function of ω .


for ω1 < ω2 < ω3 < ω4 .

Step 7: It remains to prove that Z(ω) is the spectral process to x(t), i.e. that

x(t) =∫

eiωt dZ(ω)

= lim∑

eiωkt (Z(ωk+1) − Z(ωk)) = lim S(n)(t),

for an increasingly dense subdivision {ωk} with ωk < ωk+1 . But we have thatx(t) ∈ H(x) and ei·t ∈ H(F ) are corresponding elements. Further,

eiωt = lim∑

eiωkt (gωk+1(ω) − gωk

(ω)) = lim g(n)t (ω),

where the difference of g -functions within parentheses is equal to 1 for ωk <

ω ≤ ωk+1 and 0 otherwise. The limits are in H(F ), i.e. g(·) = ei·t = lim g(n)t (·).

Since S(n)(t) corresponds to g(n)t (·) and limits are preserved under the isometry,

we have that x(t) = lim S(n)(t), as was to be shown. �

Corollary 4.1 Every y ∈ H(x) can be written

y =∫

g(ω) dZ(ω),

for some function g(ω) ∈ H(F ).

Proof: Every y ∈ H(x) is the limit of a sequence of linear combinations,

y = limn

∑k

α(n)k x(t(n)

k ) = limn

∫ω

∑k

α(n)k eiωt

(n)k dZ(ω),

and g(n)(·) =∑

α(n)k ei·t(n)

k converges in H(F ) to some function g(·), and then∫g(n)(ω) dZ(ω) →

∫g(ω) dZ(ω)

in H(x) which was to be shown. �

4.3.3 More on the spectral representation

4.3.3.1 Discrete spectrum

If the spectral distribution function F (ω) is piecewise constant with jumps ofheight ΔFk at ωk , then Z(ω) is also piecewise constant with jumps of randomsize ΔZk at ωk , and E(|ΔZk|2) = ΔFk, so x(t) =

∑ΔZk eiωkt . Note that the

covariance function then has the corresponding form, r(t) =∑

ΔFk eiωkt .In the general spectral representation, the complex Z(ω) defines a random

amplitude and phase for the different components eiωt . This fact is perhaps


difficult to appreciate in the integral form, but is easily understood for processeswith discrete spectrum. Take the polar form, ΔZk = |ΔZk|ei arg ΔZk = ρke

iθk .Then,

x(t) =∑

ρkei(ωkt+θk) =

∑ρk cos(ωkt + θk) + i

∑ρk sin(ωkt + θk).

For a real process, the imaginary part vanishes, and we have the form, wellknown from elementary courses – see also later in this section –

x(t) =∑

ρk cos(ωkt + φk). (4.18)

If the phases φk are independent and uniformly distributed between 0 and 2π ,then x(t) is also strictly stationary.

For discrete spectrum we also have the following important ways of recover-ing the discrete components of F (ω) and of Z(ω); the proof of the propertiesare part of the Fourier theory.

Theorem 4:7 If F (ω) is a step function, with jumps of size ΔFk at ωk , then

limT→∞

1T

∫ T

0r(t)e−iωkt dt = ΔFk, (4.19)

limT→∞

1T

∫ T

0r(t)2 dt =

∑k

(ΔFk)2, (4.20)

limT→∞

1T

∫ T

0x(t)e−iωkt dt = ΔZk. (4.21)

4.3.3.2 Continuous spectrum

If the spectrum is absolutely continuous, with F (ω) =∫ ω−∞ f(x) dx, then one

can normalize the increments of Z(ω) by dividing by√

f(ω), at least forf(ω) > 0, and use the spectral representation in the form

x(t) =∫ ∞

−∞eiωt√

f(ω) dZ(ω), (4.22)

with Z(ω) =∫{x≤ω;f(x)>0}

dZ(x)√f(x)

, and E

(∣∣∣dZ(ω)∣∣∣2) = dF (ω)

f(ω) = dω . Even if

Z(ω) is not a true spectral process – it may for example have infinite incrementalvariance – it is useful as a model for white noise. We will meet this “constantspectral density” formulation several times in later sections.

4.3.3.3 One-sided spectral representation of a real process

For a real processes x(t), the complex spectral representation has to produce areal integral. This of course requires Z(ω) to have certain symmetry properties,


which we shall now investigate. Write ΔZ0 for a possible Z -jump at ω = 0.Then

x(t) =∫ ∞

−∞eiωt dZ(ω)

= ΔZ0 +∫ ∞

0+eiωt dZ(ω) +

∫ ∞

0+e−iωt dZ(−ω)

= ΔZ0 +∫ ∞

0+cos ωt · (dZ(ω) + dZ(−ω))

+ i

∫ ∞

0+sin ωt · (dZ(ω) − dZ(−ω)).

For this to be real for all t it is necessary that ΔZ0 is real, and also thatdZ(ω)+dZ(−ω) is real, and dZ(ω)−dZ(−ω) is purely imaginary, which impliesdZ(−ω) = dZ(ω), i.e. arg Z(−ω) = − arg Z(ω) and |Z(−ω)| = |Z(ω)| . (Theseproperties also imply that x(t) is real.)

Now, introduce two real processes {u(λ), 0 ≤ λ < ∞} and {v(λ), 0 ≤ λ <∞}, with mean zero, and with u(0−) = v(0−) = 0, du(0) = ΔZ0 , v(0+) = 0,and such that, for ω > 0,

du(ω) = dZ(ω) + dZ(−ω) = 2Re dZ(ω)

dv(ω) = i(dZ(ω) − dZ(−ω)) = −2Im dZ(ω).

The real spectral representation of x(t) will then take the form

x(t) =∫ ∞

0cos ωt du(ω) +

∫ ∞

0sin ωt dv(ω)

=∫ ∞

0+cos ωt du(ω) +

∫ ∞

0sin ωt dv(ω) + du(0). (4.23)

It is easily checked that with the one-sided spectral distribution function G(ω),defined by (4.11),

E(du(ω) · dv(μ)) = 0, for all ω and μ, (4.24)

E(du(ω)2) ={

2dF (ω) = dG(ω), ω > 0,dF (0) = dG(0), ω = 0,

(4.25)

E(dv(ω)2) ={

2dF (ω) = dG(ω), ω > 0,dF (0) = dG(0), ω = 0.

(4.26)

In almost all applications, when a spectral density for a time process x(t) ispresented, it is the one-sided density g(ω) = 2f(ω) = dG(ω)/dω that is given.


4.3.3.4 Why negative frequencies?

One may ask why at all use negative frequencies ω < 0 in the spectral rep-resentation of a real process. Since the two complex functions dZ(ω) eiωt anddZ(−ω) e−iωt = dZ(ω) eiωt , which build up the spectral representation, circlethe origin in the counter clockwise and clockwise directions their contributionto the total x(t)-process is real, and there seems to be no point in using thecomplex formulation.

One reason for the complex approach is, besides from some mathematicalconvenience, that negative frequencies are necessary when we want to buildmodels for simultaneous time and space processes, for example a random waterwave which moves with time t along a straight line with coordinate s . Asdescribed in Section 1.6.3 on page 22, a random wave model can be built fromelementary harmonics Aω cos(ωt−κs+φω) where ω is the frequency in radiansper time unit and κ is the wave number in radians per length unit. If ω andκ have the same sign the elementary wave moves to the right with increasing tand if they have opposite sign it moves to the left. In stochastic wave modelsfor infinite water depth the dispersion relation states that κ = ω2/g > 0, withboth positive and negative ω possible. The (average) “energy” attached to theelementary wave Aω cos(ωt−κs+φω) is A2

ω/2 or, in the random case E(A2ω)/2.

If one observes the wave only at a single point s = s0 it is not possibleto determine in which direction the wave is moving, and one can divide theelementary energy in an arbitrary way between ω and −ω . When we deal withthe spectral density for the time process we have chosen to divide it equallybetween positive and negative frequencies.

If we have more than one observation point, perhaps a whole space intervalof observations, we can determine wave direction and see how the energy shouldbe divided between positive and negative ω . This can be done by splitting theprocess in two independent components, one {x+(t), t ∈ R} with only positivefrequencies, moving to the right, and one {−(t), t ∈ R} with only negativefrequencies, moving to the left. The spectra on the positive and negative sideneed not be equal.

For a wave model with one time parameter and a two-dimensional space pa-rameter (s1, s2) the wave direction is taken care of by a two-dimensional wavenumber and the spectrum defined by one part that defines the energy distribu-tion over frequencies, and one directional spreading part that determines theenergy for different wave directions.

4.3.3.5 Gaussian processes

For Gaussian processes {x(t), t ∈ R}, the spectral process is complex Gaussian,with independent real and imaginary parts. Since Z(ω) is an element in thespace H(x) of limits of linear combinations of x-variables, this is immediatefrom the characterization of Gaussian processes as those processes for whichall linear combinations have a Gaussian distribution. Also the real spectral


processes {u(λ), 0 ≤ λ < ∞} and {v(λ), 0 ≤ λ < ∞} are Gaussian, and sincethey have uncorrelated increments, they are Gaussian processes with indepen-dent increments.

The sample paths of u(ω) and v(ω) can be continuous, or they could containjump discontinuities, which then are normal random variables. In the contin-uous case, when there is a spectral density f(ω), they are almost like Wienerprocesses, and they can be transformed into Wiener processes by normalizingthe incremental variance. In analogy with Z(ω) in (4.22), define w1(ω) andw2(ω) by

w1(ω) =∫{x≤ω;f(x)>0}

du(x)√2f(x)

, w2(ω) =∫{x≤ω;f(x)>0}

dv(x)√2f(x)

, (4.27)

to get, Theorem 2:16,

x(t) =∫ ∞

0

√2f(ω) cos ωt dw1(ω) +

∫ ∞

0

√2f(ω) sin ωt dw2(ω). (4.28)

Note that if f(ω) > 0 then E(dw1(ω)2) = E(dw2(ω)2) = dω .The representation (4.28) is particularly useful for simulation of stationary

Gaussian processes, as described in detail in Appendix D. Then the continuousspectrum is discretized to frequencies ωk = kΔ, k ∈ N , and the integrals (4.28)replaced by sums. Since the increments in the Wiener processes are independentnormal variables, the approximative expressions become

x(t) =∑

Uk

√2ΔF (ωk) cos ωkt +

∑Vk

√2ΔF (ωk) sinωkt, (4.29)

=∑

Ak

√2ΔF (ωk) cos(ωkt + φk), (4.30)

where Uk and Vk are independent standard normal variables, and

Ak =√

U2k + V 2

k , φk = − arg(Uk + iVk).

Historically, the representation (4.29) was used explicitly already by LordRayleigh in connection with heat radiation and by Einstein (1910) and others tointroduce Gaussian randomness. The form (4.30) appears to have come later,at least according to S.O. Rice, who cites work written by W.R. Bennett in the1930’s.

4.3.3.6 Gaussian white noise

The differentials dw1(ω) and dw2(ω) in (4.28) are examples of Gaussian whitenoise. White noise in general is a common notion in stochastic process the-ory when one needs a process in continuous time where all process values arevirtually independent, regardless of how close they are in time. Complete in-dependence would require rx(t) = 0 for all t except t = 0, i.e. the covariance


function is not continuous and Bochner’s theorem, Theorem 4:3, is of no useto find a corresponding spectrum. Fourier’s inversion formula (4.10) hints thatthe spectrum should be independent of ω but f(ω) > 0 is not a spectral den-sity. On the other hand, the δ -distribution, δ(ω), also called the Dirac deltafunction, forms a Fourier transform pair together with the constant functionf(ω) = 1/2π . It is in fact possible to formulate a theory for “distributionvalued” stationary processes and covariance functions, but that theory is littleused in practical work and we do not go into any details on this; for a briefintroduction, see [38, Appendix I].

Instead we will use the two Wiener processes defined by (4.27) to illustratethe common way to go around the problem with constant spectral density. Weused them as spectral processes in (4.28) without any difficulty; we only notedthat E(dw1(ω)2) = E(dw2(ω)2) = dω .

In the theory of stochastic differential equations, one often uses the notationw′(t) or dw(t) with the understanding that it is shorthand for a stochasticintegral of the form

∫ tt0

g(u) dw(t), for∫

g(t)2 dt < ∞ . We will illustrate thiswith the previously mentioned Langevin equation, (1.15), and deal with thesemore in detail in Section 4.4.4.

Example 4:3 (The Ornstein-Uhlenbeck process) The Ornstein-Uhlenbeck pro-cess is a Gaussian stationary process with covariance function r(t) = σ2 e−α|t| ,and spectral density

f(ω) =σ2

π· α

α2 + ω2.

We saw in Chapter 2 that a Gaussian process with this covariance function andspectrum is continuous but not differentiable.

The process can be realized as a stochastic integral

x(t) =√

2ασ2

∫ t

−∞e−α(t−τ) dw(τ). (4.31)

As we will see in Section 4.4.4 the Ornstein-Uhlenbeck process is the solutionof the linear stochastic differential equation

αx(t) + x′(t) =√

2α σ w′(t) (4.32)

with Gaussian white noise w′(t). We met this equation in Section 1.6.1 underthe name Langevin’s equation.

For large α (α → ∞), the covariance function falls off very rapidly aroundt = 0 and the correlation between x(s) and x(t) becomes negligible whens = t . In the integral (4.31) each x(t) depends asymptotically only on theincrement dw(t) and are hence approximately independent. With increasing α ,the spectral density becomes increasingly flatter at the same time as f(ω) → 0.In order to keep the variance of the process constant, not going to 0 or ∞ , we


can let σ2 → ∞ in such a way that σ2/α → C > 0. Therefore, the Ornstein-Uhlenbeck process with large α and σ2/α = C can be used as an approximationto Gaussian white noise.

As a stationary process the Ornstein-Uhlenbeck has a spectral representa-tion of the type (4.28), and one may ask what connection there is between thetwo integral representations.

To see the analogy, take w1(ω) and w2(ω) from (4.28) and define wC(ω)for −∞ < ω < ∞ , as

wC(ω) =

{w1(ω) + iw2(ω), ω > 0,w1(−ω) − iw2(−ω), ω < 0,

to get

x(t) =∫ ∞

−∞eiωt√

f(ω) dwC(ω). (4.33)

We then do some formal calculation with white noise: w′(t) is the formalderivative of the Wiener process, and it is a stationary process with constantspectral density equal to 1/2π over the whole real line, i.e. by (4.33),

w′(t) =1√2π

∫ ∞

−∞eiωt dwC(ω),

for some complex Wiener process wC(ω). Inserting this in (4.31), we obtain

x(t) =√

2ασ2

∫ t

−∞e−α(t−τ) w′(τ) dτ

=

√2ασ2

√2π

∫ t

τ=−∞e−α(t−τ)

{∫ ∞

ω=−∞eiωτ dwC(ω)

}dτ

=

√2ασ2

√2π

∫ ∞

ω=−∞

{∫ t

τ=−∞e−(α+iω)(t−τ) dτ

}eiωt dwC(ω)

=

√2ασ2

√2π

∫ ∞

ω=−∞

1α + iω

eiωt dwC(ω)

=

√2ασ2

√2π

∫ ∞

ω=−∞

1√α2 + ω2

ei(− arg(α+iω)+ωt) dwC(ω)

=∫ ∞

ω=−∞ei(ωt+γ(ω))

√f(ω) dwC(ω),

with γ(−ω) = −γ(ω). The same Wiener process wC(ω) which works in thespectral representation of the white noise in (4.31) can be used as spectralprocess in (4.33) after correction of the phase.


4.3.4 Spectral representation of stationary sequences

A stationary sequence {x(t), t ∈ Z} can be thought of as a stationary processwhich is observed only at integer times t . The spectral representation can thenbe restricted to ω -values only in (−π, π] as for the spectral distribution. In theformula

x(t) =∫ π

−π+eiωt dZ(ω), (4.34)

there is now an explicit expression for the spectral process,

Z(ω) =12π

⎧⎨⎩ωx(0) −∑k �=0

e−iωk

ikx(k).

⎫⎬⎭4.4 Linear filters

4.4.1 Projection and the linear prediction problem

One of the most useful instruments in the theory of stochastic processes isthe linear prediction device, by which we ”predict” or approximate a randomvariable x by a linear combination of a set of observed random variables, or bya limit of such linear combinations. The general formulation in the theory ofstochastic processes is the linear filtering problem in which one seeks a linearfilter h(u) such that the linearly filtered process

y(t) =∫ ∞

u=−∞h(u)x(t − u) du

approximates some interesting random quantity y(t), dependent on x(s), s ∈ R .If the impulse response function h(u) is zero for u < 0 we talk about linearprediction, otherwise we call it linear reconstruction. The impulse responsemay contain δ -functions δτk

, which act as time delays; for example y(t) =∫δτ0(u)x(t − u)u = x(t − τ0).

The projection theorem in Hilbert spaces states that if M is a closed linearsubspace of a Hilbert space H , and x is a point in H not in M , then there isa unique element y in M closest to x , and then z = x − y is orthogonal toM ; see Appendix C.

Formulated in statistical terms, if x is a random variable and y1, . . . , yn

is a finite set of random variables, then there is a unique linear combinationx = c1y1 + . . . + cnyn that is closest to x in the ‖ · ‖-norm, i.e. such that∥∥∥x −

∑cjyj

∥∥∥2= E(|x −

∑cjyj|2)

is minimal. This linear combination is characterized by the requirement that theresidual x −∑ cjyj is orthogonal, i.e. uncorrelated with all the yj -variables.This is the least squares solution to the common linear regression problem.

Section 4.4 Linear filters 101

Expressed in terms of covariances, the coefficients in the optimal predictorx = c1y1 + . . . + cnyn satisfy the linear equation system

Cov(x, yj) = c1Cov(y1, yj) + . . . + cnCov(yn, yj), j = 1, . . . , n, (4.35)

which follows from Cov(x −∑k ckyk, yj) = 0.Note that the projection theorem says that the random variable y = x is

unique, in the sense that it y is another random variable that minimizes theprediction error, i.e. E(|x− y|2) = E(|x− y|2) then E(|y− y|2) = 0 and P (y =y) = 1. This does not mean that the coefficients in the linear combination∑

cjyj are unique; if the variables y1, . . . , yn are linearly dependent then manycombinations produce the same best predictor.

Example 4:4 For the MA(1)-process, x(t) = e(t) + b1e(t − 1), we found inExample C:1 that

e(t) =

{∑∞k=0(−b1)kx(t − k), if |b1| < 1,

limn→∞∑n

k=0(1 − kn)x(t − k), for b1 = −1.

Thus, x(t + 1) = e(t + 1) + b1e(t) has been written as the sum of one variablee(t + 1) ⊥ H(x, t) and one variable b1e(t) ∈ H(x, t). The projection theoremimplies that the best linear prediction of x(t + 1) based on x(s), s ≤ t , is

xt(t + 1) = b1e(t) =

{b1∑∞

k=0(−b1)kx(t − k), if |b1| < 1,

limn→∞−∑nk=0(1 − k

n)x(t − k), for b1 = −1.

Note that H(e, t) = H(x, t).

Example 4:5 We can extend the previous example to have H(x, t) ⊂ H(e, t)with strict inclusion. Take a series of variables e∗(t) and a random variable U

with E(U) = 0 and V (U) < ∞ , everything uncorrelated, and set

e(t) = U + e∗(t).

Then x(t) = e(t) − e(t − 1) = e∗(t) − e∗(t − 1), and H(e, t) = H(U) ⊕H(e∗, t)with H(U) and H(e∗, t) orthogonal, and H(e, t) ⊇ H(e∗, t) = H(x, t).

4.4.2 Linear filters and the spectral representation

4.4.2.1 Frequency response

In the previous section we formulated the prediction solution as a linear fil-ter with an impulse response functions. Here we take slightly more abstractapproach and use a frequency formulation. A linear time-invariant filter is a


transformation S that takes a stationary process x(t) =∫

eiωt dZ(ω) into anew stationary process y(t) so that,

y(t) =∫ ∞

−∞g(ω)eiωt dZ(ω), (4.36)

where g(ω) is the transfer function (also called frequency function). It has tosatisfy

∫ |g(ω)|2 dF (ω) < ∞ . That the filter is linear and time-invariant meansthat, for any (complex) constants a1, a2 and time delay τ ,

S(a1x1 + a2x2) = a1S(x1) + a2S(x2),S(x(· + τ)) = S(x)(· + τ).

As an alternative to the impulse function approach in Section 4.4.1 we maytake (4.36) as the definition of a linear time-invariant filter.

The process y(t) is also stationary, and it has covariance function given by

E(y(s + t)y(s)

)=∫ ∞

−∞|g(ω)|2 eiωt dF (ω).

In particular, if x(t) has spectral density fx(ω) then the spectral density ofy(t) is

fy(ω) = |g(ω)|2 fx(ω). (4.37)

Many of the interesting processes we have studied in previous sections, wereobtained as linear combinations of x(t)-variables, or, more commonly, as limitsof linear combinations. To formulate the spectral forms of these operations, weneed the following property, cf. Step 3, in the proof of Theorem 4:6.

Lemma 4.1 If gn → g in H(F ), i.e.∫ |gn(ω) − g(ω)|2 dF (ω) → 0, then∫

gn(ω) eiωt dZ(ω) →∫

g(ω) eiωt dZ(ω)

in H(x).

Proof: Use the isometry,∥∥∥∥∫ gn(ω) eiωt dZ(ω) −∫

g(ω) eiωt dZ(ω)∥∥∥∥2

H(x)

=∫ ∣∣gn(ω)eiωt − g(ω)eiωt

∣∣2 dF (ω) = ‖gn − g‖2H(F ).

�


Example 4:6 For example, the linear operation ”derivation” of a stationaryprocess is the limit of

x(t + h) − x(t)h

=∫ ∞

−∞

eihω − 1h

eiωt dZ(ω).

If x(t) satisfies the condition∫

ω2 dF (ω) < ∞ for quadratic mean differentia-bility, (eiωh − 1)/h → iω in H(F ) as h → 0, and hence

x′(t) =∫

iω eiωt dZ(ω) =∫

ω ei(ωt+π/2) dZ(ω).

The frequency function for derivation is therefore g(ω) = iω , and the spectraldensity of the derivative is fx′(ω) = ω2 fx(ω).

In general, writing y(t) =∫ |g(ω)| ei(ωt+arg g(ω)) dZ(ω), we see how the filter

amplifies the amplitude of dZ(ω) by a factor |g(ω)| and adds arg g(ω) to thephase. For the derivative, the phase increases by π/2, while the amplitudeincreases by a frequency dependent factor ω .

4.4.2.2 A practical rule

The spectral formulation of linear filters gives us an easy-to-use tool to findcovariance and cross-covariance functions between stationary processes. If theprocess {x(t), t ∈ R} is stationary with spectral distribution function Fx(ω),and {u(t), t ∈ R} and {v(t), t ∈ R} are generated from {x(t), t ∈ R} by linearfilters,

x(t) =∫

eiωt dZ(ω),

u(t) =∫

g(ω)eiωt dZ(ω),

v(t) =∫

h(ω)eiωt dZ(ω),

then, by (4.14),

Cov(x(s), u(t)) =∫

ω

∫μ

eiωse−iμt g(μ) E(dZ(ω) · dZ(μ)

)=∫

ei(s−t)ω g(ω) dFx(ω),

and similarly,

Cov(u(s), v(t)) =∫

g(ω)h(ω) ei(s−t)ω dFx(ω). (4.38)


4.4.2.3 Impulse response and frequency response

Suppose a linear filter is defined by its impulse response function h(t), as inSection 4.4.1,

y(t) =∫ ∞

−∞h(u)x(t − u) du =

∫ ∞

−∞h(t − u)x(u) du.

Inserting the spectral representation of x(t) and changing the order of integra-tion, we obtain a filter in frequency response form,

y(t) =∫ ∞

ω=−∞

{∫ ∞

u=−∞eiωu h(t − u) du

}dZ(ω) =

∫ ∞

−∞g(ω) eiωt dZ(ω),

with

g(ω) =∫ ∞

u=−∞e−iωu h(u) du, (4.39)

if∫ |h(u)| du < ∞ .Conversely, if h(u) is absolutely integrable,

∫ |h(u)| du < ∞ , then g(ω),defined by (4.39), is bounded and hence

∫ |g(ω)|2 dF (ω) < ∞ . Therefore

y(t) =∫

g(ω) eiωt dZ(ω)

defines a linear filter with frequency function g(ω) as in (4.36). Inserting theexpression for g(ω) and changing the order of integration we get the impulseresponse form,

y(t) =∫

eiωt

{∫e−iωu h(u) du

}dZ(ω)

=∫

h(u){∫

eiω(t−u) dZ(ω)}

du =∫

h(u)x(t − u) du.

The impulse response and frequency response function form a Fourier transformpair, and

h(u) =12π

∫ ∞

ω=−∞eiωug(ω) dω. (4.40)

If h(u) = 0 for u < 0 the filter is called causal or physically realizable,indicating that then y(t) =

∫∞u=0 h(u)x(t − u) du depends only on x(s) for

s ≤ t , i.e. the output from the filter at time t depends on the past and not onthe future.

4.4.2.4 Linear processes

A stationary sequence xt or a stationary process x(t) is called linear if it isthe output of a linear time invariant filter acting on a sequence of orthogonal


random variables, i.e.

xt =∞∑

k=−∞ht−k yk, (4.41)

x(t) =∫ ∞

u=−∞h(t − u) dY (u), (4.42)

where yk are uncorrelated with mean 0, and E(|yk|2) = 1, and {Y (t), t ∈ R}is a stationary process with orthogonal increments, E(dY (u) · dY (v)) is equalto 0 for u = v and equal to du for u = v . The term infinite moving average isalso used for processes of this type.

Theorem 4:8 a) A stationary sequence {xt; t ∈ Z} is an infinite moving av-erage

xt =∞∑

k=−∞ht−k yk,

with orthonormal yk and∑

k |hk|2 < ∞, if and only if its spectrum is absolutelycontinuous, F (ω) =

∫ ω−π f(x) dx.

b) A stationary process {x(t), t ∈ R} is an infinite continuous moving averagex(t) =

∫∞u=−∞ h(t − u) dY (u), with an orthogonal increment process Y (ω),

and∫u |h(u)|2 du < ∞, if and only if its spectrum is absolutely continuous,

F (ω) =∫ ω−∞ f(x) dx.

Proof: We show part a); part b) is quite similar. For the ”only if” part, usethat yk =

∫ π−π eiωk dZ(ω), where E(|dZ(ω)|2) = dω

2π . Then

xt =∑

k

ht−k

∫ π

−πeiωk dZ(ω) =

∫ π

−πeiωt

{∑k

ht−ke−iω(t−k)

}dZ(ω)

=∫ π

−πeiωt g(ω) dZ(ω),

with g(ω) =∑

k hke−iωk . Thus, the spectral distribution of xk has

dF (ω) = E(|g(ω) dZ(ω)|2) = |g(ω)|2 dω

2π,

with spectral density f(ω) = 12π |g(ω)|2 .

For the ”if” part,F (ω) =∫ ω−∞ f(x) dx , write f(ω) = 1

2π |g(ω)|2 , and expand|g(ω)| in a Fourier series,

|g(ω)| =∑

k

ck eiωk.


From the normalized spectral representation (4.22),

xt =∫ π

−πeiωt√

f(ω) dZ(ω), with E(|dZ(ω)|2) = dω,

we then get

xt =∫ π

−πeiωt

{1√2π

∑k

ckeiωk

}dZ(ω)

=∑

k

ck√2π

∫ π

−πeiω(t+k) dZ(ω) =

∑k

ck et+k =∑

k

ht−k ek,

with ek = 1√2π

∫ π−π eiωk dZ(ω), and hk = ck+t . Since the Z(ω) has constant

incremental variance, the ek -variables are uncorrelated and normalized as re-quired. �

4.4.3 Linear filters and differential equations

Linear filters expressed in terms of differential equations are common in theengineering sciences. The linear oscillator, also called the harmonic oscillator,is the basic element in mechanical systems which exhibit resonant periodicmovements. Its counterpart in electronic systems is the resonance circuit. Weshall describe both of these as examples of a general technique, common in thetheory of ordinary differential equations.

To illustrate the general ideas we start with the exponential smoothing filter,also called the RC-filter, with a term borrowed from electrical engineering.

4.4.3.1 The RC-filter and exponential smoothing

Consider the electrical circuit in Figure 4.1 with potential difference x(t) onthe left hand side and potential difference y(t) on the right hand side. Thecircuit consists of a resistance R and a capacitance C . Regarding x(t) as thedriving process and y(t) as the resulting process we will see that this deviceacts as a smoother that reduces rapid high frequency variations in x(t). Therelation between the input x(t) and the output y(t) is

RCy′(t) + y(t) = x(t), (4.43)

and the equations has the (deterministic) solution

y(t) =1

RC

∫ t

−∞e−(t−u)/(RC)x(u) du.

Thus, the impulse response of the RC-filter is

h(u) =1

RCe−u/(RC), for u > 0,


A2 B2

R

C

A1 B1

x(t)

�

�

�

�

y(t)

Figure 4.1: Input x(t) and output y(t) in an exponential smoother (RC-filter).

with frequency response

g(ω) =∫ ∞

0e−iωu 1

RCe−u/(RC) du =

11 + iωRC

.

Applying the relation (4.37) we get the spectral density relation between inputand output,

fy(ω) =fx(ω)

(ωRC)2 + 1. (4.44)

The depreciation of high frequencies in the spectrum explains the use of theRC-filter as a smoother.

To precede the general results for covariance function relations we also makethe following elementary observation about the covariance functions, where weuse the cross-covariances from Theorem 2:13:

rx(τ) = Cov(RCy′(t) + y(t), RCy′(t + τ) + y(t + τ))= (RC)2 ry′(τ) + RC ry,y′(t, t + τ) + RC ry′,y(t, t + τ) + ry(τ)= (RC)2 ry′(τ) + RC r′y(τ) + RC r′y(−τ) + ry(τ)

= (RC)2 ry′(τ) + ry(τ).

Using the spectral density ω2fy(ω) for {y′(t), t ∈ R}, according to Example 4:6,we find

rx(t) = (RC)2∫

eiωt ω2fy(ω) dω +∫

eiωt fy(ω) dω

=∫

eiωt{(ωRC)2 + 1}fy(ω) dω,

and get the spectral density for {x(t), t ∈ R},

fx(ω) = {(ωRC)2 + 1} fy(ω), (4.45)

in accordance with (4.44).As a final observation we note that the impulse response function satisfies

the differential equationRCh′(u) + h(u) = 0, (4.46)

for u > 0 with the initial condition h(0) = 1/(RC).


4.4.3.2 Linear stochastic differential equations

Suppose we have a stationary process {x(t), t ∈ R}, sufficiently differentiable,and assume that the process {y(t), t ∈ R} is a solution to an ordinary lineardifferential equation with constant coefficients,

p∑k=0

ap−ky(k)(t) = x(t). (4.47)

or, seemingly more generally,

p∑k=0

ap−ky(k)(t) =

q∑j=0

bq−jx(j)(t). (4.48)

By “solution” we mean either that (almost all) sample functions satisfy theequations or that there exists a process {y(t), t ∈ R} such that the two sides areequivalent. Note that (4.48) is only marginally more general than (4.47), sinceboth right hand sides are stationary processes without any further assumption.

What can then be said about the solution to these equations: when does itexist and when is it a stationary process; and in that case, what is its spectrumand covariance function?

For the linear differential equation (4.47),

a0y(p)(t) + a1y

(p−1)(t) + . . . + ap−1y′(t) + apy(t) = x(t). (4.49)

we define the generating function,

A(r) = a0 + a1r + . . . + aprp,

and the corresponding characteristic equation

rpA(r−1) = a0rp + a1r

p−1 + . . . + ap−1r + ap = 0. (4.50)

The existence of a stationary process solution depends on the solutions to thecharacteristic equation. The differential equation (4.49) is called stable if theroots of the characteristic equation all have negative real part.

One can work with (4.49) as a special case of a multivariate first orderdifferential equation. Dividing both sides by a0 the form is

y′ = Ay + x, (4.51)

with y(t) = (y(t), y′(t), · · · , y(p−1)(t))′ , x(t) = (0, 0, · · · , x(t))′ , and

A =

⎛⎜⎜⎜⎝0 1 0 . . . 00 0 1 · · · 0...

......

. . . 1−ap −ap−1 ap−2 · · · −a1

⎞⎟⎟⎟⎠ .

This is the formulation which is common in linear and non-linear systems the-ory; cf. for example [14, Ch. 8], to which we refer for part of the followingtheorem.


Theorem 4:9 a) If the differential equation (4.49) is stable, and the right handside {x(t), t ∈ R} is a stationary process, then there exists a stationary process{y(t), t ∈ R} that solves the equation. The solution can be written as the outputof a linear filter

y(t) =∫ t

−∞h(t − u)x(u) du, (4.52)

where the function h(u) solves the equation

a0h(p)(t) + a1h

(p−1)(t) + . . . + ap−1h′(t) + aph(t) = 0, (4.53)

with initial conditions h(0) = h′(0) = . . . = h(p−2)(0) = 0, h(p−1)(0) = 1/ap .Further

∫∞−∞ |h(u)| du < ∞.

b) If {x(t), t ∈ R} is a p times differentiable stationary process with spectraldensity fx(ω), then also

∑p0 ajx

(j)(t) is a stationary process, and it has thespectral density ∣∣∣∣∣

p∑0

aj(iω)j∣∣∣∣∣2

fx(ω). (4.54)

c) If {x(t), t ∈ R} and {y(t), t ∈ R} are two stationary processes that solvethe differential equation (4.48), then their spectral densities obey the relation∣∣∣∑ ak(iω)k

∣∣∣2 fy(ω) =∣∣∣∑ bj(iω)j

∣∣∣2 fx(ω). (4.55)

Proof: a) If {x(t), t ∈ R} has q times differentiable sample paths (withprobability one), we use a standard result in ordinary differential equations toget a solution for almost every sample path; see [14, Ch. 8].

If we work only with second order properties one can take (4.52) as the def-inition of a process {y(t), t ∈ R} and then show that it is p times differentiable(in quadratic mean) and that the two sides of (4.49) are equivalent.

Part b) and c) are easy consequences of the spectral process property (4.14).Just write the differential form in the right hand sides by means of the spectralrepresentation and perform the integration. �

4.4.3.3 The linear oscillator

The linear random oscillator is the basic ingredient in many engineering ap-plications of stationary processes. We will examine two formulations from me-chanical and electrical engineering, respectively.

Example 4:7 First consider a spring-and-damper system as in Figure 4.2, withmass m , stiffness k and damping coefficient c . When the mass is subject toa regular or irregular varying force x(t) if moves more or less periodically, andwe denote the displacement from the equilibrium by Y (t); see Figure 4.2.


c k

m⇑

x(t)�y(t)

Figure 4.2: A simple damped oscillator.

The relation between the force X(t) and the resulting displacement is de-scribed by the following differential equation,

my′′(t) + cy′(t) + ky(t) = x(t). (4.56)

Here ω0 =√

k/m is called the response frequency or eigenfrequency, andζ = c

2√

mkthe relative damping. Expressed in terms of the damping and eigen-

frequency the fundamental equation is

y′′(t) + 2ζω0y′(t) + ω2

0y(t) = m−1x(t).

This equation can be solved just like an ordinary differential equation with acontinuous x(t) and, from Theorem 4:9, it has the solution

y(t) =∫ t

−∞h(t − u)x(u) du,

expressed with the impulse response function

h(u) = m−1ω−10 e−αu sin(ωu), u ≥ 0,

with the constants

α = ζω0,

ω0 = ω0(1 − ζ2)1/2.

To find the frequency function g(ω) for the linear oscillator we considereach term on the left hand side in (4.56). Since differentiation has frequencyfunction iω , and hence, repeated differentiation has frequency function −ω2 ,we see that g(ω) satisfies the equation{−mω2 + ciω + k

} · g(ω) = 1,


and henceg(ω) =

1−mω2 + icω + k

. (4.57)

Since|g(ω)|2 =

1(k − mω2)2 + c2ω2

,

the spectral density for the output signal y(t) is

fx(ω) =fx(ω)

(k − mω2)2 + c2ω2=

fx(ω)/m2

(ω20 − ω2)2 + 4α2ω2

. (4.58)

Example 4:8 A resonance circuit with one inductance, one resistance, and onecapacitance in series is an electronic counterpart to the harmonic mechanicaloscillator; see Figure 4.3.

C

A2 B2

L

R

A1 B1

�

�

x(t)

�

�

y(t)

Figure 4.3: A resonance circuit.

If the input potential between A1 and A2 is x(t), the current I(t) throughthe circuit obeys the equation

LI ′(t) + RI(t) +1C

∫ t

−∞I(s) ds = x(t).

The output potential between B1 and B2 , which is y(t) = RI(t), thereforefollows the same equation (4.56) as the linear mechanical oscillator,

Ly′′(t) + Ry′(t) +1C

y(t) = Rx′(t), (4.59)

but this time with x′(t) as driving force. The frequency function for the filterbetween x(t) and y(t) is (cf. (4.57)),

g(ω) =iω

−(L/R)ω2 + iω + 1/(RC).

The response frequency ω0 = 1/√

LC is here called the resonance frequency.The relative damping ζ corresponds to the relative bandwidth 1/Q = 2ζ , where

1/Q = Δω/ω0 = R√

C/L,

and Δω = ω2 − ω1 is such that |g(ω1)| = |g(ω2)| = |g(ω0)|/√

2.


Example 4:9 As a final example we consider

y′′(t) + 2y′(t) + y(t) = x(t),

with the stationary process {x(t), t ∈ R} as input. The frequency function forthe filter in (4.52) is the solution to h′′(u) + 2h′(u) + h(u) = 0, and is of theform

h(u) = e−u(C1 + C2u),

where the boundary conditions give C1 = 0, C2 = 1. The solution

y(t) =∫ t

−∞(t − u)e−(t−u)x(t) du,

has the spectral density

fy(ω) =fx(ω)

|1 + 2(iω) + (iω)2|2 =fx(ω)

(1 + ω2)2.

4.4.4 White noise in linear systems

4.4.4.1 White noise in a linear differential equation

The Wiener process was constructed as a mathematical model for the Brownianmotion of particles suspended in a viscous fluid in which the erratic particlemovements are the results of bombardment by the fluid molecules. The Wienerprocess model requires that the fluid has zero viscosity and infinite mass.

A more realistic model gives room also for the viscosity and particle mass.If x(t) denotes the force acting on the particle and y(t) is the velocity, we getthe Ornstein-Uhlenbeck differential equation (4.32) from Example 4:3,

a0y′(t) + a1y(t) = x(t), (4.60)

where a1 depends on the viscosity and a0 is the particle mass. If the force x(t)is caused by collisions from independent molecules it is reasonable that differentx(t) be independent. Adding the assumption that they are Gaussian leads usto take x(t) = σ w′(t) as the “derivative of a Wiener process”, i.e. Gaussianwhite noise,

a0y′(t) + a1y(t) = σ w′(t).

This equation can be solved as an ordinary differential equation by

y(t) =1a0

∫ t

−∞e−α(t−u)w′(u) du =

1a0

∫ t

−∞e−α(t−u) dw(u). (4.61)

Here the last integral is well defined, Example 2:7 on page 51, even if thedifferential equation we started out from is not.


By carrying out the integration it is easy to see that the process y(t) definedby (4.61) satisfies

a1

∫ t

u=t0

y(u) du = a0(y(t) − y(t0)) + σ(w(t) − w(t0)),

which means that, instead of equation (4.60), we could have used the integralequation

a0(y(t) − y(t0)) + a1

∫ t

u=t0

y(u) du = σ (w(t) − w(t0)), (4.62)

to describe the increments of y(t).The general differential equation

a0y(p)(t) + a1y

(p−1)(t) + . . . + ap−1y′(t) + apy(t) = σw′(t)

can be solved in a similar way, and expressed as a stochastic integral,

y(t) = σ

∫ t

−∞h(t − u) dw(u), (4.63)

The impulse response function h(u) is the solution to

a0h(p)(t) + a1h

(p−1)(t) + . . . + ap−1h′(t) + aph(t) = 0,

as in Theorem 4:9. The formal differential equation can be replaced by the welldefined differential-integral equation

a0(y(p−1)(t) − y(p−1)(t0)) + a1(y(p−2)(t) − y(p−2)(t0))+

+ . . . + ap−1(y(t) − y(t0)) + ap

∫ t

t0

y(u) du

= σ (w(t) − w(t0)).

Stochastic differential equations involving Gaussian white noise are oftenwritten as

a0dy(t) dt + a1y(t) dt = σ dw(t),

or more generally as

dy(t) = a(t)y(t) dt + σ(t) dw(t),

with variable deterministic coefficients.In the most general form, with random coefficients,

dy(t) = a(y(t), t) y(t) dt + σ(y(t), t) dw(t),

a complete new theory is needed, namely stochastic calculus; the reader is re-ferred to [39] for a good introduction.


4.4.4.2 White noise and constant spectral density

A linear systems equation defines the relation between an input signal x(t)and a response process y(t). The linear system acts as a frequency dependentamplifier and phase modifier on the input. Of special importance is the casewhen the input is white noise. This idealized type of process is strictly definedonly in the context of linear systems. The characteristic feature of white noise,it may be denoted n(t) or, if it is Gaussian, w′(t), is that all frequencies arerepresented in equal amount, i.e. it has constant spectral density

fn(ω) =σ2

2π, −∞ < ω < ∞.

Strictly, this is not a proper spectral density of a stationary process since ithas infinite integral, but used as input in a linear system with a impulse re-sponse function that satisfies

∫ |h(u)|2 du < ∞ , it produces a stationary outputprocess.

Theorem 4:10 a) The stationary process {x(t), t ∈ R}, defined as a stochasticintegral from a standard Wiener process {w(t), t ∈ R} by

x(t) =∫ ∞

−∞h(t − u) dw(u),

has covariance function

rx(t) =∫ ∞

−∞h(t − u)h(−u) du

and spectral density

fx(ω) =|g(ω)|2

2π,

where g(ω) =∫∞−∞ e−iωuh(u) du is the frequency response function correspond-

ing to the impulse response h(u).b) If x(t) =

∫ t−∞ h(t − u) dw(u) is a solution to a stable stochastic differential

equation

a0y(p)(t) + a1y

(p−1)(t) + . . . + ap−1y′(t) + apy(t) = σw′(t), (4.64)

then its spectral density is

fx(ω) =σ2

2π· 1∣∣∑p

k=0 ak(iω)p−k∣∣2 .

Proof: a) The covariance function is a direct consequence of Theorem 2:16.Now the right hand side of the integral expression for rx(t) is the convolution ofh(u) with h(−v) and their Fourier transforms are g(ω) and g(ω) , respectively.


Since convolution corresponds to multiplication of the Fourier transforms, thespectral density of rx(t) is, as stated,

fx(ω) = g(ω)g(ω)/(2π) = |g(ω)|2/(2π).

b) The relation between the impulse response and the frequency response func-tion g(ω) is a property of the systems equation (4.64) and does not depend onany stochastic property. One can therefore use the established relation

g(ω) =1∑p

k=0 ak(iω)p−k

to get the result. �

Part (b) of the theorem finally confirms our claim that the Gaussian whitenoise σw′(t) can be treated as if it has constant spectral density σ2/(2π).

A stationary process with spectral density of the form C|P (iω)|2 where P (ω)

is a complex polynomial can be generated as the output from a stable linearsystem with white noise input; this is a very convenient way to produce astationary process with suitable spectral properties.

Example 4:10 The linear oscillator

y′′(t) + 2ζω0y′(t) + ω2

0y(t) = n(t)

where the white noise input n(t) has constant spectral density, fn(ω) = σ2

2π ,has spectral density (cf. (4.58)),

fy(ω) =σ2

2π· 1(ω2

0 − ω2)2 + 4α2ω2.

The covariance function is found, for example by residue calculus, from ry(t) =∫eiωtfy(ω) dω . With α = ζω0 and ω0 = ω0

√1 − ζ2 one gets the covariance

function

ry(t) =σ2

4αω20

e−α|t|(

cos ω0t +α

ω0sin ω0|t|

). (4.65)

4.4.5 The Hilbert transform and the envelope

4.4.5.1 The Hilbert transform

By the spectral representation we have expressed a stationary process x(t) interms of random cosine and sine functions with positive and negative frequen-cies. The complex form of the spectral representation,

x(t) =∫ ∞

−∞eiωt dZ(ω),


yielded a real process by the requirement dZ(−ω) = dZ(ω). In fact, x(t) isthen expressed as the sum of two complex processes, of which one is the complexconjugate of the other.

If we take only the half spectral representation,

x∗(t) = 2∫ ∞

0+eiωt dZ(ω) + ΔZ(0),

where ΔZ(0) is the jump of Z(ω) at the origin, we obtain a particularly usefullinear transform of x(t). One can obtain x∗(t) as the limit, as h ↓ 0 throughcontinuity points of F (ω), of the linear operation with frequency function

gh(ω) =

⎧⎪⎨⎪⎩0, ω < −h,

1, |ω| ≤ h,

2, ω > h.

The process x(t), defined by

x∗(t) = x(t) + ix(t),

is the result of a linear filter on x(t) with frequency function

g(ω) =

⎧⎪⎨⎪⎩i, ω < 0,

0, ω = 0,

−i, ω > 0.

It is called the Hilbert transform of x(t).3

When x(t) is real, with dZ(−ω) = dZ(ω), with the real spectral represen-tation (4.23),

x(t) =∫ ∞

0+cos ωt du(ω) +

∫ ∞

0sin ωt dv(ω) + du(0),

it follows that also x(t) = i(x(t) − x∗(t)) is real, and that it is given by

x(t) =∫ ∞

0sinωt du(ω) −

∫ ∞

0+cos ωt dv(ω). (4.66)

Thus, x∗(t) is a complex process with x(t) as real part and x(t) as imaginarypart. All involved processes can be generated by the same real spectral processes{u(λ), 0 ≤ λ < ∞} and {v(λ), 0 ≤ λ < ∞}.

Theorem 4:11 Let {x(t), t ∈ R} be stationary and real, with mean 0, co-variance function r(t) and spectral distribution function F (ω), with a possiblejump ΔF (0) at ω = 0. Denote the Hilbert transform of x(t) by x(t). Then,with G(ω) denoting the one-sided spectrum,

3Matlab, Signal processing toolbox, contains a routine for making Hilbert transforms.

Try it!


a) {x(t), t ∈ R} is stationary and real, with mean 0, and covariance function

r(t) = r(t) − ΔF (0) =∫ ∞

−∞eiωt dF (ω) − ΔF (0) =

∫ ∞

0+cos ωt dG(ω).

b) The process x(t) has the same spectrum F (ω) as x(t), except that anyjump at ω = 0 has been removed.

c) The cross-covariance function between x(t)and x(t) is

r∗(t) = E(x(s) · x(s + t)) =∫ ∞

0sin ωt dG(ω).

In particular, x(t) and x(t) are uncorrelated, taken at the same timeinstant.

Proof: Part (a) and (c) follow from (4.23), (4.66), and the correlation prop-erties (4.24-4.26) of the real spectral processes. Then part (b) is immediate.

�

4.4.5.2 The envelope

Assume that F (ω) is continuous at ω = 0 so there is no ”constant” randomcomponent in x(t), and consider the joint behavior of x(t) and its Hilberttransform x(t). Also assume x(t), and hence x(t), to be Gaussian processes,with common covariance function r(t), and consider the complex process

x∗(t) = x(t) + ix(t) = 2∫ ∞

0+eiωt dZ(ω).

The envelope R(t) of x(t) is the absolute value of x∗(t),

R(t) =√

x(t)2 + x(t)2.

In particular, |x(t)| ≤ R(t), with equality when x(t) = 0.Since, for Gaussian processes, x(t) and x(t) are independent with the same

Gaussian distribution, the envelope has a Rayleigh distribution with density

fR(r) =r

σ2e−r2/2σ2

, r ≥ 0.

The envelope, as defined here, does always exist, and it always has thestated statistical properties. The physical meaning of an envelope is howevernot clear from the mathematical definition, and we must turn to special typesof processes before we can identify any particularly interesting properties ofR(t).


4.4.5.3 The envelope of a narrow band process

For processes with spectrum concentrated to a narrow frequency band, the sam-ple function have a characteristic ”fading” look, as a wave with one dominatingfrequency. Then the envelope represents the slowly varying random amplitude.

We consider a stationary process x(t) with spectral density f(ω) concen-trated around some frequency ω0 ,

f(ω) =12f0(ω − ω0) +

12f0(ω + ω0),

for some function f0(ω) such that f0(ω) = f0(−ω), and f0(ω) = 0 for |ω| > d ,some d < ω0 . Express x(t) in spectral form, using the normalized form (4.22),to obtain

x(t) =∫ ∞

−∞eiωt√

f(ω) dWC(ω)

=∫ ω0+d

ω0−d

√f0(ω − ω0)/2 eiωt dWC(ω)

+∫ −ω0+d

−ω0−d

√f0(ω + ω0)/2 eiωt dWC(ω)

= I1(t) + I2(t), say.

By change of variables in I1(t) and I2(t) this gives,

I1(t) = eiω0t

∫ d

−d

√f0(ω)/2 eiωt dωWC(ω + ω0) = eiω0tY (t),

I2(t) = e−iω0t

∫ d

−d

√f0(ω)/2 eiωt dωWC(ω − ω0)

= e−iω0t

∫ d

−d

√f0(ω)/2 e−iωt dωWC(−ω − ω0) = I1(t),

and, in combination,x(t) = 2ReY (t) eiω0t.

Here, Y (t) is a complex process

Y (t) = Y1(t) + iY2(t) =∫ d

−d

√f0(ω)/2 eiωt dωWC(ω + ω0)

with only low frequencies. With R(t) = 2|Y (t)| and Θ(t) = arg Y (t), we obtain

x(t) = R(t)Re ei(ω0t+Θ(t)) = R(t) cos(ω0t + Θ(t)).

The envelope R(t) has here a real physical meaning as the slowly varying am-plitude of the narrow band process.


0 10 20 30 40 50−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5A Pierson−Moskowitz wave, with Hilbert transform and envelope

0 50 100 150 200

−3

−2

−1

0

1

2

3

A narrowband wave, with Hilbert transform and envelope

Figure 4.4: Gaussian processes and their Hilbert transforms and envelopes.Left: Pierson-Moskowitz waves, Right: process with triangular spectrumover (0.8, 1.2).

4.4.6 The sampling theorem

The spectral form expresses a stationary process as an integral, in quadraticmean, of elementary cosine functions with random amplitude and phase. If allthese amplitudes and phases are known, the process can be reconstructed. Forband limited processes, this is particularly simple. A process is band limited tofrequency ω0 if

F (−ω0+) − F (−∞) = F (∞) − F (ω0−) = 0,

i.e. its spectrum is restricted to the interval [−ω0, ω0] . We require that thereis no spectral mass at the points ±ω0 .

Theorem 4:12 If the stationary process {x(t), t ∈ R} is band limited to ω0 ,then it is perfectly specified by its values at discrete time points spaced t0 = π/ω0

apart. More specifically, with probability one,

x(t) =∞∑

k=−∞x(α + kt0) · sin ω0(t − α − kt0)

ω0(t − α − kt0), (4.67)

where α is an arbitrary constant.

Proof: The spectral representation says that

x(t) =∫ ω0−

−ω0+eiωt dZ(ω).

For a fixed t , the function gt(ω) = eiωt · I[−ω0,ω0] is square integrable over(−ω0, ω0), and from the theory of Fourier series, it can be expanded as

eiωt = limN→∞

N∑k=−N

eiωkt0 · sinω0(t − kt0)ω0(t − kt0)

,


with convergence in H(F ), i.e.

∫ ω0

−ω0

∣∣∣∣∣eiωt −N∑

k=−N

eiωkt0 · sin ω0(t − kt0)ω0(t − kt0)

∣∣∣∣∣2

dF (ω) → 0.

The convergence is also uniform for |ω| < ω0 = π/t0 . For ω = ±ω0 it convergesto

eiω0t + e−iω0t

2= cos ω0t.

Therefore, if dF (±ω0) = 0, then

N∑k=−N


→ eiωt, (4.68)

in H(F ) as N → ∞ .Inserting this expansion in the spectral representation of {x(t), t ∈ R} we

obtain,

E

⎛⎝∣∣∣∣∣x(t) −N∑

k=−N

x(kt0) · sinω0(t − kt0)ω0(t − kt0)

∣∣∣∣∣2⎞⎠ (4.69)

= E

⎛⎝∣∣∣∣∣∫ ω0

−ω0

(eiωt −

N∑k=−N


)dZ(ω)

∣∣∣∣∣2⎞⎠

≤∫ ω0

−ω0

∣∣∣∣∣eiωt −N∑

k=−N


∣∣∣∣∣2

dF (ω).

By (4.68), this goes to 0 as N → ∞ , i.e.

x(t) = limN→∞

N∑k=−N

x(kt0) · sin ω0(t − kt0)ω0(t − kt0)

,

in H(x), which is the statement of the theorem for α = 0. For arbitrary α ,apply the just proved result to y(t) = x(t + α). �

Remark 4:2 If there is spectral mass F+ = dF (ω0), F− = dF (ω0), at theendpoints ±ω0 , then (4.69) would tend to

sin2 ω0t(F+ + F−),

failing the sampling representation.

Section 4.5 Karhunen-Loeve expansion 121

An example of this is the simple random cosine process, x(t) = cos(ω0t+φ),which has covariance function r(t) = 1

2 cos ω0t, and spectrum concentrated at±ω0 . Then

x(α + kt0) = (−1)kx(α),

which means that for every t, the sum

∑k

x(α + kt0) · sinω0(t − α − kt0)ω0(t − α − kt0)

is proportional to x(α). On the other hand x(α + t0/2) is uncorrelated withx(α) and cannot be represented by the sampling theorem.

4.5 Karhunen-Loeve expansion

4.5.1 Principal components

In a multivariate distribution of a random vector the components may be moreor less statistically dependent. If there is strong dependence between the com-ponents it is possible that it suffices to specify a few (random) values in orderto specify almost the entire outcome of the full random vector. The formal toolto generate such a common behavior is the concept of principal components,which is a tool to approximately reduce the dimensionality of a variation space.

Let x = (x1, . . . , xn)′ be a vector of n random variables with mean zeroand a covariance matrix Σ , which is symmetric and non-negative definite byconstruction. The covariance matrix has n eigenvalues ωk with correspondingorthonormal eigenvectors pk , decreasingly ordered as ω1 ≥ ω2 ≥ . . . ≥ ωn ,such that

Σpk = ωkpk.

The transformation

zk =1√ωk

p′k x

gives us n new standardized random variables

V (zk) =p′

kΣpk

ωk=

p′k pk ωk

ωk= 1.

Furthermore, they are uncorrelated, for j = k ,

Cov(zj .zk) = E(zjzk) =1√

ωj ωkE(p′

j xx′ pk

)=

1√ωj ωk

p′j Σpk = 0,

since the eigenvectors pj and pk are orthogonal.


The random variables yk =√

ωkzk , k = 1, . . . , n , are called the principalcomponents of the vector x . In matrix language, with P = (p1, . . . ,pn) as thematrix with the eigenvectors as columns,

y = Px with inverse x = P′ y ,

is a vector of uncorrelated variables with decreasing variances ωk .Since the matrix P is orthogonal, P−1 = P′ , the original x-variables can

be expressed as a linear combination of the uncorrelated variables yk and zk ,

xk = p′k y =

n∑j=1

√ωjpkzj. (4.70)

In practice, when one wants to simulate a large vector of many correlatedGaussian variables, one can use (4.70) to generate successively the most im-portant variational modes and truncate the sum when, for example it describes99% of the variation of the x-variables.

4.5.2 Expansion of a stationary process along eigenfunctions

We shall now generalize the finite-dimensional formulation in the previous sec-tion to continuous parameter stochastic processes. By the spectral representa-tion, x(t) =

∫eiωt dZ(ω), every stationary process can be expressed by means

of uncountably many orthogonal variables dZ(ω). For processes with discretespectrum with jumps at ±ωk , Z(ω) has countable many jumps ΔZ(ωk) andx(t) =

∑eiωktΔkZ .

In fact, every quadratic mean continuous process, stationary or not, canbe expressed on a finite interval [a, b] as a sum of deterministic functions withrandom orthogonal coefficients,

x(t) = limn→∞

n∑0

ck(t)zk.

The convergence is uniform for a ≤ t ≤ b , in the sense that

E

⎛⎝∣∣∣∣∣x(t) −n∑0

ck(t)zk

∣∣∣∣∣2⎞⎠→ 0,

uniformly in t , as n → ∞ .The functions ck(t) depend on the choice of observation interval [a, b] , and

the random variables zk are elements in the Hilbert space spanned by x(t); t ∈[a, b] ,

zk ∈ H(x(t); t ∈ [a, b]).

Let us first investigate what properties such an expansion should have, if itexists. Write H(x) instead of H(x(t); t ∈ [a, b]). Suppose there exists zk with

‖zk‖2H(x) = 1, (zj , zk) = E(zjzk) = 0, j = k,


and assume that the family {zk} is complete for H(x), i.e. the zk form a basisfor H(x). In particular this means that for every U ∈ H(x),

U ⊥ zk,∀k ⇒ U = 0.

Now take any y ∈ H(x), and define ck = (y, zk). Then, by the orthogonal-ity,

E

⎛⎝∣∣∣∣∣y −n∑0

ckzk

∣∣∣∣∣2⎞⎠ = . . . = E(|y|2) −

n∑0

|ck|2,

so∑n

0 |ck|2 ≤ ‖y‖2H(x) for all n , and hence

∑∞0 |ck|2 ≤ ‖y‖2

H(x) . This meansthat

∑∞0 ckzk exists as a limit in quadratic mean, and also that

y −∞∑0

ckzk ⊥ zn

for all n . But since {zk} is a complete family, y =∑∞

0 ckzk =∑∞

0 (y, zk)zk .Now replace y by a fixed time observation of x(t). Then, naturally, the ck

will depend on the time t and are functions ck(t), x(t) =∑

ck(t)zk . For thecovariance function of x(t) =

∑k ck(t)zk we have, by the orthogonality of zk ,

r(s, t) = E(x(s)x(t)) =∑j,k

cj(s) ck(t) E(zjzk) =∑

k

ck(s)ck(t).

Thus, we shall investigate the existence and properties of the following pairof expansions,

x(t) =∑

k

ck(t)zk, (4.71)

r(s, t) =∑

k

ck(s)ck(t). (4.72)

Not only can the zk be taken as uncorrelated but also the functions ck(t)can be chosen as orthogonal, i.e.∫ b

acj(t) ck(t) dt = 0, j = k

∫ b

a|ck(t)|2 dt = ωk ≥ 0.

As a final check on the consequences of an expansion (4.72) and the ortho-


gonality of the functions ck(t), we observe the following arguments4,∫ b

ar(s, t)cj(t) dt =

∫ b

a

{ ∞∑0

ck(s) ck(t)

}cj(t) dt

=∞∑0

ck(s)∫ b

acj(t) ck(t) dt =

∞∑0

ck(s)ωkδj−k = ωj cj(s).

Thus, the functions cj(t) are eigenfunctions with eigenvalues ωj to thecovariance operator r(s, t),

c(·) �→∫ b

ar(·, t) c(t) dt.

Call the normalized eigenfunctions φj(t) = 1√ωj

cj(t) if ωj > 0, making φk(t) afamily of orthonormal eigenfunctions.

4.5.3 The Karhunen-Loeve theorem

Theorem 4:13 Let {x(t); a ≤ t ≤ b} be continuous in quadratic mean withmean zero and covariance function r(s, t) = E(x(s)x(t)). Then there existorthonormal eigenfunctions φk(t), k = 0, 1, . . . , N ≤ ∞, for a ≤ t ≤ b, witheigenvalues ωk ≥ 0, to the equation∫ b

ar(s, t)φ(t) dt = ω φ(s),

such that the random variables

zk =1√ωk

∫ b

aφk(t)x(t) dt

are uncorrelated, and can represent x(t)as

x(t) =∞∑0

√ωk φk(t) zk. (4.73)

The sum is a limit in quadratic mean and ‖x(t) −∑n0

√ωk φk(t) zk‖2

H(x) → 0uniformly for t ∈ [a, b].

The variables zk are sometimes called observables and they can be usedfor example to make statistical inference about the distribution of the processx(t). Note, that if x(t) is a normal process, then the zk are uncorrelated normalvariables, and hence independent, making inference simple.

Before we sketch the proof of the theorem, we present an explicit construc-tion of the Wiener process as an example of the Karhunen-Loeve theorem.

4The termwise integration is allowed, since sups,t∈[a,b] |P∞

0 ck(s)ck(t)| ≤ supt r(t, t) < ∞ .

Prove this as an exercise.


Example 4:11 The standard Wiener process w(t), observed over [0, T ] hascovariance function r(s, t) = min(s, t), and eigenfunctions can be found explic-itly: from ∫ T

0min(s, t)φ(t) dt = ωφ(s),

it follows by twice differentiation,∫ s

0tφ(t) dt +

∫ T

ssφ(t) dt = ωφ(s) (4.74)

sφ(s) − sφ(s) +∫ T

sφ(t) dt = ωφ′(s) (4.75)

−φ(s) = ωφ′′(s). (4.76)

The initial conditions, φ(0) = 0, φ′(T ) = 0, obtained from (4.74) and (4.75),imply the solution φ(t) = A sin t√

ω, with cos T√

ω= 0. Thus, the positive

eigenvalues ωk satisfy

T√ωk

=π

2+ kπ, k = 0, 1, 2, . . . .

The normalized eigenfunctions are

φk(t) =

√2T

sin(k + 1

2)π t

T,

with eigenvalues

ωk = T 2 1π2(k + 1

2 )2.

With

zk =1√ωk

∫ T

0φk(t)w(t) dt =

π(k + 12)

T

∫ T

0

√2T

· sin tπ(k + 12 )

T· w(t) dt,

we have that the Wiener process can be defined as the infinite (uniformly con-vergent in quadratic mean) sum

w(t) =

√2T

∞∑k=0

sin πt(k+ 12)

T

π(k+ 12)

T

zk,

with independent standard normal variables zk .The reader should simulate a Wiener process w(t), find the variables zk

for k = 0, 1, . . . , n < ∞ by numerical integration, and reproduce w(t) as atruncated sum.


Proof: (of Theorem 4:13) We only indicate the steps in the proof, followingthe outline in [35]. One has to show: the mathematical facts about existenceand properties of eigenvalues and eigenfunctions, the convergence of the series(4.73), and finally the stochastic properties of the variables zk . This is done ina series of steps.

(i) If∫ ba r(s, t)φ(t) dt = ωφ(s), then ω is real and non-negative. This follows

from

0 ≤∫ b

a

∫ b

ar(s, t)φ(s) φ(t) ds dt = ω

∫ b

a|φ(s)|2 ds.

(ii) There is at least one non-zero eigenvalue. The largest eigenvalue is

ω0 = maxφ;‖φ‖=1

∫ b

a

∫ b

ar(s, t)φ(s) φ(t) ds dt,

where the maximum is taken over ‖φ‖2 =∫ ba |φ(t)|2 dt = 1. As stated in [35],

”this is not easily proved”. The corresponding eigenfunction is denoted byφ0(t), and it is continuous.

(iii) The function r1(s, t) = r(s, t) − ω0φ0(s)φ0(t) is a continuous covariancefunction, namely for the process

x1(t) = x(t) − φ0(t)∫ b

aφ0(s)x(s) ds,

and it holds that ∫ b

ar1(s, t)φ0(t) dt = 0. (4.77)

Repeating step (ii) with r1(s, t) instead of r(s, t) we get a new eigenvalueω1 ≥ 0 with eigenfunction φ1(t). Since

∫ ba r1(s, t)φ1(t) dt = ω1φ1(s), we have∫ b

aφ1(s)φ0(t) dt =

1ω1

∫ b

aφ0(s)

{∫ b

ar1(s, t)φ1(t) dt

}ds

=1ω1

∫ b

aφ1(t)

{∫ b

ar1(s, t)φ0(s) ds

}dt = 0,

according to (4.77) since r1(s, t) is real. Thus φ0 and φ1 are orthogonal.It also follows that φ1 is an eigenfunction to r(s, t),∫ b

ar(s, t)φ(t) dt =

∫ b

ar1(s, t)φ1(t) dt + ω0φ0(s)

∫ b

aφ0(t)φ1(t) dt

= ω1φ1(s) + 0.

(iv) Repeat (ii) and (iii) as long as there is anything remaining of


rn(s, t) = r(s, t) −n∑

k=0

ωkφk(s)φk(t).

Then, either there is a finite n such that rn ≡ 0, or there is an infinite decreasingsequence of positive eigenvalues ωk ↓ 0 with

∑k ωk < ∞ . Show as an exercise

that∑

k ωk ≤ ∫ ba r(s, s) dt .

(v) If there is an infinite number of eigenvalues, then

supa≤s,t≤b

∣∣∣∣∣r(s, t) −n∑

k=0

ωk φk(s)φk(t)

∣∣∣∣∣→ 0

as n → ∞ , i.e.

r(s, t) =∞∑

k=0

ωk φk(s)φk(t),

with uniform convergence. (This is Mercer’s theorem from 1909.)

(vi) For the representation (4.73) we have

E

∣∣∣∣∣x(t) −n∑0

√ωkφk(t) zk

∣∣∣∣∣2

= E(|x(t)|2) −n∑0

ωk |φk(t)|2

= r(t, t) −n∑0

ωk |φk(t)|2 → 0,

uniformly in a ≤ t ≤ b , according to (v), as n → ∞ .

(vii) The properties of zk follow from the orthogonality of the eigenfunctions.�

Example 4:12 As a classical engineering problem about signal detection weshall illustrate the use of the Karhunen-Loeve expansion we shall show how onecan make hypothesis testing on the mean value function for a Gaussian processx(t) with known covariance function r(s, t) = Cov(x(s), x(t)) but unknownmean value function m(t) = E(x(t)). Following Grenander, [15], suppose wehave observed x(t)for a ≤ t ≤ b , and that we have two alternative hypothesesabout m(t),

H0 : m(t) = m0(t),

H1 : m(t) = m1(t)

wanting to ascertain which one is most likely true.We calculate the independent N(0, 1)-variables zk ,

x(t) = m(t) +∞∑0

√ωk φk(t) zk = m(t) + x(t),


where zk = 1√ωk

∫ ba φk(t) x(t) dt . The zk are not observable, since they require

x(t) = x(t) − m(t), and m(t) is unknown, but we can introduce observableindependent Gaussian variables yk by a similar procedure working on x(t),

yk =∫ b

aφk(t)x(t) dt =

∫ b

aφk(t)m(t) dt + zk

√ωk.

Writing

ak =∫ b

aφk(t) m(t) dt

aik =∫ b

aφk(t) mi(t) dt, i = 0, 1,

we have that yk ∈ N(ak,√

ωk) and the hypotheses are transformed into hy-potheses about ak :

H0 : ak = a0k, k = 0, 1, . . .

H1 : ak = a1k, k = 0, 1, . . .

Testing hypotheses about an infinite number of independent Gaussian vari-ables in no more difficult than for finitely many. The likelihood ratio (LR) testcan be used in any case. Let the parameters be T0 = (a00, a01, . . . , a0n, . . .)and T1 = (a10, a11, . . . , a1n, . . .), respectively. The LR-test based on y0, . . . , yn

rejects H0 if the likelihood ratio pn(y) = f1(y)/f0(y) > c , where the constantc determines the significance level. In this case

pn(y) =fy0,...,yn(y0, . . . , yn;T1)fy0,...,yn(y0, . . . , yn;T0)

=

∏nk=0

1√2πωk

e−(yk−a1k)2/2ωk∏nk=0

1√2πωk

e−(yk−a0k)2/2ωk

= exp

{−

n∑k=0

yk(a0k − a1k)ωk

+12

n∑k=0

a20k − a2

1k

ωk

}= exp

{−

n∑k=0

Uk

},

say. The LR-test thus rejects H0 if∑n

0 Uk < cα .We now let n → ∞ , and examines the limit of the test quantity. If

∞∑0

a20k

ωk< ∞ and

∞∑0

a21k

ωk< ∞,

then the sums converge; in particular∑∞

0 Uk converges in quadratic mean toa normal random variable.

Since

E(Uk) =

⎧⎨⎩+12

(a1k−a0k)2

ωkif H0 is true

−12

(a1k−a0k)2

ωkif H1 is true


and

V (Uk) =(a1k − a0k)2

ωk

under both H0 and H1 , we have that∑∞

0 Uk is normal with mean

E

( ∞∑0

Uk

)=

⎧⎨⎩m0 =∑∞

012

(a1k−a0k)2

ωkif H0 is true

m1 = −m0 = −∑∞0

12

(a1k−a0k)2

ωkif H1 is true

and with variance

V (∞∑0

Uk) = 2(m0 − m1) = 4m0.

Thus, H0 is rejected if∑∞

0 Uk < m0−ωα2√

m0 , where ωα is the upper normalα-quantile.

If∑∞

0(a0k−a1k)2

ω2k

< ∞ , the test can be expressed in a simple way by theobservation that

∞∑0

Uk =∫ b

af(t)

(x(t) − m0(t) + m1(t)

2

)dt,

where

f(t) =∞∑0

a0k − a1k

ωkφk(t).


Exercises

4:1. Let x(t) be a stationary Gaussian process with E(x(t)) = 0, covariancefunction rx(t) and spectral density fx(ω). Calculate the covariance func-tion for the process

y(t) = x2(t) − rx(0),

and show that it has the spectral density

fy(ω) = 2∫ ∞

−∞fx(μ)fx(ω − μ) dμ.

4:2. Derive the spectral density for u(t) = 2x(t)x′(t) if x(t) is a differentiablestationary Gaussian process with spectral density fx(ω).

4:3. Let et, t = 0,±1,±2, . . . be independent N(0, 1)-variables and define thestationary processes

xt = θxt−1 + et =t∑

n=−∞θt−n en,

yt = et + ψet−1,

with |θ| < 1. Find the expressions for the spectral processes Zx(ω)and Zy(ω) in terms of the spectral process Ze(ω), and derive the crossspectrum between xt and yt . (Perhaps you should read Chapter 6 first.)

4:4. Let un and vn be two sequences of independent, identically distributedvariables with zero mean and let the stationary sequences xn and yn bedefined by

yn = a1 + b1xn−1 + un

xn = a2 − b2yn + vn.

Express the spectral processes dZx and dZy as functions of un and vn ,and derive the spectral densities for xn and yn and their cross spectrum.

4:5. Use the limit

limT→∞

1π

∫ T

−T

sin ωt

tdt =

⎧⎨⎩1 for ω > 0,0 for ω = 0,−1 for ω < 0.

for the following alternative derivation of the spectral representation of astationary process x(t) with spectral distribution function F (ω).

a) First show that the following integral and limit exists in quadraticmean:

Z(ω) = limT→∞

12π

∫ T

−T

e−iωt − 1−it

x(t) dt.


b) Then show that the process Z(ω), −∞ < ω < ∞ , has orthogonalincrements and that

E|Z(ω2) − Z(ω1)|2 = F (ω2) − F (ω1)

for ω1 < ω2 .

c) Finally, show that the integral∫ ∞

−∞eiωt dZ(ω) = lim

∑eiωkt(Z(ωk+1) − Z(ωk))

exists, and that E|x(t) − ∫ eiωt dZ(ω)|2 = 0.

4:6. Complete the proof of Lemma 4.1 on page 102.

4:7. Consider the covariance function

ry(t) =σ2

4αω20

e−α|t|(

cos ω0t +α

ω0sin ω0|t|

),

of the linear oscillator in Example 4:10 on page 115.

The covariance function contains some |t| ; show that the covariance func-tions fulfil a condition for sample function differentiability, but not fortwice differentiability.

Find the relation between the relative damping zeta and the spectralwidth parameter α = ω2/

√omega0ω4 .

4:8. Prove that sups,t∈[a,b] |∑∞

0 ck(s)ck(t)| ≤ supt r(t, t) < ∞ in the expan-sions (4.71) and (4.72).

4:9. Prove that in step (iv) in the proof of Theorem 4:13,∑k

ωk =∑

k

|φk(t)|2 dt ≤ maxa≤s≤b

r(s, s) · (b − a)2.

Chapter 5

Ergodic theory and mixing

The concept of ergodicity is one of the most fundamental in probability, sinceit links the mathematical theory of probability to what can be observed in adeterministic mechanical world. It also plays an important role in statisticalphysics and in fact, the term ergodic was coined by Ludwig Boltzmann in 1887in his study of the time development of mechanical particle systems. Theterm itself stems from the Greek ergos = ”work” and hodos = ”path”, possiblymeaning that ergodic theory is about how the energy in a system evolves withtime. Ergodicity in itself is not a probabilistic concept and it can be studiedwithin a purely deterministic framework, but it is only in terms of probabilitiesthat a sensible interpretation can be given to the basic ergodic results. The mainresult in this chapter is the Birkhoff ergodic theorem, which in 1931 settled thequestion of convergence of dynamical systems. For an account of the paralleldevelopment in statistical physics and probability theory, see the interestinghistorical work by von Plato [26]. The account in this chapter is based on [5]and [9]. More results on ergodic behavior of random and non-random sequencescan be found in [12], and for general stochastic aspects of dynamical systems,[21].

5.1 The basic Ergodic theorem in L2

We met our first ergodic theorem in Section 2.7, Theorem 2:17, stating covari-ance conditions under which the time average T−1

∫ T0 x(t) dt tends, in quadratic

mean and with probability one, to the expectation of the stationary process{x(t), t ∈ R}. In the special case when x(t) is stationary with covariancefunction r(t) = Cov(x(s), x(s + t)), the quadratic mean convergence becomesparticularly simple. If E(x(t)) = 0 then

1T

∫ T

0r(t) dt → 0 implies

1T

∫ T

0x(t) dt

q.m.→ 0,

133

134 Ergodic theory and mixing Chapter 5

as T → ∞ . This was proven by elementary calculation in Theorem 2:17(a).Note that it follows from the statements in Theorem 4:7 in Section 4.3.3, thatthe relation is satisfied if and only if the spectral distribution function is con-tinuous at the origin.

By means of the spectral representation x(t) =∫

eiωt dZ(ω), we can formu-late a more precise theorem and see what happens if this sufficient conditiondoes not hold. In Section 4.3.2 we stated an explicit expression (4.15) for thespectral process Z(ω). In fact, if ω1 and ω2 are continuity points of the spectraldistribution function F (ω), then we have the parallel expressions

F (ω2) − F (ω1) =12π

limT→∞

∫ T

−T


−itr(t) dt,

Z(ω2) − Z(ω1) =12π

limT→∞

∫ T

−T


−itx(t) dt,

the latter convergence being in quadratic mean.We repeat the statement from Theorem 4:7, that when the spectral distri-

bution F is a step function, then

limT→∞

1T

∫ T

0r(t)e−iωkt dt = ΔFk,

limT→∞

1T

∫ T

0|r(t)|2 dt =

∑k

(ΔFk)2,

limT→∞

1T

∫ T

0x(t)e−iωkt dt = ΔZk.

5.2 Stationarity and transformations

5.2.1 Pseudo randomness and transformation of sample space

For strictly stationary processes one can obtain limit theorems of quite differentcharacter from those valid for processes which satisfy a covariance stationaritycondition. These theorems also require much deeper conditions than simplecovariance conditions. Remember, however, that a Gaussian (weakly) station-ary process is also strictly stationary, so the general ergodicity properties ofGaussian processes can be inferred already from its covariance function. Westart by giving all results for stationary sequences {xn;n ∈ Z}.

For a strictly stationary sequence the location of the origin is unessential forthe stochastic properties, i.e. P ((x1, x2, . . .) ∈ B) = P ((xk+1, xk+2, . . .) ∈ B)for every Borel set B ∈ B∞ ; see Section 1.3.2. This also means that we canassume the sequence to be double ended, and to have started in the remote

Section 5.2 Stationarity and transformations 135

past.1 From now on in this chapter, by a stationary process we mean a processthat is strictly stationary.

How do we – or nature – construct stationary sequences? Obviously, firstwe need something (call it ”a game”) that can go on forever, and second, weneed a game where the rules remain the same forever.

Example 5:1 (The irrational modulo game) A simple game, that can go onforever and has almost all interesting properties of a stationary process, is theadding of an irrational number. Take a random x0 with uniform distributionbetween 0 and 1, and let θ be an irrational number. Define

xk+1 = xk + θ mod 1,

i.e. xk+1 is the fractional part of xk +θ . It is easy to see that xk is a stationarysequence. We shall soon see why this game is more interesting with an irrationalθ than with a rational one. If computers could handle irrational numbers, thistype of pseudorandom number generator would be even more useful in MonteCarlo simulations than it is.

Example 5:2 One can define other stationary sequences by applying any timeinvariant rule to a stationary sequence xn , e.g. with 0 ≤ x0 ≤ 1, yn = 1 ifxn+1 > x2

n and yn = 0 otherwise. A more complicated rule is

yn = xn + maxk>0

e−kxn+k.

The well-known quadratic transformation,

xk+1 = cxk(1 − xk),

is an example of an interesting transformation giving rise to a stochastic se-quence when x0 is random; in Exercise 5:5 you are asked to find its stationarydistribution.

5.2.2 Strict stationarity and measure preserving transforma-

tions

Deterministic explicit rules, like xk+1 = xk + θ mod 1, are just examples oftransformations of a sample space which can produce stationary sequences forcertain probability distributions. We need to define a general concept of a mea-sure preserving transformation, which makes the resulting process stationary.

1More precisely, for every strictly stationary sequence {xn, n ∈ N} there exists a strictly

stationary sequence {exn; n ∈ Z} such that exn; n ≥ 0 have the same finite-dimensional distri-

butions as xn; n ≥ 0; prove this in Exercise 5:4.


Definition 5:1 a) Consider a probability space (Ω,F , P ). A measurable trans-formation2 T on (Ω,F , P ) is called measure preserving if

P (T−1A) = P (A) for all A ∈ F . (5.1)

b) Given a measurable space (Ω,F) and a measurable transformation T on(Ω,F), a probability measure P is called invariant if (5.1) holds.

In statistical physics or dynamical systems theory, a probability space isregarded as a model for the ”universe”, with the outcomes ω representing allits different ”states”, e.g. the location and velocity of all its particles. Themeasure P defines the probabilities for the universe to be in such and such astate that certain events occur. A transformation Tω is just a law of nature thatchanges the state of the universe; that a transformation is measure preservingmeans that the events occur with the same probability before and after thetransformation.

5.2.2.1 From a measure preserving transformation to a stationarysequence

Every measure preserving transformation can generate a stationary sequence.To see how, take a random variable x(ω) on (Ω,F , P ), i.e. a ”measurement” onthe state of the universe, for example its temperature. Then x(Tω) is the samemeasurement taken after the transformation. Define the random sequence

x1(ω) = x(ω), x2(ω) = x(Tω), . . . , xn(ω) = x(T n−1ω), . . .

Since T is measure preserving, this sequence is strictly stationary: With B ∈B∞ and

A = {ω; (x1(ω), x2(ω), . . .) ∈ B} = {ω; (x(ω), x(Tω), . . .) ∈ B},

P (A) = P ((x1, x2, . . .) ∈ B). Further,

T−1A = {ω;Tω ∈ A} = {ω; (x(Tω), x(T 2ω), . . .) ∈ B}= {ω; (x2(ω), x3(ω), . . .) ∈ B},

and thus P (T−1A) = P ((x2, x3, . . .) ∈ B). Since T is measure preserving,P (A) = P (T−1A), and hence (x1, x2, . . .) and (x2, x3, . . .) have the same dis-tribution, that is, xn is a stationary sequence.

2A measurable transformation T on a measurable space (Ω,F) is a function defined on

Ω such that the inverse images under T of all sets in F are again in F ; that is T−1A =

{ω; Tω ∈ A} ∈ F for all A ∈ F .

Section 5.3 The Ergodic theorem, transformation view 137

5.2.2.2 From a stationary process to a measure preserving transfor-mation

We have just seen how to construct a stationary sequence from a measurepreserving transformation. Conversely, every stationary sequence generates ameasure preserving transformation on R∞ , the space of realizations for thestationary sequence.

Take the probability space (R∞,B∞, P ) with outcomes

ω = (x1, x2, . . .),

and define the shift transformation T by

Tω = (x2, x3, . . .).

As an example, take the set A = {ω;x1 < x2} to be the outcomes for whichthe first coordinate is smaller than the second one. Then T−1A = {ω;Tω ∈ A}is the set {(x1, x2, . . .); (x2, x3, . . .) ∈ A} = {(x1, x2, . . .);x2 < x3}, that is, thesecond coordinate is smaller than the third. The transformation just shifts theevent criterion one step to the right.

Take the coordinate process x(ω) = ω = (x1, x2, . . .), for which the nth vari-able in the sequence is equal to the nth coordinate in the outcome, and assumethat P is such that x is a stationary sequence. Then the shift transformationT is measure preserving – the shifted sequence has the same distribution as theoriginal one.

Remark 5:1 The property that a transformation is measure preserving, ofcourse depends on the probability measure P and not only on the transfor-mation itself. In probability theory, where the probability measure is often givena priori, it is natural to put the request on the transformation.

In the mathematical study of dynamical systems, one often starts with atransformation T and seeks a measure under which T is measure preserving.Such a measure is called invariant. Thus, invariant measures in the dynamicalsystems theory are equivalent to the strictly stationary processes in probabilitytheory.

5.3 The Ergodic theorem, transformation view

The classical ergodic theory for dynamical systems, deals with what happens inthe long run when one observes some characteristic of the system, i.e. when onetakes observation x(ω). The state of the system changes by the transformationT , and our interest is with the time average of the sequence of measurementsx(ω), x(Tω), x(T 2ω), . . . , taken for a fixed initial outcome ω ,

1n

n∑1

x(T k−1ω)


as n → ∞ . Think of Ω as all the possible states our universe can be in,and, to be concrete, think of x(ω) as the temperature at one specific location.The universe changes states from day to day, and if ω is the state of theuniverse today, and Tω its state tomorrow, then 1

n

∑n1 x(T k−1ω) is the average

temperature observed over an n-day period.In this section we shall take the transformation view on ergodic theory, and

prove the ergodic theorem in that vein. In the next section we shall do exactlythe same thing, but take a probabilistic strictly stationary process view, andprove the ergodic theorem in terms of random variables and expectations.

5.3.1 Invariant sets and invariant random variables

To motivate the introduction of invariant sets and invariant random variables,we consider the possible limits of the average of a sequence of random variables,

Sn/n =1n

n∑1

xk.

If Sn/n converges, as n → ∞ , what are the possible limits? Say, if Sn/n → y ,a random variable, then obviously xn/n = Sn/n − Sn−1/n → y − y = 0, and3

y = limx2 + x3 + . . . + xn+1

n+ lim

x1 − xn+1

n

= limx2 + x3 + . . . + xn+1

n− lim

xn+1

n.

We see that the limit of Sn/n is the same for the sequence x1, x2, . . . as itwould be for the shifted sequence x2, x3, . . . .

Definition 5:2 Consider a probability space (Ω,F , P ) and a measure preserv-ing transformation T of Ω onto itself.

(a) A random variable x on (Ω,F , P ) is called invariant under T if x(ω) =x(Tω), for almost every ω ∈ Ω.

(b) A set A ∈ F is called invariant under T if T−1A = A.

Example 5:3 The limit of Sn/n is invariant (when it exists) under the shifttransformation (x1, x2, . . .) �→ (x2, x3, . . .) of R∞ . The random variables y =lim supxn and lim supSn/n are always invariant.

3Show as an exercise, if you have not done so already, that xn is a stationary sequence

with E(|xn|) < ∞ , then P (xn/n → 0, as n → ∞) = 1; Exercise 5:6.


Example 5:4 Let xn be a Markov chain with transition matrix

P =

⎛⎜⎜⎜⎝1/2 1/2 0 01/3 2/3 0 00 0 2/3 1/30 0 1/2 1/2

⎞⎟⎟⎟⎠ ,

with starting distribution p(0) = (2p/5, 3p/5, 3q/5, 2q/5) is stationary for everyp + q = 1. The sets A1 = {ω = (x1, x2, . . .);xk ∈ {1, 2}} and A2 = {ω =(x1, x2, . . .);xk ∈ {3, 4}} are both invariant under the shift transformation.

The proof of the following simple theorem is left to the reader.

Theorem 5:1 (a) The family of invariant sets,

J = {invariant sets A ∈ F },

is a σ -field.

(b) A random variable y is invariant if and only if it is measurable with respectto the family J of invariant sets.

5.3.2 Ergodicity

The fundamental property of ergodic systems ω �→ Tω with a stationary (orinvariant) distribution P , is that the T kω , with increasing k , visits every cornerof the state space, exactly with the correct frequency as required by P . Anotherway of saying this is that the ”histogram”, counting the number of visits to anyneighborhood of states, converges to a limiting ”density”, namely the densityfor the invariant distribution P over the state space. If we make a measurementx(ω) on the system, then the expected value E(x) is the ”ensemble average”with respect to the measure P ,

E(x) =∫

ω∈Ωx(ω) dP (ω),

and – here is the ergodicity – this is exactly the limit of the ”time average”

1n

n∑1

x(T k−1ω).

In the Markov Example 5:4, if p and q are not 0, there is no possibilityfor the process to visit every state the correct number of times, since either itstarts in the invariant set A1 and then it always takes the values 1, 2, or itstarts in A2 and then it stays there forever and takes only values 3, 4. This isthe key to the definition of ergodicity.


Definition 5:3 A measure preserving transformation T on (Ω,F , P ) is calledergodic if every invariant set A ∈ J has either P (A) = 0 or P (A) = 1, thatis, all invariant sets are trivial. The term ”metrically transitive” is sometimesused instead of ”ergodic”.

Here is a nice example of a transformation that can be ergodic.

Example 5:5 The modulo game in Example 5:1, considered as a transfor-mation on the unit interval, ([0, 1],B, �), (� is the Lebesgue measure, i.e. theuniform distribution),

Tx = x + θ mod 1,

is ergodic when θ is irrational, and non-ergodic for rational θ .For θ = m/n one has T nx = x and every set of the form A ∪ {A + 1/n} ∪

{A + 2/n} . . . ∪ {A + (n − 1)/n} is invariant but can have probability ∈ (0, 1).To see what happens when θ is irrational we need some further results for

ergodic transformations.

Theorem 5:2 Let T be a measure preserving transformation of (Ω,F , P ).Then T is ergodic if and only if every invariant random variable x is a.s.a constant. It is sufficient that every bounded invariant random variable is a.s.constant.

Proof: First assume that every bounded invariant variable is constant, andtake an invariant set A . Its indicator function χA(ω) = 1 if ω ∈ A is thenan invariant random variable, and by assumption it is an a.s. constant. Thismeans that the sets where it is 0 and 1, respectively, have probability either 0or 1, i.e. P (A) = 0 or 1. Hence T is ergodic.

Conversely, take an ergodic T , and consider an arbitrary invariant randomvariable x . We shall show that x is a.s. constant. Define, for real x0 ,

Ax0 = {ω;x(ω) ≤ x0}.

Then T−1Ax0 = {ω;Tω ∈ Ax0} = {ω;x(Tω) ≤ x0} = Ax0 , since x is invariant.But then, by ergodicity, P (Ax0) = 0 or 1, depending on x0 , and it is easy tosee that then there is an x0 such that P (x = x0) = 1, hence x is constant.

�

Example 5:6 (Irrational modulo game) We can now show that Tx = x + λ

mod 1 is ergodic if λ is irrational. Take any Borel-measurable function f(x)on [0, 1) with

∫ 10 f2(x) dx < ∞ . It can be expanded in a Fourier series

f(x) =∞∑

k=−∞cke

2πikx,


with convergence in quadratic mean for almost all x and∑ |ck|2 < ∞ . Then

y = f(ω) is a random variable. We assume it is invariant, and prove that it isa.s. constant. But f invariant means that

f(x) = f(Tx) =∞∑

k=−∞cke

2πikx · e2πikλ,

which implies ck(1 − e2πikλ) = 0 for all k . But e2πikλ = 1 for k = 0 whenλ is irrational, and hence ck = 0 for all k = 0, which means that f(x) = c0 ,constant. By Theorem 5:2 we conclude that T is ergodic.

The study of ergodic transformations is an important part of the dynamicsystems and chaos theory; see [21].

5.3.3 The Birkhoff Ergodic theorem

In the introductory remarks, Section 5.3.1, we noted that any limit of the timeaverage

1n

n∑1

x(T k−1ω)

is invariant. The limit could be a constant, and it could be a random variable,but it needs to be invariant. If it is constant, we need to find the value of theconstant, and if it is a random variable, we want to find out as much as possibleabout its distribution.

In fact, for measure preserving transformations, the limit always exists, andis equal to the conditional expectation of x given the invariant sets. Since wehave not dealt with this concept previously in this course, we state the basicproperties of conditional expectations by giving its definition.

Definition 5:4 If y is a random variable on (Ω,F , P ) with E(|y|) < ∞,and A a sub-σ -field of F , then by the conditional expectation of y given A ,E(y | A), is meant any A-measurable random variable u that satisfies∫

ω∈Ay(ω) dP (ω) =

∫ω∈A

u(ω) dP (ω),

for all A ∈ A . Note that the value of E(y | A) = u is defined only almostsurely, and that any A-measurable variable which has the same integral as y

when integrated over A-sets, works equally well as the conditional expectation.

In particular, if A only contains sets which have probability 0 or 1, E(y | A)is a.s. constant and equal to E(y).

In particular, we consider the conditional expectation E(x | J ), given theσ -field J of sets which are invariant under a measure preserving transforma-tion.


5.3.3.1 Time averages always converge

Theorem 5:3 (Birkhoff ergodicity theorem) Let T be a measure preservingtransformation on (Ω,F , P ). Then for any random variable x with E(|x|) <

∞,

limn→∞

1n

n−1∑0

x(T kω) = E(x | J ), a.s.

The proof is based on the following lemma, shown by Adriano M. Garsia in1965.

Lemma 5.1 Let T be a measure preserving transformation and x a randomvariable with E(|x|) < ∞. Define

Sk(ω) = x(ω) + . . . + x(T k−1ω), and Mn(ω) = max(0, S1, S2, . . . , Sn).

Then ∫ω;Mn>0

x(ω) dP (ω) ≥ 0.

Proof of lemma: Consider S′k = x(Tω)+ . . .+x(T kω) = Sk −x(ω)+x(T kω),

and Mn(Tω) = M ′n = max(0, S′

1, . . . , S′n). For k = 1, . . . , n , M ′

n ≥ S′k , so

x + M ′n ≥ x + S′

k = Sk+1,

and (for k = 0) x + M ′n ≥ S1(= x). Hence

x ≥ Sk − M ′n, for k = 1, 2, . . . , n + 1,

which impliesx ≥ max(S1, . . . , Sn) − M ′

n.

Thus (with M ′n = Mn(Tω)),∫

Mn>0x(ω) dP (ω) ≥

∫Mn>0

{max(S1(ω), . . . , Sn(ω)) − Mn(Tω)} dP (ω).

But on the set {ω;Mn(ω) > 0}, one has that Mn = max(S1, . . . , Sn), andthus ∫

Mn>0x(ω) dP (ω) ≥

∫Mn>0

{Mn(ω) − Mn(Tω)} dP (ω)

≥∫

{Mn(ω) − Mn(Tω)} dP (ω) = 0,

since increasing the integration area does not change the integral of Mn(ω)while it can only make the integral of Mn(Tω) larger. Further, T is measure


preserving, i.e. shifting the variables one step does not change the distribution,nor the expectation. �

Proof of theorem: We first assume that E(x | J ) = 0 and prove that theaverage converge to 0, a.s. For the general case consider x−E(x | J ) and usethat

E(x | J )(Tω) = E(x | J )(ω),

since E(x | J ) is invariant by Theorem 5:1(b), page 139We show that x = lim supSn/n ≤ 0 and, similarly, x = lim inf Sn/n ≥ 0,

giving lim Sn/n = 0. Take an ε > 0 and denote D = {ω;x > ε}: we shall showthat P (D) = 0. Since, from Example 5:3, x is an invariant random variable,also the event D is invariant. Define a new random variable,

x∗(ω) =

{x(ω) − ε if ω ∈ D,

0 otherwise,

and set S∗n(ω) =

∑n1 x∗(T k−1ω), with M∗

n defined from S∗k . From Lemma 5.1

we know ∫M∗

n>0x∗(ω) dP (ω) ≥ 0. (5.2)

We now only have to replace this inequality by an inequality for a similarintegral over the set D to be finished. The sets

Fn = {M∗n > 0} =

{max

1≤k≤nS∗

k > 0}

increase towards the set

F =

{supk≥1

S∗k > 0

}=

{supk≥1

S∗k

k> 0

}=

{supk≥1

Sk

k> ε

}∩ D.

But since supk≥1 Sk/k ≥ lim supSk/k = x , we have that F = D . In order totake the limit of (5.2) we must be sure the expectations are finite, i.e. E(|x∗|) ≤E(|x|) + ε , and bounded convergence gives

0 ≤∫

M∗n>0

x∗(ω) dP (ω) →∫

Dx∗(ω) dP (ω). (5.3)

Here the right hand side is∫D

x∗(ω) dP (ω) =∫

Dx(ω) dP (ω) − εP (D)

=∫

DE(x | J ) dP − εP (D) = −εP (D),

since∫D E(x | J ) dP = 0, by assumption. Together with (5.3) this implies

P (D) = 0, and hence x ≤ ε . But ε > 0 was arbitrary so x ≤ 0, a.s. The samechain of arguments leads to x ≥ 0, and hence lim supSn/n ≤ 0 ≤ lim inf Sn/n .The limit therefore exists and is 0, which was to be shown. �


5.3.3.2 Time averages of ergodic transformations go to the mean

It is now a simple corollary that time averages for ergodic transformationsconverge to the expected value.

Corollary 5.1 If T is a measure preserving ergodic transformation on theprobability space (Ω,F , P ), then for any random variable x with E(|x|) < ∞,

limn→∞

1n

n−1∑0

x(T kω) = E(x), a.s.

Proof: When T is ergodic every invariant set has probability 0 or 1 andtherefore the conditional expectation is constant, E(x | J ) = E(x), a.s. �

Remark 5:2 If x is non-negative with E(x) = ∞ then Sn/n → ∞ if T isergodic. Show that as an exercise; Exercise 5:10.

5.3.3.3 Ergodic non-random walks

One can regard a measure preserving transformation T as a non-random walkω , Tω , T 2ω , . . . , over the sample space. In the beginning of this section weinterpreted the ergodic statements as convergence results for the number ofvisits to any neighborhood of a fixed outcome. This can now be made precise.Take a set A ∈ F and consider its indicator function χA(ω). The ergodictheorem says, that if T is ergodic,

1n

n−1∑0

χA(T kω) a.s.→ P (A).

Example 5:7 In the irrational modulo game, Tx = x+ θ mod 1, the relativenumber of points falling in an interval [a, b) converges to the length of theinterval if θ is irrational. Thus the number of points becomes asymptoticallyequidistributed over [0, 1). This is the weak Weyl’s equidistribution theorem.

5.4 The Ergodic theorem, process view

It is easy to formulate the convergence theorem for transformations in terms oftime averages of stationary sequences. First we need to define invariant eventsand ergodicity in (Ω,F , P ).

Definition 5:5 (Reformulation of Definition 5:2) Let {xn} be a stationarysequence. An event A ∈ F is called invariant for {xn}, if there exists a B ∈ B∞such that for any n ≥ 1,

A = {(xn, xn+1, . . .) ∈ B}.

Section 5.4 The Ergodic theorem, process view 145

The sequence is called ergodic if every invariant set has probability 0 or 1.A random variable z is called invariant for {xn}, if it is a function of

x1, x2, . . . and remains unchanged under the shift transformation, i.e. if thereexists a random variable φ on (R∞,B∞) such that z = φ(xn, xn+1, . . .) for alln ≥ 1.

From the correspondence between transformations of the sample space andstationary sequences, it is easy to formulate an ergodicity theorem for a sta-tionary sequence {xn, n ∈ Z}.

Theorem 5:4 (a) If {xn, n ∈ Z} is a stationary sequence with E(|x1|) < ∞,and J denotes the σ -field of invariant sets,

1n

n∑1

xka.s.→ E(x1 | J ).

(b) If {xn, n ∈ Z} is stationary and ergodic, then

1n

n∑1

xka.s.→ E(x1).

(c) If {xn, n ∈ Z} is stationary and ergodic, and φ(x1, x2, . . .) is measurableon (R∞,B∞), then the process

yn = φ(xn, xn+1, . . .)

is stationary and ergodic. The special case that φ(xn, xn+1, . . .) = φ(xn) onlydepends on xn should be noted in particular.

Proof: It is easy to reformulate the results of Theorem 5:3 to yield the the-orem. However, we shall give the direct proof once more, but use the processformulation, in order to get a slightly better understanding in probabilisticterms. We give a parallel proof of part (a), and leave the rest to the reader.

(a) First the counterpart of Lemma 5.1: From the sequence {xn, n ∈ Z}, defineSk =

∑nj=1 xj and Mn = max(0, S1, S2, . . . , Sn). We prove that

E(x1 | Mn > 0) ≥ 0. (5.4)

To this end, write S′k =

∑k+1j=2 xj = Sk − x1 + xn+1 , and define the cor-

responding maximum, M ′n = max(0, S′

1, S′2, . . . , S

′n), and note that, for k =

1, . . . , n ,x1 + M ′

n ≥ x1 + S′k = Sk+1.

Since M ′n ≥ 0 we also have x1 + M ′

n ≥ S1(= x1), so

x1 ≥ max(S1, . . . , Sn) − M ′n.


Now, when Mn > 0 we can replace max(S1, . . . , Sn) by Mn , so taking theconditional expectation, given Mn > 0, we get

E(x1 | Mn > 0) ≥ E(max(S1, . . . , Sn) − M ′n | Mn > 0)

= E(Mn − M ′n | Mn > 0).

Further, since M ′n ≥ 0, one easily argues that

E(Mn − M ′n | Mn > 0) ≥ E(Mn − M ′

n)/P (Mn > 0),

which is 0, since Mn and M ′n have the same expectation. This proves (5.4).

We continue with the rest of the proof of part (a). Suppose E(x1 | J ) = 0and consider the invariant random variables x = lim supSn/n . Take an ε > 0and introduce the invariant event D = {x > ε}. Then, by the assumption,

E(x1 | x > ε) = 0, (5.5)

a fact which is basic in the proof. We intend to prove that P (D) = 0 for everyε > 0, thereby showing that x ≤ 0.

Similarly, x = lim inf Sn/n can be shown to be non-negative, and hencex = x = 0.

However, before we prove that P (D) = P (x > ε) = 0, we need to discussthe meaning of (5.5). A conditional expectation is defined as a random variable,measurable with respect to the conditioning σ -field, in this case J . In (5.5)we conditioned on one of the events D ∈ J and that is fine if P (D) > 0,but if P (D) = 0, the claim (5.5) makes no sense. The conditional expectationgiven an event of probability 0 can be given any value we like, since the onlyrequirement on the conditional expectation is that it should give a correct valuewhen integrated over a J -event. If that event has probability 0 the integral is0 regardless of how the expectation is defined. We return to this at the end ofthe proof.

From (x1, x2, . . .), define a new sequence of variables

x∗k =

{xk − ε if x > ε,

0 otherwise,

and define S∗k =

∑kj=1 x∗

k and M∗n = max(0, S∗

1 , S∗2 , . . . , S∗

n), in analogy withSk and Mn . The sequence {x∗

k} is stationary, since x is an invariant randomvariable, so we can apply (5.4) to get

E(x∗1 | M∗

n > 0) ≥ 0. (5.6)

On the other hand, from the definition of x∗k we have

E(x∗1 | x > ε) = E(x1 | x > ε) − ε = −ε < 0, (5.7)

Section 5.4 The Ergodic theorem, process view 147

since E(x1 | x > ε) = 0 by the assumption. These two inequalities go inopposite directions, and in fact they will turn out to be in conflict, unlessP (D) = 0, proving the assertion.

So we would like to have a relation between the events D = {x > ε} andFn = {M∗

n > 0} = {max1≤k≤n S∗k > 0}, and we see that as n increases, then

Fn increases to

F =

{supk≥1

S∗k > 0

}=

{supk≥1

S∗k

k> 0

}=

{supk≥1

Sk

k> ε

}⋂{x > ε} .

But x = lim supSk/k ≤ supk≥1 Sk/k , so the right hand side is just D = {x >ε}; Fn ↑ D . This implies (here E(|x∗

1|) ≤ E(|x1|) + ε < ∞ is needed),

0 ≤ limn→∞E(x∗

1 | Fn) = E(x∗1 | D) = −ε < 0,

which obviously is impossible.Where did it go wrong, and where is the contradiction? By definition, for

the outcomes where x > ε , the variables x∗k ARE equal to xk − ε , and S∗

k/k =Sk/k−ε . But Fn ↑ D does not imply limn→∞ E(x∗

1 | Fn) = limn→∞ E(x∗1 | D),

since these expressions are not well defined. If P (D) > 0 our reasoning makessense and lead to a contradiction, if P (D) = 0 we have argued with undefinedquantities in (5.6) and (5.7). The reader who wants to be on safest possiblegrounds should return to the formulation and proof of Theorem 5:3. �

5.4.0.4 Ergodic stationary processes

For continuous time processes {x(t), t ∈ R}, one defines the shift transforma-tion Uτ by

(Uτx)(t) = x(t + τ).

If x(t) is stationary, Uτ is measure preserving. For a set of functions B , theshifted set UτB is the set of functions Uτx for x ∈ B . A Borel set B ∈ BR

is called a.s. invariant, if B and UτB differ by, at most, sets of P -measure 0.Let J denote the σ -field of invariant sets. The process {x(t), t ∈ R} is calledergodic if all invariant sets have probability 0 or 1.

Theorem 5:5 (a) For any stationary process {x(t), t ∈ R} with E(|x(t)|) <

∞ and integrable sample paths, as T → ∞,

1T

∫ T

0x(t) dt

a.s.→ E(x(0) | J ).

(b) If further {x(t), t ∈ R} is ergodic, then

1T

∫ T

0x(t) dt

a.s.→ E(x(0)).


Proof: One first has to show that xn =∫ nn−1 x(t) dt is a stationary sequence,

and use the ergodic theorem to get convergence for integer n ,

1n

∫ n

0x(t) dt

a.s.→ E(∫ 1

0x(t) dt | J ),

as n → ∞ . By invariance and stationarity the last expectation is equal toE(x(0) | J ). Finally, with n = [T ] ,

1T

∫ T

0x(t) dt =

n

T

1n

∫ n

0x(t) dt +

1T

∫ T

nx(t) dt.

The first term has the same limit as 1n

∫ n0 x(t) dt , while the second term is

bounded by1T

∫ n+1

n|x(t)| dt.

But also |x(t)| is a stationary process, to which we can apply Theorem 5:4(a),getting convergence, and hence can conclude that the last term tends to 0.Thus we obtain the desired limit. �

5.5 Ergodic Gaussian sequences and processes

We shall now give simple conditions for ergodicity for Gaussian stationary pro-cesses, characterized by their covariance function r(t) in continuous or discretetime.

Theorem 5:6 Let {x(t), t ∈ R} be stationary and Gaussian with E(x(t)) = 0and V (x(t)) = 1, and let its covariance function be r(t). Then x(t) is ergodicif and only if its spectral distribution function F (ω) is continuous everywhere.If the spectral distribution has a density f(ω), F (ω) =

∫ ω−∞ f(l) dl , the F (ω) is

obviously continuous, but that is by no means necessary. It suffices that F (ω)is a continuous function.

Proof of ”only if” part: If x(t) is ergodic, so is x2(t), and therefore thetime average of x2(t) tends to E(x2(0)) = 1,

ST

T=

1T

∫ T

0x2(t) dt

a.s.→ 1,

as T → ∞ . It is a property of the Gaussian distribution that E((ST /T )4) ≤ Kis bounded for large T , and therefore the almost sure convergent ST/T alsoconverges in quadratic mean, i.e. E((ST /T − 1)2) → 0. But this expectationcan be calculated. Since, for a standard Gaussian process, E(x(s)2 x(t)2) =

Section 5.5 Ergodic Gaussian sequences and processes 149

1 + 2r(t − s)2 , one gets,

E((ST /T − 1)2) =1T 2

E

(∫ T

0

∫ T

0x(s)2x(t)2 ds dt

)− 1

=2T 2

∫ T

0

∫ T

0r2(t − s) ds dt =

4T 2

∫ T

0t ·{

1t

∫ t

0r2(s) ds

}dt. (5.8)

But according to Theorem 4:7, page 94, relation (4.20), 1t

∫ t0 r2(s) ds tends to

the sum of squares of all jumps of the spectral distribution function F (ω),∑(ΔFk)2 . Hence, if this sum is strictly positive, the right hand side in (5.8)

has a positive limit, which contradicts what we proved above, and we haveconcluded the ”only if” part of the theorem. �

The ”if” part is more difficult, and we can prove it here only under theadditional condition that the process has a spectral density, i.e. the spectraldistribution is F (ω) =

∫ ω−∞ f(x) dx , because then

r(t) =∫

eiωt f(ω) dω → 0,

as t → ∞ , by Riemann-Lebesgue’s Lemma. (The full statement and proof canbe found in [24, 15].) So what we have to show is that if r(t) → 0, then x(t) isergodic. Since this is worth to remember, we formulate it as a lemma.

Lemma 5.2 A Gaussian stationary process is ergodic if its covariance functionr(t) → 0 as t → ∞.

Proof of lemma, and the ”if” part of theorem: We show that if r(t) → 0,then every invariant set has probability 0 or 1. Let S be an a.s. invariant setfor the x(t)-process, i.e. the translated event Sτ differs from S by an event ofprobability zero. But every event in F can be approximated arbitrarily wellby a finite-dimensional event, B , depending only on x(t) for a finite number oftime points tk, k = 1, . . . , n ; cf. Section 1.3.3. From stationarity, also Sτ canbe approximated by the translated event Bτ = UτB , with the same error, andcombining S with Sτ can at most double the error. Thus, we have

|P (S) − P (B)| < ε,

|P (S ∩ Sτ ) − P (B ∩ Bτ )| < 2ε.

Here P (S ∩ Sτ ) = P (S) since S is invariant, so P (S) can be approximatedarbitrarily well by both P (B) and by P (B ∩ Bτ ).

But B depends on x(ti), i = 1 . . . , n , while Bτ is defined from x(τ + tj),j = 1, . . . , n , and they are multivariate normal with covariances

Cov(x(ti), x(tj + τ)) = r(τ + tj − ti) → 0


as τ → ∞ . Thus these two groups of random variables become asymptoticallyindependent,

P (B ∩ Bτ ) − P (B) · P (Bτ ) → 0,

and by stationarity, P (B ∩ Bτ ) → P (B)2 . Thus both P (B) and P (B)2 ap-proximate P (S), and we conclude that P (S) = P (S)2 . This is possible only ifP (S) is either 0 or 1, i.e. x(t) is ergodic. �

5.6 Mixing and asymptotic independence

How much of the future development of a stochastic process is pre-determinedfrom what has already happened, and how much information about the future isthere in a piece of observation of a stochastic process and how closely dependentare disjoint segments of the process?

In this section we will briefly mention some criteria on a stationary processthat guarantee perfect predictability and un-predictability, respectively. A dif-ferentiable process can be predicted locally by means of a Taylor expansion. Thelinear prediction theory gives the Cramer-Wold decomposition (Theorem 5:7)into a singular component that can be predicted linearly without error and oneregular component for which the predictable part tends to zero with increas-ing prediction horizon. The spectral representation in Chapter 4 relates thespectrum to the number of harmonic components which are needed to builda stationary process. The ergodic theorem in Chapter 5 touched upon theproblem of asymptotic independence; for a Gaussian process with (absolutely)continuous spectrum and asymptotically vanishing covariance function, valuesfar apart are asymptotically independent and a ”law of large numbers” holds.In this section we shall try to relate these scattered results to each other. Somecomments are for Gaussian processes only, while others are of general nature.A new concept, mixing, will also be introduced.

5.6.1 Singularity and regularity

When predicting from x(s), s ≤ t , a question of both practical and theo-retical (or perhaps philosophical) interest is from where does the informa-tion about x(t + h) originate, and how much new information is added toH(x, t) = S(x(s); s ≤ t), with increasing t .

When t → −∞ , obviously

H(x, t) ↓ H(x,−∞) = ∩t≤t0H(x, t),

andH(x,−∞) ⊆ H(x, t) ⊆ H(x).

The subspace H(x, t) is the space of random variables that can be obtained aslimits of linear combinations of variables x(tk) with tk ≤ t , and H(x,−∞) is

Section 5.6 Mixing and asymptotic independence 151

what can be obtained from old variables, regardless of how old they may be. Itcan be called the infinitely remote past, or the the primordial randomness.4

Two extremes may occur, depending on the size of H(x,−∞):

• if H(x,−∞) = H(x), then {x(t), t ∈ R} is purely deterministic, or sin-gular,

• if H(x,−∞) = 0 , then {x(t), t ∈ R} is purely non-deterministic, orregular.

A process is deterministic if all information about the future that can beobtained from the past at time t , x(s), s ≤ t , can be obtained already fromx(s), s ≤ τ < t , arbitrarily far back. An example of this is the band-limitedwhite noise Gaussian process, which we studied in Chapter 4. Such a process isinfinitely differentiable – this follows from the general rules for differentiability– and the entire sample functions can be reconstructed from the values in anarbitrarily small interval located anywhere on the time axis. A summary offacts pertaining reconstruction and prediction is given later in these notes; seeSection 5.6 on general results on asymptotic independence and its opposite,complete dependence.

The following theorem was proved by Cramer (1962) in the general param-eter case; the discrete stationary case was given by Wold (1954).

Theorem 5:7 (The Cramer-Wold decomposition) Every stochastic process

{x(t), t ∈ R}

with E(|x(t)|2) < ∞, is the sum of two uncorrelated processes,

x(t) = y(t) + z(t),

where {y(t), t ∈ R} is regular (purely non-deterministic) and {z(t), t ∈ R} issingular (deterministic).

Proof: Construct H(x, t) and H(x,−∞) = limt↓−∞ H(x, t), and define

z(t) = P−∞(x(t)) = the projection of x(t) on H(x,−∞),

y(t) = x(t) − z(t).

To prove the theorem, we have to show that

1. H(z,−∞) = H(z, t), and z(t) is deterministic,

2. H(y,−∞) = 0 , and y(t) is non-deterministic,

3. H(y) ⊥ H(z), and y(s) and z(t) are uncorrelated.4Nowadays called ”the primordial soup”.


Number (3) follows from the projection properties, since the residual y(s) =x(s)− P−∞(x(s)) is uncorrelated with every element in H(x,−∞), i.e. y(s) ⊥H(x,−∞). Since z(t) ∈ H(x,−∞), number (3) follows.

Further, H(y, t) ⊆ H(x, t) and H(y, t) ⊥ H(x,−∞). Therefore H(y,−∞)is equal to 0 , because if y is an element of H(y,−∞) then both y ∈ H(y, t) ⊂H(x, t) for all t , i.e. y ∈ H(x,−∞), and at the same time y ⊥ H(x,−∞). Theonly element that is both in H(x,−∞) and is orthogonal to H(x,−∞) is thezero element, showing (2).

Finally, H(z, t) = H(x,−∞) for every t . To see this, note that

H(x, t) ⊆ H(y, t) ⊕H(z, t)

for all t , and therefore also

H(x,−∞) ⊆ H(y,−∞) ⊕H(z, t).

Since H(y,−∞) = 0 ,

H(x,−∞) ⊆ H(z, t) ⊆ H(x,−∞).

Thus H(x,−∞) = H(z, t), and (1) is proved. �

As the band-limited white noise example shows, there are natural deter-ministic processes. Other common process models are regular. Examples ofprocesses combining the two properties seem to be rather artificial.

Example 5:8 An AR(1)-process with an added component,

x(t) = ax(t − 1) + e + e(t),

with uncorrelated e and e(t)-variables, can be decomposed into

y(t) =∞∑

k=0

ake(t − k),

which is regular, and

z(t) =1

1 − ae,

which is singular. The common ARMA-process is regular.

5.6.2 Asymptotic independence, regularity and singularity

As we saw in the proof of Lemma 5.2, if r(t) → 0 as t → ∞ , then in a Gaussianprocess, finitely many variables taken sufficiently far apart are almost indepen-dent. But this does definitely not mean that the process observed in an entireinterval, x(s), s ∈ I is asymptotically independent of x(s+ t), s ∈ I . These two


segments can be completely dependent of each other in a deterministic way.For example, the realizations can be infinitely differentiable and part of an an-alytic function that can be reconstructed from its derivatives in an arbitrarilysmall interval. It was shown by Belyaev [3] that if the covariance function of aseparable process is an entire function, i.e. analytic in the entire complex plane,then the sample functions are a.s. also entire functions, which can be expressedas a convergent power series

x(t) =∞∑

k=0

x(k)(0)tk

k!.

Examples of such covariance functions are r(t) = e−t2/2 and r(t) = sin tt . Ya-

glom [38, Ch. 8] contains a readable account of prediction in this case; [18, 11]give more mathematical details.

5.6.2.1 Regularity, singularity and the spectrum

The Cramer-Wold decomposition deals with the problem of predicting futurevalues by means of linear combinations of past observations. A singular (orpurely deterministic) process can be perfectly predicted linearly from old values.In a regular process the predictable part tends to zero with increasing predictionhorizon. Simple conditions for singularity/regularity can be formulated in termsof the spectral distribution function F (ω).

Let f(ω) = ddω F (ω) be the derivative of the bounded and non-decreasing

function F (ω). For almost all ω this derivative exists and is non-negative, andits integral is bounded,

∫∞−∞ f(ω) dω ≤ F (∞) − F (−∞). Write

F (ac)(ω) =∫ ω

−∞f(x) dx ≤ F (ω).

The spectrum is called absolutely continuous with spectral density f(ω) if

F (ω) = F (ac)(ω).

In general, F (ac)(ω) need not be equal to F (ω). In particular, this is of coursethe case when the spectrum has jumps ΔFk at frequencies ωk . Write

F (d)(ω) =∑

ωk≤ω

ΔFk,

so the spectrum is discrete if F (ω) = F (d)(ω). The part of the spectrum whichis neither absolutely continuous nor discrete is called the singular part:

F (s)(ω) = F (ω) − F (ac)(ω) − F (d)(ω).

Note that both F (d)(ω) and F (s)(ω) are bounded non-decreasing functions,differentiable almost everywhere, with zero derivative. The question of singu-larity or regularity of the process {x(t), t ∈ R} depends on the behavior of thespectrum for large |ω| .


5.6.2.2 Conditions for stationary sequences

Since∫ π−π f(ω) dω < ∞ and −∞ ≤ log f(ω) ≤ f(ω) = d

dωF (ω), the integral

P =12π

∫ π

−πlog f(ω) dω (5.9)

is either finite or equal to −∞ .

Theorem 5:8 For a stationary sequence xt, t ∈ Z the following cases can oc-cur.

a) If P = −∞, then xt is singular.

b) If P > −∞ and the spectrum is absolutely continuous with f(ω) > 0 foralmost all ω , then xt is regular.

c) If P > −∞, but F (ω) is either discontinuous, or is continuous with non-vanishing singular part, F (s)(ω) = 0, then xt is neither singular nor regular.

This theorem is quite satisfying, and it is worth making some commentson its implications. First, if the spectrum is discrete with a finite number ofjumps, then f(ω) = 0 for almost all ω and P = −∞ , so the process is singular.As seen from the spectral representation (4.18), the process then depends onlyon a finite number of random quantities which can be recovered from a finitenumber of observed values.

If the spectrum is absolutely continuous with density f(ω), singularity andregularity depends on whether f(ω) comes close to 0 or not. For example, iff(ω) vanishes in an interval, then P = −∞ and x(t) is singular.5 If f(ω) ≥c > 0 for −π < ω ≤ π , then the integral is finite and the process regular.

A stationary sequence x(t) is regular, if and only if it can be representedas a one-sided moving average

xt =t∑

k=−∞ht−k yk,

with uncorrelated yk ; cf. Theorem 4:8 which also implies that it must have aspectral density.

A sequence that is neither singular nor regular can be represented as sumof two uncorrelated sequences,

xt = x(s)t + x

(r)t = x

(s)t +

t∑k=−∞

ht−k yk.

5Singularity also occurs if f(ω) = 0 at a single point ω0 and is very close to 0 nearby,

such as when f(ω) ∼ exp(− 1(ω−ω0)2

) when ω → ω0 .


The regular part x(r)t has absolutely continuous spectrum,

F (ac)(ω) =∫ ω

−πf(x) dx,

while the singular part x(s)t has spectral distribution F (d)(ω) + F (s)(ω).

It is also possible to express the prediction error in terms of the integral

P =12π

∫ π

−πlog f(ω) dω.

In fact, the one step ahead prediction error

σ20 = inf

h0,h1,...E

⎛⎝∣∣∣∣∣xt+1 −∞∑

k=0

hkxt−k

∣∣∣∣∣2⎞⎠ = 2π exp(P ).

5.6.2.3 Conditions for stationary processes

Conditions for regularity and singularity for stationary processes x(t) withcontinuous parameter, can be expressed in terms of the integral

Q =∫ ∞

−∞log f(ω)1 + ω2

dω,

where as before, f(ω) = ddωF (ω) is the a.s. existing derivative of the spectral

distribution function.

Theorem 5:9 For a stationary process {x(t), t ∈ R}, one has that

a) if Q = −∞, then x(t) is singular,

b) if Q > −∞ and the spectrum is absolutely continuous then x(t) is regular.

The decomposition of x(t) into one singular component which can be pre-dicted, and one regular component which is a moving average, is analogous tothe discrete time case,

x(t) = x(s)(t) +∫ t

u=−∞h(t − u) dζ(u),

where {ζ(t), t ∈ R} is process with uncorrelated increments.We apply the theorem to processes with covariance functions r(t) = e−t2/2

and r(t) = sin tt . They have absolutely continuous spectra with spectral densi-

ties,

f(ω) =1√2π

e−ω2/2 and f(ω) = 1/2 for |ω| < 1 ,

respectively. Obviously Q = −∞ is divergent in both cases, and we haveverified the statement that these processes are deterministic, although their


covariance functions tend to 0 as t → ∞ . The Ornstein-Uhlenbeck processwith covariance function r(t) = e−α|t| and spectral density f(ω) = α

π(α2+ω2)is

an example of a regular process with Q > −∞ .Note that in all three examples the covariance function r(t) tends to 0 as

t → ∞ , but with quite different rates. For the regular Ornstein-Uhlenbeckprocess it tends to 0 exponentially fast, while for the two singular (and hencepredictable) processes, the covariance falls off either much faster, as e−t2/2 , ormuch slower, as 1/t . Hence, we learn that stochastic determinism and non-determinism are complicated matters, even if they are entirely defined in termsof correlations and distribution functions.

5.6.3 Uniform, strong, and weak mixing

Predictability, regularity, and singularity are probabilistic concepts, defined andstudied in terms of prediction error moments and correlations. When it comesto ergodicity, we have seen examples of totally deterministic sequences whichexhibit ergodic behavior, in the sense that the obey the law of large numbers.To complicate matters, we have seen that a Gaussian process, with covariancefunction tending to 0 at infinity, implying asymptotic independence, is alwaysergodic, even if the remote past is in a deterministic sense determined by thearbitrarily remote.

Ergodicity is a law of large numbers, time averages converge to a limit. Instatistics, one would also like to have some idea of the asymptotic distribution;in particular, one would like a central limit theorem for normalized sums orintegrals, ∑N

k=1 x(k) − AN

BN,

∫ Nt=0 x(t) dt − AN

BN,

as N → ∞ . This asks for general concepts, that guarantee the asymptoticindependence of functionals of a stochastic process.

For a general stationary process, only the ergodic theorem can be expectedto hold. For example, if x(t) ≡ x , then B−1

N

∑Nk=1 x(k) = x with AN =

0 and BN = N , and this has the same distribution as x , regardless of N .Here, there is a strong dependence between the x-variables. But even a veryweak dependence would not help us much to obtain any interesting limitingdistributions. For example, if y(k) is a sequence of independent, identicallydistributed random variables, and

x(t) = y(t + 1) − y(t),

then1

BN

N∑k=1

x(k) =y(N + 1) − y(1)

BN.

This has a non-trivial asymptotic distribution only if BN has a non-zero limit.Central limit theorems for normalized sums typically require BN → ∞ and


then some mixing condition has to be imposed. (Martingale arguments areanother class of ”almost independence” conditions for asymptotic normality.;see [36, 37].)

For any stochastic process {x(t), t ∈ R} define the σ -field Mba as the σ -

field generated by x(t); a ≤ t ≤ b . Taking a and b as ∓ infinity, we getMb−∞ and M∞

a as the σ -fields generated by {x(t); t ≤ b} and by {x(t); t ≥a}, respectively. The following mixing conditions represent successively milderconditions on the asymptotic independence.

uniform mixing: x(t) is called uniformly mixing (or φ-mixing) if there is anon-negative function φ(n) such that φ(n) → 0 as n → ∞ , and for all tand events A ∈ Mt−∞ and B ∈ M∞

t+n ,

|P (A ∩ B) − P (A)P (B)| ≤ φ(n)P (A).

strong mixing: x(t) is called strongly mixing (or α-mixing) if there is a non-negative function α(n) such that α(n) → 0 as n → ∞ , and for all t andevents A ∈ Mt−∞ and B ∈ M∞

t+n ,

|P (A ∩ B) − P (A)P (B)| ≤ α(n).

weak mixing: x(t) is called weakly mixing if, for all events A ∈ Mt−∞ andB ∈ M∞

t ,

limn→∞

1n

n∑k=1

|P (A ∩ U−kB) − P (A)P (B)| = 0.

Here, U−kB = {x(·); x(k + ·) ∈ B} ∈ M∞k .

Of these, uniform mixing is the most demanding and weak mixing the least.

Theorem 5:10 Let {x(t); t ∈ Z} be a stationary Gaussian sequence. Then

a) x(t) is uniformly mixing if and only if it is m-dependent, i.e. there is an m

such that the covariance function r(t) = 0 for |t| > m.

b) x(t) is strong mixing if it has a continuous spectral density f(ω) ≥ c > 0 on−π < ω ≤ π .

Proof: a) That m-dependence implies uniform mixing is obvious. To provethe necessity, assume that r(t) is not identically 0 for large t . That x(t)is uniformly mixing implies that for all A ∈ M0−∞ , with P (A) > 0 and allB ∈ M∞

n ,|P (B | A) − P (B)| < φ(n) → 0 (5.10)


as n → ∞ . This directly implies that r(t) → 0 as t → ∞ , since otherwisethere would be an infinite number of t = tk for which r(tk) ≥ r0 > 0, say. TakeA = {x(0) > 0} ∈ M0

0 and B = {x(tk) > 0} ∈ Mtktk

. It is easy to see that

P (B | A) − P (B) ≥ c0 > 0

is bounded away from 0 by a positive constant c0 , and hence can not tend to0.

For simplicity, assume E(x(t) = 0, E(x(t)2) = 1 and assume there is aninfinite number of t = tk for which ρk = r(tk) > 0, but still r(tk) → 0. DefineA = {x(0) > 1/ρk}, (obviously P (A) > 0) and let B = {x(tk) > 1}. Sincex(0), x(tk) have a bivariate normal distribution, the conditional distribution ofx(tk) given x(0) = x is normal with mean ρkx and variance 1−ρ2

k . As ρk → 0,the conditional distribution of x(0) given x(0) > 1/ρk will be concentrated near1/ρk and then x(tk) will be approximately N(1, 1). Therefore, as ρk → 0,P (B | A) → 1/2. On the other hand, P (B) = (2π)−1/2

∫∞1 exp(−y2/2) dy <

0.2. Hence (5.10) does not hold for φ(tk) → 0.

b) For a proof of this, see [18] or [17]. �

Theorem 5:11 A stationary process x(t) that is strong mixing is ergodic.

Proof: The proof is almost a complete repetition of the proof of Lemma 5.2.Take an invariant event S and approximate it by a finite-dimensional event Bwith |P (S) − P (B)| < ε . Suppose B ∈ Mb

a ⊂ Mb−∞ . Then the translatedevent

Bn = UnB ∈ Mb+na+n ⊂ M∞

a+n,

and henceP (B ∩ Bn) − P (B) · P (Bn) → 0.

As in the proof of Lemma 5.2, it follows that P (B)2 = P (B) and hence P (S)2 =P (S). �

The reader could take as a challenge to prove that also weak mixing impliesthat x(t) is ergodic; see also Exercise 14.


Exercises

5:1. Show that the following transformation of Ω = [0, 1), F = B , P =Lebesgue measure, is measurable and measure preserving,

Tx = 2x mod 1.

5:2. (Continued.) Define the random variable x(ω) = 0 if 0 ≤ ω < 1/2,x(ω) = 1 if 1/2 ≤ ω < 1. Show that the sequence xn(ω) = x(T n−1ω)consists of independent zeros and ones, with probability 1/2 each.

5:3. Show that if T is measure preserving on (Ω,F , P ) and x is a randomvariable, then E(x(ω)) = E(x(Tω)).

5:4. Show that every one-sided stationary sequence {xn, n ≥ 0} can be ex-tended to a two-sided sequence {xn, n ∈ Z} with the same finite-dimen-sional distributions.

5:5. Find a distribution for x0 which makes xk+1 = 4xk(1 − xk) a stationarysequence.

5:6. Prove that if E(|xn| < ∞ , then P (xn/n → 0, as n → ∞) = 1; this wasused on page 138.

5:7. For any sequence of random variables, xn , and event B ∈ B , show thatthe event {xn ∈ B, infinitely often} is invariant under the shift transfor-mation.

5:8. Give an example of an ergodic transformation T on (Ω,F , P ) such thatT 2 is not ergodic.

5:9. Show that xn is ergodic if and only if for every A ∈ Bk , k = 1, 2, . . . ,

1n

n∑j=1

χA(xj , . . . , xj+k) → P ((x1, . . . , xk+1) ∈ A).

5:10. Show that if x is non-negative with E(x) = ∞ and xn(ω) = x(T n−1ω),Sn =

∑n1 xn , then Sn/n → ∞ if T is ergodic.

5:11. Prove Theorem 5:4.

5:12. Take two stationary and ergodic sequences xn and yn . Take one of thetwo sequences at random with equal probability, zn = xn, n = 1, 2, . . . orzn = yn, n = 1, 2, . . . . Show that zn is not ergodic.

5:13. Let {xn} and {yn} be two ergodic sequences, both defined on (Ω,F , P ),and consider the bivariate sequence zn = (xn, yn). Construct an exam-ple that shows that zn need not be ergodic, even if {xn} and {yn} areindependent.


5:14. Prove that a sufficient condition for z(n) = (x(n), y(n)) to be ergodic,if {x(n)} and {y(n)} are independent ergodic sequences, is that one of{x(n)} and {y(n)} is weakly mixing.

Chapter 6

Vector processes and random

fields

6.1 Cross-spectrum and spectral representation

The internal correlation structure of a stationary process is defined by thecovariance function; the spectrum distributes the correlation over different fre-quencies and in the spectral representation the process is actually built byindividual components, with independent amplitudes and phases, like in thediscrete spectrum case (4.18), on page 94. The phases are all independent anduniformly distributed over (0, 2π). The spectrum does not contain any phaseinformation!

When we have two stationary processes {x(t), t ∈ R} and {t(t), t ∈ R}which are correlated, their individual amplitudes and phases are still, of course,independent, but for every frequence the amplitudes in the two processes can bedependent, and there can exist a complicated dependence between the phases.This cross dependence is describes by the cross-covariance function and thecross-spectrum.

A stationary vector-valued process is a vector of p stationary processes,

x(t) = (x1(t), . . . , xp(t)),

with stationary cross-covariances, for E(xj(t)) = 0,

rjk(t) = E(xj(s + t) · xk(s)) = rkj(−t).

If the processes are real, which we usually assume,

rjk(t) = E(xj(s + t) · xk(s)) = rkj(−t).

The covariance function

R(t) = (rjk(t))

is a matrix function of covariances, where each auto-covariance function rkk(t)has its marginal spectral representation rkk(t) =

∫eiωt dFkk(ω).

161

162 Vector processes and random fields Chapter 6

6.1.1 Spectral distribution

Theorem 6:1 (a) To every continuous covariance matrix function R(t) thereexists a spectral distribution F(ω) such that

R(t) =∫ ∞

−∞eiωt dF(ω),

where F(ω) is a function of positive type, i.e. for every pair j, k , complexz = (z1, . . . , zn), and frequency interval ω1 < ω2 ,∑

j,k

(Fjk(ω2) − Fjk(ω1))zjzk ≥ 0.

This says that ΔF(ω) = (Fjk(ω2)−Fjk(ω1)) is a non-negative definite Hermitematrix.

(b) If Fjj(ω), Fkk(ω) are absolutely continuous with spectral densities fjj(ω),fkk(ω), then Fjk(ω) is absolutely continuous, with

|fjk(ω)|2 ≤ fjj(ω) fkk(ω).

Proof: (a) Take the z and define the stationary process

y(t) =∑

j

zj xj(t)

with covariance function

rz(t) =∑jk

rjk(t)zjzk =∫

eiωt dGz(ω),

where Gz(ω) is a real, bounded and non-decreasing spectral distribution func-tion. We then take, in order, zj = zk = 1, and zj = i , zk = 1, the rest being0. This gives two spectral distribution G1(ω), and G2(ω), say, and we have,

rjj(t) + rkk(t) + rjk(t) + rkj(t) =∫

eiωt dG1(ω),

rjj(t) + rkk(t) + irjk(t) − irkj(t) =∫

eiωt dG2(ω).

Together with rjj(t) =∫

eiωt dFjj(ω), and rkk(t) =∫

eiωt dFkk(ω), we get

rjk(t) + rkj(t) =∫

eiωt (dG1(ω) − dFjj(ω) − dFkk(ω)) ,

irjk(t) − irkj(t) =∫

eiωt (dG2(ω) − dFjj(ω) − dFkk(ω)) ,

Section 6.1 Cross-spectrum and spectral representation 163

which implies

rjk(t) = rkj(t) =∫

eiωt · 12

(dG1(ω) − idG2(ω) − (1 − i)(dFjj(ω) + dFkk(ω)))

=∫

eiωt dFjk(ω),

say, which is the spectral representation of rjk(t).It is easy to see that ΔF(ω) has the stated properties; in particular that∑

jk

ΔFjk(ω)zjzk ≥ 0. (6.1)

(b) From (6.1), by taking only zj and zk to be non-zero, it follows that

ΔFjj|zj |2 + ΔFkk|zk|2 + 2Re(ΔFjkzjzk) ≥ 0,

which in turn implies that for any ω -interval, |ΔFjk|2 ≤ ΔFjj · ΔFkk.Thus, ifFjj and Fkk have spectral densities, so does Fjk and

|fjk(ω)|2 ≤ fjj(ω) fkk(ω).

�

For real-valued vector processes, the spectral distributions may be put inreal form as for one-dimensional processes. In particular, the cross-covariancefunction can be written

rjk(t) =∫ ∞

0{cos ωt dGjk(ω) + sinωt dHjk(t)} , (6.2)

where Gjk(ω) and Hjk(ω) are functions of bounded variation.

6.1.2 Spectral representation of x(t)

6.1.2.1 The spectral components

Each component xj(t) in a stationary vector process has its spectral representa-tion xj(t) =

∫eiωt dZj(ω) in terms of a spectral process Zj(ω) with orthogonal

increments. Further, for j = k , the increments of Zj(ω) and Zk(ω) over dis-joint ω -intervals are orthogonal, while for equal ω -intervals,

E(dZj(ω) · dZk(ω)) = dFjk(ω),

E(dZj(ω) · dZk(μ)) = 0, for ω = μ.

The cross-correlation between the components of x(t) are determined by thecorrelations between the spectral components. To see how it works, consider


processes with discrete spectrum, for which the spectral representation are sumsof the form (4.29),

xj(t) =∑n

σj(n) (Uj(n) cos ω(n)t + Vj(n) sin ω(n)t) .

Here Uj(n) and Vj(n) are real random variables with mean 0, variance 1,uncorrelated for different n-values. The correlation between the xj - and thexk -process is caused by a correlation between the U :s and V :s in the tworepresentations:

E(Uj(n)Uk(n)) = E(Vj(n)Vk(n)) = ρjk(n), j = k,

E(Uj(n)Vk(n)) = −E(Vj(n)Uk(n)) = −ρjk(n), j = k,

E(Uj(n)Vj(n)) = 0,

for some ρjk(n) and ρjk(n) such that 0 ≤ ρjk(n)2 + ρjk(n)2 ≤ 1. Directcalculation of auto- and cross-covariances gives

rjk(t) =∑n

σj(n)σk(n) (ρjk(n) cos ω(n)t + ρjk(n) sin ωnt) ,

=∑n

Ajk(n) cos (ω(n)t − Φjk(n)) , j = k, (6.3)

rkj(t) =∑n

σj(n)σk(n) (ρjk(n) cos ω(n)t − ρjk(n) sin ωnt) ,

=∑n

Akj(n) cos (ω(n)t − Φkj(n)) , j = k, (6.4)

rj(t) =∑n

σj(n)2 cos ω(n)t, j = 1, . . . , p. (6.5)

Here, Ajk(n) = Akj(n) = σj(n)σk(n)√

ρjk(n)2 + ρjk(n)2 , represent the corre-lation between the amplitudes, while Φjk(n) = −Φkj(n), with

cos Φjk(n) =ρjk(n)√

ρjk(n)2 + ρjk(n)2, sin Φjk(n) =

ρjk(n)√ρjk(n)2 + ρjk(n)2

,

represent the phase relations.The corresponding spectral distributions Fkk(ω) are symmetric with mass

ΔFk = σk(n)2/2

at the frequencies ±ω(n), while for j = k , the cross spectrum Fjk(ω) is skewedif ρjk(n) = 0, with mass

ΔFjk =

⎧⎪⎪⎪⎨⎪⎪⎪⎩12Ajk(n) e−iΦjk(n) = 1

2Ajk(n) ρjk(n) + iρjk(n)√ρjk(n)2 + ρjk(n)2

, at ω = ωn,

12Ajk(n) eiΦjk(n) = 1

2Ajk(n)ρjk(n) − iρjk(n)√ρjk(n)2 + ρjk(n)2

, at ω = −ωn.

Section 6.2 Some random field theory 165

6.1.2.2 Phase, amplitude, and coherence spectrum

The frequency dependent function Φjk(n) is called the phase spectrum and itdefines the time delay between components in xk(s) and xj(s+ t); the correla-tion between the components Uj(n) cos(ω(n)(s+t))+Vj(n) sin(ω(n)(s+t)) andUk(n) cos(ω(n)s) + Vk(n) sin(ω(n))s have their maxima at t = Φjk(n)/ω(n).

Further, Ajk(n)/2 is called the amplitude spectrum and the squared coher-ence spectrum is defined as

|ΔFjk(n)|2ΔFjj(n)ΔFkk(n)

= ρjk(n)2 + ρjk(n)2.

For processes with continuous spectra and cross spectral density

fjk(ω) =12Ajk(ω) eiΦjk(ω),

the amplitude spectrum, phase spectrum, and squared coherence spectrum aredefined as Ajk(ω)/2, Φjk(ω), and |fjk(ω)|2/(fjj(ω)fkk(ω)), respectively.

6.1.2.3 Cross-correlation in linear filters

In a linear filter, the cross-covariance and cross spectrum describe the relationbetween the input process x(t) =

∫eiωt dZx(ω) and the output process y(t) =∫

h(t − u)x(u) du =∫

g(ω)eiωt dZx(ω). One calculates easily,

rxy(t) = E(x(s + t) · y(s))

= E

(∫eiω(s+t) dZx(ω) ·

∫g(ω′)eiω′s dZx(ω′)

)=∫

eiωtg(ω) dFx(ω),

so the cross spectral distribution is

dFxy(ω) = g(ω)dFx(ω).

By estimating the cross-covariance or the cross spectrum between input andoutput in an unknown filter, one can estimate the frequency function as g∗(ω) =f∗

xy(ω)/f∗x(ω) = f∗

xy(−ω)/f∗x(ω), when f∗

x(ω) > 0.

6.2 Some random field theory

A random field is a stochastic process x(t) with multi-dimensional parameter

t = (t1, . . . , tp).

For example, if t = (t1, t2) is two-dimensional we can think of (t1, t2, x(t)) as arandom surface. A time-dependent random surface is a field (s1, s2, x(t, s1, s2))with t = (t, s1, s2), where t is time and (s1, s2) ∈ R2 is location. In the general


theory we use t as generic notation for the parameter; in special applicationsto random time dependent surfaces we use (t, (s1, s2)) as parameter.

From being mainly used in geoscience, like geology under the name of geo-statistics, and in marine science as models for a random sea, cf. Section 1.6.3,random fields are now widely used in all sorts of applications with spatial orspatial-temporal variability.

6.2.1 Homogeneous fields

Define the mean value and covariance functions for random fields in the naturalway as m(t) = E(x(t)) and r(t,u) = C(x(t), x(u)).

The analogue of a stationary process is a homogeneous field. The field iscalled homogeneous if m(t) is constant m and r(t,u) depends only on thedifference t − u , i.e. assuming a real field with m = 0,

r(t) = r(u + t,u) = E(x(u + t) · x(u)).

The covariance of the process values at two parameter points depends on dis-tance as well as on direction of the vector between the two points.

In spatial applications it is popular to use the variogram defined by

2γ(u,v) = E(|x(u) − x(v)|2)

or the semi-variogram γ(t).The variogram plays the same role as the incremental variance does in a

Wiener process. There E(|w(s+ t)−w(s)|2 = t ·σ2 is independent of s . A fieldfor which the variogram only depends on the vector u−v is called intrinsicallystationary. The variogram for a homogeneous field is γ(t) = r(0) − r(t). Ahomogeneous field is also intrinsically stationary, but as seen from the Wienerprocess the converse is not sure.

Theorem 6:2 (a) The covariance function r(t) of a homogeneous random fieldhas a spectral distribution

r(t) =∫

eiω·t dF (ω),

where ω·t = ω1t1+. . .+ωptp , and F (ω) is a p-dimensional spectral distributionfunction depending on the frequency parameter ω = (ω1, . . . , ωp).1

(b) There exists a stochastic spectral process Z(ω) with orthogonal incrementsΔZ(ω) over rectangles Δω = [ω1, ω1 + Δ1] × . . . × [ωp, ωp + Δp], such thatE(ΔZ(ω)) = 0 and

E(|ΔZ(ω)|2) = ΔF (ω), and E(ΔZ(ω1) · ΔZ(ω2)) = 0,1 F (ω) is equal to a regular p -dimensional probability distribution function multiplied by

a positive constant, equal to the variance of the process.


for disjoint rectangles Δω1 and Δω2 , and

x(t) =∫

eiω·t dZ(ω). (6.6)

6.2.2 Isotropic fields

In an homogeneous isotropic field, the correlation properties are the same inall directions, i.e. the covariance function r(t) depends only on the distance

‖t‖ =√

t21 + . . . + t2p . This type of process model is natural when one cannot identify any special directional dependent stochastic properties in the field,but rather it keeps its distributions after rotation (and translation). Manyphenomena in the natural world share this property while other naturally donot. Ocean waves are non-isotropic, chemical concentration field in the bottomsediment in the ocean or the disturbance field in mobile phone communicationmight well be isotropic.

As we have seen, a spectral distribution for a honogeneous field need onlysatisfy the requirement that it is non-negative, integrable and symmetric. Thespectral distribution for an isotropic field needs to satisfy a special invariancecondition, giving it a particularly simple structure.

Theorem 6:3 The covariance function r(t) to a homogeneous isotropic fielda mixture of Bessel functions,

r(t) = r(‖t‖) =∫ ∞

0

J(p−2)/2(ω · ‖t‖)(ω · ‖t‖)(p−2)/2

dG(ω),

where G(ω) is a bounded, non-decreasing function, and Jm(ω) is a Bessel func-tion of the first kind of order m,

Jm(ω) =∞∑

k=0

(−1)k(ω/2)2k+m

k! Γ(k + m + 1).

Proof: We have r(t) =∫

eiω·t dF (ω). Introduce the spherical coordinates,ω = ω · (�1, . . . , �p), ω = ‖ω‖,

∑�2k = 1, and let �1 = cos θp−1 . For every

θp−1 , (�2, . . . , �p), defines a point on the sphere with radius√

1 − cos2 θp−1 =sin θp−1 .

Since r(t) depends only on ‖t‖, we can calculate the integral for the specialpoint t = (t, 0, . . . , 0), to get

r(t) =∫

eiω1t dF (ω) =∫

eiω t cos θp−1 dF (ω).

With G(ω) =∫‖ω‖≤ω dF (ω) we find that

r(t) =∫ ∞

ω=0

{∫θeiωt cos θp−1 dσ(θ)

}dG(ω),


where dσ is the rotation invariant measure on the (p − 1)-dimensional unitsphere. For fixed θp−1 we integrate (θ1, . . . , θp−2) over the sphere with radiussin θp−1 , with area Cp−1 sinp−2 θp−1 . We find

r(t) =∫ ∞

ω=0

{∫ π

θ=0Cp−2e

iωt cos θ sinp−2 θ dθ

}dG(ω),

which is the stated form, if we incorporate the constant Cp−2 into G. �

6.2.2.1 Isotropic fields with special structure

Of course, a homogeneous field x(t1, t2) with two-dimensional parameter andspectral density fx(ω1, ω2) and covariance function rx(t1, t2) gives rise to astationary process when observed along a single straight line, for example alongt2 = 0. Then x(t1, 0) has covariance function r(t1) = rx(t1, 0) and spectraldensity f(ω1) =

∫ω2

fx(ω1, ω2) dω2 .It is important to realise that even if any non-negative definite function can

act as covariance function for a stationary process, not all stationary processes,and corresponding covariance functions, can occur as an observation of a sectionin an isotropic field with two-dimensional parameter. Only those functions thatcan be expressed as

r(t) =∫ ∞

0J0(ω‖t‖) dG(ω),

with bounded non-decreasing G(ω) are possible. A simple sufficient conditionfor a one-dimensional spectral density f(ω) to be obtained from a section ofan isotropic field is the following:

• Any function f(ω), ω ∈ R that is non-increasing for ω ≥ 0 and integrableover R can act as the spectral density for a section, e.g. x(t1, 0), of anisotropic field in two dimensions.

One should also note the particularly simple form of the covariance functionfor the case p = 3,

r(‖t‖) =∫ ∞

0

sin(ω‖t‖)ω‖t‖ dG(ω).

Another special case that needs special treatment is when the parameter tis composed of both a time and a space parameter, t = (t, s1, . . . , sp), one couldhope for isotropy in (s1, . . . , sp) only, in which case the spectral form becomes

r(t, ‖(s1, . . . , sp)‖) =∫ ∞

ν=−∞

∫ ∞

ω=0eiνt · Hp(ω‖(s1, . . . , sp)‖) dG(ν, ω),

whereHp(x) = ((2/x))(p−2)/2 Γ(p/2)J(p−2)/2(x).

The form of the covariance function for an isotropic field as a mixture ofBessel functions is useful for non-parametric estimation from data. It is only


the weight function G(ω) that needs to be estimated, for example as a discretedistribution.

The theorem gives all possible isotropic covariance functions valid in thespecial dimension p . Some functions can be used as covariance function in anydimension, for example

r(‖t‖) = σ2 exp(−φ‖t‖α),

for any α ∈ (0, 2).Another popular class of covariance functions valid in any dimension is the

Whittle-Matern family, which is

r(‖t‖) =σ2

2ν−1Γ(ν)(2√

ν ‖t‖φ)ν Kν(2√

ν ‖t‖φ),

where Kν is a modified Bessel function of order ν . The value of ν determinesthe smoothness of the field. When ν → ∞ , the Whittle-Matern covariancefunctions tends to a Gaussian shape, in which case the field is infinitely differ-entiable; cf. Chapter 2.

In all these forms, φ is a spatial scale parameter. Often one adds an extravariance term σ2

0 at the center, for t = 0 , to account for the so called nuggeteffect, a variance component with a covariance range too short to be observed.

A modern account of spatial data modeling is given in [2], also by means ofspectral and correlation models.

6.2.3 Randomly moving surfaces

The spectral representation makes it easy to imagine the structure of a ran-domly moving surface, homogeneous in time as well as in space. The spectralrepresentation (6.6) is a representation of real field x(t) as a packet of directedwaves, At cos(ω · t + φω), with random amplitude and phase, and constant oneach plane parallel to ω · t = 0. For example, with t = (s1, s2, t) and t is timeand (s1, s2) is space, and ω = (κ1, κ2, ω), the elementary waves are

Aω cos(κ1s1 + κ2s2 + ωt + φω).

For fixed t this is a cosine-function in the plane, which is zero along linesκ1s1+κ2s2+ωt+φω = π/2+kπ , k integer. For fixed (s1, s2) it is a cosine wavewith frequency ω . The parameters κ1 and κ2 are called the wave numbers.

In general there is no particular relation between the time frequency ω andthe space frequencies ν , except for water waves, which we shall deal with later.However, one important application of space-time random fields is the model-ing of environmental variables, like the concentration of a hazardous pollutant.Over a reasonably short period of time the concentration variation may beregarded as statistically stationary in time, at least averaged over a 24 hour


period. But it is often unlikely that the correlation structure in space is inde-pendent of the absolute location. Topography, location of cities and pollutantsources, makes the process inhomogeneous in space.

One way to overcome the inhomogeneity is to make a transformation of thespace map and move each observation point (s1, s2) to a new location (s1, s2)so that the field x(t, s1, s2) = x(t, s1, s2) is homogeneous. This may not beexactly attainable, but the technique is often used in environmental statisticsfor planning of measurements.

6.2.4 Stochastic water waves

Stochastic water waves are special cases of homogeneous random fields, forwhich there is a special relation between time and space frequencies (wavenumbers). For a one-dimensional time dependent Gaussian wave x(t, s), wheres is distance along an axis, the elementary waves have the form

Aω cos(ωt − κs + φω).

By physical considerations one can derive an explicit relation, the dispersionrelation, between wave number κ and frequency ω . If h is the water depth,

ω2 = κg tanh(hκ),

which for infinite depth2 reduces to ω2 = κg. Here g is the constant of gravity.A Gaussian random wave is a mixture of elementary waves of this form, in

spectral language, with κ > 0 solving the dispersion relation,

x(t, s) =∫ ∞

ω=−∞ei(ωt−κs) dZ+(ω) +

∫ ∞

ω=−∞ei(ωt+κs) dZ−(ω)

= x+(t, s) + x−(t, s).

Here is a case when it is important to use both positive and negative frequencies;cf. the comments in Section 4.3.3.4. Waves described by x+(t, s) move to theright and waves in x−(t, s) move to the left with increasing t .

Keeping t = t0 or s = s0 fixed, one obtains a space wave, x(t0, s), and a timewave, x(t, s0) respectively. The spectral density of the time wave, x(t, s0) =x+(t, s) + x−(t, s0) is called the wave frequency spectrum,

f freqx (ω) = f+(ω) + f−(ω),

and we see again that it is not possible to distinguish between the two wavedirections by just observing the time wave.

2 tanh x = (ex − e−x)/(ex + e−x) .


The space wave has a wave number spectrum given by the equation, forinfinite water depth, with ω2 = gκ > 0,

f timex (ω) =

2ωg

f spacex (ω2/g),

f spacex (κ) =

12

√g

κf time

x (√

gκ).

One obvious effect of these relations is that the space process seems to havemore short waves than can be inferred from the time observations. Physicallythis is due to the fact that short waves travel with lower speed than long waves,and they are therefore not observed as easily in the time process. Both thetime wave observations and the space registrations are in a sense “biased” asrepresentatives for the full time-space wave field.

In Chapter 3, Remark 3:3, we introduced the mean period 2π√

ω0/ω2 ofa stationary time process. The corresponding quantity for the space wave isthe mean wave length, i.e. the average distance between two upcrossings of themean level by the space process x(t0, s). It is expressed in terms of the spectralmoments of the wave number spectrum, in particular

κ0 =∫

κf space

x (κ) dκ =∫

ωf time

x dω = ω0,

κ2 =∫

κκ2f space

x (κ) dκ =∫

ω

ω4

2g2

√g2

ω2f time

x (ω)2ωg

dω = ω4/g2.

The average wave length is therefore 2πg√

ω0/ω4 . We see that the average wavelength is more sensitive to the tail of the spectral density than is the averagewave period. Considering the difficulties in estimating the high frequency partof the wave spectrum, all statements that rely on high spectral moments areunreliable.

The case of a two-dimensional time dependent Gaussian wave x(t, s1, s2),the elementary waves with frequency ω and direction θ become

Aω cos(ωt − κ(s1 cos θ + s2 sin θ) + φω), (6.7)

where ω > 0 and κ > 0 is given by the dispersion relation. With this choice ofsign, θ determines the direction in which the waves move.

The spectral density for the time-space wave field specifies the contributionto x(t, (s1, s2)) from elementary waves of the form (6.7). Summed (or ratherintegrated) over all directions 0 ≤ θ < 2π , they give the time wave x(t, s0), inwhich one cannot identify the different directions. The spectral distribution,called the directional spectrum, is therefore often written in polar form, basedon the spectral density f time

x (ω) for the time wave, as

f(ω, θ) = f timex (ω)g(ω, θ).

The spreading function g(ω, θ), with∫ 2π0 g(ω, θ) dθ = 1, specifies the relative

contribution of waves from different directions. It may we frequency dependent.


0.2

0.4

0.6

0.8

30

210

60

240

90

270

120

300

150

330

180 0

Directional Spectrum; level curves at 2, 4, 6, 8, 10

0 50 100 150 200 250 3000

50

100

150

200

250

300

Figure 6.1: Left: Level curves for directional spectrum with frequency depen-dent spreading. Right: Level curves for simulated Gaussian space sea.

Example 6:1 Wave spectra for the ocean under different weather conditionsare important to characterize the input to (linear or non-linear) ship models.Much effort has been spent on design and estimation of typical wave spectra.One of the most popular is the Pierson-Moskowitz spectrum,

f timePM (ω) =

α

ω5e−1.25(ωm/ω)4 .

or the variant, the Jonswap spectrum, in which an extra factor γ > 1 isintroduced to enhance the peak of the spectrum,

f timeJ (ω) =

α

ω5e−1.25(ωm/ω)4 γexp(−(1−ω/ωm)2/2σ2

m).

In both spectra, α is a main parameter for the total variance, and ωm definesthe “peak frequency”. The parameters γ and σm determine the peakednessof the spectrum. The spectrum and a realization of a Gaussian process withJonswap spectrum was shown in Example 1:4.

Figure 6.1 shows to the right, the level curves for a simulated Gaussianwave surface with the directional spectrum with frequency dependent spreading,shown on the left. Frequency spectrum is of Jonswap type.

As mentioned in the historical Section 1.6.3, Gaussian waves have been usedsince the early 1950, with great success. However, since Gaussian processes aresymmetric, x(t) has the same distribution as −x(t) and as x(−t), they are notvery realistic for actual water waves except in special situations; deep water, nostrong wind. Much research is presently devoted to development of “non-linear”stochastic wave models, where elementary waves with different frequencies caninteract, in contrast to the “linear” Gaussian model, where they just add up.


Exercises

6:1. To be written.

Appendix A

The axioms of probability

Here is a collection of the basic probability axioms, together with proofs of theextension theorem for probabilities on a field (page 5), and of Kolmogorov’sextension theorem, from finite-dimensional probabilities to infinite-dimensionalones (page 10).

A.1 The axioms

A probability P is a countably additive measure on a probability space, asdefined here.

Definition A:1 (a) A family of subsets F0 to an arbitrary space Ω is calleda field (or algebra) if it contains the whole set Ω and is closed under the setoperations complement, A∗ , union, A ∪ B , and intersection, A ∩ B , i.e. if A

and B are sets in the family F0 , then also the complement A∗ and the unionA ∪ B belong to F0 , etc. It then also contains all unions of finitely many setsA1 , . . . , An in F0 .

(b) A field F of subsets is called a σ -field (or σ -algebra) if it contains allcountable unions and intersections of its sets, i.e. if it is a field and furthermore,

A1, A2, . . . ∈ F implies ∪∞n=1 An ∈ F .

(c) A probability measure P on a sample space Ω with a σ -field F of events,is a function defined for every A ∈ F , with the properties

(1) 0 ≤ P (A) ≤ 1 for all A ∈ F .

(2) P (Ω) = 1.

(3) For disjoint sets Ak ∈ F , k = 1, 2, . . . one has P (∪∞1 Ak) =

∑∞1 P (Ak).

(d) Equivalent with (c) is

175

176 Appendix A.

(1) 0 ≤ P (A) ≤ 1 for all A ∈ F .

(2) P (Ω) = 1.

(3’) If A1, A2 ∈ F are disjoint, then P (A1 ∪ A2) = P (A1) + P (A2).

(3”) If A1 ⊇ A2 ⊇ . . . ∈ F with ∩∞1 An = ∅, then limn→∞ P (An) = 0.

A typical field F0 in R is the family of sets which are unions of a finitenumber of intervals. The smallest σ -field that contains all sets in F0 is thefamily B of Borel sets.

A.2 Extension of a probability from field to σ-field

How do we define probabilities? For real events, of course, via a statisticaldistribution function F (x). Given a distribution function F , we can define aprobability P for finite unions of real half open disjoint intervals,

P (∪n1 (ak, bk]) =

n∑1

(F (bk) − F (ak)) . (A.1)

If we define the probability of a single point as P ([a]) = F (a) − limx↑a F (x),we can define a probability to every finite union of real intervals.

Sets which are unions of a finite number real intervals, (a, b] , (a, b), [a, b),[a, b] with −∞ ≤ a < b ≤ ∞ , form a field on R , and they can be givenprobabilities via a distribution function. The question is, does this also giveprobabilities to the more complicated events (Borel sets) in the σ -field F gen-erated by the intervals. The answer is yes, as stated in the following extensiontheorem, Caratheodory’s extension theorem, which is valid not only for intervalsand Borel sets, but for any field F0 and generated σ -field F . For a proof, thereader is referred to any text book in probability or measure theory, e.g. [36].

Theorem A:1 Suppose P is a function which is defined for all sets in a fieldF0 , there satisfying the probability axioms, i.e. (with three equivalent formula-tions of Condition 4),

(1) 0 ≤ P (A) ≤ 1 for all A ∈ F0 .

(2) P (Ω) = 1.

(3) If A1, A2 ∈ F0 are disjoint, then P (A1 ∪ A2) = P (A1) + P (A2).

(4a) If A1 ⊇ A2 ⊇ . . . ∈ F0 with ∩∞1 An = ∅, then limn→∞ P (An) = 0.

(4b) If A1, A2, . . . ∈ F0 are disjoint and ∪∞k=1Ak ∈ F0 , then

P (∪∞k=1Ak) =

∞∑k=1

P (Ak).

Section A.2 Extension of a probability from field to σ -field 177

(4c) If A1, A2, . . . ∈ F0 are disjoint and ∪∞k=1Ak = Ω then

∑∞k=1 P (Ak) = 1.

Then one can extend P to be defined, in one and only one way, for all sets inthe σ -field F generated by F0 , so that it still satisfies the probability axioms.

We can now state and prove the existence of probability measures on thereal line with a given distribution function.

Theorem A:2 Let F (x) be a statistical distribution function on R, i.e. a non-decreasing, right-continuous function with F (−∞) = 0, F (∞) = 1. Thenthere exists exactly one probability measure P on (R,B), such that P ((a, b]) =F (b) − F (a).

Proof: We shall use the extension Theorem A:1. The Borel sets B equal theσ -field generated by the field F0 of unions of finitely many intervals. Equation(A.1) extended by singletons, defines P for each set in F0 , and it is easilychecked that properties (1), (2), and (3) hold. The only difficult part is (4a),which we prove by contradiction.

The idea is to use Cantor’s theorem that every decreasing sequence of com-pact, non-empty sets, has a non-empty intersection. Assume, for a decreasingsequence of sets An ∈ F0 , that P (An) ↓ h > 0. We show that then the in-tersection of the An -sets is not empty. Each An consists of finitely many halfopen intervals. It is then possible to remove from An a short piece from theleft end, to make it closed and bounded, i.e. there exists a compact, nonempty,Kn ⊂ An , such

P (An − Kn) ≤ ε/2n.

(Convince yourself that P (An − Kn) is defined.) Then

Lm = ∩m1 Kn ⊆ Km ⊆ Am,

form a decreasing sequence,

L1 ⊇ L2 ⊇ . . . .

If we can prove that the Lm can be taken nonempty, we can use Cantor’stheorem, and conclude that they have a nonempty intersection, i.e. there existsat least one point x ∈ ∩∞

1 Lm , which also implies x ∈ ∩∞1 Am , so the Ak do not

decrease to the empty set. The proof would be finished.Hence, it remains to prove that we can choose each Lm nonempty. Take

ε < h . Then

P (Am − Lm) = P (Am − ∩m1 Kn) = P (∪m

1 (Am − Kn))

≤m∑1

P (Am − Kn) ≤m∑1

P (An − Kn) ≤m∑1

ε/2n ≤ ε,

P (Lm) = P (Am) − P (Am − Lm) ≥ h − ε > 0,

and Lm is non-empty. �

178 Appendix A.

A.3 Kolmogorov’s extension to R∞

Kolmogorov’s existence, or extension, theorem from 1933, allows us to definea stochastic process through its family of finite-dimensional distribution func-tions. Kolmogorov’s book [20] appeared after a period of about 30 years at-tempts to give probability theory a solid mathematical foundation; in fact,Hilbert’s sixth problem (1900) asked for a logical investigation of the axioms ofprobability.

Theorem A:3 Extension formulation: Every consistent family {Pn} of prob-ability measures on (Rn,Bn), n = 1, 2, . . . , can be uniquely extended to a prob-ability measure P on (R∞,B∞), i.e. in such a way that

P ((a1, b1] × (a2, b2] × . . . × (an, bn] × R∞)

= Pn ((a1, b1] × (a2, b2] × . . . × (an, bn]) .

Existence formulation: To every consistent family of finite-dimensional dis-tribution functions, F = {Ftn}∞n=1, there exists one and only one probabilitymeasure P on (R∞,B∞) with

P (x1 ≤ b1, . . . , xn ≤ bn) = Fn(b1, . . . , bn).

Proof: We prove first the Extension formulation. We are given one probabilitymeasure Pn on each one of (Rn,Bn), n = 1, 2, . . . . Consider the intervals inR∞ , i.e. the sets of the form

I = {x = (x1, x2, . . .); xi ∈ (ai, bi], i = 1, . . . , n},

for some n , and unions of a finite number of intervals. Define P for eachinterval by

P (I) = Pn

(n∏1

(ai, bi]

).

Let I1 and I2 be two disjoint intervals. They may have different dimensions(n1 = n2 ), but setting suitable ai or bi equal to ±∞ , we may assume thatthey have the same dimension. The consistency of the family {Pn} guaranteesthat this does not change their probabilities, and that the additivity propertyholds, that is, if also I1 ∪ I2 is an interval, then P (I1 ∪ I2) = P (I1) + P (I2). Itis easy to extend P with additivity to all finite unions of intervals. By this wehave defined P on the field F0 of finite unions of intervals, and checked thatproperties (1), (2), and (3) of Theorem A:1 hold.

Now check property (4a) in the same way as for Theorem A:2, for a decreas-ing sequence of non-empty intervals with empty intersection,

I1 ⊇ I2 ⊇ . . . , with ∩∞1 In = ∅,

Section A.3 Kolmogorov’s extension to R∞ 179

and suppose P (In) ↓ h > 0.1 We can always assume In to have dimension n ,

In = {x ∈ R∞; a(n)j < xj ≤ b

(n)j , j = 1, . . . , n},

and we can always assume the aj and bj to be bounded. As in the proof ofTheorem A:2, remove a small piece of the lover side of each interval In to get acompact Kn and define Lm = ∩m

1 Kn . By removing a small enough piece onecan obtain that P (Lm) ≥ h/2 > 0 so Lm is non-empty.

If we write

L1 : α(1)1 ≤ x1 ≤ β

(1)1

L2 : α(2)1 ≤ x1 ≤ β

(2)1 , α

(2)2 ≤ x2 ≤ β

(2)2

L3 : α(3)1 ≤ x1 ≤ β

(3)1 , α

(3)2 ≤ x2 ≤ β

(3)2 , α

(3)3 ≤ x3 ≤ β

(3)3

......

......

For each j , [α(n)j , β

(n)j ] , n = j, j +1, . . . , is a decreasing sequence of non-empty,

closed and bounded intervals, and by Cantor’s theorem they have at least onecommon point, xj ∈ ∩∞

n=j[α(n)j , β

(n)j ] . Then, x = (x1, x2, . . .) ∈ Ln for all n .

Hence x ∈ Ln ⊆ In for all n and the intersection ∩∞1 In is not empty. This

contradiction shows that P (In) ↓ 0, and (3”) is shown to hold.The conditions (1), (2), (3), and (4a) of Theorem A:1 are all satisfied, and

hence P can be extended to the σ -field F generated by the intervals.

To get the Existence formulation, just observe that the family of finite-dimen-sional distributions uniquely defines Pn on (Rn,Bn), and use the extension.

�

Exercises

A:1. Let Z be the integers, and A the family of subsets A , such that eitherA or its complement Ac is finite. Let P (A) = 0 in the first case andP (A) = 1 in the second case. Show that P can not be extended to aprobability to σ(A), the smallest σ -field that contains A .

1Property (4a) deals with a decreasing sequence of finite unions of intervals. It is easy to

convince oneself that it suffices to show that (4a) holds for a decreasing sequence of intervals.

180 Appendix A.

Appendix B

Stochastic convergence

Here we summarize the basic types of stochastic convergence and the ways wehave to check the convergence of a random sequence with specified distributions.

Definition B:1 Let {xn}∞n=1 be a sequence of random variables x1(ω), x2(ω),. . . defined on the same probability space, and let x = x(ω) be a random variable,defined on the same probability space. Then, the convergence xn → x as n → ∞can be defined in three ways:

• almost surely, with probability one (xna.s.→ x): P ({ω;xn → x}) = 1;

• in quadratic mean (xnq.m.→ x): E

(|xn − x|2

)→ 0;

• in probability (xnP→ x): for every ε > 0, P (|xn − x| > ε) → 0.

Furthermore, xn tends in distribution to x, (in symbols xnL→ x) if

P (xn ≤ a) → P (x ≤ a)

for all a such that P (x ≤ u) is a continuous function of u at u = a.

B.1 Criteria for convergence almost surely

In order for a random sequence xn to converge almost surely (i.e. with proba-bility one), to the random variable x , it is necessary and sufficient that

limm→∞P (|xn − x| > δ for at least one n ≥ m) = 0 (B.1)

for every δ > 0.

181

182 Appendix B.

To prove this, note that if ω is an outcome such that the real sequencexn(ω) does not converge to x(ω), then

ω ∈⎧⎨⎩

∞⋃q=1

∞⋂m=1

∞⋃n=m

|xn(ω) − x(ω)| > 1/q

⎫⎬⎭ .

Here, the innermost event has probability

P (∪∞n=m |xn(ω) − x(ω)| > 1/q) = P (|xn − x| > 1/q for at least one n ≥ m) ,

and this is 0 for all q if and only if (B.1) holds for all δ > 0. The reader shouldcomplete the argument, using that P (∪kAk) = 0 if and only if P (Ak) = 0 forall k , and the fact that if B1 ⊇ B2 ⊇ . . . is a non-increasing sequence of events,then P (∩kBk) = limk→∞ P (Bk). Now,

P (|xn − x| > δ for at least one n ≥ m) ≤∞∑

n=m

P (|xn − x| > δ),

and hence a simple sufficient condition for (B.1) and a sufficient condition foralmost sure convergence is that for all δ > 0,

∞∑n=1

P (|xn − x| > δ) < ∞. (B.2)

(In fact, the first Borel-Cantelli lemma directly shows that (B.2) is sufficientfor almost sure convergence.)

A simple moment condition is obtained from the inequality P (|xn − x| >δ) ≤ E(|xn − x|h)/δh , giving that a sufficient condition for almost sure conver-gence is

∞∑n=1

E(|xn − x|h) < ∞, (B.3)

for some h > 0.A Cauchy convergence type condition is the following: sufficient condition

for almost sure convergence: if there exist two sequences of positive numbersδn and εn such that

∑∞n=1 δn < ∞ and

∑∞n=1 εn < ∞ , and such that

P (|xn+1 − xn| > δn) < εn, (B.4)

then there exists a random variable x such that xna.s.→ x .

To see this, use the Borel-Cantelli lemma to conclude that

P (|xn+1 − xn| > δn for infinitely many n) = 0.

Thus, for almost all ω , there is a number N , depending on the outcome ω ,such that

|xn+1 − xn| < δn for all n ≥ N .

Since∑

δn < ∞ , the sequence xn(ω) converges to a limit x(ω) for theseoutcomes. For ω where the limit does not exist, set x(ω) = 0, for example.Then xn

a.s.→ x , as was to be proved.

Section B.2 Criteria for convergence in quadratic mean 183

B.1.0.1 Uniform convergence of random functions

A sequence of random variables can converge almost surely, and we have justgiven sufficient conditions for this. But we shall also need convergence of asequence of random functions {xn(t); t ∈ T}, where T = [a, b] is a closedbounded interval.

Definition B:2 A sequence of functions {xn(t); a ≤ t ≤ b} converges uni-formly to the function {x(t); a ≤ t ≤ b} if

maxa≤t≤b

|xn(t) − x(t)| → 0, as n → ∞ ,

that is, if xn lies close to the limiting function x in the entire interval [a, b]for all sufficiently large n .

It is a basic result in real analysis that if a sequence of continuous functionsconverges uniformly in a closed and bounded interval, then the limiting functionis also continuous. This fact will be useful when we show the almost sure samplefunction continuity of a random function.

Condition (B.4) can be restated to deal with almost sure uniform conver-gence of random functions: if there exist two sequences of positive numbers δn

and εn such that∑∞

n=1 δn < ∞ and∑∞

n=1 εn < ∞ , and such that

P ( maxa≤t≤b

|xn+1(t) − xn(t)| > δn) < εn, (B.5)

then there exists a random function x(t); a ≤ t ≤ b , such that xn(t) a.s.→ x(t)uniformly for t ∈ [a, b] .

B.2 Criteria for convergence in quadratic mean

Some of the representation theorems for stationary processes express a processas a complex stochastic integral, defined as a limit in quadratic mean of ap-proximating sums of complex-valued random variables. To define a quadraticmean integral, or other limit of that kind, one needs simple convergence criteriafor when xn

q.m.→ x for a sequence of random variables with E(|xn|2) < ∞ .The Cauchy convergence criterion for convergence in quadratic mean states

that a necessary and sufficient condition for that there exists a (possibly com-plex) random variable x such that xn

q.m.→ x is that

E(|xm − xn|2) → 0, (B.6)

as n and m tend to infinity, independently of each other. (In mathematicallanguage, this is the completeness of the space L2 .)

184 Appendix B.

The limit x has E(|x|2) = lim E(|xn|2) < ∞ , and E(xn) → E(x). If thereare two convergent sequences, xn

q.m.→ x and ynq.m.→ y , then

E(xnyn) → E(xy). (B.7)

To show quadratic mean convergence of stochastic integrals, the followingcriterion is useful:

the Loeve criterion: the sequence xn converges in quadratic mean if andonly if

E(xmxn) has a finite limit c, (B.8)

when m and n tend to infinity independently of each other.The if part follows from E(|xm−xn|2) = E(xmxm)−E(xmxn)−E(xnxm)+

E(xnxn) → c−c−c+c = 0. The only if part follows from E(xmxn) → E(xx) =E(|x|2).

B.3 Criteria for convergence in probability

Both almost sure convergence and convergence n quadratic mean imply con-vergence in probability. Further, if xn

P→ x then there exists a subsequencenk → ∞ as k → ∞ , such that xnk

a.s.→ x .To prove this, we use criterion (B.2). Take any sequence εk > 0 such that

∞∑k=1

εk < ∞.

If xnP→ x , take any δ > 0 and consider P (|xn − x| > δ) → 0 as n → ∞ . The

meaning of the convergence is that for each εk there is an Nεksuch that

P (|xn − x| > δ) < εk,

for all n ≥ Nεk. In particular, with nk = Nεk

, one has

∞∑k=1

P (|xnk− x| > δ) <

∞∑k=1

εk,

which is finite by construction. The sufficient criterion (B.2) gives the desiredalmost sure convergence of the subsequence xnk

.

Exercises

B:1. Prove the Borel-Cantelli lemma:

Section B.3 Criteria for convergence in probability 185

a) If Ak are events in a probability space (Ω,F , P ), then∑k

P (Ak) < ∞,

implies P (Ak infinitely often) = 0.

b) If the events Ak are independent, then∑k

P (Ak) = ∞,

implies P (Ak infinitely often) = 1.

B:2. Let x1, x2, . . . be independent identically distributed random variables.Show that

E(|xk| < ∞ if and only if P (|xk| > k infinitely often) = 0 .

B:3. Suppose the random sequences xn and x′n have the same distribution.

Prove that if xna.s.→ x then there exists a random variable x′ such that

x′n

a.s.→ x′ .

186 Appendix B.

Appendix C

Hilbert space and random

variables

C.1 Hilbert space and scalar products

A Hilbert space is a set of elements which can be added and multiplied bycomplex numbers, and for which there is defined an inner product. The innerproduct in a Hilbert space has the same mathematical properties as the covari-ance between two random variables with mean zero, and therefore it is naturalto think of random variables as elements in a Hilbert space. We summarizehere the basic properties of a Hilbert space, for use in Chapters 3 and 4. Forfurther reading on Hilbert spaces and on metric spaces, see e.g. the classicalbook by Royden [29].

Definition C:1 A general Hilbert space H over the complex numbers C is aset of elements, usually called points or vectors, with the following properties:

1. The operations addition and subtraction are defined, and there exists aunique ”zero” element 0 ∈ H and to each x ∈ H there is a uniqueinverse −x:

x + y = y + x ∈ H,

x + 0 = x,

x + (−x) = 0.

2. Multiplication with complex scalar is defined (usually written cx = c · x):

c · x ∈ H,

0 · x = 0,

1 · x = x.

187

188 Appendix C.

3. A scalar (inner) product (x, y) is defined such that:

(x, y) = (y, x) ∈ C,

(ax + by, z) = a(x, z) + b(y, z),(x, x) ≥ 0,(x, x) = 0 if and only if x = 0.

4. A norm ‖x‖ and a distance d(x, y) = ‖x−y‖ are defined, and convergencehas the standard meaning: if x ∈ H then ‖x‖ = (x, x)1/2 , and if xn, x ∈H then limn→∞ xn = x if and only if ‖xn − x‖ → 0.

5. The space is complete in the sense that if xn, x ∈ H and ‖xm − xn‖ → 0as m,n → ∞ then there is a point x ∈ H such that limn→∞ xn = x.

Remark C:1 If H is a space that satisfies(1-3) in the definition, then it canbe completed and made a Hilbert space that satisfies also (5).

We list some further properties of Hilbert spaces and scalar products, whichwill be seen to have parallels as concepts for random variables:

Schwarz inequality: |(x, y)| ≤ ‖x‖ ·‖y‖ with equality if and only if (y, x)x =(x, x)y ,

Triangle inequality: ‖x + y‖ ≤ ‖x‖ + ‖y‖,

Continuity: if xn → x and yn → y then (xn, yn) → (x, y),

Pythagorean theorem: if x and y are orthogonal, i.e. (x, y) = 0, then

‖x + y‖2 = ‖x‖2 + ‖y‖2.

C.1.0.2 Linear subspaces:

Let L = {xj ∈ H; j = 1, 2, . . .} be a set of elements in a Hilbert space H , andlet

M0 = {a1x1 + . . . + akxk; k = 1, 2, . . . ; aj ∈ C}be the family of all finite linear combinations of elements in L . Then

M = M0 = S(L) ={x ∈ H;x = lim

n→∞xn for some xn ∈ M0

}is called the subspace of H spanned by L . It consists of all elements in Hwhich are linear combinations of elements in L or are limits of such linearcombinations. It is a subspace in the sense that it is closed under addition,multiplication by scalar, and passage to a limit.

Section C.2 Projections in Hilbert space 189

C.2 Projections in Hilbert space

Two elements in a Hilbert space are called orthogonal, written x ⊥ y , if (x, y) =0. Two subsets L1 and L2 are said to be orthogonal, L1 ⊥ L2 , if all elementsx ∈ L1 are orthogonal to all elements y ∈ L2 . Similarly, two subspaces M1

and M2 are orthogonal, M1 ⊥ M2 , if all elements in M1 are orthogonal toall elements in M2 . The reader should check that if L1 ⊥ L2 , then S(L1) ⊥S(L2).

For a sequence of subspaces, M1, . . . ,Mk of H , write

V = M1 ⊕ . . . ⊕Mk

for the vector sum of M1, . . . ,Mk , which is the set of all vectors x1 + . . . +xk ,where xj ∈ Mj , for j = 1, . . . , k .

C.2.0.3 The projection theorem

Let M be a subspace of a Hilbert space H , and let x be a point in H not inM . Then x can be written in exactly one way as a sum

x = y + z

with y ∈ M and z = (x − y) ⊥ M . Furthermore, y is the point in M whichis closest to x ,

d(x, y) = minw∈M

d(x,w).

and equality holds if and only if w = y .The most common use of the projection theorem is to approximate a point

x in a general Hilbert space by a linear combination, or a limit thereof, of afinite or infinite number of certain elements in H .

C.2.0.4 Separable spaces and orthogonal bases

A Hilbert space H is called separable if it contains a countable set of elementsx1, x2, . . . such that the subspace spanned by all the xj is equal to H . If the x-variables are linearly independent, i.e. there is no non-trivial linear combinationequal to 0 , a1x1 + . . . + anxn = 0 , it is possible to find orthogonal elementsy1, y2, . . . , such that

y1 = c11x1,

y2 = c21x1 + c22x2,

· · ·yn = cn1x1 + cn2x2 + . . . + cnnxn,

· · ·

190 Appendix C.

This is the Gram-Schmidt orthogonalization process, and orthogonal means that

(yj, yk) = δjk ={

1, j = k0, j = k.

The sequence y1, y2, . . . is called a complete orthogonal basis for the Hilbertspace H . It is a basis, i.e. every element in H can be written as a linear com-bination of yk -elements or as a limit of such combinations, and it is orthogonalby construction. It is furthermore complete, i.e. there is no element z ∈ H suchthat

‖z‖ > 0, (z, yj) = 0, for all j .

C.3 Stochastic processes and Hilbert spaces

A Hilbert space is a set of elements which can be added and multiplied bycomplex numbers, and for which there is defined an inner product. The innerproduct in a Hilbert space has the same mathematical properties as the covari-ance between two random variables with mean zero, and therefore it is naturalto think of random variables as elements in a Hilbert space; see Appendix Cfor a summary of elementary properties of Hilbert spaces.

We shall consider a very special Hilbert space, namely the space of allrandom variables X on a probability space (Ω,F , P ), which have zero meanand finite variance.

Theorem C:1 If (Ω,F , P ) is a probability space, then

H = {random variables x on (Ω,F , P ) such that E(x) = 0, E(|x|2) < ∞}

with the scalar product(x, y) = E(xy)

is a Hilbert space; it will be denoted H(Ω).

First, it is clear that (x, y) = E(xy) has the properties of a scalar product;check that. It is also clear that we can add random variables with mean zeroand finite variance to obtain new random variables with the same properties.Also, ‖x‖ =

√E(|x|2), which means that if ‖x‖ = 0, then P (x = 0) = 1,

so random variables which are zero with probability one, are, in this context,defined to be equal to the zero element 0 . Convergence in the norm ‖ · ‖ isequal to convergence in quadratic mean of random variables, and if a sequenceof random variables xn is a Cauchy sequence, i.e. ‖xm−xn‖ → 0 as m,n → ∞ ,then we know that it converges to a random variable x with finite mean, whichmeans that H(Ω) is complete. Therefore it has all the properties of a Hilbertspace.

Section C.3 Stochastic processes and Hilbert spaces 191

C.3.0.5 A stochastic process as a curve in H(Ω)

A random variable with E(x) = 0 and finite variance is a point in the Hilbertspace H(Ω). Two equivalent random variables x and y are represented by thesame point in H(Ω), since P (x = y) = 1 and hence ‖x−y‖2 = E(|x−y|2) = 0.

A stochastic process is a family of random variables, and thus a stochasticprocess {x(t), t ∈ R} with one-dimensional parameter t is a curve in H(Ω).Further, from the definition of the norm ‖x‖ =

√E(|x|2), we see that conver-

gence in this norm is equivalent to convergence in quadratic mean. In otherwords, if a stochastic process is continuous in quadratic mean, then the corre-sponding curve in H(Ω) is continuous.

C.3.0.6 The generated subspace

A set of points in a Hilbert space generates a subspace, which consists of allfinite linear combinations and their limits. If {x(t); t ∈ T} is a stochasticprocess, write

H(x) = S(x)

for the subspace spanned by x(·). Also, for a process {x(t), t ∈ R}, define

H(x, t) = S(x(s); s ≤ t)

as the subspace spanned by all variables observed up till time t . At time tit contains all variables which can be constructed by linear operations on theavailable observations. Examples of random variables in H(x, t) are

x(t) + x(t − 1) + . . . + x(t − n + 1)n

,

∫ t

−∞e−(t−u)x(u) du, x′

−(t) + 3x′′−(t),

where x′−(t), x′′−(t) denote left derivatives.

Example C:1 Take an MA(1)-process, i.e. from uncorrelated variables

e(t), t = . . . ,−1, 0, 1, 2, . . . ,

with E(e(t)) = 0, V (e(t)) = 1, we construct

x(t) = e(t) + b1e(t − 1).

If |b1| < 1, the process can be inverted and e(t) simply retrieved from x(s), s ≤t :

e(t) = x(t) − b1e(t − 1) = x(t) − b1(x(t − 1) − b1e(t − 2))

=n∑

k=0

(−b1)kx(t − k) + (−b1)n+1e(t − n − 1) = yn(t) + zn(t), say.

192 Appendix C.

Here,yn(t) ∈ S(x(s); s = t − n, . . . , t) ⊆ S(x(s); s ≤ t) = H(x, t),

while‖zn(t)‖ = |b1|n+1 → 0

as n → ∞ . Thus e(t) − yn(t) → 0 and we have that

e(t) =∞∑

k=0

(−b1)kx(t − k) = limn→∞

n∑k=0

(−b1)kx(t − k) ∈ H(x, t)

if |b1| < 1. The representation of e(t) as a limit of finite linear combinationsof x(t − k)-values is explicit and obvious.

For |b1| = 1 it is less obvious that e(t) ∈ H(x, t), but it is still possible torepresent e(t) as a limit. For example, if b1 = −1, x(t) = e(t) − e(t − 1), andzn(t) = e(t − n − 1) does not converge to anything. But in any case,

e(t) =n∑

k=0

x(t − k) + e(t − n − 1),

and so, since the left hand side does not depend on n ,

e(t) =1N

N∑n=1

e(t) =1N

N∑n=1

n∑k=0

x(t − k) +1N

N∑n=1

e(t − n − 1)

=N∑

k=0

(1 − k

N

)x(t − k) +

1N

N∑n=1

e(t − n − 1) = yN (t) + zN (t).

Now, zN (t) = 1N

∑Nn=1 e(t − n − 1) = e(t) − yN(t) → 0 by the law of large

numbers, since all e(t) are uncorrelated with E(e(t)) = 0 and V (e(t)) = 1.We have shown that e(t) is in fact the limit of a finite linear combination ofx(s)-variables, i.e. e(t) ∈ H(x, t).

Appendix D

Spectral simulation of random

processes

D.1 The Fast Fourier Transform, FFT

A stationary process {x(t), t ∈ R} with continuous spectrum f(ω) can beefficiently simulated by Fourier methods from its spectral representation. Onethen has to discretize the continuous spectrum and use the approximation (4.29)from Section 4.3.3.

Fourier-simulation is most effectively performed with the help of the FastFourier Transform (FFT), or rather the inverse transform. This algorithmtransforms a sequence of real or complex numbers Z(0), Z(1), . . . , Z(N − 1)into its (inverse) discrete Fourier transform

z(n) =N−1∑k=0

Z(k) exp(i2πkn/N), (D.1)

for n = 0, 1, . . . , N − 1, where the integer N is a power of 2, N = 2m . Inthe literature, there are as many ways to write the Fourier sum as there arecombinatorial possibilities, with or without a factor N in the denominator andwith or without a minus-sign in the exponential function. Almost every mathe-matical computer software toolbox contains efficient algorithms to perform theFFT according to (D.1).

The basis for the use of (D.1) to generate a sample sequence lies in therepresentation of a stationary process as an approximating sum of harmonicfunctions with random phase and amplitude; see (4.29) and the alternative form(4.30). The Z(k) will then be chosen as complex random variables with absolutevalue and argument equal to the desired amplitude and phase. When using theformula for simulation purposes, there are however a number of questions whichhave to be resolved, concerning the relation between the sampling interval andthe frequency resolution, as well as the aliasing problem.

193

194 Appendix D.

Before we describe the steps in the simulation we repeat the basic factsabout processes with discrete spectrum, and the special problems that arisewhen sampling a continuous time process.

D.2 Random phase and amplitude

To see how (D.1) can be used to generate a sample function we consider firstthe special stationary process (4.18) with discrete spectrum in Section 4.3.3, orthe normalized form (4.30). Including the spectral jump at zero frequency ithas the form,

x(t) = ρ0 +∞∑

k=1

ρk cos(ωkt + φk). (D.2)

Here ρ0 is a random level shift, while {ρk} are the amplitudes and {φk} thephases of the different harmonic components of x(t). The frequencies ωk > 0can be any set of fixed positive frequencies.

If we define

Z(0) = ρ0,

Z(k) = ρk exp(iφk), for k = 1, 2, . . ..

it is easy to see that x(t) in (D.2) is the real part of a complex sum, so if wewrite y(t) for the imaginary part, then

x(t) + iy(t) =∞∑

k=0

Z(k) exp(iωkt). (D.3)

We repeat the fundamental properties of this representation.

If amplitudes and phases in (D.2) are independent and the phasesφk are uniformly distributed over [0, 2π), then {x(t), t ∈ R} isstationary and has a discrete spectral distribution with mass σ2

k =12E(ρ2

k) and σ20 = E(ρ2

0) at the frequencies ωk > 0, and ω0 = 0,respectively. Further, Z(k) = ρk exp(iφk) = σk(Uk + iVk) have thedesired properties if the real and imaginary parts are independentstandardized Gaussian random variables, with E(Uk) = E(Vk) = 0and variance V (Uk) = V (Vk) = 1.

It is possible to approximate every spectral distribution by a discrete spectrum.The corresponding process is then an approximation of the original process.

D.3 Aliasing

If a stationary process {x(t), t ∈ R} with continuous twosided spectral den-sity fx(ω), is sampled with a sampling interval d , the sequence {x(nd), n =

Section D.4 Simulation scheme 195

0,±1, . . .} has a spectral density f(d)x (ω) that can be restricted to any interval

of length π/d , for example the interval (−π/d, π/d] . There it can be writtenas a folding of the original spectral density,

f (d)x (ω) =

∞∑j=−∞

fx

(ω +

2πj

d

), for −π/d < ω ≤ π/d.

The corresponding one-sided spectral density g(d)x (ω) can then be defined on

[0, π/d] asf (d)

x (ω) + f (d)x (−ω).

For reasons that will become clear later (Section D.6) we prefer to define itinstead on [0, 2π/d) by

g(d)x (ω) =

∞∑j=−∞

fx

(ω +

2πj

d

), for 0 ≤ ω < 2π/d . (D.4)

D.4 Simulation scheme

In view of (D.1) and (D.3) we would like to generate a finite part of the sumin (D.3) to get z(n) and then take the real part to get x(t) for t = nd ,n = 0, 1, . . . , N − 1. To see the analogy clearly we repeat the expressions:

z(n) =N−1∑k=0

Z(k) exp(i2πkn/N) (D.5)

x(t) = Re

∞∑k=0

Z(k) exp(iωkt) (D.6)

Here is the scheme to follow.

We have: A real spectral density gx(ω) for ω ≥ 0 for a stationary process{x(t), t ∈ R}.

We want: A discrete time sample x(nd), n = 0, 1, . . . , N − 1 of {x(t), t ∈ R}of size N = 2m with sampling interval d , equally spaced over the time interval[0, T ) with T = Nd ;

Means: Generate random variables Z(k) = σk(Uk + iVk), k = 0, 1, . . . , N − 1,with distribution described below, and take z(n) =

∑N−1k=0 Z(k) exp(i2πkn/N)

according to (D.1). Then set

x(nd) = Re z(n), n = 0, 1, . . . , N − 1.

This will give the desired realization.

196 Appendix D.

D.5 Difficulties and details

The Fourier simulation scheme rises a number of questions which have to bedealt with before it can be implemented. Here we shall comment on the impor-tant issues.

Frequency spacing: We have requested N time points regularly spaced in[0, T ) in steps of d , and we want to use the special sum (D.1). This will imposea restriction both on the frequency spacing and on the maximum frequencyaccounted for. Comparing (D.5) and (D.6), bearing in mind that t = nd , wefind that only frequencies that are of the form

ωk =2πk

Nd=

2πk

Tfor k = 0, 1, . . . , N − 1 ,

appear in the simulation, and further that the highest frequency in the sum is2π(N−1)

dN , just barely below

ωmax =2πd

.

Discretization of spectrum: The continuous spectrum with density gx(ω)has to be replaced by a discrete spectrum with mass only at the frequenciesωk = 2πk

dN which enter into the sum (D.5). The mass at ωk should be equal to

σ2k =

2πdN

g(d)x (ωk), k = 0, 1, . . . , N − 1. (D.7)

Generation of the Z(n): Generate independent random variables

Z(k) = σk(Uk + iVk)

with Uk and Vk from a normal distribution with mean zero and variance 1, forinstance by the Box-Muller technique,

Uk = cos(2πR1)√

−2 ln R2,

Vk = sin(2πR1)√

−2 ln R2,

where R1 and R2 are independent random numbers uniformly distributed in(0, 1].

Aliasing: The restricted frequency range in (D.5), implies that the generatedx(nd) will have variance

∑N−1k=0 σ2

k , where each σ2k is an infinite sum:

σ2k =

2πdN

∞∑j=−∞

fx(ωk +2πj

d).

In practice one has to truncate the infinite series and use

σ2k =

2πdN

J∑j=−J

fx(ωk +2πj

d), k = 0, 1, . . . , N − 1, (D.8)

where J is taken large enough. If fx(ω) ≈ 0 for ω ≥ ωmax one can take J = 0.

Section D.6 Simulation of the envelope 197

D.6 Simulation of the envelope

The Fourier simulation will not only yield a realization of x(nd) = Re z(n)but also of its Hilbert transform y(nd) = Im z(n). Therefore we can get theenvelope as a byproduct, √

x(nd)2 + y(nd)2.

Thus generation of 2N Gaussian random numbers Uk, Vk , for k = 0, 1, . . . , N−1 will result in 2N useful data points. If the aim is to generate only the x(nd)-series, one could restrict the sum (D.5) to only n = 0, 1, . . . , N/2 − 1 and thusgenerate only N Gaussian variates.

D.7 Summary

In order to simulate a sample sequence of a stationary process {x(t), t ∈ R}with spectral density fx(ω) over a finite time interval one should do the follow-ing:

1. Choose the desired time interval [0, T ).

2. Choose a sampling interval d or the number of sample points N = 2m .This will give a sequence of N process values x(nd), k = 0, 1, . . . , N − 1.

3. Calculate and truncate the real discretized spectrum

σ2k =

2πdN

J∑j=−J

fx(ωk +2πj

d), k = 0, 1, . . . , N − 1,

and take J so large that fx(ω) ≈ 0 for ω > 2π(J + 1)/d .

4. Generate independent standard normal variables

Uk, Vk, for k = 0, 1, . . . , N − 1

with mean zero and variance 1.

5. Set Z(k) = σk(Uk + iVk) and calculate the (inverse) Fourier transform

z(n) =N−1∑k=0

Z(k) exp(i2πkn/N).

6. Take the real part,

x(nd) = Re z(n), n = 0, 1, . . . , N − 1;

this is the desired sequence.

198 Appendix D.

7. To generate the envelope, take the imaginary part

y(nd) = Im z(n), n = 0, 1, . . . , N − 1;

the envelope is then √x(nd)2 + y(nd)2.

Literature[1] Adler, R. (1990): An introduction to continuity, extrema, and related top-

ical for general Gaussian processes. IMS Lecture Notes-Monograph Series,Vol. 12.

[2] Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2004): Hierarchical mod-eling and analysis for spatial data. Chapman & Hall/CRC, Boca Raton.indexCarlin, B.P.

[3] Belyaev, Yu.K. (1959): Analytic random processes. Theory Probab. and itsApplications, English edition, 4, 402.

[4] Belyaev, Yu.K. (1961): Continuity and Holder’s conditions for sample func-tions of stationary Gaussian processes. Proc. Fourth Berk. Symp. on Math.Stat. and Probability, 2, 22-33.

[5] Breiman, L. (1968): Probability. Addison-Wesley, Reading. Reprinted 199?in SIAM series.

[6] Cramer, H. (1942): On harmonic analysis in certain function spaces. ArkivMat. Astron. Fysik, 28B, no. 12.

[7] Cramer, H. (1945): On the theory of stochastic processes. Proc. TenthScand. Congr. of Math., Copenhagen, pp. 28-39.

[8] Cramer, H. (1945): Mathematical Methods of Statistics. pronceton Univer-sity Press, 1945..

[9] Cramer, H. and Leadbetter, M.R. (1967): Stationary and related stochasticprocesses. Wiley, New York. Reprinted by Dover Publications, 2004.

[10] Dobrushin, R.L. (1960): Properties of the sample functions of a stationaryGaussian process. Teoriya Veroyatnostei i ee Primeneniya, 5, 132-134.

[11] Doob, J.L. (1953): Stochastic processes. Wiley, New York.

[12] Durrett, R. (1996): Probability: Theory and examples. Doxbury Press.

[13] Einstein, A. (1905): Investigations on the theory of Brownian novement.Reprinted by Dover Publications, 1956.

199

200 Literature

[14] Jordan, D.W. and Smith, P. (1999): Nonlinear ordinary differential equa-tions. 3rd Ed. Oxford University Press.

[15] Grenander, U. (1950): Stochastic processes and statistical inference. ArkivMat. 1, 195-277.

[16] Gut, A. (1995): An intermediate course in probability. Springer-Verlag.

[17] Ibragimov, I.A. and Linnik, Yu.V. (1971): Independent and stationarysequences of random variables. Wolters-Noordhoff, Groningen.

[18] Ibragimov, I.A. and Rozanov, Y.A. (1978): Gaussian random processes.Springer-Verlag, New York.

[19] Kac, M. and Slepian, D. (1959): Large excursions of Gaussian processes.Ann. Math. Statist., 30, 1215–1228.

[20] Kolmogorov, A. (1933): Grundbegriffe der Wahrscheinlichkeitsrechnung.Springer-Verlag, Berlin.

[21] Lasota, A. and Mackey, M.C. (1994): Chaos, fractals, and noise; stochasticaspects of dynamics. Springer-Verlag, New York.

[22] Leadbetter, M.R., Lindgren, G. and Rootzen, H. (1983): Extremes andrelated properties of random sequences and processes. Springer-Verlag, NewYork.

[23] Lindgren, G. (1975): Prediction from a random time point. Annals ofProbability, 3, 412–423.

[24] Maruyama, G. (1949): The harmonic analysis of stationary stochastic pro-cesses. Mem. Fac. Sci. Kyusyu Univ. A4, 45-106.

[25] Petersen, K. (1983): Ergodic Theory, Cambridge University Press, Cam-bridge.

[26] von Plato, J. (1994): Creating modern probability. Cambridge UniversityPress, Cambridge.

[27] Rice, S.O. (1944, 1945): Mathematical analysis of random noise. Bell Sys-tems technical Journal, 23, 282-332, and 24, pp 46-156. Reprinted in:Wax, N. (1954): Selected papers on noise and stochastic processes. DoverPublications, New York.

[28] Rice, S.O. (1963): Noise in FM-receivers. In: Time Series Analysis, Ed:M. Rosenblatt, Chapter 25, pp. 395-422. Wiley, New York.

[29] Royden, H.L. (1988): Real Analysis, 3rd Ed. Prentice Hall.

Literature 201

[30] Rychlik, I. (2000): On some reliability applications of Rice’s formula forthe intensity of level crossings. Extremes, 3, 331–348.

[31] Slepian, D. (1963): On the zeros of Gaussian noise. In: Time Series Anal-ysis, Ed: M. Rosenblatt, pp. 104-115. Wiley, New York.

[32] St. Denis, M. and Pierson, W.J. (1954): On the motion of ships in confusedseas. Transactions, Soc. Naval Architects and Marine Engineers, Vol. 61,(1953) pp. 280-357.

[33] van Trees, H. (1968): Detection, Estimation, and Modulation Theory. PartI”, John Wiley & Sons.

[34] WAFO, Wave Analysis for Fatigue and Oceanography. Available athttp://www.maths.lth.se/matstat/wafo/.

[35] Wong, E. and Hajek, B. (1985): Stochastic processes in engineering sys-tems. Springer-Verlag, New York.

[36] Williams, D. (1991): Probability with Martingales, Cambridge UniversityPress, Cambridge.

[37] Williams, D. (2001): Weighting the Odds. Cambridge University Press,Cambridge.

[38] Yaglom, A.M. (1962): An introduction to the theory of stationary randomfunctions. Prentice Hall, Englewood Cliffs.

[39] B. Øksendal: Stochastic differential equations, an introduction with appli-cations. Springer-Verlag, 5th Ed 2000.

Index

αspectral width parameter, 63, 77

εspectral width parameter, 77

ωangular frequency, 2, 14elementary outcome, 2

σ -algebra, 3, 175σ -field, 3, 175

generated by events, 3generated by random variables,

6of invariant sets, 139

Øksendal, B., 201

additivecountably, 4, 175finitely, 4

Adler, R., 199alarm prediction, 64algebra, see σ -algebraaliasing, 88, 194, 196amplitude, 89, 94

for narrow band process, 118ARMA-process, 152auto-covariance function, 161auto-regressive process, 152

Banach’s theorem, 59band-limited white noise, 151Banerjee, S., 199Belyaev, Yu.K., 36, 153, 199Bennet, W.R., 97Bessel function, 167

modified, 169Birkhoff ergodic theorem, 142Bochner’s theorem, 82

Boltzmann, L., 133Boole’s inequality, 32Borel

field in C , 37field in R∞ , 9field in RT , 11field in R , 3measurable function, 6set, 3

Borel-Cantelli lemma, 25, 32, 53, 184Box-Muller’s method, 196Breiman, L., 199Brown, R., ix, 19Brownian motion, ix, 19, 112Bulinskaya, E.V., 43Bulinskaya,E.V., 56

Cantor’s theorem, 177, 179Caratheodory’s extension theorem,

176Cauchy principal value, 86characteristic equation, 108characteristic function, 83Chebysjev’s inequality, 34complete

basis, 123, 190space, 92, 188

completionof probability measure, 5of vector space, 188

conditional covariancenormal distribution, 17, 69

conditional expectation, 7, 64, 141horizontal window, 66normal distribution, 17, 69vertical window, 66

consistent family of distributions, 8

202

Index 203

continuityabsolute, 13conditions for, 33in L2 , 27, 45, 90in quadratic mean, 27, 45, 90modulus, 41of Gaussian process, 35of sample function, 30of stationary process, 34, 35probabilities on C , 37summary of conditions, 49

convergencealmost surely, 27, 41, 181Cauchy condition, 182in distribution, 181in probability, 27, 181in quadratic mean, 27, 181, 190Loeve criterion, 184uniform, of random functions,

183countable

additivity, 10basis, 11

covariance function, 13Cramer, H., 21, 24, 151, 199Cramer-Wold decomposition, 151, 190cross spectrum, 161cross-covariance function, 161cross-spectrum, 162crossings

expected number, 21, 57finite number, 44

derivativeas linear filter, 103at upcrossing, 56

deterministic process, 151, 153differentiability

conditions for, 37in quadratic mean, 28, 46of Gaussian process, 39of sample function, 37, 46summary of conditions, 49

differential equation

linear, 108directional spectrum, 171dispersion relation, 23, 96, 170Dobrushin, R.L., 36, 199Doob, J.L, 199Durbin’s formula, 74Durbin, J., 74Durrett, R., 199

eigenfunction, 122Einstein, A., ix, 20, 97, 199ensemble average, 52, 139envelope, 63, 117

of narrow band process, 118simulation, 197

equivalent processes, 29ergodicity, 65, 133

almost surely, 52and mixing, 156definition, 140in continuous time, 147in quadratic mean, 52Markov chain, 139of Gaussian process, 148of random sequence, 145upcrossings, 67

excursionabove a level, 73

exponential smoothing, 106extension

of probability measure, 5, 176

fast Fourier transform, 193FFT, 193field, see σ -field

homogeneous field, 166isotropic random field, 15, 167random, 165

filter, 101, 106amplification, 103causal, 104cross-correlation, 165linear, 100phase shift, 103time-invariant, 101

204 Index

finite-dimensionalapproximation by event, 12distribution, 8

FM-noise, 21folding, 88Fourier

integral, 13inversion, 86transform, 21, 193

frequency, 89mean frequency, 63

frequency function for filter, 101Fubini’s theorem, 60

Garsia, A.M., 142Gaussian process, 16, 96

continuity, 35differentiability, 39ergodicity, 148Slepian model, 68unboundedness, 36

Gelfand, A.E., 199generating function, 108geostatistics, 166Gram-Schmidt orthogonalization, 190Grenander, U., ix, 24, 127, 200Gut, A., 200

Holder condition, 40–42Hajek, B., 201Hamming, R.W., 22harmonic oscillator, 106Herglotz’ lemma, 88Hermitian function, 80Hilbert space, 187, 190

generated by a process, 191Hilbert transform, 115, 197Hilbert’s sixth problem, 178homogeneous field

spectral distribution, 15, 166spectral representation, 15, 166

horizontal window conditioning, 66

Ibragimov, I.A., 200impulse response, 101, 104

inner product, 188, 190integrability

conditions for, 49in quadratic mean, 49

intervalinR∞ , 9

invariantmeasure, 137random variable, 138set, 138

inversion of spectrumcontinuous time, 86discrete time, 94

isometry, 90

JONSWAP spectrum, 14, 74, 77, 172Jordan, D.W., 200jump discontinuity, 40

Kac and Slepian, horizontal window,66

Kac, M., 57, 58Kac. M., 200Karhunen-Loeve expansion, 122Kolmogorov

existence theorem, 10, 80, 178extension theorem, 178Grundbegriffe, 9, 178probability axiom, 3, 175

Kolmogorov, A.N., 3, 200

Langevin’s equation, 20, 98Lasota, A., 200law of large numbers, 192Leadbetter, M.R., 199, 200level crossing, 57likelihood ratio test, 128Lindgren, G., 200linear

filter, 100, 101interpolation, 100oscillator, 14, 106prediction, 18, 100process, 104regression, 101

Index 205

subspace, 188time-invariant filter, 102

Linnik, Yu.V., 200Lipschitz condition, 41Loeve criterion, 45, 50, 184Lord Rayleigh, 22, 97

m-dependence, 157Mackey, M.C., 200Markov’s inequality, 34Maruyama, G., 200Matern, B., 169Mercer’s theorem, 127metrically transitive, 140mixing

for Gaussian processes, 157relation to ergodicity, 158strong, 157uniform, 157weak, 157

modulo game, 135as ergodic sequence, 140

Monotone convergence, 61Monte Carlo simulation, 135, 193moving average

discrete time, 191infinite, 105

multivariate normal distribution, 15

narrow banded spectrum, 14, 63non-deterministic process, 151non-negative function, 80non-random walk

as ergodic sequence, 144norm, 188normal distribution

bivariate, 56conditional, 16inequalities, 18multivariate, 15

nugget effect, 169

observable, 9, 11, 124Ornstein-Uhlenbeck process, 39, 98,

112, 156

orthogonalbasis, 189increments, 89

oscillatorharmonic, 106linear, 106

Palm distribution, 66Perrin, J.B., 20Petersen, K., viii, 200phase, 89, 94Pierson, W.J., ix, 22, 201Pierson-Moskowitz spectrum, 172point process, 66Poisson process, 40prediction, 100

after upcrossing, 68linear, 18quadratic mean, 7

primordialrandomness, 151soup, 151

principal components, 121probability measure, 175projection

in Hilbert space, 100, 189uncorrelated residuals, 101, 152uniqueness, 101

pseudo randomness, 134Pythagorean theorem, 188

quadratic meancontinuity, 45convergence, 45

Loeve criterion, 45differentiability, 46integrability, 49optimal prediction, 7

randomamplitude, 89, 94, 194moving surface, 23, 169phase, 89, 94, 194waves, 23, 170

Rayleigh distribution, 117

206 Index

for slope at upcrossing, 70RC-filter, 106reconstruction, 18rectangle

n-dimensional, 4, 25generalized, 4, 9

regression approximation, 74regular process, 150, 151

as moving average, 154Rice’s formula, 21, 57

alternative proof, 60for Gaussian process, 62for non-stationary process, 61

Rice, S., ixRice, S.O., 21, 22, 57, 58, 79, 97,

200Riemann-Lebesgue’s lemma, 149Rootzen, H., 200Royden, H.L., viii, 200Rozanov, Y.A., 200Rychlik, I., 201

sample function average, 52sampling theorem, 119scalar product, 188Schwarz’ inequality, 188second order stationary process, 79separable

process, 29space, 189

shift transformation, 147signal detection, 127simulation

by Fourier method, 193of envelope, 197

singularprocess, 150, 151spectral conditions, 154spectrum, 153

Slepian model, 57, 68Slepian, D., 57, 200, 201Smith, P., 200spectral

density, 13, 86

distribution, 13for process, 82for random field, 166for sequence, 88for vector process, 162

inversion, 86moment, 13, 49representation

of process, 21, 89of random field, 166of sequence, 100of vector process, 163

width, 63, 77spectrum

absolutely continuous, 94, 153amplitude spectrum, 165coherence spectrum, 165directional, 171discrete, 93, 153encountered, 23JONSWAP, 14, 74, 77, 172one-sided, 87, 95phase spectrum, 165Pierson-Moskowitz, 172singular, 153Whittle-Matern, 169

spreading function, 23, 96, 171stationary

intrinsically, 166stationary process, 13

as curve in Hilbert space, 191complex, 82second order, 13strictly, 79weakly, 13, 79

StDenis, M., ix, 22, 201stochastic calculus, 113stochastic differential equation, 98,

108Stone-Weierstrass theorem, 92strong Markov property, 64subspace

generated, 191spanned, 191

Index 207

tangents, non-existence of, 43time average, 52, 138time-invariant filter, 101total variation, 59transfer function, 101transformation

generated by stationary process,137

generating stochastic process, 136measurable, 136measure preserving, 136quadratic, 135shift, 137

triangle inequality, 188Tukey, J.W., 22

upcrossingscounting, 60expected number, 57intensity, 58slope at, 56

variogram, 166vertical window conditioning, 66von Plato, J., 133, 200

WAFO, 72, 74, 201wave

average length, 171average period, 171characteristics, 75frequency, 23, 169, 170number, 23, 169, 170Pierson-Moskowitz spectrum, 119

wave direction, 171wave frequency spectrum, 170wave number spectrum, 171Weyl’s equidistribution theorem, 144white noise, 14, 94, 98, 112Whittle, P., 169Whittle-Matern spectrum, 169Wiener process, 19, 112

as spectral process, 97continuity, 34derivative of, 20, 99, 112

Holder condition, 42infinite variation, 55integral, 51Karhunen-Loeve expansion, 124

Wiener, N., 22Williams, D., viii, 201Wold, H., 151Wong, E., 201

Yaglom, A.M., 153, 201

208 Index

October 2006

Mathematical StatisticsCentre for Mathematical Sciences

Lund UniversityBox 118, SE-221 00 Lund, Sweden

http://www.maths.lth.se/

a course forphd mathematical statistics … and representation theorems for stationary processes, in...

Documents