topics on max-stable processes and the...

TOPICS ON MAX-STABLE PROCESSES AND

THE CENTRAL LIMIT THEOREM

by

Yizao Wang

A dissertation submitted in partial fulfillmentof the requirements for the degree of

Doctor of Philosophy(Statistics)

in The University of Michigan2012

Doctoral Committee:

Associate Professor Stilian. A. Stoev, ChairProfessor Tailen HsingProfessor Robert W. KeenerProfessor Roman VershyninProfessor Emeritus Michael B. Woodroofe

ACKNOWLEDGEMENTS

First of all, I am indebted to my thesis advisor Professor Stilian A. Stoev for his

help and support since 2008. He has been a great mentor for me in my research

career. At the same time, he has also provided me many helps and advice in daily

life. This dissertation would not have been possible without him. In particular, the

first part of this dissertation is under his supervision.

Second, I am grateful to Professor Emeritus Michael Woodroofe. He sets up a

very high standard for scholars, and as a young researcher I am deeply influenced by

him in many aspects. The second part of this dissertation is under his supervision.

I would also like to thank Professor Yves Atchade, Professor Tailen Hsing, Pro-

fessor Bob Keener and Professor Parthanil Roy (from Michigan State University)

for many insightful and inspiring discussions on research. I also appreciate Profes-

sor Tailen Hsing, Professor Bob Keener, Professor Roman Vershynin and Professor

Michael Woodroofe for serving on my thesis committee.

I own many thanks to all the faculty members and students in the Department

of Statistics at the University of Michigan. I really enjoy my last five years as a

graduate student in Ann Arbor.

At last, I am greatly indebted to my parents for their unconditional support of

my pursue of academia career abroad during the past years. Without their support I

can achieve no success. I am also grateful to my wife, Fei Xu, for her companionship

full of encouragement, support and consideration.

ii

TABLE OF CONTENTS

ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

CHAPTER

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Max-stable Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Central Limit Theorems for Random Fields . . . . . . . . . . . . . . . . . . . 5

II. Preliminaries on Max-stable Processes . . . . . . . . . . . . . . . . . . . . . . 8

2.1 Spectral Representation and Extremal Integrals . . . . . . . . . . . . . . . . 92.2 Spectrally Continuous and Discrete α-Frechet processes . . . . . . . . . . . . 14

III. Association of Sum- and Max-stable Processes . . . . . . . . . . . . . . . . . 16

3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Identification of Max-linear and Positive-linear Isometries . . . . . . . . . . . 223.3 Association of Sum- and Max-stable Processes . . . . . . . . . . . . . . . . . 243.4 Association of Classifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.5 Proofs of Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

IV. Decomposability of Sum- and Max-stable Processes . . . . . . . . . . . . . . 35

4.1 SαS Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2 Stationary SαS Components and Flows . . . . . . . . . . . . . . . . . . . . . 414.3 Decomposability of Max-stable Processes . . . . . . . . . . . . . . . . . . . . 494.4 Proof of Theorem IV.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

V. Conditional Sampling for Max-stable Processes . . . . . . . . . . . . . . . . . 58

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2 Conditional Probability in Max-linear Models . . . . . . . . . . . . . . . . . 625.3 Conditional Sampling: Computational Efficiency . . . . . . . . . . . . . . . . 685.4 MARMA Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.5 Discrete Smith Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.6 Proofs of Theorems V.4 and V.9 . . . . . . . . . . . . . . . . . . . . . . . . . 82

VI. Central Limit Theorems for Stationary Random Fields . . . . . . . . . . . . 89

iii

6.1 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.2 m-Dependent Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 936.3 A Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.4 An Invariance Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986.5 Orthomartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.6 Stationary Causal Linear Random Fields . . . . . . . . . . . . . . . . . . . . 1056.7 A Moment Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.8 Auxiliary Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

VII. Asymptotic Normality of Kernel Density Estimators for Stationary Ran-dom Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.1 Assumptions and Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . 1187.2 Examples and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.3 A Central Limit Theorem for m-Dependent Random Fields . . . . . . . . . . 1237.4 Asymptotic Normality by m-Approximation . . . . . . . . . . . . . . . . . . 1257.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

iv

LIST OF FIGURES

Figure

5.1 Four samples from the conditional distribution of the discrete Smith model (seeSection 5.5), given the observed values (all equal to 5) at the locations marked bycrosses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Prediction of a MARMA(3,0) process with φ1 = 0.7,φ2 = 0.5 and φ3 = 0.3, basedon the observation of the first 100 values of the process. . . . . . . . . . . . . . . . 77

5.3 Conditional medians (left) and 0.95-th conditional marginal quantiles (right). Eachcross indicates an observed location of the random field, with the observed value atright. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

v

LIST OF TABLES

Table

5.1 Means and standard deviations (in parentheses) of the running times (in seconds)for the decomposition of the hitting matrix H, based on 100 independent obser-vations X = A ⊙ Z, where A is an (n × p) matrix corresponding to a discretizedSmith model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2 Cumulative probabilities that the projection predictors correspond to at time 100+t, based on 1000 simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.3 Coverage rates (CR) and the widths of the upper 95% confidence intervals at time100 + t, based on 1000 simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . 78

vi

CHAPTER I

Introduction

This dissertation consists of results in two distinct areas of probability theory.

One is the extreme value theory, the other is the central limit theorem.

In the extreme value theory, the focus is on max-stable processes. Such processes

play an increasingly important role in characterizing and modeling extremal phenom-

ena in finance, environmental sciences and statistical mechanics. Several structural

and ergodic properties of max-stable processes are investigated via their spectral rep-

resentations. Besides, the conditional distributions of max-stable processes are also

studied, and a computationally efficient algorithm is developed. This algorithm has

many potential applications in prediction of extremal phenomena.

In the central limit theorem, the asymptotic normality for partial sums of sta-

tionary random fields is studied, with a focus on the projective conditions on the

dependence. Such conditions, easy to check for many stochastic processes and ran-

dom fields, have recently drawn many attentions for (one-dimensional) time series

models in statistics and econometrics. Here, the focus is on (high-dimensional) sta-

tionary random fields. In particular, a general central limit theorem for stationary

random fields and orthomartingales is established. The method is then extended to

establish the asymptotic normality for the kernel density estimator of linear random

1

2

fields.

Below are overviews of the following chapters of this dissertation.

1.1 Max-stable Processes

Max-stable processes arise in the limit of maxima of independent and identically

distributed processes. It is well known that all max-stable processes can be trans-

formed to α-Frechet processes. A random variable Y is α-Frechet with α > 0, if

P(Y ≤ y) = exp(−σαy−α), y > 0.

A stochastic process Ytt∈T is α-Frechet, if all its max-linear combinations in form

of maxi=1,...,n aiYti≡

n

i=1 aiYti, ai > 0, ti ∈ T, i = 1, . . . , n, n ∈ N are α-Frechet.

It is known since de Haan [23] that under mild regularity conditions, for every α-

Frechet process Ytt∈T , there exists a class of non-negative, Lα-integrable functions

ftt∈T ∈ Lα

+(S,BS, µ), such that

(1.1) P(Yt1 ≤ y1, . . . , Ytn≤ yn) = exp

−

S

n

i=1

fti(s)/yti

α

µ(ds).

Indeed, every such a process has an extremal integral representation as

(1.2) Ytt∈Td=

e

S

ft(s)M∨

α(ds)

t∈T

,

where ‘e

’ is the symbol of the extremal integral and M∨α

is an α-Frechet random

sup-measure (see Stoev and Taqqu [101]).

Preliminary results on max-stable processes can be found in Chapter II. Then,

starting with such representation results, structural properties of max-stable pro-

cesses are investigated. Besides, a careful investigation of its conditional distributions

also yields an exact conditional sampling algorithm, which has potential applications

of spatial extremes.

3

Association of max-stable processes to sum-stable processes

The association of α-Frechet processes to the symmetric α-stable (SαS) processes

is established in Chapter III. Namely, under mild assumptions, every α-Frechet pro-

cess can be associated to an SαS process via spectral representations. This provides

a theoretical support to the longstanding folklore that two classes of processes share

many similar structural results. However, the converse is not true. That means that

roughly speaking, the class of SαS processes has richer structures than the class of

α-Frechet processes.

The association method has become a convenient tool to translate results on

SαS processes (e.g. Rosinski [83] and Samorodnitsky [92]) to α-Frechet processes.

By the association method, many structural results on SαS processes have natural

counterparts for α-Frechet processes. See also Kabluchko [50] for an independent

treatment with different tools.

Decomposability of max-stable processes

The decomposability properties have been extensively studied for probability dis-

tributions. The notion of decomposability can be generalized to α-Frechet processes.

Namely, letting Y = Ytt∈T be an α-Frechet process as in (1.2), a natural question

is, when can we write

(1.3) Ytt∈Td=

Y (1)t ∨ Y (2)

t

t∈T

,

where Y(i) = Y (i)t t∈T , i = 1, 2, are two independent α-Frechet processes? If such

processes Y(1),Y(2) exist, what kind of α-Frechet processes can they be? To what

extent are their structures determined by Y?

A characterization of all possible α-Frechet components Y(i) is established in

Chapter IV. Furthermore, when Y is stationary, a necessary and sufficient condition

4

for its α-Frechet component to be stationary is established. This time, Y may

have but only trivial stationary α-Frechet components (scaled copies cY with c ∈

(0, 1)), and such a process is said to be indecomposable. These indecomposable

processes can be viewed as the elementary building blocks for all stationary α-Frechet

processes. Therefore, to study stationary α-Frechet processes, it suffices to focus on

the indecomposable ones. The decomposability of stationary α-Frechet processes

also provides a different point of view on the classification problem for stationary

α-Frechet processes.

Similar decomposability results also hold for sum-stable processes. This is clear

from the association point of view. In fact, we first establish the decomposability

result for sum-stable processes, and then obtain results for max-stable processes by

the association method.

Conditional sampling for max-stable random fields

Given an α-Frechet random field Ytt∈Zd , what is the conditional distribution

(1.4) P((Ys1 , . . . , Ysm) ∈ · | Yt1 , . . . , Ytn

) = ?

The conditional distribution formula is established for a dense class of α-Frechet

random fields (the spectrally discrete ones). For such random fields, explicit exact for-

mula of the conditional distribution (1.4) is obtained. The hard part of the problem

is to provide an efficient algorithm applicable in practice. Such an algorithm is de-

veloped, thanks to certain conditional independence structure of spectrally discrete

max-stable random fields.

As a potential application, such an algorithm would play an important role in the

prediction problem. The prediction problem arises in many scenarios from different

areas. For example, suppose observations of heavy rainfalls are available in an area

5

at certain locations. The engineers often need estimates (predictions) of the rainfalls

over the entire area, and this information is useful for building infrastructure for

flood-protection. Max-stable random fields are natural models for such problems

focusing on extremal phenomena.

Remark I.1. The main results in Chapters II, III, IV and V have already been

published in peer-reviewed journals ([111], [110], [112] and [113] respectively).

1.2 Central Limit Theorems for Random Fields

In probability theory, the central limit theorem is one of the problems with longest

history: when does

(1.5)1√n

n

k=1

(Xk − EXk) ⇒ N (0, σ2)

occur? While the case when Xkk=1,...,n are independent has been completely solved

for more than half a century, establishing central limit theorems in the dependent case

is still an active area of research. Such limit results are of fundamental importance in

various areas, particularly in statistics theory, where it is important to characterize

the cumulative behavior of large amount of individuals.

This dissertation investigates two problems, focusing on central limit theorems for

random fields. Namely, given a stationary random field Xi,j(i,j)∈Z2 , we establish

conditions on the dependence such that

(1.6)1

n

n

i=1

n

j=1

(Xi,j − EXi,j) ⇒ N (0, σ2).

This problem has been investigated by many researchers. Many results in the lit-

erature are based on mixing-type conditions (see e.g. Bradley [8]). These conditions

are sometimes difficult to check in applications. Here, the focus is on projective-type

6

conditions that are easy to check. Such conditions have recently drawn much atten-

tion in the study of central limit theorems for (one-dimensional) stochastic processes,

with applications in statistics and econometrics. See for example Dedecker et al. [30]

and Wu [118, 119], among others. The extension of the aforementioned results to

high dimensions (i.e. random fields) is not trivial, as the main technical tool used

there, the martingale approximation method, is not applicable in the multiparameter

setting most of the time. Instead, we will take an m-approximation approach.

A general central limit theorem

A central limit theorem for stationary random fields (1.6) is established in Chap-

ter VI. A particular example is the functionals of linear random fields

(1.7) Xi,j = g ∞

k=0

∞

l=0

ak,li−k,j−l

, (i, j) ∈ Z

2 ,

where i,j(i,j)∈Z2 are i.i.d. random variables, and g is often a Lipschitz func-

tion. Such models from statistics have recently attracted people’s attentions (see

e.g. [15]). Another example is when Xi,j(i,j)∈Z2 are orthomartingale differences (see

e.g. Khoshnevisan [53]). In this case, a new central limit theorem for orthomartin-

gales follows from the previous result, generalizing known results [3, 62, 63, 73] in

the literature.

Asymptotic normality of kernel density estimators

Consider a causal linear random field Xi,j(i,j)∈Z2 (as in (1.7) with g(x) = x).

The kernel density estimator

fn(x) =1

n2bn

n

i=1

n

j=1

Kx−Xi,j

bn

is said to be asymptotically normal, if

(1.8)

n2bn(fn(x)− Efn(x)) ⇒ N (0, σ2x) ,

7

where σ2x= p(x)

K2(s)ds and p(x) is the density of X0,0 at x. Such estimators

were first considered for i.i.d. sequences by Rosenblatt [82] and Parzen [65], and have

been widely studied since then (see e.g. Wu and Mielniczuk [120] for a treatment

for stationary sequences). Sufficient conditions for (1.8) to hold are provided in

Chatper VII.

Remark I.2. The results in Chapters VI and VII can be found in [114] and [115],

which have been submitted to peer-reviewed journals at the time of writing this

dissertation.

CHAPTER II

Preliminaries on Max-stable Processes

Max-stable processes have been studied extensively in the past 30 years. The

works of Balkema and Resnick [2], de Haan [22, 23], de Haan and Pickands [26],

Gine et al. [38] and Resnick and Roy [78], among many others have led to a wealth

of knowledge on max-stable processes. The seminal works of de Haan [23] and de

Haan and Pickands [26] laid the foundations of the spectral representations of max-

stable processes and established important structural results for stationary max-

stable processes. Since then, however, while many authors focused on various im-

portant aspects of max-stable processes, the general theory of their representation

and structural properties had not been thoroughly explored. At the same time, the

structure and the classification of sum-stable processes has been vigorously stud-

ied. Rosinski [83], building on the seminal works of Hardin [45, 46] about minimal

representations, developed the important connection between stationary sum-stable

processes and flows. This led to a number of important contributions on the struc-

ture of sum-stable processes (see, e.g. [86, 84, 70, 71, 92]). There are relatively few

results of this nature about the structure of max-stable processes, with the notable

exceptions of de Haan and Pickands [26], Davis and Resnick [20] and the very recent

works of Kabluchko et al. [51] and Kabluchko [50].

8

9

This chapter collects preliminary results on max-stable processes and their

(stochastic) extremal integral representation introduced by Stoev and Taqqu [101]

(see also Wang and Stoev [111]). This representation, essentially equivalent to the

one by de Haan [23], provides a natural connection to sum-stable processes (see

e.g. Samorodnitsky and Taqqu [93]). This connection is explored in Chapters III

and IV.

2.1 Spectral Representation and Extremal Integrals

It is well known that the univariate marginals of a max-stable process are nec-

essarily extreme value distributions, i.e. up to rescaling and shift they are either

Frechet, Gumbel or negative Frechet. The extreme value distributions arise as limits

of normalized maxima of independent and identically distributed random variables:

n

i=1 Xi − bnan

⇒ Z .

If the weak convergence holds and Z is non-degenerate, then it must have one of

the above mentioned distributions (see e.g. [76], Proposition 0.3). Similarly, given

independent and identically distributed stochastic processes X(i)t t∈T , i ∈ N, if

n

i=1 X(i)t − bn(t)

an(t)

t∈T

⇒ Ztt∈T ,

for some an(t)t∈T ∈ RT

+, bn(t)t∈T ∈ RT , then the limiting process is necessarily a

max-stable process.

We focus on a special class of max-stable processes: the α-Frechet processes.

Recall that a positive random variable Z ≥ 0 has α-Frechet distribution, α > 0, if

P(Z ≤ x) = exp−σαx−α , x ∈ (0,∞) .

Here Zα:= σ > 0 stands for the scale coefficient of Z. A stochastic process

10

Xtt∈T is α-Frechet, if all max-linear combinations:

(2.1) max1≤j≤n

ajXtj≡

n

j=1

ajXtjfor all aj > 0, tj ∈ T, j = 1, . . . , n,

are α-Frechet random variables. Any max-stable process can be transformed into an

α-Frechet process by simply transforming their one-dimensional distributions into

α-Frechet ones (see e.g. [76], Chapter 5.4).

The seminal work of de Haan [23] provides convenient spectral representations for

stochastically continuous α-Frechet processes in terms of functionals of Poisson point

processes on (0, 1)× (0,∞). Here, we adopt the slightly more general, but essentially

equivalent, approach of representing max-stable processes through extremal integrals

with respect to random sup-measures (see Stoev and Taqqu [101]). We do so in order

to emphasize the analogies with the well-developed theory of sum-stable processes

(see e.g. Samorodnitsky and Taqqu [93] and Chapter III below).

Given a measure space (S,BS, µ) and α > 0, Mα(A)A∈BSis said to be an α-

Frechet random sup-measure with control measure µ, if

(i) the Mα(Ai)’s are independent random variables for disjoint Ai ∈ BS, 1 ≤ i ≤ n,

(ii) Mα(A) is α-Frechet with scale coefficient Mα(A)α = µ(A)1/α, and

(iii) for all disjoint Ai’s, i ∈ N, we have Mα(

i∈NAi) =

i∈N

Mα(Ai), almost surely.

One can then define the extremal integral of a non-negative simple function f(u) :=

n

i=1 ai1Ai(u) ≥ 0 with disjoint A1, . . . , An ∈ BS:

e

S

fdMα ≡

e

S

f(u)Mα(du) :=

1≤i≤n

aiMα(Ai).

One can show thate

SfdMα is an α-Frechet random variable with scale coeffi-

cient (Sfαdµ)1/α. The definition of

e

SfdMα can, by continuity in probability,

11

be extended to integrands f in the space of nonnegative, Lα-integrable measurable

functions Lα

+(S, µ) := f ∈ Lα(S, µ) : f ≥ 0. Here and in the sequel, we may write

(S, µ) = (S,BS, µ) for simplicity.

Extremal integrals are sometimes referred to as stochastic extremal integrals to

emphasize that they are random variables. We omit the term ‘stochastic’ for the sake

of simplicity. Extremal integrals are in parallel to the notion of stochastic integrals

based on SαS random measures ([101]). In particular, two important properties of

extremal integrals are:

(i) the random variablese

SfjdMα, j = 1, . . . , n are independent, if and only if

the fj’s have pairwise disjoint supports (mod µ), and

(ii) the extremal integral is max-linear:e

S(af ∨bg)dMα = a

e

SfdMα∨b

e

SgdMα,

for all a, b > 0 and f, g ∈ Lα

+(S, µ) = f ∈ Lα(S, µ) : f ≥ 0.

For more details, see Stoev and Taqqu [101].

Now, for any collection of deterministic functions ftt∈T ⊂ Lα

+(S, µ), one can

construct the stochastic process:

(2.2) Xt =

e

S

ft(u)Mα(du) , for all t ∈ T .

In view of the max-linearity of the extremal integrals and (2.1), the resulting process

X = Xtt∈T is α-Frechet. Furthermore, for any n ∈ N, xi > 0, ti ∈ T, i = 1, . . . , n:

(2.3) P(Xt1 ≤ x1, . . . , Xtn≤ xn) = exp

−

S

n

i=1

x−1ifti(u)

α

µ(du).

This shows that the deterministic functions ftt∈T characterize completely the finite-

dimensional distributions of the process X. In general, if

(2.4) Xtt∈Td=

e

S

ftdMα

t∈T

,

12

for some ftt∈T ⊂ Lα

+(S, µ), we shall say that the processX has the extremal integral

representation or spectral representation ftt∈T over the space Lα

+(S, µ). The ft’s in

(2.4) are also referred to as spectral functions of X. In this dissertation, we let ‘d=’

denote ‘equal in finite-dimensional distributions’.

Many α-Frechet processes of practical interest have tractable spectral representa-

tions, with (S,BS, µ) being a standard Lebesgue space. A measurable space (S,S, ν)

is a standard Lebesgue space, if (S,S) is a standard Borel space and ν is a σ-finite

measure. A standard Borel space is a measurable space measurably isomorphic (i.e.,

there exists a one-to-one, onto and bi-measurable map) to a Borel subset of a Polish

space. For example, a Polish space with σ-finite measure on its Borel sets is stan-

dard Lebesgue, and one often chooses (S,BS, µ) = ([0, 1],B[0,1], Leb) in (2.4). (For

more discussions on standard Lebesgue spaces and stationary sum-stable processes,

see Appendix A in [71].)

As shown in Proposition 3.2 in [101], an α-Frechet process X has a representation

(2.4) with (S,BS, µ) being standard Lebesgue, if and only if X satisfies Condition S.

Condition S. There exists a countable subset T0 ⊆ T such that for every t ∈ T , we

have that Xtn

P−→ Xt for some tnn∈N ⊂ T0.

Note that without the Condition S, every max-stable process X can still have a

spectral representation as in (2.4), but the space (S, µ) may not be standard Lebesgue

(see Theorem 1 in [50]).

Remark II.1. The assumption that (S, µ) is a standard Lebesgue space implies that

the space of integrands Lα

+(S, µ) is a complete and separablemetric space with respect

to the metric:

(2.5) ρµ,α(f, g) =

S

|fα− gα|dµ .

13

This metric is natural to use when handling extremal integrals, since as n → ∞,

(2.6)

e

S

fndMα

P−→ ξ , if and only if, ρµ,α(fn, f) =

S

|fα

n− fα

|dµ → 0,

where ξ =e

SfdMα (see e.g. [101]). (Such a metric naturally induces a metric for

space of jointly α-Frechet random variables.) By default, we equip the space Lα

+(S, µ)

with the metric ρµ,α and often write fLα

+(S,µ) for (Sfαdµ)1/α. Here ·

Lα

+(S,µ) is

not a norm unless α ≥ 1.

We focus only on the rich class of α-Frechet processes that satisfy Condition S. In

particular, we want ft(s) to be jointly measurable as a function from (T, S) to R+.

Here, we suppose T is a σ-algebra on T and the measurability is w.r.t. the product

σ-algebra T ⊗BS := σ(T ×BS). The following result clarifies the connection between

the joint measurability of the spectral functions ft(s) and the measurability of its

corresponding α-Frechet process. The proof can be found in [111].

Proposition II.2. Let (S, µ) be a standard Lebesgue space and Mα (α > 0) be an

α-Frechet random sup-measure on S with control measure µ. Suppose (T, ρT ) is a

separable metric space and T is the Borel σ-algebra.

(i) Let X = Xtt∈T have a spectral representation ftt∈T ⊂ Lα

+(S, µ) as in (2.4).

Then, X has a measurable modification if and only if ft(s)t∈T has a jointly

measurable modification, i.e., there exists a T ⊗BS-measurable mapping (s, t) →

gt(s), such that ft(s) = gt(s) µ-a.e. for all t ∈ T .

(ii) If an α-Frechet process Xtt∈T has a measurable modification, then it satisfies

Condition S, and hence it has a representation as in (2.4).

We always assume (T, ρT ) is a separable metric space and T is the Borel σ-algebra.

By Proposition II.2, any measurable α-Frechet process Xtt∈T always has a jointly

measurable spectral representation and satisfies Condition S.

14

2.2 Spectrally Continuous and Discrete α-Frechet processes

Definition II.3. Consider an α-Frechet processX = Xtt∈T . We sayX is spectrally

discrete, if X can be represented as

Xtt∈Td=

i∈Z

ft(i)Zi,

where Zii∈Z are i.i.d. standard α-Frechet random variables, and for each t ∈ T ,

the map ft : Z → R+ satisfies

ift(i)α < ∞. The α-Frechet process X is spectrally

continuous, if X cannot be represented as

Xtt∈Td=

X(1)

t ∨X(2)t

t∈T

with two independent non-degenerate α-Frechet processes X(1)t t∈T , X

(2)t t∈T , such

that one of them is spectrally discrete.

Theorem II.4. Let Xtt∈T be an α-Frechet process with jointly measurable repre-

sentation ftt∈T ⊂ Lα

+(S, µ). Then, there exist spectrally continuous and discrete

α-Frechet processes Xcontt

t∈T and Xdisct

t∈T , such that two processes are indepen-

dent, and

Xtt∈Td=

Xcont

t∨Xdisc

t

t∈T

.

Furthermore, this decomposition is unique in distribution.

The proof can be found in [111]. The processes Xcontt

t∈T and Xdisct

t∈T are

referred to as the spectrally continuous and spectrally discrete components of X,

respectively.

Example II.5. Let Zi, i ∈ N be independent standard α-Frechet variables and let

ft(i) ≥ 0, t ∈ T be such that

i∈Nfα

t(i) < ∞, for all t ∈ T . The spectrally discrete

α-Frechet processes can have stochastic extremal integral representation as

Xt :=

i∈N

ft(i)Zi ≡

e

N

ftdMα, t ∈ T,

15

where Mα is an α-Frechet random sup-measure on N with counting control measure.

Spectrally discrete max-stable processes have simple structure of conditional dis-

tributions, which will be explored in Chapter V.

Example II.6. Consider the well-known α-Frechet extremal process (α > 0):

(2.7) Xtt∈R+

d=

e

R+

1(0,t](u)Mα(du)

t∈R+

,

where Mα has the Lebesgue control measure on R+ (see e.g. [76], Chapter 4). The

α-Frechet extremal process X is spectrally continuous.

CHAPTER III

Association of Sum- and Max-stable Processes

The deep connection between sum- and max-stable processes has long been sus-

pected. As observed, for example, in [19] the moving maxima and the moving aver-

ages are statistically indistinguishable in the extremes. Also, the maxima of indepen-

dent copies of a sum-stable process (appropriately rescaled) converge in distribution

to a max-stable process and the two processes have very similar spectral represen-

tations (see e.g. [101], Theorem 5.1). In [100], the ergodic properties of max-stable

processes were characterized by borrowing ideas and drawing parallels to existing

work in the sum-stable domain.

In this chapter, we introduce the notion of association of sum- and max-stable

processes, by relating their spectral functions. It provides a theoretical support for

the long-standing folklore that the two classes of processes share similar structures.

Furthermore, we will see that the association method also helps ‘translate’ structural

properties of sum-stable processes to max-stable processes.

We focus on infinite variance symmetric α-stable (SαS, α ∈ (0, 2)) sum-stable

processes and α-Frechet max-stable processes. Recall that an infinite variance SαS

variable X has characteristic function ϕX(t) = E exp−itX = exp−σα|t|α , ∀t ∈

R, where α ∈ (0, 2). On the other hand, Y has an α-Frechet distribution if FY (y) =

16

17

P(Y ≤ y) = exp−σαy−α , ∀y ∈ (0,∞), where now α is in (0,∞). The σ’s in both

cases are positive parameters referred to as scale coefficients.

Recall that X = Xtt∈T is an SαS stochastic process if all its finite linear combi-

nations

n

i=1 aiXti, ai ∈ R, ti ∈ T, are SαS. These processes have convenient integral

(or spectral) representations:

(3.1) Xtt∈Td=

S

ft(s)Mα,+(ds)

t∈T

.

Here ftt∈T ⊂ Lα(S, µ), ‘’ stands for the stable integral andMα,+ is an SαS random

measure on measure space (S, µ) with control measure µ (see [93], Chapters 3 and

13). The representation (3.1) implies that

(3.2) E exp− i

n

j=1

ajXtj

= exp

−

S

n

j=1

ajftj(s)α

µ(ds), aj ∈ R , tj ∈ T ,

which determines the finite-dimensional distributions (f.d.d.) of the SαS process

Xtt∈T .

On the other hand, we have seen in Chapter II that, every α-Frechet process has

an extremal integral representation

(3.3) Ytt∈Td=

e

S

ft(s)Mα,∨(ds)

t∈T

,

with ftt∈T ⊂ Lα

+(S, µ), and

(3.4)

P(Yt1 ≤ a1, . . . , Ytn≤ an) = exp

−

S

n

j=1

ftj(s)/ajα

µ(ds), aj ≥ 0 , tj ∈ T .

The ft’s in (3.1) and (3.3) are called the spectral functions of the sum- or max-

stable processes, respectively. Based on the spectral representations above, we define

association as follows:

Definition III.1 (Associated SαS and α-Frechet processes). We say that an SαS

process Xtt∈T and an α-Frechet process Ytt∈T are associated, if there exist

18

ftt∈T ⊂ Lα

+(S, µ) such that:

Xtt∈Td=

S

ftdMα,+

t∈T

and Ytt∈Td=

e

S

ftdMα,∨

t∈T

.

In this case, we say Xtt∈T and Ytt∈T are associated by ftt∈T .

We need to show that this definition is consistent. That is, if f (1)t t∈T and

f (2)t t∈T are two different spectral representations of certain SαS (α-Frechet, resp.)

process, then the associated α-Frechet (SαS, resp.) processes are equal in finite-

dimensional distributions. This is ensured by the following theorem, which is proved

in Section 3.2 below.

Theorem III.2. Consider two arbitrary collections of functions f (i)1 , . . . , f (i)

n ∈

Lα

+(Si, µi) , i = 1, 2, 0 < α < 2. Then,

(3.5)

n

j=1

ajf(1)j

Lα(S1,µ1)

=

n

j=1

ajf(2)j

Lα(S2,µ2)

, for all aj ∈ R ,

if and only if

(3.6)

n

j=1

ajf(1)j

Lα

+(S1,µ1)=

n

j=1

ajf(2)j

Lα

+(S2,µ2), for all aj ≥ 0 .

Furthermore, Theorem III.2 entails that our notion of association is not merely

formal. For example, stationary or self-similar max-stable processes are associated

with stationary or self-similar sum-stable ones, respectively (see Corollary III.12).

We will also see, however, that there are SαS processes that cannot be associ-

ated to any α-Frechet processes (see Theorem III.13). In particular, we provide a

practical characterization of the max-associable SαS processes Xtt∈T with station-

ary increments characterized by dissipative flow, indexed by T = R or T = Z (see

Proposition III.16).

19

This chapter is organized as follows. In Section 3.1, some preliminaries are pro-

vided. In Section 3.2, we prove Theorem III.2. In Section 3.3, we establish the asso-

ciation of SαS and α-Frechet processes and give examples of both max-associable and

non max-associable SαS processes. In Section 3.4, we show how the association can

serve as a tool to translate available structural results for SαS processes to α-Frechet

processes, and vice versa.

3.1 Preliminaries

We draw a connection between the linear isometries and max-linear isometries,

which play important roles in relating two representations of a given SαS or an α-

Frechet process, respectively. The notion of a linear isometry is well known. To

define a max-linear isometry, we say that a subset F ⊂ Lα

+(S, µ) is a max-linear

space if for all n ∈ N, fi ∈ F , ai > 0,

n

i=1 aifi ∈ F and if F is closed w.r.t. the

metric ρµ,α defined by ρµ,α(f, g) =S|fα − gα|dµ (and recall Remark II.1).

Definition III.3 (Max-linear isometry). Let α > 0 and consider two measure

spaces (S1, µ1) and (S2, µ2) with positive and σ-finite measures µ1 and µ2. Let

F1 ⊂ Lα

+(S1, µ1) be a max-linear space. A mapping U : F1 → Lα

+(S2, µ2), is said to

be a max-linear isometry, if:

(i) for all f1, f2 ∈ F1 and a1, a2 ≥ 0, U(a1f1 ∨ a2 f1) = a1(Uf1) ∨ a2(Uf2), µ2-a.e.

and

(ii) for all f ∈ F1, UfLα

+(S2,µ2)= f

Lα

+(S1,µ1).

A linear (max-linear resp.) isometry may be defined only on a small linear (max-

linear resp.) subspace of Lα(S, µ) (Lα

+(S, µ) resp.). However, this linear (max-linear

resp.) isometry can be extended uniquely to the extended ratio space (extended

20

positive ratio space resp.), which will turn out to be closed w.r.t. both linear and

max-linear combinations.

Definition III.4. Let F be a collection of functions in Lα(S, µ).

(i) The ratio σ-field of F , written ρ(F ) := σ (f1/f2, f1, f2 ∈ F), is defined as the

σ-field generated by ratio of functions in F , with the conventions ±1/0 = ±∞

and 0/0 = 0;

(ii) The extended ratio space of F , written Re(F ), is defined as:

(3.7) Re(F ) := rf : rf ∈ Lα(S, µ), r ∼ ρ(F ), f ∈ F .

Similarly, we define extended positive ratio space:

(3.8) Re,+(F ) := rf : rf ∈ Lα

+(S, µ), r ∼ ρ(F ), r ≥ 0, f ∈ F .

The following result is due to [45] and [109].

Theorem III.5. Let F be a linear (max-linear resp.) subspace of Lα(S1, µ1) with

0 < α < 2 (Lα

+(S1, µ1) with 0 < α < ∞ resp.). If U is a linear (max-linear resp.)

isometry from F to U(F), then U can be uniquely extended to a linear (max-linear

resp.) isometry U : Re(F) → Re(U(F)) (U : Re,+(F) → Re,+(U(F)) resp.), with

the form

(3.9) U(rf) = T (r)U(f) ,

for all rf ∈ Re(F) in (3.7) (rf ∈ Re,+(F) as in (3.8) resp.). Here T is the mapping

from Lα(S1, ρ(F), µ1) to Lα(S1, ρ(U(F)), µ2), induced by a regular set isomorphism

T from ρ(F) to ρ(U(F)).

For the precise definition of a regular set isomorphism T and the induced mapping

T , see [56], [45] or [109]. The following remark provides some intuition. Part (iii) is

especially important since it shows that the two types of isometries can be identified.

21

Remark III.6. (i) U is well defined in the sense that for any rifi ∈ Re(F) , i = 1, 2

in (3.7), if r1f1 = r2f2 , µ1-a.e., then U(r1f1) = U(r2f2) , µ2-a.e. Similar result

holds for rifi ∈ Re,+(F) as in (3.8).

(ii) T maps any two almost disjoint sets to almost disjoint sets. See [56].

(iii) The mapping T is both linear and max-linear, i.e., for a, b ≥ 0,

(3.10) T (af + bg) = aTf + bTg and T (af ∨ bg) = aTf ∨ bTg .

This follows from the definition T1A = 1T (A) for measurable A ⊂ S1 and the

construction of T via simple functions. It is via T that the linearity and max-

linearity are identified.

To make good use of (iii) in Remark III.6, we introduce the notion of positive-

linearity. We say a linear isometry U is positive-linear, if U maps all nonnegative

functions to nonnegative functions. Accordingly, we say that F ⊂ Lα

+(S, µ) is a

positive-linear space, if it is closed w.r.t. the metric ρµ,α and all positive-linear com-

binations, i.e., for all n ∈ N, fi ∈ F , ai ≥ 0, we have g :=

n

i=1 aifi ∈ F . Note that

the metric (f, g) → f − g1∧αLα(S,µ) restricted to Lα

+(S, µ) generates the same topology

as the metric ρµ,α. Clearly, Theorem III.5 holds if F is a positive-linear (instead of

a linear) subspace of Lα

+(S, µ). In this case, U is also positive-linear. We conclude

this section with the following refinement of statement (iii) in Remark III.6.

Proposition III.7. Let U be as in Theorem III.5. If F is a positive-linear subspace

of Lα

+(S1, µ1), then the linear isometry U in (3.9) is also a max-linear isometry

from Re,+(F) to Re,+(U(F)). If F is a max-linear subspace of Lα(S1, µ1), then

the max-linear isometry U in (3.9) is also a positive-linear isometry from Re(F) to

Re(U(F)).

22

Proof. Suppose F is max-linear and U is a max-linear isometry. We show U

is also positive-linear. First, if U in (3.9) is max-linear, then the mapping T

from Lα

+(S1, ρ(F), µ1) to Lα

+(S2, ρ(U(F)), µ2) is both max-linear and linear, by Re-

mark III.6 (iii). Moreover, it is easily seen that T is positive-linear. Now, for

r1f1, r2f2 ∈ Re,+(F) as in (3.8), we have

U(a1r1f1 + a2r2f2) = U

a1r1f1

f1 ∨ f2+ a2r2

f2f1 ∨ f2

(f1 ∨ f2)

= Ta1r1

f1f1 ∨ f2

+ a2r2f2

f1 ∨ f2

U(f1 ∨ f2) = a1U(r1f1) + a2U(r2f2) .

That is, U is positive-linear. The proof of the other case is similar, except that

we need the existence of full support function f in F , guaranteed by Lemma 3.2

in [45].

3.2 Identification of Max-linear and Positive-linear Isometries

In this section we prove Theorem III.2. It will be used to relate SαS and α-

Frechet processes in the next section. To do so, we need to introduce a subspace of

Lα

+(S, µ), which is closed w.r.t. the max-linear and positive-linear combinations. For

any F ⊂ Lα

+(S, µ), let

(3.11) F+ := span+F and F∨ := ∨-spanF

denote the smallest positive-linear and max-linear subspace of Lα

+(S, µ) containing

the collection of functions F , respectively. We call them the max-linear and positive-

linear spaces generated by F , respectively. (We also write F := spanF as the

smallest linear subspace of Lα(S, µ) containing F .) In general, we have F+ = F∨.

This means both F+ and F∨ are too small to be closed w.r.t. both ‘

’ and ‘’

operators. However, these two subspaces generate the same extended positive ratio

23

space, on which the two types of isometries are identical. The following fact is proved

in the Appendix.

Proposition III.8. Suppose F ⊂ Lα

+(S, µ). Then Re,+(F+) = Re,+(F∨).

Proof of Theorem III.2. Let F (i) := f (i)1 , . . . , f (i)

n ⊂ Lα

+(Si, µi). We prove the ‘if’

part. Suppose Relation (3.6) holds and we will show (3.5). Relation (3.6) implies that

there exists unique max-linear isometry U from F(1)∨ onto F

(2)∨ , such that Uf (1)

j=

f (2)j

, 1 ≤ j ≤ n. Thus, Theorem III.5 implies that the mapping

U : Re,+(F(1)∨ ) → Re,+(U(F (1)

∨ ))

with form (3.9) is a max-linear isometry. By Proposition III.7, we have that U is

also a positive-linear isometry. By Proposition III.8, U is a positive-linear isometry

defined on Re,+(F(1)+ ), which implies (3.5). The proof of the ‘if’ part is similar.

To conclude this section, we will address the following question: for f (1)1 , . . . , f (1)

n ∈

Lα(S1, µ1), do there always exist nonnegative f (2)1 , . . . , f (2)

n ∈ Lα

+(S2, µ2) such that

Relation (3.5) holds for any aj ∈ R? The answer is ’No’. As a consequence, in the

next section we will see that there are SαS processes, which cannot be associated to

any α-Frechet process.

Proposition III.9. Consider f (1)j

∈ Lα(S1, µ1), 1 ≤ j ≤ n. Then, there exist some

f (2)j

∈ Lα

+(S2, µ2) , 1 ≤ j ≤ n such that (3.5) holds, if and only if

(3.12) f (1)i

(s)f (1)j

(s) ≥ 0 , µ1-a.e. for all 1 ≤ i, j ≤ n .

When (3.12) is true, one can take f (2)i

(s) := |f (1)i

(s)|, 1 ≤ i ≤ n and (S2, µ2) ≡

(S1, µ1) for (3.5) to hold.

The proof is given in the Appendix. We will call (3.12) the associability condition.

24

3.3 Association of Sum- and Max-stable Processes

In this section, by essentially applying Theorem III.2, we associate an SαS process

to every α-Frechet process by Definition III.1. The associated processes will be shown

to have similar properties. However, we will also see that not all the SαS processes

can be associated to α-Frechet processes. We conclude with several examples.

Remark III.10. In Definition III.1, the associated SαS and α-Frechet processes have

the same α ∈ (0, 2). It is easy to see that, for any α-Frechet process Ytt∈T with

spectral functions ftt∈T , Yβ

t t∈T is α/β-Frechet with spectral functions fβ

t t∈T ,

for all 0 < α, β < ∞. This transformation shows that the parameter α plays essen-

tially no role in characterizing the dependence structure of the α-Frechet process.

Given an SαS process with nonnegative spectral functions, one could associate it

to the 1-Frechet process with spectral functions fα

tt∈T . This leads to no loss of

generality. Here, we chose to pair-up the two α’s for technical convenience.

The following result, a simple application of Theorem III.2, shows the consistency

of Definition III.1, i.e., the notion of association is independent of the choice of the

spectral functions.

Theorem III.11. Suppose an SαS process Xtt∈T and an α-Frechet process Ytt∈T

are associated by f (1)t t∈T ⊂ Lα

+(S1, µ1). Then, f (2)t t∈T ⊂ Lα

+(S2, µ2) is a spectral

representation of Xtt∈T , if and only if it is a spectral representation of Ytt∈T .

Namely,

S1

f (1)t dM (1)

α,+

t∈T

d=

S2

f (2)t dM (2)

α,+

t∈T

,

if and only if

e

S1

f (1)t dM (1)

α,∨

t∈T

d=

e

S2

f (2)t dM (2)

α,∨

t∈T

,

25

where M (i)α,+ and M (i)

α,∨ are SαS random measures and α-Frechet random sup-

measures, respectively, on Si with control measure µi, i = 1, 2.

As an immediate consequence, stationarity and self-similarity are preserved under

association. Here we assume T = Rd or Zd.

Corollary III.12. Suppose an SαS process Xtt∈T and an α-Frechet process

Ytt∈T are associated. Then,

(i) Xtt∈T is stationary if and only if Ytt∈T is stationary.

(ii) Xtt∈T is self-similar with exponent H, if and only if Ytt∈T is self-similar

with exponent H.

Proof. Suppose Xtt∈T and Ytt∈T are associated by ftt∈T ⊂ Lα

+(S, µ). (i) For

any h ∈ T , letting gt = ft+h , ∀t ∈ T , by stationarity of Xtt∈T , we obtain gtt∈T

as another spectral representation. Namely, SftdMα,+t∈T

d=

SgtdMα,+t∈T .

By Theorem III.11, the previous statement is equivalent to e

SftdMα,∨t∈T

d=

e

SgtdMα,∨t∈T , which is equivalent to the fact that Ytt∈T is stationary. The

proof of part (ii) is similar and thus omitted.

Observe that not all SαS processes can be associated to α-Frechet processes, since

not all SαS processes have nonnegative spectral representations. For an SαS process

Xtt∈T with spectral representation ftt∈T to have an associated α-Frechet process,

a necessary and sufficient condition is that for all t1, . . . , tn ∈ T , ft1 , . . . , ftn satisfy

the associability condition (3.12). We say such SαS processes are max-associable.

Now, Proposition III.9 becomes:

Theorem III.13. Any SαS process Xtt∈T with representation (3.1) is max-

26

associable, if and only if for all t1, t2 ∈ T ,

(3.13) ft1(s)ft2(s) ≥ 0 , µ-a.e.

Indeed, by Theorem III.13 for any max-associable spectral representation ftt∈T ,

|ft|t∈T is also a spectral representation for the same process. Clearly, if the spectral

functions are nonnegative, then the SαS process is max-associable. We give two

simple examples next.

Example III.14 (Association of mixed fractional motions). Consider the self-similar

SαS processes Xtt∈R+ with the following representations

(3.14) Xtt∈R+

d=

E

∞

0

tH−1α g

x,

u

t

Mα,+(dx, du)

t∈R+

, H ∈ (0,∞) ,

where (E, E , ν) is a standard Lebesgue space, Mα,+ is an SαS random measure on

X × R+ with control measure m(dx, du) = ν(dx)du and g ∈ Lα(E × R+,m). Such

processes are called mixed fractional motions (see [10]). When g ≥ 0 a.e., the process

Xtt∈R+ is max-associable. The Corollary III.12 implies the associated α-Frechet

process is H-self-similar.

Example III.15 (Association of Chentzov SαS random fields). Recall that Xtt∈Rn

is a Chentzov SαS random field, if

Xtt∈Rn ≡ Mα,+(Vt)t∈Rn

d=

S

1Vt(u)Mα,+(du)

t∈Rn

.

Here, 0 < α < 2, (S, µ) is a measure space and Vt, t ∈ Rn is a family of measurable

sets such that µ(Vt) < ∞ for all t ∈ Rn (see Ch. 8 in [93]). Since 1Vt

(u) ≥ 0, all

Chentzov SαS random fields are max-associable.

We conclude this section with some examples of SαS processes that are not max-

associable. In particular, recall that the SαS processes with stationary increments

27

(zero at t = 0) characterized by dissipative flows were shown in [103] to have repre-

sentation

(3.15) Xtt∈Rd=

E

R

(G(x, t+ u)−G(x, u))Mα,+(dx, du)

t∈R

.

Here, (E, E , ν) is a standard Lebesgue space, Mα,+, α ∈ (0, 2), is an SαS random

measure with control measure m(dx, du) = ν(dx)du and G : E × R → R is a

measurable function such that, for all t ∈ R,

Gt(x, u) = G(x, t+ u)−G(x, u) , x ∈ E, u ∈ R

belongs to Lα(E × R,m). The process Xtt∈R in (3.15) is called a mixed moving

average with stationary increments. The following result provides a partial character-

ization of the max-associable SαS processes Xtt∈T , which have the representation

(3.15). We shall suppose that E is equipped with a metric ρ and endow E ×R with

the product topology.

Proposition III.16. Consider an SαS process Xtt∈R with representation (3.15).

Suppose there exists a closed set N ⊂ E × R, such that m(N ) = 0 and the function

G is continuous at all (x, u) ∈ N c := E × R \ N , w.r.t. the product topology. Then,

Xtt∈R is max-associable, if and only if

(3.16) G(x, u) = f(x)1Ax(u) + c(x), on N

c .

Namely, for all x ∈ E, G(x, u) can take at most two values on N c.

Proof. By Theorem III.13, Xtt∈R is max-associable, if and only if for all t1, t2 ∈ R,

(3.17) Gt1(x, u)Gt2(x, u) = (G(x, t1 + u)−G(x, u))(G(x, t2 + u)−G(x, u)) ≥ 0 ,

m-a.e. (x, u) ∈ E × R .

28

First, we show the ‘if’ part. Define G(x, u) := G(x, u) (given by (3.16)) on N c

and G(x, u) := f(x)1Ax(u) + c(x) on N (if Ax and c(x) are not defined, then set

G(x, u) = 0). Set Gt(x, u) = G(x, u + t) − G(x, u). Note that Gt(x, u) is another

spectral representation of Xtt∈R and for all (x, u), 1Ax(u+ t)− 1Ax

(u)t∈R

can

take at most 2 values, one of which is 0. This observation implies (3.17) with Gt(x, u)

replaced by Gt(x, u), whence Xtt∈R is max-associable.

Next, we prove the ‘only if’ part. We show that (3.17) is violated, if G(x, u) takes

more than 2 different values on (x×R)∩N c for some x ∈ X. Suppose there exist

x ∈ E, ui ∈ R such that (x, ui) ∈ N c and gxi := G(x, ui) are mutually different, for

i = 1, 2, 3. Indeed, without loss of generality we may suppose that gx1 < gx2 < gx3.

Then, by the continuity of G, there exists > 0 such that Bi := B(x, )× (ui−, ui+

) , i = 1, 2, 3 are disjoint sets with B(x, ) := y ∈ E : ρ(x, y) < , ρ is the metric

on E and

(3.18) supB1∩N

c

G(x, u) < infB2∩N

c

G(x, u) ≤ supB2∩N

c

G(x, u) < infB3∩N

c

G(x, u) .

Put t1 = u1−u2 and t2 = u3−u2. Inequality (3.18) implies thatGt1(x, u)Gt2(x, u) < 0

on B2 ∩N c. This, in view of Theorem III.13, contradicts the max-associability. We

have thus shown (3.16).

We give two classes of SαS processes, which cannot be associated to any α-Frechet

processes, according to Proposition III.16.

Example III.17 (Non-associability of linear fractional stable motions). The linear

fractional stable motions (see Ch. 7.4 in [93]) have the following spectral representa-

tions:

Xtt∈Rd=

R

a(t+ u)H−1/α

+ − uH−1/α+

+ b

(t+ u)H−1/α

− − uH−1/α−

Mα,+(du)

t∈R

.

29

HereH ∈ (0, 1), α ∈ (0, 2), H = 1/α, a, b ∈ R and |a|+|b| > 0. By Proposition III.16,

these processes are not max-associable.

Example III.18 (Non-associability of Telecom processes). The Telecom process

offers an extension of fractional Brownian motion consistent with heavy-tailed fluc-

tuations. It is a large scale limit of renewal reward processes and it can be obtained

by choosing the distribution of the rewards accordingly (see [57] and [72]). A Telecom

process Xtt∈R has the following representation

Xtt∈Rd=

R

R

e(H−1)/α (F (es(t+ u))− F (esu))Mα,+(ds, du)

t∈R

,

where 1 < α < 2, 1/α < H < 1, F (z) = (z ∧ 0 + 1)+ , z ∈ R and the SαS random

measure Mα,+ is with control measure mα(ds, du) = dsdu. By Proposition III.16,

the Telecom process is not max-associable.

Remark III.19. It is important that the index T in Proposition III.16 is the entire real

line R. Indeed, in both Example III.17 and III.18, when the time index is restricted

to the half-line T = R+ (or T = R−), the processes Xtt∈T satisfy condition (3.13)

and are therefore max-associable.

3.4 Association of Classifications

In this section, we show how to apply the association technique to relate various

classification results for SαS and α-Frechet processes. Note that, many classifications

of SαS (α-Frechet as well) processes are induced by suitable decompositions of the

measure space (S, µ). The following theorem provides an essential tool for translating

classification results for SαS to α-Frechet processes, and vice versa.

Theorem III.20. Suppose an SαS process Xtt∈T and an α-Frechet process Ytt∈T

are associated by two spectral representations f (i)t t∈T ⊂ Lα

+(Si, µi) for i = 1, 2. That

30

is,

Xtt∈Td=

Si

f (i)t dM (i)

α,+

t∈T

and Ytt∈Td=

e

Si

f (i)t dM (i)

α,∨

t∈T

, i = 1, 2 .

Then, for any measurable subsets Ai ⊂ Si, i = 1, 2, we have

A1

f (1)t dM (1)

α,+

t∈T

d=

A2

f (2)t dM (2)

α,+

t∈T

if and only if

e

A1

f (1)t dM (1)

α,∨

t∈T

d=

e

A2

f (2)t dM (2)

α,∨

t∈T

.

The proof follows from Theorem III.2 by restricting the measures onto the sets Ai, i =

1, 2.

For an SαS process Xtt∈T with spectral functions ftt∈T ⊂ Lα(S, µ), a de-

composition typically takes the form Xtt∈Td=

n

j=1 X(j)t t∈T , where X(j)

t =

A(j) ftdMα,+ for all t ∈ T and A(j), 1 ≤ j ≤ n are disjoint subsets of S =

n

j=1 A(j).

The components X(j)t t∈T , 1 ≤ j ≤ n are independent SαS processes. When

Xtt∈T is max-associable, Theorem III.20 enables us to define the associated de-

composition, for the α-Frechet process Ytt∈T associated with Xtt∈T . Namely,

we have Ytt∈Td=

n

j=1 Y(j)t t∈T , where Y (j)

t =e

A(j) |ft|dMα,∨ for all t ∈ T . Con-

versely, given a decomposition for α-Frechet processes, we can define a corresponding

decomposition for the associated SαS processes.

Example III.21 (Conservative-dissipative decomposition). In the seminal work, [83]

established the conservative-dissipative decomposition for SαS processes. Namely, for

any Xtt∈T with representation (3.1), one have

Xtt∈Td= XC

t+XD

tt∈T ,

where XC

t=

CftdMα,+ and XD

t=

DftdMα,+ for all t ∈ T , with C and D defined

31

by

(3.19) C :=s :

T

ft(s)αλ(dt) = ∞

and D := S \ C .

When Xtt∈T is stationary, the sets C and D correspond to the Hopf decom-

position S = C ∪ D of the non-singular flow associated with Xtt∈T (see [83]

for details). Therefore, XC

tt∈T and XD

tt∈T are referred to as the conservative

and dissipative components of Xtt∈T , respectively. Theorem III.20 enables us to

use (3.19) to establish the parallel decomposition of the associated α-Frechet process

Ytt∈T . Namely, for the associated Ytt∈T , we have, Ytt∈Td= Y C

t∨ Y D

tt∈T ,

where Y C

t=

e

C|ft|dMα,∨ and Y D

t=

e

D|ft|dMα,∨ for all t ∈ T . This decomposition

was established in [109] by using different tools.

Remark III.22. Similar associations can be established for other decompositions,

including positive-null decomposition (see [92] and [109]), and the decompositions of

the above two types for random fields (T = Zd or R

d, see [91] and [108]). A more

specific decomposition for SαS processes with representation (3.15) was developed

in [70], and one can obtain the corresponding decomposition for the associated α-

Frechet process by Theorem III.20.

3.5 Proofs of Auxiliary Results

We first need the following lemma.

Lemma III.23. If F ⊂ Lα

+(S, µ), then

(i) ρ(F ) = ρ(span+(F )) = ρ(∨-span(F )), and

(ii) for any f (1) ∈ span+(F ) and f (2) ∈ ∨-span(F ), f (1)/f (2) ∈ ρ(F ).

32

Proof. (i) First, for any fi, gi ∈ F, ai ≥ 0, bi ≥ 0, i ∈ N, we have

i∈N

aifij∈N

bjgj≤ x

=

i∈N

aifij∈N

bjgj≤ x

=

i∈N

k∈N

j∈N

aifibjgj

< x+1

k

,

hence ρ(∨-span(F )) ⊂ ρ(span+(F )).

To show ρ(span+(F )) ⊂ ρ(∨-span(F )), we shall first prove that ρ(span+(F )) ⊂

ρ(∨-span(F )), where span+(F ) involves only finite positive linear combinations. For

all f1, f2, g1 ∈ F, a1, b1, b2 ≥ 0, we have

a1f1 + a2f2b1g1

≤ x=

qj∈Q

a1f1b1g1

≤ qj∩

a2f2b1g1

≤ x− qj

,

This shows that (a1f1 + a2f2)/b1g1 is ρ(∨-span(F )) measurable. By using the fact

that F contains only nonnegative functions and since

b1g1

a1f1 + a2f2≤ x

=

a1f1 + a2f2

b1g1≥

1

x

, x > 0,

we similarly obtain that (a1f1+a2f2)/(b1g1+b2g2) is ρ(∨-span(F )) measurable. Sim-

ilarly arguments can be used to show that (

n

i=1 aifi)/(

n

i=1 bigi) is ρ(∨-span(F ))

measurable for all ai, bi ≥ 0, fi, gi ∈ F, 1 ≤ i ≤ n.

We have thus shown that ρ(span+(F )) ⊂ ρ(∨-span(F )). If now f, g ∈ span+(F ),

then there exist two sequences fn, gn ∈ span+(F ), such that fn → f and gn → g a.e..

Thus, hn := fn/gn → h := f/g as n → ∞, a.e.. Since hn are ρ(span+(F )) measurable

for all n ∈ N, so is h. Hence ρ(span+(F )) = ρ(span+(F )) ⊂ ρ(∨-span(F )).

(ii) By the previous argument, it is enough to focus on finite linear and max-

linear combinations. Suppose f (1) =

n

i=1 aifi and f (2) =

p

j=1 bjgj for some fi, gj ∈

F, ai, bj ≥ 0, 1 ≤ i ≤ n, 1 ≤ j ≤ p. Then, for all x > 0,

n

i=1 aifip

j=1 bjgj< x

=

p

j=1

n

i=1

aifigj

< xbj∈ ρ(F ) .

It follows that f (1)/f (2) ∈ ρ(F ).

33

Proof of Proposition III.8. First we show Re,+(F∨) ⊃ Re,+(F+), where F∨ and F+

are defined in (3.11). By (3.8), it suffices to show that, for any r2 ∈ ρ(F+), f (2) ∈ F+,

there exist r1 ∈ ρ(F∨) and f (1) ∈ F∨, such that

(3.20) r1f(1) = r2f

(2) .

To obtain (3.20), we need the concept of full support. We say a function g has full

support in F (an arbitrary collection of functions defined on (S, µ)), if g ∈ F and for

all f ∈ F , µ(supp(g)\supp(f)) = 0. Here supp(f) := s ∈ S : f(s) = 0. By Lemma

3.2 in [109], there exists function f (1) ∈ F∨, which has full support in F∨. One can

show that this function has also full support in F+. Indeed, let g ∈ F+ be arbitrary.

Then, there exist gn =

kn

i=1 anigni, ani ≥ 0 and gni ∈ F ⊂ F∨ such that gnµ

−→ g as

n → ∞. Note that µ(supp(gn) \ supp(f)) = 0 for all n. Thus, for all > 0, we have

µ(|gn − g| > ) ≥ µ(|g| > \ supp(f)). Since µ(|gn − g| > ) → 0 as n → ∞, it

follows that µ(|g| > \ supp(f)) = 0 for all > 0, i.e., µ(supp(g) \ supp(f)) = 0.

We have thus shown that f has full support in F+.

Now, set r1 := r2f (2)/f (1)

, we have (3.20). (Note that f (2) = 0 , µ-a.e. on

S \ supp(f (1)). By setting 0/0 = 0, f (2)/f (1) is well defined.) Lemma III.23 (ii)

implies that f (2)/f (1) ∈ ρ(F ), whence r1 ∈ ρ(F ) = ρ(F+). We have thus shown

Re,+(F∨) ⊃ Re,+(F+). In a similar way one can show Re,+(F∨) ⊂ Re,∨(F+).

Proof of Proposition III.9. First, suppose (3.12) does not hold but (3.5) holds. Then,

without loss of generality, we can assume that there exists S(1)0 ⊂ S1 such that

f (1)1 (s) > 0, f (1)

2 (s) < 0 for all s ∈ S(1)0 and µ(S(1)

0 ) > 0. It follows from (3.5)

that there exists a linear isometry U such that, by Theorem III.5, Uf (1)i

= f (2)i

=

T (ri)U(f), with certain f and ri = f (1)i

/f , for i = 1, 2. In particular, f can be taken

with full support. Note that sign(r1) = sign(r2) on S(1)0 . It follows that f (2)

1 and f (2)2

34

have different signs on a set of positive measure (indeed, this set is the image of the

S(1)0 under the regular set isomorphism T ). This contradicts the fact that f (2)

1 and

f (2)2 are both nonnegative on S2.

On the other hand, suppose (3.12) is true. Define Uf (1)i

:= |f (1)i

|. It follows

from (3.12) that U can be extended to a positive-linear isometry from Lα(S1, µ1) to

Lα

+(S2, µ2), which implies (3.5).

CHAPTER IV

Decomposability of Sum- and Max-stable Processes

In this chapter, we investigate the general decomposability problem for both SαS

and α-Frechet processes with 0 < α < 2. We first focus on SαS processes. Then,

by the association method introduced in Chapter III, the counterpart results for

α-Frechet processes are proved with little extra effort in Section 4.3.

Let X = Xtt∈T be an SαS process. We are interested in the case when X can

be written as

(4.1) Xtt∈Td=

X(1)

t + · · ·+X(n)t

t∈T

,

where ‘d=’ stands for ‘equality in finite-dimensional distributions’, and X(k) =

X(k)t t∈T , k = 1, . . . , n are independent SαS processes. We will write X

d= X(1) +

· · ·+X(n) in short, and each X(k) will be referred to as a component of X. The sta-

bility property readily implies that (4.1) holds with X(k) d= n−1/αX ≡ n−1/αXtt∈T .

The components equal in finite-dimensional distributions to a constant multiple of X

will be referred to as trivial. We are interested in the general structure of non-trivial

SαS components of X.

Many important decompositions (4.1) of SαS processes (with non-trivial compo-

nents) are already available in the literature: see for example Cambanis et al. [12],

Rosinski [83], Rosinski and Samorodnitsky [86], Surgailis et al. [103], Pipiras and

35

36

Taqqu [70, 71], and Samorodnitsky [92], to name a few. These results were mo-

tivated by studies of various probabilistic and structural aspects of the underlying

SαS processes such as ergodicity, mixing, stationarity, self-similarity, etc. Notably,

Rosinski [83] established a fundamental connection between stationary SαS processes

and non-singular flows. He developed important tools based on minimal represen-

tations of SαS processes and inspired multiple decomposition results motivated by

connections to ergodic theory.

In this chapter, we adopt a different perspective. Our main goal is to charac-

terize all possible SαS decompositions (4.1). Our results show how the dependence

structure of an SαS process determines the structure of its components.

Consider SαS processes Xtt∈T indexed by a complete separable metric space T

with an integral representation

(4.2) Xtt∈Td=

S

ft(s)Mα(ds)

t∈T

,

with spectral functions ftt∈T ⊂ Lα(S,BS, µ). Recall that for all n ∈ N, tj ∈ T, aj ∈

R,

(4.3) E exp− i

n

j=1

ajXtj

= exp

−

S

n

j=1

ajftj

α

dµ.

Without loss of generality, we always assume that the spectral functions ftt∈T ⊂

Lα(S,BS, µ) have full support, i.e., S = suppft, t ∈ T.

We first state the main result of this chapter. To this end, we recall that the ratio

σ-algebra of a spectral representation F = ftt∈T (of Xt) is defined as

(4.4) ρ(F ) ≡ ρft, t ∈ T := σft1/ft2 , t1, t2 ∈ T.

The following result characterizes the structure of all SαS decompositions.

37

Theorem IV.1. Suppose Xtt∈T is an SαS process (0 < α < 2) with spectral

representation

Xtt∈Td=

S

ft(s)Mα(ds)

t∈T

,

with ftt∈T ⊂ Lα(S,BS, µ). Let X(k)t t∈T , k = 1, · · · , n be independent SαS pro-

cesses.

(i) The decomposition

(4.5) Xtt∈Td=

X(1)

t + · · ·+X(n)t

t∈T

holds, if and only if there exist measurable functions rk : S → [−1, 1], k =

1, · · · , n, such that

(4.6) X(k)t t∈T

d=

S

rk(s)ft(s)Mα(ds)

t∈T

, k = 1, · · · , n.

In this case, necessarily

n

k=1 |rk(s)|α = 1, µ-almost everywhere on S.

(ii) If (4.5) holds, then the rk’s in (4.6) can be chosen to be non-negative and ρ(F )-

measurable. Such rk’s are unique modulo µ.

The rest of the chapter is structured as follows. In Section 4.1, we provide some

consequences of Theorem IV.1 for general SαS processes. The stationary case is

discussed in Section 4.2. Parallel results on max-stable processes are presented in

Section 4.3. The proof of Theorem IV.1 is given in Section 4.4.

4.1 SαS Components

In this section, we provide a few examples to illustrate the consequences of our

main result Theorem IV.1. The first one is about SαS processes with independent

increments. Recall that we always assume 0 < α < 2.

38

Corollary IV.2. Let X = Xtt∈R+ be an arbitrary SαS process with independent

increments and X0 = 0. Then all SαS components of X also have independent

increments.

Proof. Write m(t) = Xtα

α, where Xtα denotes the scale coefficient of the SαS

random variable Xt. By the independence of the increments of X, it follows that

m is a non-decreasing function with m(0) = 0. First, we consider the simple case

when m(t) is right-continuous. Consider the Borel measure µ on [0,∞) determined

by µ([0, t]) := m(t). The independence of the increments of X readily implies that

X has the representation:

(4.7) Xtt∈R+

d=

∞

0

1[0,t](s)Mα(ds)

t∈R+

,

where Mα is an SαS random measure with control measure µ.

Now, for any SαS component Y (≡ X(k)) of X, we have that (4.6) holds with

ft(s) = 1[0,t](s) and some function r(s)(≡ rk(s)). This implies that the increments of

Y are also independent since, for example, for any 0 ≤ t1 < t2, the spectral functions

r(s)ft1(s) = r(s)1[0,t1](s) and r(s)ft2(s) − r(s)ft1(s) = r(s)1(t1,t2](s) have disjoint

supports.

It remains to prove the general case. The difficulty is that m(t) may have (at

most countably many) discontinuities, and a representation as (4.7) is not always

possible. Nevertheless, introduce the right-continuous functions t → mi(t), i = 0, 1,

m0(t) := m(t+)−

τ≤t

(m(τ)−m(τ−)) and m1(t) :=

τ≤t

(m(τ)−m(τ−))

and let Mα be an SαS random measure on R+×0, 1 with control measure µ([0, t]×

i) := mi(t), i = 0, 1, t ∈ R+. In this way, as in (4.7) one can show that

Xtt∈Td=

R+×0,1

1[0,t)×0(s, v) + 1[0,t]×1(s, v)Mα(ds, dv)

t∈T

.

39

The rest of the proof remains similar and is omitted.

Remark IV.3. Theorem IV.1 and Corollary IV.2 do not apply to the Gaussian case

(α = 2). For the sake of simplicity, take T = 1, 2 and n = 2 (2 SαS components)

in (4.1). In this case, all the (in)dependence information of the mean-zero Gaussian

process Xtt∈T is characterized by the covariance matrix Σ of the Gaussian vector

(X(1)1 , X(2)

1 , X(1)2 , X(2)

2 ). A counterexample can be easily constructed by choosing

appropriately Σ. This reflects the drastic difference of the geometries of Lα spaces

for α < 2 and α = 2.

The next natural question to ask is whether two SαS processes have common

components. Namely, the SαS process Z is a common component of the SαS processes

X and Y , if Xd= Z + X(1) and Y

d= Z + Y (1), where X(1) and Y (1) are both SαS

processes independent of Z.

To study the common components, the co-spectral point of view introduced in

Wang and Stoev [111] is helpful. Consider a measurable SαS process Xtt∈T with

spectral representation (4.2), where the index set T is equipped with a measure λ

defined on the σ-algebra BT . Without loss of generality, we take f(·, ·) : (S×T,BS ×

BT ) → (R,BR) to be jointly measurable (see Theorems 9.4.2 and 11.1.1 in [93]).

The co-spectral functions, f·(s) ≡ f(s, ·), are elements of L0(T ) ≡ L0(T,BT ,λ), the

space of BT -measurable functions modulo λ-null sets. The co-spectral functions are

indexed by s ∈ S, in contrast to the spectral functions ft(·) indexed by t ∈ T . Recall

also that a set P ⊂ L0(T ) is a cone, if cP = P for all c ∈ R \ 0 and 0 ∈ P . We

write f·(s)s∈S ⊂ P modulo µ, if for µ-almost all s ∈ S, f·(s) ∈ P .

Proposition IV.4. Let X(i) = X(i)t t∈T be SαS processes with measurable represen-

tations f (i)t t∈T ⊂ Lα(Si,BSi

, µi), i = 1, 2. If there exist two cones Pi ⊂ L0(T ), i =

40

1, 2, such that f (i)· (s)s∈Si

⊂ Pi modulo µi, for i = 1, 2, and P1 ∩ P2 = 0, then

the two processes have no common component.

Proof. Suppose Z is a component of X(1). Then, by Theorem IV.1, Z has a spectral

representation r(1)f (1)t t∈T , for some BS1-measurable function r(1). By the definition

of cones, the co-spectral functions of Z are included in P1, i.e., r(1)(s)f(1)· (s)s∈S1 ⊂

P1 modulo µ1. If Z is also a component of X(2), then by the same argument,

r(2)(s)f (2)· (s)s∈S2 ⊂ P2 modulo µ2, for some BS2-measurable function r(2)(s). Since

P1∩P2 = 0, it then follows that µi(supp(r(i))) = 0, i = 1, 2, or equivalently Z = 0,

the degenerate case.

We conclude this section with an application to SαS moving averages.

Corollary IV.5. Let X(1)and X(2)

be two SαS moving averages

X(i)t t∈Rd

d=

Rd

f (i)(t+ s)M (i)α(ds)

t∈Rd

with kernel functions f (i) ∈ Lα(Rd,BRd ,λ), i = 1, 2. Then, either

(4.8) X(1) d= cX(2)

for some c > 0 ,

or X(1)and X(2)

have no common component. Moreover, (4.8) holds, if and only if

for some τ ∈ Rdand ∈ ±1,

(4.9) f (1)(s) = cf (2)(s+ τ) , µ-almost all s ∈ S.

Proof. Clearly (4.9) implies (4.8). Conversely, if (4.8) holds, then (4.9) follows as in

the proof of Corollary 4.2 in [111], with slight modification (the proof therein was

for positive cones). When (4.8) (or equivalently (4.9)) does not hold, consider the

smallest cones containing f (i)(s+ ·)s∈R, i = 1, 2 respectively. Since these two cones

have trivial intersection 0, Proposition IV.4 implies that X(1) and X(2) have no

common component.

41

4.2 Stationary SαS Components and Flows

LetX = Xtt∈T be a stationary SαS process with representation (4.2), where now

T = Rd or T = Z

d, d ∈ N. The seminal work of Rosnski [83] established an important

connection between stationary SαS processes and flows. A family of functions φtt∈T

is said to be a flow on (S,BS, µ), if for all t1, t2 ∈ T , φt1+t2(s) = φt1(φt2(s)) for all

s ∈ S, and φ0(s) = s for all s ∈ S. We say that a flow is non-singular, if µ(φt(A)) = 0

is equivalent to µ(A) = 0, for all A ∈ BS, t ∈ T . Given a flow φtt∈T , ctt∈T is

said to be a cocycle if ct+τ (s) = ct(s)cτ φt(s) µ-almost surely for all t, τ ∈ T and

ct ∈ ±1 for all t ∈ T .

To understand the relation between the structure of stationary SαS processes

and flows, it is necessary to work with minimal representations of SαS processes,

introduced by Hardin [45, 46]. The minimality assumption is crucial in many results

on the structure of SαS processes, although it is in general difficult to check (see

e.g. Rosinski [85] and Pipiras [69]).

Definition IV.6. The spectral functions F ≡ ftt∈T (and the corresponding spec-

tral representation (4.2)) are said to be minimal, if the ratio σ-algebra ρ(F ) in (4.4)

is equivalent to BS, i.e., for all A ∈ BS, there exists B ∈ ρ(F ) such that µ(A∆B) = 0,

where A∆B = (A \B) ∪ (B \ A).

Rosinski ([83], Theorem 3.1) proved that if ftt∈T is minimal, then there exists

a modulo µ unique non-singular flow φtt∈T , and a corresponding cocycle ctt∈T ,

such that for all t ∈ T ,

(4.10) ft(s) = ct(s)dµ φt

dµ(s)

1/αf0 φt(s) , µ-almost everywhere.

Conversely, suppose that (4.10) holds for some non-singular flow φtt∈T , a cor-

responding cocycle ctt∈T , and a function f0 ∈ Lα(S, µ) (ftt∈T not necessarily

42

minimal). Then, clearly the SαS process X in (4.2) is stationary. In this case, we

shall say that X is generated by the flow φtt∈T .

Consider now an SαS decomposition (4.1) of X, where the independent com-

ponents X(k)t t∈T ’s are stationary. This will be referred to as a stationary SαS

decomposition, and the X(k)t t∈T ’s as stationary components of X. Our goal in this

section is to characterize the structure of all possible stationary components. This

characterization involves the invariant σ-algebra with respect to the flow φtt∈T :

(4.11) Fφ = A ∈ BS : µ(φτ (A)∆A) = 0 , for all τ ∈ T .

Given a function g and a σ-algebra G, we write g ∈ G, if g is measurable with respect

to G.

Theorem IV.7. Let Xtt∈T be a stationary and measurable SαS process with spec-

tral functions ftt∈T given by

ft(s) =

S

ct(s)dµ φt

dµ(s)

1/αf0 φt(s)Mα(ds), t ∈ T .

(i) Suppose that Xtt∈T has a stationary SαS decomposition

(4.12) Xtt∈Td=

X(1)

t + · · ·+X(n)t

t∈T

.

Then, each component X(k)t t∈T has a representation

(4.13) X(k)t t∈T

d=

S

rk(s)ft(s)Mα(ds)

t∈T

, k = 1, · · · , n,

where the rk’s can be chosen to be non-negative and ρ(F )-measurable. This choice is

unique modulo µ and these rk’s are φ-invariant, i.e. rk ∈ Fφ.

(ii) Conversely, for any φ-invariant rk’s such that

n

k=1 |rk(s)|α = 1, µ-almost ev-

erywhere on S, decomposition (4.12) holds with X(k)’s as in (4.13).

43

Proof. By using (4.10), a change of variables, and the φ-invariance of the functions

rk’s, one can show that the X(k)’s in (4.13) are stationary. This fact and Theorem

IV.1 yield part (ii).

We now show (i). Suppose that X(k) is a stationary (SαS) component of X.

Theorem IV.1 implies that there exists unique modulo µ non-negative and ρ(F )-

measurable function rk for which (4.13) holds. By the stationarity of X(k), it also

follows that for all τ ∈ T , rk(s)ft+τ (s)t∈T is also a spectral representation of X(k).

By the flow representation (4.10), it follows that for all t, τ ∈ T ,

(4.14) ft+τ (s) = cτ (s)ft φτ (s)dµ φτ

dµ

1/α(s) , µ-almost everywhere,

and we obtain that for all τ, tj ∈ T, aj ∈ R, j = 1, · · · , n:

S

n

j=1

ajrk(s)ftj+τ (s)α

µ(ds) =

S

n

j=1

ajrk φ−τ (s)ftj(s)α

µ(ds),

which shows that rk φ−τ (s)ft(s)t∈T is also a representation for X(k), for all τ ∈ T .

Observe that from (4.14), for all t1, t2, τ ∈ T and λ ∈ R,

ft1+τ

ft2+τ

≤ λ= φ−1

τ

ft1ft2

≤ λmodulo µ.

It then follows that for all τ ∈ T , the σ-algebra φ−τ (ρ(F )) ≡ (φτ )−1(ρ(F )) is equiv-

alent to ρ(F ). This, by the uniqueness of rk ∈ ρ(F ) (Theorem IV.1), implies that

rk φτ = rk modulo µ, for all τ . Then, rk ∈ Fφ follows from standard measure-

theoretic argument. The proof is complete.

Remark IV.8. The structure of the stationary SαS components of stationary SαS pro-

cesses (including random fields) has attracted much interest since the seminal work

of Rosinski [83, 84]. See, for example, Pipiras and Taqqu [71], Samorodnitsky [92],

Roy [87, 88], Roy and Samorodnitsky [91], Roy [89, 90], and Wang et al. [108]. In

44

view of Theorem IV.7, the components considered in these works correspond to in-

dicator functions rk(s) = 1Ak(s) of certain disjoint flow-invariant sets Ak’s arising

from ergodic theory (see e.g. Krengel [55] and Aaronson [1]).

Theorem IV.7 can be applied to check indecomposability of stationary SαS pro-

cesses. Recall that a stationary SαS process is said to be indecomposable, if all its

stationary SαS components are trivial (i.e. constant multiples of the original process).

Corollary IV.9. Consider Xtt∈T as in Theorem IV.7. If Fφ is trivial, then

Xtt∈T is indecomposable. The converse is true when, in addition, ftt∈T is mini-

mal.

Proof. If Fφ is trivial, the result follows from Theorem IV.7. Conversely, let ftt∈T

be minimal and X indecomposable. Then, one can choose A ∈ Fφ, such that µ(A) >

0 and µ(S \ A) > 0. Then, consider

XA

tt∈T

d=

S

1A(s)ft(s)Mα(ds)

t∈T

.

By Theorem IV.7, XA is a stationary component of X. It suffices to show that XA

is a non-trivial of X, which would contradict the indecomposability.

Suppose that XA is trivial, then cXA d= X, for some c > 0. Thus, by Theorem

IV.7, cXA has a representation as in (4.13), with rk := c1A. On the other hand, since

cXA d= X, we also have the trivial representation with rk := 1. Since A ∈ ρ(F ),

the uniqueness of rk implies that 1 = c1A modulo µ, which contradicts µ(Ac) > 0.

Therefore, XA is non-trivial.

The indecomposable stationary SαS processes can be seen as the elementary build-

ing blocks for the construction of general stationary SαS processes. We conclude this

section with two examples.

45

Example IV.10 (Mixed moving averages). Consider a mixed moving average in the

sense of [102]:

(4.15) Xtt∈Rd

d=

Rd×V

f(t+ s, v)Mα(ds, dv)

t∈Rd

.

Here, Mα is an SαS random measure on Rd×V with the control measure λ×ν, where

λ is the Lebesgue measure on (Rd,BRd) and ν is a probability measure on (V,BV ), and

f(s, v) ∈ Lα(Rd × V,BRd×V ,λ× ν). Given a disjoint union V =

n

j=1 Aj, where Aj’s

are measurable subsets of V , the mixed moving averages can clearly be decomposed

as in (4.12) with

X(k)t t∈Rd

d=

Rd×Ak

f(t+ s, v)Mα(ds, dv)

t∈Rd

, for all k = 1, . . . , n .

Any moving average process

(4.16) Xtt∈Rd

d=

Rd

f(t+ s)Mα(ds)

t∈Rd

trivially has a mixed moving average representation. The next result shows when

the converse is true.

Corollary IV.11. The mixed moving average X in (4.15) is indecomposable, if and

only if it has a moving average representation as in (4.16).

Proof. By Corollary IV.9, the moving average process (4.16) is indecomposable, since

in this case φt(s) = t + s, t, s ∈ Rd and therefore Fφ is trivial. This proves the ‘if’

part.

Suppose now thatX in (4.15) is indecomposable. In Section 5 of Pipiras [69] it was

shown that SαS processes with mixed moving average representations and stationary

increments also have minimal representations of the mixed moving average type. By

using similar arguments, one can show that this is also true for the class of stationary

mixed moving average processes.

46

Thus, without loss of generality, we assume that the representation in (4.15) is

minimal. Suppose now that there exists a set A ∈ BV with ν(A) > 0 and ν(Ac) > 0.

Since Rd × A and R

d × Ac are flow-invariant, we have the stationary decomposition

Xtt∈Rd

d= XA

t+XA

c

tt∈Rd , where

XB

t:=

R×V

1B(v)f(t+ s, v)Mα(ds, dv), B ∈ A,Ac.

Note that both components XA = XA

tt∈Rd and XA

c

= XAc

tt∈Rd are non-zero

because the representation of X has full support.

Now, since X is indecomposable, there exist positive constants c1 and c2, such

that Xd= c1XA d

= c2XAc

. The minimality of the representation and Theorem IV.7

imply that c11A = c21Ac modulo ν, which is impossible. This contradiction shows

that the set V cannot be partitioned into two disjoint sets of positive measure. That

is, V is a singleton and the mixed moving average is in fact a moving average.

Example IV.12 (Doubly stationary processes). Consider a stationary process ξ =

ξtt∈T (T = Zd) supported on the probability space (E, E , µ) with ξt ∈ Lα(E, E , µ).

Without loss of generality, we may suppose that ξt(u) = ξ0 φt(u), where φtt∈T is

a µ-measure-preserving flow.

Let Mα be an SαS random measure on (E, E , µ) with control measure µ. The

stationary SαS process X = Xtt∈T

(4.17) Xt :=

E

ξt(u)Mα(du), t ∈ T

is said to be doubly stationary (see Cambanis et al. [11]). By Corollary IV.9, if ξ is

ergodic, then X is indecomposable.

A natural and interesting question raised by a referee is: what happens when X

is decomposable and hence ξ is non-ergodic? Can we have a direct integral decom-

47

position of the process X into indecomposable components? The following remark

partly addresses this question.

Remark IV.13. The doubly stationary SαS processes are a special case of station-

ary SαS processes generated by positively recurrent flows (actions). As shown in

Samorodnitsky [92], Remark 2.6, each such stationary SαS process X = Xtt∈T can

be expressed through a measure-preserving flow (action) on a finite measure space.

Namely,

(4.18) Xtt∈Td=

E

ft(u)M(µ)α

(du)

t∈T

, with ft(u) := ct(u)f0 φt(u),

where M (µ)α is an SαS random measure with a finite control measure µ on (E, E),

φ = φtt∈T is a µ-preserving flow (action), and ctt∈T is a co-cycle with respect to

φ. In the case when the co-cycle is trivial (ct ≡ 1) and µ(E) = 1, the process X is

doubly stationary.

For simplicity, suppose that T = Zd and without loss of generality let (E, E , µ)

be a standard Lebesgue space with µ(E) = 1. The ergodic decomposition theorem

(see e.g. Keller [52], Theorem 2.3.3) implies that there exists conditional probability

distributions µuu∈E with respect to I such that φ is measure-preserving and ergodic

with respect to the measures µu for µ-almost all u ∈ E. Let ν be another φ-invariant

measure on (E, E) dominating the conditional probabilities µu so that the Radon–

Nikodym derivatives p(x, u) = (dµu/dν)(x) are jointly measurable on (E × E, E ⊗

E , ν × µ). Consider

gt(x, u) = ft(x)p(φt(x), u)1/α.

Recall that ν and µu are φ-invariant, whence

p(φt(x), u) =dµu

dν(φt(x)) =

dµu

dν(x) = p(x, u), modulo ν × µ.

48

Thus, gt(x, u) = ft(x)(dµu/dν)1/α(x), and for all aj ∈ R, tj ∈ T, j = 1, · · · , n, we

have

E2

n

j=1

ajgtj(x, u)α

ν(dx)µ(du) =

E2

n

j=1

ajftj(x)αdµu

dν(x)ν(dx)µ(du)

=

E2

n

j=1

ajftj(x)α

dµu(dx)µ(du)

=

E

n

j=1

ajftj(x)α

µ(dx),

where the last equality follows from the identity that

E

h(x)µ(dx) =

E2

h(x)µu(dx)µ(du), for all h ∈ L1(E, E , µ).

We have thus shown that Xtt∈T defined by (4.18) has another spectral represen-

tation

(4.19) Xtt∈Td=

E×E

gt(x, u)M(ν×µ)α

(dx, du)

t∈T

,

where M (ν×µ)α is an SαS random measure on E × E with control measure ν × µ. It

also follows that for µ-almost all u ∈ E, the process defined by

X(u)t

:=

E

gt(x, u)M(ν)α

(dx), t ∈ T,

is indecomposable, where M (ν)α has control measure ν. Indeed, as above, one can

show that

X(u)t t∈T

d=

E

ft(u, x)M(µu)α

(dx)

t∈T

,

where M (µu)α has control measure µu. The ergodic decomposition theorem implies

that the flow (action) φ is ergodic with respect to µu, which by Corollary IV.9 implies

the indecomposability of X(u) = X(u)t t∈T . In this way, (4.19) parallels the mixed

moving average representation for stationary SαS processes generated by dissipative

flows (see e.g. Rosinski [83]).

49

Remark IV.14. The above construction of the decomposition (4.19) assumes the

existence of a φ-invariant measure ν dominating all conditional probabilities µu, u ∈

E. If the measure µ, restricted on the invariant σ-algebra Fφ is discrete, i.e. Fφ

consists of countably many atoms under µ, then one can take ν ≡ µ. In this case,

the process X is decomposed into a sum (possibly infinite) of its indecomposable

components:

Xt =

k

Ek

ft(x)M(µ)α

(dx),

where the Ek’s are disjoint φ-invariant measurable sets, such that E = ∪kEk and

φ|Ekis ergodic, for each k. In this case, the Ek’s are the atoms of Fφ.

In general, when µ|Fφis not discrete, the dominating measure ν if it exists, may not

be σ-finite. Indeed, since the φt’s are ergodic for µu, it follows that either µu = µu

or µu and µu are singular, for µ-almost all u, u ∈ E. Thus, if Fφ is “too rich”,

this singularity feature implies that the measure ν may not be chosen to be σ-finite.

4.3 Decomposability of Max-stable Processes

In this section, we state and prove some results on the (max-)decomposability of

max-stable processes. Again, we focus on α-Frechet processes.

Let Y = Ytt∈T be an α-Frechet process. If

(4.20) Ytt∈Td=

Y (1)t ∨ · · · ∨ Y (n)

t

t∈T

,

for some independent α-Frechet processes Y (k) = Y (k)t t∈T , i = 1, · · · , n, then we

say that the Y (k)’s are components of Y . By the max-stability of Y , (4.20) trivially

holds if the Y (k)’s are independent copies of n−1/αYtt∈T . The constant multiples of

Y are referred to as trivial components of Y and as in the SαS case, we are interested

in the structure of the non-trivial ones.

50

The association method can be readily applied to transfer decomposability results

for SαS processes to the max-stable setting. Let Y = Ytt∈T be an α-Frechet

(α ∈ (0, 2)) process with extremal representation

(4.21) Ytt∈Td=

e

S

ft(s)M∨

α(ds)

t∈T

,

where ftt∈T ⊂ Lα

+(S,BS, µ) are spectral functions, and recall that

(4.22) P(Yti≤ yi, i = 1, · · · , n) = exp

−

S

max1≤i≤n

fti(s)

yi

α

µ(ds),

for all yi > 0, ti ∈ T, i = 1, · · · , n.

Assume 0 < α < 2. Recall that, an SαS process X and an α-Frechet process Y

are said to be associated if they have a common spectral representation. That is, if

for some non-negative ftt∈T ⊂ Lα

+(S,BS, µ), Relations (4.2) and (4.21) hold.

To illustrate the association method in Chapter III, we prove the max-stable

counterpart of our main result Theorem IV.1. From the proof, we can see that the

other results in the sum-stable setting have their natural max-stable counterparts by

association. We briefly state some of these results at the end of this section.

Theorem IV.15. Suppose Ytt∈T is an α-Frechet process with spectral represen-

tation (4.21), where F ≡ ftt∈T ⊂ Lα

+(S,BS, µ). Let Y (k)t t∈T , k = 1, · · · , n, be

independent α-Frechet processes. Then the decomposition (4.20) holds, if and only

if there exist measurable functions rk : S → [0, 1], k = 1, · · · , n, such that

(4.23) Y (k)t t∈T

d=

e

S

rk(s)ft(s)M∨

α(ds)

t∈T

, k = 1, · · · , n.

In this case,

n

k=1 rk(s)α = 1, µ-almost everywhere on S and the rk’s in (4.23) can

be chosen to be ρ(F )-measurable, uniquely modulo µ.

Proof. The ‘if’ part follows from straight-forward calculation of the cumulative distri-

bution functions (4.22). To show the ‘only if’ part, suppose (4.20) holds and Y (k) has

51

spectral functions g(k)t t∈T ⊂ Lα

+(Vk,BBk, νk), k = 1, . . . , n. Without loss of general-

ity, assume Vkk=1,...,n to be mutually disjoint and define gt(v) =

n

k=1 g(k)t (v)1Vk

∈

Lα

+(V,BV , ν) for appropriately defined (V,BV , ν) (see the proof of Theorem IV.1).

Now, consider the SαS processX associated to Y . It has spectral functions ftt∈T

and gtt∈T . Consider the SαS processesX(k) associated to Y (k) via spectral functions

g(k)t t∈T for k = 1, . . . , n. By checking the characteristic functions, one can show

that X(k)k=1,...,n form a decomposition of X as in (4.1). Then, by Theorem IV.1,

each SαS component X(k) has a spectral representation (4.6) with spectral functions

rkftt∈T . But we introduced X(k) as the SαS process associated to Y (k) via spectral

representation g(k)t t∈T . Hence, X(k) has spectral functions g(k)t t∈T and rkftt∈T ,

and so does Y (k) by Theorem III.11. Therefore, (4.23) holds and the rest of the

desired results follow.

Further parallel results can be established by the association method. Consider

a stationary α-Frechet process Y . If Y (k), k = 1, . . . , n are independent stationary

α-Frechet processes such that (4.20) holds, then we say each Y (k) is a stationary α-

Frechet component of Y . The process Y is said to be indecomposable, if it has no non-

trivial stationary component. The following results on (mixed) moving maxima (see

e.g. [101] and [50] for more details) follow from Theorem IV.15 and the association

method, in parallel to Corollary IV.11 on (mixed) moving averages in the sum-stable

setting.

Corollary IV.16. The mixed moving maxima process

Ytt∈Rd

d=

e

Rd×V

f(t+ s, v)M∨

α(ds, dv)

t∈Rd

is indecomposable, if and only if it has a moving maxima representation

Ytt∈Rd

d=

e

Rd

f(t+ s)M∨

α(ds)

t∈Rd

.

52

4.4 Proof of Theorem IV.1

We will first show that Theorem IV.1 is true when ftt∈T is minimal (Proposi-

tion IV.18), and then we complete the proof by relating a general spectral representa-

tions to a minimal one. This technique is standard in the literature of representations

of SαS processes (see e.g. Rosinski [83], Remark 2.3). We start with a useful lemma.

Lemma IV.17. Let ftt∈T ⊂ Lα(S,BS, µ) be a minimal representation of an SαS

process. For any two bounded BS-measurable functions r(1) and r(2), we have

S

r(1)ftdMα

t∈T

d=

S

r(2)ftdMα

t∈T

,

if and only if |r(1)| = |r(2)| modulo µ.

Proof. The ’if’ part is trivial. We shall prove now the ’only if’ part. Let S(k) :=

supp(r(k)), k = 1, 2 and note that since ftt∈T is minimal, then r(k)ftt∈T , are

minimal representations, restricted to S(k), k = 1, 2, respectively. Since the latter

two representations correspond to the same process, by Theorem 2.2 in [83], there

exist a bi-measurable, one-to-one and onto point mapping Ψ : S(1) → S(2) and a

function h : S(1) → R \ 0, such that, for all t ∈ T ,

(4.24) r(1)(s)ft(s) = r(2) Ψ(s)ft Ψ(s)h(s) , almost all s ∈ S(1),

and

(4.25)dµ Ψ

dµ= |h|α , µ-almost everywhere.

It then follows that, for almost all s ∈ S(1),

(4.26)ft1(s)

ft2(s)=

r(1)(s)ft1(s)

r(1)(s)ft2(s)=

ft1 Ψ(s)

ft2 Ψ(s).

53

Define Rλ(t1, t2) = s : ft1(s)/ft2(s) ≤ λ and note that by (4.26), for all A ≡

Rλ(t1, t2),

(4.27) µ(Ψ(A ∩ S(1))∆(A ∩ S(2))) = 0 .

In fact, one can show that Relation (4.27) is also valid for all A ∈ ρ(F ) ≡

σ(Rλ(t1, t2) : λ ∈ R, t1, t2 ∈ T ). Then, by minimality, (4.27) holds for all

A ∈ BS. In particular, taking A equal to S(1) and S(2), respectively, it follows

that µ(S(1)∆S(2)) = 0. Therefore, writing S := S(1) ∩ S(2), we have

(4.28) µ(Ψ(A ∩ S)∆(A ∩ S)) = 0, for all A ∈ BS .

This implies that Ψ(s) = s, for µ-almost all s ∈ S. To see this, let BS = BS ∩ S

denote the σ-algebra BS restricted to S. Observe that for all A ∈ BS, we have

1A = 1A Ψ, for µ-almost all s ∈ S, and trivially σ(1A : A ∈ BS) = BS. Thus,

by the second part of Proposition 5.1 in [85], it follows that Ψ(s) = s modulo µ on

S. This and (4.25) imply that h(s) ∈ ±1, almost everywhere. Plugging Ψ and h

into (4.24) yields the desired result.

Proposition IV.18. Theorem IV.1 is true when ftt∈T is minimal.

Proof. We first prove the ’if ’ part. The result follows readily by using characteristic

functions. Indeed, suppose that the X(k) = X(k)t t∈T , k = 1, . . . , n are independent

and have representations as in (4.6). Then, for all aj ∈ R, tj ∈ T, j = 1, · · · ,m, we

have

(4.29) E expi

m

j=1

ajXtj

= exp

−

S

m

j=1

ajftj

α

dµ

=n

k=1

exp−

S

m

j=1

ajrkftj

α

dµ=

n

k=1

E expi

m

j=1

ajX(k)tj

,

54

where the second equality follows from the fact that

n

k=1 |rk(s)|α = 1, for µ-almost

all s ∈ S. Relation (4.29) implies the decomposition (4.1).

We now prove the ’only if ’ part. Suppose that (4.1) holds and let f (k)t t∈T ⊂

Lα(Vk,BVk, νk), k = 1, . . . , n be representations for the independent components

X(k)t t∈T , k = 1, . . . , n, respectively, and without loss of generality, assume that

Vkk=1,...,n are mutually disjoint. Introduce the measure space (V,BV , ν), where

V :=

n

k=1 Vk, BV :=

n

k=1 Ak, Ak ∈ BVk, k = 1, . . . , n and ν(A) :=

n

k=1 νk(A ∩

Vk) for all A ∈ BV .

By decomposition (4.1), it follows that Xtt∈Td=

VgtdMαt∈T , with gt(u) :=

n

k=1 f(k)t (u)1Vk

(u) andMα an SαS random measure on (V,BV ) with control measure

ν.

Thus, ftt∈T ⊂ Lα(S,BS, µ) and gtt∈T ⊂ Lα(V,BV , ν) are two representations

of the same process X, and by assumption the former is minimal. Therefore, by

Remark 2.5 in [83], there exist modulo ν unique functions Φ : V → S and h : V →

R \ 0, such that, for all t ∈ T ,

(4.30) gt(u) = h(u)ft Φ(u) , almost all u ∈ V ,

where moreover µ = νh Φ−1 with dνh = |h|αdν.

Recall that V is the union of mutually disjoint sets Vkk=1,...,n. For each k =

1, . . . , n, let Φk : Vk → Sk := Φ(Vk) be the restriction of Φ to Vk, and define the

measure µk(·) := νh,k Φ−1k( · ∩ Sk) on (S,BS) with dνh,k := |h|αdνk. Note that

µk has support Sk, and the Radon–Nikodym derivative dµk/dµ exists. We claim

that (4.6) holds with rk := (dµk/dµ)1/α. To see this, observe that for all m ∈

N, a1, . . . , am ∈ R, t1, . . . , tm ∈ T ,

S

m

j=1

ajrkftj

α

dµ =

Sk

m

j=1

ajftj

α

dµk =

Vk

m

j=1

ajhftj Φk

α

dνk ,

55

which, combined with (4.30), yields (4.6) because gt|Vk= f (k)

t .

Note also that

n

k=1 µk = µ and thus

n

k=1 rα

k= 1. This completes the proof of

part (i) of Theorem IV.1 in the case when ftt∈T is minimal.

To prove part (ii), note that the rk’s above are in fact non-negative and BS-

measurable. Note also that by minimality, the rk’s have versions rk’s that are ρ(F )-

measurable, i.e. rk = rk modulo µ. Their uniqueness follows from Lemma IV.17.

Proof of Theorem IV.1. (i) The ‘if’ part follows by using characteristic functions as

in the proof of Proposition IV.18 above.

Now, we prove the ‘only if’ part. Let ftt∈T ⊂ Lα(S,BS, µ) be a minimal represen-

tation of X. As in the proof of Proposition IV.18, by Remark 2.5 in [83], there exist

modulo µ unique functions Φ : S → S and h : S → R \ 0, such that, for all t ∈ T ,

(4.31) ft(s) = h(s) ft Φ(s) , almost all s ∈ S,

where µ = µh Φ−1 with dµh = |h|αdµ.

Now, by Proposition IV.18, if the decomposition (4.1) holds, then there exist

unique non-negative functions rk, k = 1, · · · , n, such that

(4.32) X(k)t t∈T

d=

Srk ftdMα

t∈T

, k = 1, · · · , n,

and

n

k=1 rαk = 1 modulo µ. Here Mα is an SαS measure on (S,BS) with control

measure µ. Let rk(s) := rk Φ(s) and note that by using (4.31) and a change of

variables, for all aj ∈ R, tj ∈ T, j = 1, · · · ,m, we obtain

(4.33)

S

m

j=1

ajrk(s)ftj(s)µ(ds) =

S

m

j=1

ajrk(s) ftj(s)µ(ds).

This, in view of Relation (4.32), implies (4.6). Further, the fact that

n

k=1 rαk = 1

implies

n

k=1 rα

k= 1, modulo µ, because the mapping Φ is non-singular, i.e. µΦ−1 ∼

µ. This completes the proof of part (i).

56

We now focus on proving part (ii). Suppose that (4.6) holds for two choices of rk,

namely rkand r

k. Let also r

kand r

kbe non-negative and measurable with respect

to ρ(F ). We claim that

(4.34) ρ(F ) ∼ Φ−1(ρ( F ))

and defer the proof to the end. Then, since the minimality implies that BS ∼ ρ( F ).

rkand r

kare measurable with respect to ρ(F ) ∼ Φ−1(BS). Now, Doob–Dynkin’s

lemma (see e.g. Rao [75], p. 30) implies that

(4.35) rk(s) = r

k Φ(s) and r

k(s) = r

k Φ(s), for µ almost all s,

where rkand r

kare two BS-measurable functions. By using the last relation and a

change of variables, we obtain that (4.33) holds with (rk, rk) replaced by (rk, r

k) and

(rk, r

k), respectively. Thus both r

kftt∈T and r

kftt∈T are representations of the

k-th component of X. Since ftt∈T is a minimal representation of X, Lemma IV.17

implies that rk= r

kmodulo µ. This, by (4.35) and the non-singularity of Φ yields

rk= r

kmodulo µ.

It remains to prove (4.34) Relation (4.31) and the fact that h(s) = 0 imply that

for all λ and t1, t2 ∈ T , ft1/ft2 ≤ λ = Φ−1( ft1/ ft2 ≤ λ) modulo µ. Thus the

classes of sets C := ft1/ft2 ≤ λ, t1, t2 ∈ T, λ ∈ R and C := Φ−1( ft1/ ft2 ≤

λ), t1, t2 ∈ T, λ ∈ R are equivalent. That is, for all A ∈ C, there exists A ∈ C,

with µ(A∆ A) = 0 and vice versa.

Define

G =Φ−1(A) : A ∈ ρ( F ) such thatµ(Φ−1(A)∆B) = 0 for some B ∈ σ(C)

.

Notice that G is a σ-algebra and since C ⊂ G ⊂ Φ−1(ρ( F )), we obtain that σ(C) =

Φ−1(ρ( F )) ≡ G. This, in view of definition of G, shows that for all A ∈ σ(C), exists

57

A ∈ σ(C) with µ(A∆ A) = 0. In a similar way one can show that each element of

σ(C) is equivalent to an element in σ(C), which completes the proof of the desired

equivalence of the σ-algebras.

CHAPTER V

Conditional Sampling for Max-stable Processes

The modeling and parameter estimation of the univariate marginal distributions

of the extremes have been studied extensively (see e.g. Davison and Smith [21], de

Haan and Ferreira [24], Resnick [77] and the references therein). Many of the recent

developments of statistical inference in extreme value theory focus on the character-

ization, modeling and estimation of the dependence for multivariate extremes. In

this context, building adequate max-stable processes and random fields plays a key

role. See for example de Haan and Pereira [25], Buishand et al. [9], Schlather [94],

Schlather and Tawn [95], Cooley et al. [17], and Naveau et al. [64].

This chapter is motivated by an important and long-standing challenge, namely,

the prediction for max-stable random processes and fields. Suppose that one already

has a suitable max-stable model for the dependence structure of a random field

Xtt∈T . The field is observed at several locations t1, . . . , tn ∈ T and one wants to

predict the values of the field Xs1 , . . . , Xsmat some other locations. The optimal

predictors involve the conditional distribution of Xtt∈T , given the data. Even

if the finite-dimensional distributions of the field Xtt∈T are available in analytic

form, it is typically impossible to obtain a closed-form solution for the conditional

distribution. Naıve Monte Carlo approximations are not practical either, since they

58

59

involve conditioning on events of infinitesimal probability, which leads to mounting

errors and computational costs.

Prior studies of Davis and Resnick [19, 20] and Cooley et al. [17], among others,

have shown that the prediction problem in the max-stable context is challenging,

and it does not have an elegant analytical solution. On the other hand, the growing

popularity and the use of max-stable processes in various applications, make this an

important problem. This motivated us to seek a computational solution.

5.1 Overview

In this chapter, we develop theory and methodology for sampling from the con-

ditional distributions of spectrally discrete max-stable models. More precisely, we

provide an algorithm that can generate efficiently exact independent samples from the

regular conditional probability of (Xs1 , . . . , Xsm), given the values (Xt1 , . . . , Xtn

). For

the sake of simplicity, we write X = (X1, . . . , Xn) ≡ (Xt1 , . . . , Xtn). The algorithm

applies to the general max-linear model:

(5.1) Xi = maxj=1,...,p

ai,jZj ≡

p

j=1

ai,jZj , i = 1, . . . , n.

where the ai,j’s are known non-negative constants and the Zj’s are independent

continuous non-negative random variables. Any multivariate max-stable distribution

can be approximated arbitrarily well via a max-linear model with sufficiently large

p (see e.g. Remark II.1).

The main idea is to first generate samples from the regular conditional probability

distribution of Z | X = x, where Z = (Zj)j=1,...,p. Then, the conditional distributions

of

Xsk=

p

j=1

bk,jZj , k = 1, . . . ,m,

60

given X = x can be readily obtained, for any given bk,j’s. In this chapter, we assume

that the model is completely known, i.e., the parameters ai,j and bk,j are given.

The statistical inference for these parameters is beyond the scope of this chapter.

Observe that if X = x, then (5.1) implies natural equality and inequality con-

straints on the Zj’s. More precisely, (5.1) gives rise to a set of so-called hitting

scenarios. In each hitting scenario, a subset of the Zj’s equal, in other words hit,

their upper bounds and the rest of the Zj’s can take arbitrary values in certain open

intervals. We will show that the regular conditional probability of Z | X = x is

a weighted mixture of the various distributions of the vector Z, under all possible

hitting scenarios corresponding to X = x.

The resulting formula, however, involves determining all hitting scenarios, which

becomes computationally prohibitive for large and even moderate values of p. This

issue is closely related to the NP-hard set-covering problem in computer science (see

e.g. [13]).

Fortunately, further detailed analysis of the probabilistic structure of the max-

linear models allows us to obtain a different formula of the regular conditional prob-

ability (Theorem V.9). It yields an exact and computationally efficient algorithm,

which in practice can handle complex max-linear models with p in the order of thou-

sands, on a conventional desktop computer. The algorithm is implemented in the R

([74]) package maxLinear [107], with the core part written in C/C++. We also used

the R package fields ([37]) to generate some of the figures in this chapter.

We illustrate the performance of our algorithm over two classes of processes: the

max-autoregressive moving average (MARMA) time series (Davis and Resnick [19]),

and the Smith model (Smith [98]) for spatial extremes. The MARMA processes

are spectrally discrete max-stable processes, and our algorithm applies directly. In

61

Section 5.4, we demonstrate the prediction of MARMA processes by conditional

sampling and compare our result to the projection predictors proposed in [19]. To

apply our algorithm to the Smith model, on the other hand, we first need to discretize

the (spectrally continuous) model. Section 5.5 is devoted to conditional sampling

for the discretized Smith model. Thanks to the computational efficiency of our

algorithm, we can choose a mesh fine enough to obtain a satisfactory discretization.

Figure 5.1 shows four realizations from such a discretized Smith model, conditioning

only on 7 observations (with assumed value 5). The algorithm applies in the same

way to more complex models.

−2 −1 0 1

−2−1

01

5

5 5 5

10

10

10

15

15

55

5

5

55

5

−2 −1 0 1

−2−1

01

2

4

4

4

6

6

6

6

8

8

8

8

10

10

12

55

5

5

55

5

−2 −1 0 1

−2−1

01

5

5

5

5 5

10

10

15

55

5

5

55

5

−2 −1 0 1

−2−1

01

5

5

5

5

5

5

10

10

15

55

5

5

55

5

Conditional sampling from the Smith model

Parameters:ρ=0,β1=1,β2=1

5

10

15

Figure 5.1:Four samples from the conditional distribution of the discrete Smith model (see Sec-tion 5.5), given the observed values (all equal to 5) at the locations marked by crosses.

62

Remark V.1. We shall focus on spectrally discrete max-stable processes (see Chap-

ter II):

Xt :=p

j=1

φj(t)Zj, t ∈ T,

where the φj(t)’s are non-negative deterministic functions. By taking sufficiently

large p’s and with judicious φj(t)’s, one can build flexible models that can replicate

the behavior of an arbitrary max-stable process (recall the metric (2.5) characterizing

the convergence of stochastic extremal integrals). From this point of view, a satisfac-

tory computational solution must be able to deal with max-linear models with large

p’s.

Remark V.2. After our work [112] was published, the exact conditional distributions

of spectrally continuous max-stable processes were addressed by Dombry and Eyi–

Minko [31] via a different approach. Nevertheless, they also have a similar notion of

hitting scenarios introduced below.

5.2 Conditional Probability in Max-linear Models

Consider the max-linear model in (5.1). We shall denote this model by:

(5.2) X = A⊙ Z,

where A = (ai,j)n×p is a matrix with non-negative entries, X = (X1, . . . , Xn) and

Z = (Z1, . . . , Zp) are column vectors. We assume that the Zj’s, j = 1, . . . , p, are

independent non-negative random variables having probability densities.

In this section, we provide an explicit formula for the regular conditional probability

of Z with respect to X (see Theorem V.4 below). We start with some intuition and

notation. Throughout this chapter, we assume that the matrix A has at least one

nonzero entry in each of its rows and columns. This will be referred to as Assumption

A.

63

Observe that if x = A⊙ z with x ∈ Rn

+, z ∈ Rp

+, then

(5.3) 0 ≤ zj ≤ zj ≡ zj(A,x) := min1≤i≤n

xi/ai,j, j = 1, . . . , p.

That is, the max-linear model (5.2) imposes certain inequality and equality con-

straints on the Zj’s, given a set of observed Xi’s. Namely, some of the upper bounds

zj(A,x) in (5.3) must be attained, or hit, i.e., zj = zj(A,x) in such a way that

xi = ai,j(i)zj(i), i = 1, . . . , n,

with judicious j(i) ∈ 1, . . . , p. The next example helps to understand the inequality

and equality constraints.

Example V.3. Suppose that n = p = 3 and

A =

1 0 0

1 1 0

1 1 1

.

Let x = A⊙z for some z ∈ R3+. In this case, it necessarily follows that x1 ≤ x2 ≤ x3.

Moreover, (5.3) yields z = x.

(i) If x = (1, 2, 3), then it trivially follows that z = z = (1, 2, 3), which is an

equality constraint on z.

(ii) If x = (1, 1, 3), then it follows that z1 = z1 = 1, z2 ≤ z2 = 1 and z3 = z3 = 3.

Here, the “equality constraints” must hold for z1 = z1 and z3 = z3, while z2

only needs to satisfy the “inequality constraint” 0 ≤ z2 ≤ z2.

Write

C(A,x) := z ∈ Rp

+ : x = A⊙ z,

64

and note that the conditional distribution of Z | X = x concentrates on the set

C(A,x). The observation in Example V.3 can be generalized and formulated as

follows.

• Every z ∈ C(A,x) corresponds to a set of active (equality) constraints J ⊂

1, . . . , p, which we refer to as a hitting scenario of (A,x), such that

(5.4) zj = zj(A,x), j ∈ J and zj < zj(A,x), j ∈ J c := 1, . . . , p \ J.

Observe that if j ∈ J , then there are no further constraints and zj can take any

value in [0, zj), regardless of the values of the other components of the vector

z ∈ C(A,x).

• Every value x may give rise to many different hitting scenarios J ⊂ 1, . . . , p.

Let J (A,x) denote the collection of all such J ’s. We refer to J (A,x) as to the

hitting distribution of x w.r.t. A:

J (A,x) ≡J ⊂ 1, . . . , p : exist z ∈ C(A,x), such that (5.4) holds

.

To illustrate the notions of hitting scenario and hitting distribution, consider again

Example V.3. Therein, we have J (A,x) = 1, 2, 3 in case (i), and J (A,x) =

1, 3, 1, 2, 3 in case (ii).

The hitting distribution J (A,x) is a finite set and thus can always be identified.

However, the identification procedure is the key difficulty in providing an efficient

algorithm for conditional sampling in practice. This issue is addressed in Section 5.3.

In the rest of this section, suppose that J (A,x) is given. Then, we can partition

C(A,x) as follows

C(A,x) =

J∈J (A,x)

CJ(A,x) ,

65

where

CJ(A,x) = z ∈ Rp

+ : zj = zj, j ∈ J and zj < zj, j ∈ J.

The sets CJ(A,x), J ∈ J (A,x) are disjoint since they correspond to different hitting

scenarios in J (A,x). Let

(5.5) r(J (A,x)) = minJ∈J (A,x)

|J | ,

where |J | is the number of elements in J . We call r(J (A,x)) the rank of the hitting

distribution J (A,x). It equals the minimal number of equality constraints among the

hitting scenarios in J (A,x). It will turn out that the hitting scenarios J ⊂ J (A,x)

with |J | > r(J (A,x)) occur with (conditional) probability zero and can be ignored.

We therefore focus on the set of all relevant hitting scenarios:

Jr(A,x) = J ∈ J (A,x) : |J | = r(J (A,x)).

Theorem V.4. Consider the max-linear model in (5.2), where Zj’s are independent

random variables with densities fZjand distribution functions FZj

, j = 1, . . . , p. Let

A = (ai,j)n×p have non-negative entries satisfying Assumption A and let RRp

+be the

class of all rectangles (e, f ], e, f ∈ Rp

+ in Rp

+.

For all J ∈ J (A,x), E ∈ RRp

+, and x ∈ R

n

+, define

(5.6) νJ(x, E) :=

j∈J

δzj(πj(E))

j∈Jc

PZj ∈ πj(E) | Zj < zj,

where πj(z1, . . . , zp) = zj and δa is a unit point-mass at a.

Then, the regular conditional probability ν(x, E) of Z w.r.t. X equals:

(5.7) ν(x, E) =

J∈Jr(A,x)

pJ(A,x)νJ(x, E), E ∈ RRp

+,

for PX-almost all x ∈ A⊙ (Rp

+), where for all J ∈ Jr(A,x),

(5.8) pJ(A,x) =wJ

K∈Jr(A,x) wK

with wJ =

j∈J

zjfZj(zj)

j∈Jc

FZj(zj).

66

In the special case when the Zj’s are α-Frechet with scale coefficient 1, we have

wJ =

j∈J(zj)−α.

Remark V.5. We state (5.7) only for rectangle sets E because the projections πj(B)

of an arbitrary Borel set B ⊂ Rp

+ are not always Borel (see e.g. [99]). Nevertheless,

the extension of measure theorem ensures that Formula (5.7) specifies completely

the regular conditional probability.

We do not provide a proof of Theorem V.4 directly. Instead, we will first provide

an equivalent formula for ν(x, E) in Theorem V.9 in Section 5.3, and then prove that

ν(x, E) is the desired regular conditional probability. All the proofs are deferred to

Section 5.6. The next example gives the intuition behind Formula (5.7).

Example V.6. Continue with Example V.3.

(i) IfX = x = (1, 2, 3), then z = x, J (A,x) = 1, 2, 3. Therefore, r(J (A,x)) =

3 and Formula (5.7) yields

ν(x, E) = νJ(x, E) = δz1(π1(E))δz2(π2(E))δz3(π3(E)) ≡ δz(E) ,

a degenerate distribution with single unit point mass at z.

(ii) If X = x = (1, 1, 3), then, z = x, J (A,x) = 1, 3, 1, 2, 3, and

r(J (A,x)) = 2. Therefore, Jr(A,x) = 1, 3 and Formula (5.7) yields:

ν(x, E) = ν1,3(x, E) = δz1(π1(E))P(Z2 ∈ π2(E) | Z2 < z2)δz3(π3(E)).

In this case, the conditional distribution concentrates on the one-dimensional

set 1× (0, 1)× 3.

(iii) Finally, if X = x = (1, 1, 1), then z = x and J (A,x) = 1, 1, 2, 1, 2, 3.

Then, Jr(A,x) = 1 and

ν(x, E) = ν1(x, E) = δz1(π1(E))3

j=2

P(Zj ∈ πj(E) | Zj < zj).

67

The conditional distribution concentrates on the set 1× (0, 1)× (0, 1).

We conclude this section by showing that the conditional distributions (5.7) arise

as suitable limits. This result can be viewed as a heuristic justification of Theo-

rem V.4. Let > 0, consider

(5.9) C

J(A,x) :=

z ∈ R

p

+ : zj ∈ [zj(1−), zj(1+)], j ∈ J, zk < zk(1−) , k ∈ J c

,

and set

(5.10) C(A,x) :=

J∈J (A,x)

C

J(A,x) .

Note that the sets A⊙ (C(A,x)) shrink to the point x, as ↓ 0.

Proposition V.7. Under the assumptions of Theorem V.4, for all x ∈ A ⊙ (Rp

+),

we have, as ↓ 0,

(5.11) P(Z ∈ E | Z ∈ C(A,x)) −→ ν(x, E), E ∈ RRp

+.

Proof. Recall the definition of C

Jin (5.9). Observe that for all > 0, the sets

C

J(A,x)J∈J (A,x) are mutually disjoint. Thus, writing C ≡ C(A,x) and C

J≡

C

J(A,x), by (5.10) we have

P(Z ∈ E | Z ∈ C) =

J∈J

P(Z ∈ E | Z ∈ C

J)P(Z ∈ C

J| Z ∈ C)

=

J∈J

P(Z ∈ E | Z ∈ C

J)

P(Z ∈ C

J)

K∈JP(Z ∈ C

K),(5.12)

where the terms with P(Z ∈ C

J) = 0 are ignored. One can see that P(Z ∈ E | Z ∈

C

J) converge to νJ(E,x) in (5.6), as ↓ 0. The independence of the Zj’s also implies

that

(5.13) P(Z ∈ C

J) =

j∈J

P(Zj ∈ [zj(1− ), zj(1 + )])

k∈Jc

P(Zk ≤ zk(1− ))

=

j∈J

fZj

(zj)zj · 2+ o()

k∈Jc

FZj

(zj) + o().

68

Observe that for J ∈ Jr(A,x), the latter expression equals 2wJ |J |(1 + o(1)), ↓ 0

and the terms with |J | > r will become negligible since they are of smaller order.

Therefore, Relation (5.13) yields (5.7), and the proof is thus complete.

The proof of Proposition V.7 provides an insight to the expressions of the weights

wJ ’s in (5.8) and the components νJ ’s in (5.6). In particular, it explains why only hit-

ting scenarios of rank r are involved in the expression of the conditional probability.

The formal proof of Theorem V.4, however, requires a different argument.

5.3 Conditional Sampling: Computational Efficiency

We discuss here important computational issues related to sampling from the reg-

ular conditional probability in (5.7). It turns out that identifying all hitting scenarios

amounts to solving the set covering problem, which is NP-hard (see e.g. [13]). The

probabilistic structure of the max-linear models, however, will lead us to an alter-

native efficient solution, valid with probability one. In particular, we will provide

a new formula for the regular conditional probability, showing that Z can be de-

composed into conditionally independent vectors, given X = x. As a consequence,

with probability one we are not in the ‘bad’ situation that the corresponding set

covering problem requires exponential time to solve. Indeed, this will lead us to an

efficient and linearly-scalable algorithm for conditional sampling, which works well

for max-linear models with large dimensions n× p arising in applications.

To fix ideas, observe that Theorem V.4 implies the following simple algorithm.

Algorithm I:

1. Compute zj for j = 1, . . . , p.

2. Identify J (A,x), compute r = r(J (A,x)) and focus on the set of relevant

69

hitting scenarios Jr = Jr(A,x).

3. Compute wJJ∈Jrand pJJ∈Jr

.

4. Sample Z ∼ ν(x, ·) according to (5.7).

Step 1 is immediate. Provided that Step 2 is done, Step 3 is trivial and, Step 4

can be carried out by first picking a hitting scenario J ∈ Jr(A,x) (with probability

pJ(A,x)), setting Zj = zj, for j ∈ J and then resampling independently the remain-

ing Zj’s from the truncated distributions: Zj | Zj < zj, for all j ∈ 1, . . . , p \ J .

The most computationally intensive aspect of this algorithm is to identify the

set of all relevant hitting scenarios Jr(A,x) in Step 2. This is closely related to

the NP-hard set covering problem in theoretical computer science (see e.g. [13]),

which is formulated next. Let H = (hi,j)n×p be a matrix of 0’s and 1’s, and let

c = (cj)p

j=1 ∈ Zp

+ be a p-dimensional cost vector. For simplicity, introduce the

notation:

m ≡ 1, 2, . . . ,m, m ∈ N.

For the matrix H, we say that the column j ∈ p covers the row i ∈ n, if hi,j = 1.

The goal of the set-covering problem is to find a minimum-cost subset J ⊂ p, such

that every row is covered by at least one column j ∈ J . This is equivalent to solving

(5.14) minδj∈0,1j∈p

j∈p

cjδj , subject to

j∈p

hi,jδj ≥ 1 , i ∈ n .

We can relate the problem of identifying Jr(A,x) to the set covering problem by

defining

(5.15) hi,j = 1ai,jzj=xi,

where A = (ai,j)n×p and x = (xi)ni=1 are as in (5.2), and cj = 1 , j ∈ p. It is easy

70

to see that, every J ∈ Jr(A,x) corresponds to a solution of (5.14), and vice versa.

Namely, for δjj∈p minimizing (5.14), we have J = j ∈ p : δj = 1 ∈ Jr(A,x).

The set Jr(A,x) corresponds to the set of all solutions of (5.14), which depends

only on the matrix H. Therefore, in the sequel we write Jr(H) for Jr(A,x), and

(5.16) H = (hi,j)n×p ≡ H(A,x),

with hi,j as in (5.15) will be referred to as the hitting matrix.

Example V.8. Recall Example V.6. The following hitting matrices correspond to

the three cases of x discussed therein:

H(i) =

1 0 0

0 1 0

0 0 1

, H(ii) =

1 0 0

1 1 0

0 0 1

and H(iii) =

1 0 0

1 1 0

1 1 1

.

Observe that solving for Jr(H) is even more challenging than solving the set

covering problem (5.14), where only one minimum-cost subset J is needed, and

often an approximation of the optimal solution is acceptable. Here, we need to

identify exhaustively all J ’s such that (5.14) holds. Fortunately, this problem can

be substantially simplified, thanks to the probabilistic structure of the max-linear

model.

We first study the distribution ofH. In view of (5.16), we have thatH = H(A,X),

with X = A⊙Z, is a random matrix. It will turn out that, with probability one, H

has a nice structure, leading to an efficient conditional sampling algorithm.

For any hitting matrix H, we will decompose the set p ≡ 1, . . . , p into a

certain disjoint union p =

r

s=1 J(s). The vectors (Zj)

j∈J(s) , s = 1, . . . , r will turn

out to be conditionally independent (in s), given X = x. Therefore, ν(x, E) will be

expressed as a product of (conditional) probabilities.

71

We start by decomposing the set n ≡ 1, . . . , n. First, for all i1, i2 ∈ n , j ∈

p, we write i1j

∼ i2 , if hi1,j = hi2,j = 1. Then, we define an equivalence relation on

n:

(5.17) i1 ∼ i2, if i1 = i0j1∼ i1

j2∼ · · ·

jm∼ im = i2 ,

with some m ≤ n, i1 = i0,i1, . . . ,im = i2 ∈ n, j1, . . . , jm ∈ p. That is, ‘∼’ is the

transitive closure of ‘j

∼’. Consequently, we obtain a partition of n, denoted by

(5.18) n =r

s=1

Is ,

where Is, s = 1, . . . , r are the equivalence classes w.r.t. (5.17). Based on (5.18), we

define further

J (s) =j ∈ p : hi,j = 1 for all i ∈ Is

,(5.19)

J(s)

=j ∈ p : hi,j = 1 for some i ∈ Is

.(5.20)

The sets J (s), J(s)s∈r will determine the factorization form of ν(x, E).

Theorem V.9. Let Z be as in Theorem V.4. Let also H be the hitting matrix

corresponding to (A,X) with X = A ⊙ Z, and J (s), J(s)s∈r be the sets defined

in (5.19) and (5.20). Then, with probability one, we have

(i) r = r(J (A,X)),

(ii) for all J ⊂ p, J ∈ Jr(A,A⊙ Z) if and only if J can be written as

(5.21) J = j1, . . . , jr with js ∈ J (s) , s ∈ r ,

(iii) for ν(x, E) defined in (5.7),

(5.22) ν(X, E) =r

s=1

ν(s)(X, E) with ν(s)(X, E) =

j∈J(s) w

(s)j(X)ν(s)

j(X, E)

j∈J(s) w

(s)j(X)

,

72

where for all j ∈ J (s),

w(s)j(x) := zjfZj

(zj)

k∈J(s)

\j

FZk(zk) ,(5.23)

ν(s)j(x, E) := δπj(E)(zj)

k∈J(s)

\j

P(Zk ∈ πk(E)|Zk < zk),(5.24)

with zj = zj(x) as in (5.3).

The proof of Theorem V.9 is given in Section 5.6.

Remark V.10. Note that this result does not claim that ν(x, E) in (5.22) is the regular

conditional probability. It merely provides an equivalent expression for (5.7), which

is valid with probability one. We still need to show that (5.7), or equivalently (5.22),

is indeed the regular conditional probability.

From (5.23) and (5.24), one can see that ν(s) is the conditional distribution of

(Zj)j∈J

(s) . Therefore, Relation (5.22) implies that (Zj)j∈J

(s)s∈r, as vectors in-

dexed by s, are conditionally independent, given X = x. This leads to the following

improved conditional sampling algorithm:

Algorithm II:

1. Compute zj for j = 1, . . . , p and the hitting matrix H = H(A,x).

2. Identify J (s), J(s)s∈r by (5.19) and (5.20).

3. Compute w(s)jj∈J(s) for all s ∈ r by (5.23).

4. Sample (Zj)j∈J

(s) | X = x ∼ ν(s)(x, ·) independently for s = 1, . . . , r.

5. Combine the sampled (Zj)j∈J

(s) , s = 1, . . . , r to obtain a sample Z.

This algorithm identifies all hitting scenarios in an efficient way. To illustrate

its efficiency compared to Algorithm I, consider that r = 10 and |J (s)| = 10 for all

73

Table 5.1:Means and standard deviations (in parentheses) of the running times (in seconds) forthe decomposition of the hitting matrix H, based on 100 independent observations X =A⊙ Z, where A is an (n× p) matrix corresponding to a discretized Smith model.

p \ n 1 5 10 502500 0.03 (0.02) 0.13 (0.03) 0.24 (0.04) 1.25 (0.09)10000 0.11 (0.04) 0.50 (0.05) 1.00 (0.08) 4.98 (0.33)

s ∈ 10. Then, applying Formula (5.7) in Algorithm I requires storing in memory

the weights of all 1010 hitting scenarios. In contrast, the implementation of (5.22)

requires saving only 10 × 10 weights. This improvement is critical in practice since

it allows us to handle large, realistic models.

Table 5.1 demonstrates the running times of Algorithm II as a function of the

dimensions n × p of the matrix A. It is based on a discretized 2-d Smith model

(Section 5.5) and measured on an Intel(R) Core(TM)2 Duo CPU E4400 2.00GHz

with 2GB RAM. It is remarkable that the times scale linearly in both n and p.

5.4 MARMA Processes

In this section, we apply our result to the max-autoregressive moving average

(MARMA) processes studied by Davis and Resnick [19]. A stationary process

Xtt∈Z is a MARMA(m, q) process if it satisfies the MARMA recursion:

(5.25) Xt = φ1Xt−1 ∨ · · · ∨ φmXt−m ∨ Zt ∨ θ1Zt−1 ∨ · · · ∨ θqZt−q ,

for all t ∈ Z, where φi ≥ 0, θj ≥ 0, i = 1, . . . ,m, j = 1, . . . , q are the parameters,

and Ztt∈Z are i.i.d. 1-Frechet random variables. Proposition 2.2 in [19] shows

that, (5.25) has a unique solution in form of

(5.26) Xt =∞

j=0

ψjZt−j < ∞ , almost surely,

74

with ψj ≥ 0, j ≥ 0,

∞

j=0 ψj < ∞, if and only if φ∗ =

m

i=1 φi < 1. In this case,

ψj =j∧q

k=0

αj−kθk ,

where αjj∈Z are determined recursively by αj = 0 for all j < 0, α0 = 1 and

(5.27) αj = φ1αj−1 ∨ φ2αj−2 ∨ · · · ∨ φmαj−m , ∀j ≥ 1 .

In the sequel, we will focus on the MARMA process (5.25) with unique stationary

solution (5.26). In this case, the MARMA process is a spectrally discrete max–stable

process. Without loss of generality, we also assume Zkk∈Z to be standard 1-Frechet.

We consider the prediction of the MARMA process in the following framework:

suppose at each time t ∈ 1, . . . , n we observe the value Xt of the process, and

the goal is to predict Xsn<s≤n+N . We do so by generating i.i.d. samples from the

conditional distribution Xsn<s≤n+N | Xtt=1,...,n. To apply our result, it suffices to

provide a max-linear representation of this model. We will truncate (5.26) to obtain

(5.28) Xt =p

j=0

ψjZt−j , ∀t = 1, . . . , n+N .

The truncated process can approximate the original one arbitrarily well, if we take p

large enough. Indeed, by using the independence and max-stability of the Zt’s, one

can show that

(5.29) P( Xt = Xt) = P

p

j=0

ψjZt−j ≥

∞

j=p+1

ψjZt−j

= 1−

∞

j=p+1 ψj∞

j=0 ψj

−→ 1 ,

as p → ∞. Moreover, by induction on αj in (5.27), one can show that αj ≤ (φ∗)j/m

for all j ∈ N, and thus the convergence (5.29) above is geometrically fast.

Now, we reformulate the prediction problem with the model (5.28) as follows:

observe X[1,n] = A⊙ Z, and predict Y[1,N ] = B ⊙ Z | X[1,n] ,

75

with the notation X[1,n] = ( X1, . . . , Xn), Y[1,N ] = ( Xn+1, . . . , Xn+N) and Z =

(Z1−p, Z2−p, . . . , Zn+N). Here, A ∈ Rn×(p+n+N)+ , B ∈ R

N×(p+n+N)+ are determined

by (5.28). In particular,

(5.30)

A

B

=

ψp ψp−1 · · · ψ0 0 0 · · · 0

0 ψp ψp−1 · · · ψ0 0 · · · 0

.... . . . . . . . . . . . . . .

...

0 · · · 0 ψp ψp−1 · · · ψ0 0

0 · · · 0 0 ψp ψp−1 · · · ψ0

.

In practice, given the observations X[1,n], we use our algorithm to sample from the

conditional distribution Z | X[1,n]. Therefore, we can sample

(5.31) Y[1,N ] | X[1,n]d= B⊙ Z | X[1,n] .

Our approach is different from the prediction considered in [19], which we will

briefly review. Davis and Resnick took the classic time series point of view and

investigated how to approximate Xs by a max-linear combination of Xtt=1,...,n,

w.r.t. a certain metric d. Namely, for all Y ∈ H with

H = ∞

j=−∞

αjZj : αj ≥ 0,∞

j=−∞

αj < ∞

,

they considered a projection of Y onto the space Fn, max-linearly spanned by

Xtt=1,...,n: Fn =

∞

j=0 bjXn−j : bj ≥ 0,

∞

j=0 bj < ∞. That is, consider the

projection PnY defined by

(5.32) PnY = argminY ∈Fnd(Y , Y )

with the metric d induced by d(

jαjZj,

jβjZj) =

j|αj − βj|. For specific

MARMA processes, [19] provided predictors based on the projection (5.32). We

will refer to these predictors as the projection predictors.

76

In general, the conditional samplings reflect the conditional distribution (5.31),

and they provide more information than the projection predictors. Sampling multiple

times from (5.31), we can calculate e.g., conditional medians, conditional means,

quantiles, etc., which are optimal predictors with respect to various loss functions.

Example V.11 (MAR(m) processes). Consider the MAR(m) ≡MARMA(m, 0) pro-

cess with

(5.33) Xt = φ1Xt−1 ∨ · · · ∨ φmXt−m ∨ Zt .

The projection predictor for this model can be obtained recursively by

(5.34) Xt+k = φ1Xt+k−1 ∨ · · · ∨ φm

Xt+k−m ,

with Xt = Xt, t = 1, . . . , n (see [19], p. 799).

Figure 5.2 illustrates an application of our conditional sampling algorithm in this

case. Consider an MAR(3) process Xt150t=1 with φ1 = 0.7,φ2 = 0.5 and φ3 = 0.3. In

effect, we use the truncated model Xtt∈N in (5.28) with p = 500, but we still write

Xt for the sake of simplicity. Treating the first 100 values as observed, we plot the

projection predictor, conditional upper 95%-quantiles and the conditional medians

of Xs150s=101 based on 500 independent samples from the conditional distribution.

Observe that the value of the projection predictor in Figure 5.2 is always below

the conditional median. This “underestimation” phenomenon was typical in all the

simulations we performed. It can be explained by the fact that, the projection

predictor in (5.34) does not account for the jumps of the process caused by new

arrivals Ztt>100. Indeed, a large new arrival Zt will cause the process to jump

immediately to Zt at time t, but this will never occur for the projection predictor

Xt.

77

0 50 100 150

020

4060

80

t

XMARMA processConditional 95% quantileConditional medianProjection predictor

Prediction of MARMA(3,0) processes

Figure 5.2:Prediction of a MARMA(3,0) process with φ1 = 0.7,φ2 = 0.5 and φ3 = 0.3, based onthe observation of the first 100 values of the process.

Next, we apply our algorithm to examine the bias of the projection predictor. To

do this, for each generated MARMA process, we calculated the cumulative probabil-

ity that the projection predictor corresponds to, for each location s = 101, . . . , 150.

Namely, using 500 independent samples X(k)s 150

s=101, k = 1, . . . , 500 from the condi-

tional distribution, we calculated

(5.35) P(Xs ≤Xs | Xt

100t=1) ≈

1

500

500

k=1

1X(k)s

≤ Xs, ∀s > 100 ,

78

where Xs is the projection predictor in (5.34). This procedure was repeated 1000

times for independent realizations of Xt100t=1 and the means of the (estimated) prob-

ability in (5.35) are reported in Table 5.2. Note that as the time lag increases, the

conditional quantiles of the projection predictors decrease. In this way, our condi-

tional sampling algorithm helps quantify numerically the observed underestimation

phenomenon in Figure 5.2.

Table 5.2:Cumulative probabilities that the projection predictors correspond to at time 100 + t,based on 1000 simulations.

t 1 2 3 4 5 10 20 30 40mean 70.6% 50.3% 35.6% 25.3% 17.8% 2.9% 0.1% 0% 0%

Finally, we compare the generated conditional samples to the true process values

at times s = 101, . . . , 150. Our goal is to demonstrate the validity of our conditional

sampling algorithm. The idea is that, at each location s = 101, . . . , 150, the true

process should lie below the predicted 95% upper confidence bound of Xs | Xt100t=1,

with probability at least 95%. (Note that due to the presence of atoms in the

conditional distributions, the coverage probability may in principle be higher than

95%.) Motivated by this, we repeat the procedure in the previous paragraph and

record the proportion of the times that Xs is below the predicted confidence quantile,

for each s. We refer to these values as the coverage rates. As discussed, the coverage

rates should be close to 95%. This is supported by our simulation result, shown in

Table 5.3.

Table 5.3:Coverage rates (CR) and the widths of the upper 95% confidence intervals at time 100+t,based on 1000 simulations.

t 1 2 3 4 5 10 20 30 40CR 0.956 0.952 0.954 0.957 0.966 0.947 0.943 0.951 0.955

width 13.06 26.6 37.8 45.6 51.2 62.8 66.0 66.2 65.4

79

Table 5.3 also shows the widths of the upper 95%-confidence intervals. Note that

these widths are not equal to the upper confidence bounds, given by the conditional

95%-quantiles, since the left end-point of the conditional distributions are greater

than zero. When the time lag is small, the left end-point is large and the widths

are small, due to the strong influence of the past of the process Xt100t=1. On the

other hand, because of the weak temporal dependence of the MAR(3) processes, this

influence decreases fast as the lags increase. Consequently, the conditional distri-

bution converges to the unconditional one, and the conditional quantile to the un-

conditional one. Note that the (unconditional) 95%-quantile of Xs for the MARMA

process (5.26) can be calculated via the formula 0.95 = P(σZ ≤ u) = exp(−σu−1),

with σ =

p

j=0 ψj. For the MAR(3) process we chose, we have σ = 3.4 and the

95%-quantile of Xs equals 66.29. This is consistent with the widths in Table 5.3 for

large lags.

Remark V.12. As pointed out by an anonymous referee, in this case one can directly

generate samples from XsN

s=n+1 | Xtn

t=1, by generating independent Frechet ran-

dom variables and iterating (5.33). We selected this example only for illustrative

purpose and to be able to compare with the projection predictors in [19]. One can

modify slightly the prediction problem, such that our algorithm still applies by ad-

justing accordingly (5.30), while both the projection predictor and the direct method

by using (5.33) do not apply. For example, consider the prediction problem with re-

spect to the conditional distribution P(Xs2n+N

s=2n+1 ∈ · | Xt : t = 1, 3, . . . , 2n − 1)

(prediction with only partial history observed) or P(Xsn−1s=2 ∈ · | X1, Xn) (predic-

tion of the middle path with the beginning and the end-point (in the future) given).

In other words, our algorithm has no restriction on the locations of observations.

This feature is of great importance in spatial prediction problems.

80

5.5 Discrete Smith Model

Consider the following moving maxima random field model in R2:

(5.36) Xt =

e

R2φ(t− u)Mα(du), t = (t1, t2) ∈ R

2,

where Mα is an α-Frechet random sup-measure on R2 with the Lebesgue control

measure. Smith [98] proposed to use for φ the bivariate Gaussian density:

(5.37) φ(t1, t2) :=β1β2

2π1− ρ2

exp−

1

2(1− ρ2)

β21t

21 − 2ρβ1β2t1t2 + β2

2t22

,

with correlation ρ ∈ (−1, 1) and variances σ2i= 1/β2

i, i = 1, 2. Consistent and

asymptotically normal estimators for the parameters ρ, β1 and β2 were obtained by

de Haan and Pereira [25]. Here, we will assume that these parameters are known

and will illustrate the conditional sampling methodology over a discretized version

of the random field (5.36). Namely, we truncate the extremal integral in (5.36) to

the square region [−M,M ]2 and consider a uniform mesh of size h := M/q, q ∈ N.

We then set

(5.38) Xt :=

−q≤j1,j2≤q−1

h2/αφ(t− uj1j2)Zj1j2 ,

where uj1j2 = ((j1 + 1/2)h, (j2 + 1/2)h) and h2/αZj1j2

d= Mα((j1h, (j1 + 1)h] ×

(j2h, (j2 + 1)h]). This discretized model (5.38) can be made arbitrarily close to

the spectrally continuous one in (5.36) by taking a fine mesh h and sufficiently large

M (see e.g. [101]).

Suppose that the random fieldX in (5.38) is observed at n locationsXti = xi, ti ∈

[−M,M ]2, i = 1, . . . , n. In view of (5.38), we have the max-linear model X = A⊙Z,

with X = (Xti)n

i=1 and Z = (Zj)p

j=1, p = q2. By sampling from the conditional

distribution of Z | X = x, we can predict the random field Xs at arbitrary locations

s ∈ R2.

81

To illustrate our algorithm, we used the model (5.38) with parameter values ρ =

0, β1 = β2 = 1,M = 4, p = q2 = 2500, and n = 7 observed locations. We generated

N = 500 independent samples from the conditional distribution of the random field

Xs, where s takes values on an uniform 100×100 grid, in the region [−2, 2]×[−2, 2].

We have already seen four of these realizations in Figure 5.1. Figure 5.3 illustrates

the median and 0.95-th quantile of the conditional distribution. The former provides

the optimal predictor for the values of the random field given the observed data, with

respect to the absolute deviation loss. The marginal quantiles, on the other hand,

provide important confidence regions for the random field, given the data.

Certainly, conditional sampling may be used to address more complex functional

prediction problems. In particular, given a two-dimensional threshold surface, one

can readily obtain the correct probability that the random field exceeds or stays

below this surface, conditionally on the observed values. This is much more than

what marginal conditional distributions can provide.

−2 −1 0 1

−2−1

01

4.0

4.5

5.0

5.5

6.0

4

4.5

5

5

5

5.5

5.5 5.5

5.5

5.5

5.5

5.5

6

6

6

6

Conditional Median of the Smith model

Parameters:ρ=0, β1=1, β2=1

5

5

5

5

5

5

5

−2 −1 0 1

−2−1

01

5

10

15

20

25

10 15

15

15

15

20

20

20

20

25

25 25

Conditional Marginal Quantile of the Smith model

Parameters:ρ=0, β1=1, β2=1, q=0.95

5

5

5

5

5

5

5

Figure 5.3:Conditional medians (left) and 0.95-th conditional marginal quantiles (right). Eachcross indicates an observed location of the random field, with the observed value atright.

82

5.6 Proofs of Theorems V.4 and V.9

In this section, we prove Theorems V.4 and V.9. We will first prove Theorem V.9,

which simplifies the regular conditional probability formula (5.7) in Theorem V.4.

Then, we show the simplified new formula is the desired regular conditional probabil-

ity, which completes the proof of Theorem V.4. The key step to prove Theorem V.9

is the following lemma. Write H·j = i ∈ r : hi,j = 1.

Lemma V.13. Under the assumptions of Theorem V.9, with probability one,

(i) J (s)is nonempty for all s ∈ r, and

(ii) for all j ∈ J (s), H·j ∩ Is = ∅ implies H·j ⊂ Is.

Proof. Note that to show part (ii) of Lemma V.13, it suffices to observe that since

Is is an equivalence class w.r.t. Relation (5.17), H·j \ Is and H·j ∩ Is cannot be both

nonempty. Thus, it remains to show part (i). We proceed by excluding several

P-measure zero sets, on which the desired results may not hold.

First, observe that for all i ∈ n, the maximum value of ai,jZjj∈r is achieved

for unique j ∈ p with probability one, since the Zj’s are independent and have

continuous distributions. Thus, the set

N1 :=

i∈n,j1,j2∈p,j1 =j2

ai,j1Zj1 = ai,j2Zj2 = max

j∈p

ai,jZj

has P-measure zero. From now on, we focus on the event N c

1 and set j(i) =

argmaxj∈pai,jZj for all i ∈ n.

Next, we show that with probability one, i1j

∼ i2 implies j(i1) = j(i2). That is,

the set

N2 :=

j∈p,i1,i2∈n,i1 =i2

Nj,i1,i2 with Nj,i1,i2:=

j(i1) = j(i2), i1

j

∼ i2

83

has P-measure 0. It suffices to show P(Nj,i1,i2) = 0 for all i1 = i2. If not, since p

and n are finite sets, there exists N0 ⊂ Nj,i1,j2 , such that j(i1) = j1 = j(i2) = j2

on N0, and P(N0) > 0. At the same time, however, observe that i1j

∼ i2 implies

hi1,j = hi2,j = 1, which yields

aik,jzj = xik= aik,j(ik)Zj(ik) = aik,jkZjk

, k = 1, 2 .

It then follows that on N0, Zj1/Zj2 = ai1,jai2,j2/(ai2,jai1,j1), which is a constant. This

constant is strictly positive and finite. Indeed, this is because on N c

1 , ai,j(i) > 0 by

Assumption A and hi,j = 1 implies ai,j > 0. Since Zj1 and Zj2 are independent

continuous random variables, it then follows that P(N0) = 0.

Finally, we focus on the event (N1∪N2)c. Then, for any i1, i2 ∈ Is, we have i1 ∼ i2

and let i0, . . . ,in be as in (5.17). It then follows that j(i1) = j(i0) = j(i1) = · · · =

j(in) = j(i2). Note that for all i ∈ n, hi,j(i) = 1 by the definition of j(i). Hence,

j(i1) = j(i2) ∈ J (s). We have thus completed the proof.

Proof of Theorem V.9. Since Iss∈r are disjoint with

s∈rIs = n, in the lan-

guage of the set-covering problem, to cover n, we need to cover each Is. By part

(ii) of Lemma V.13, any two different Is1 and Is2 cannot be covered by a single set

H·j. Thus we need at least r sets to cover n. On the other hand, with probability

one we can select one js from each J (s) (by part (i) of Lemma V.13), which yields a

valid cover. That is, with probability one, r = r(J (H)) and any valid minimum-cost

cover of n must be as in (5.21), and vice versa. We have thus proved parts (i) and

(ii).

84

To show (iii), by straight-forward calculation, we have, with probability one,

J∈Jr(A,x)

wJ =

j1∈J(1)

· · ·

jr∈J(r)

wj1,...,jr

=

j1∈J(1)

· · ·

jr−1∈J(r−1)

r−1

s=1

zjsfZjs(zjs)

j /∈J(r)

j =j1,...,jr−1

FZj(zj)

×

j∈J(r)

zjfZj

(zj)

k∈J(r)

\j

FZk(zk)

=r

s=1

j∈J(s)

zjfZj

(zj)

k∈J(s)

\j

FZk(zk)

=

r

s=1

j∈J(s)

w(s)j

.(5.39)

Similarly, we have

(5.40)

J∈Jr(A,x)

wJνJ(x, E) =r

s=1

j∈J(s)

w(s)jν(s)j(x, E)

.

By plugging (5.39) and (5.40) into (5.7), we obtain the desired result and complete

the proof.

Proof of Theorem V.4. To prove that ν in (5.7) yields the regular conditional prob-

ability of Z given X, it is enough to show that

(5.41) P(X ∈ D,Z ∈ E) =

D

ν(x, E)PX(dx),

for all rectangles D ∈ RRn

+and E ∈ RR

p

+. In view of Theorem V.9, it is enough to

work with ν(x, E) given by (5.22).

We shall prove (5.41) by breaking the integration into a suitable sum of integrals

over regions corresponding to all hitting matrices H for the max-linear model X =

A⊙ Z. We say such a hitting matrix H is nice, if J (s) defined in (5.19) is nonempty

for all s ∈ r. In view of Lemma V.13, it suffices to focus on the set H(A) of

nice hitting matrices H. Notice that the set H(A) is finite since the elements of the

hitting matrices are 0’s and 1’s.

85

For all rectangles D ∈ RRn

+, let

DH =x = A⊙ z : H(A,x) = H,x ∈ D

be the set of all x ∈ Rn

+ that give rise to the hitting matrix H. By Lemma V.13 (i),

for the random vector X = A⊙ Z, with probability one, we have

X =

H∈H(A)

X1DH(X)

and hence

(5.42)

D

ν(x, E)PX(dx) =

H∈H(A)

DH

ν(x, E)PX(dx) .

Now fix an arbitrary and non-random nice hitting matrix H ∈ H(A). Let Iss∈r

denote the partition of n determined by (5.17) and let J (s), J(s), s = 1, . . . , r be

as in (5.19). Recall that J (s) ⊂ J(s)

and the sets J(s), s = 1, . . . , r are disjoint.

Focus on the set DH ⊂ Rn

+. Without loss of generality, and for notational conve-

nience, suppose that s ∈ Is, for all s = 1, . . . , r. That is,

I1 = 1, i1,2, . . . , i1,k1, I2 = 2, i2,2, . . . , i2,k2, · · · , Ir = r, ir,2, . . . , ir,kr.

Define the projection mapping PH : DH → Rr

+ onto the first r coordinates:

PH(x1, . . . , xn) = (x1, . . . , xr) ≡ xr .

Note that PH , restricted toDH is one-to-one. Indeed, for all i ∈ Is, we have xi = ai,jzj

and xs = as,jzj, for all j ∈ J (s) (recall (5.19)). This implies xi = (ai,j/as,j)xs, for all

i ∈ Is and all s = 1, . . . , r. Hence, PH(x) = PH(x) implies x = x.

Consequently, can write x = P−1(xr), xr ∈ P(DH), and

DH

ν(x, E)PX(dx) =

PH(DH)

ν(x, E)QXrH(dx1 . . . dxr),

where QXrH

:= PX P

−1H

is the induced measure on the set PH(DH).

86

Lemma V.14. The measure QXrH

has a density with respect to the Lebesgue measure

on the set PH(DH). The density is given by

(5.43) QXrH(dxr) = 1PH(DH)(xr)

r

s=1

j∈J(s)

w(s)j(x)

dx1

x1· · ·

dxr

xr

.

The proof of this result is given below. In view of (5.43) and (5.22), we obtain

PH(DH)

ν(x, E)QXrH(dxr)

=

PH(DH)

r

s=1

j∈J(s) w

(s)j(x)ν(s)

j(x, E)

k∈J(s) w

(s)k(x)

=ν(x,E)

×

r

s=1

j∈J(s)

w(s)j(x)

dx1

x1· · ·

dxr

xr

=Q

XrH

(dxr)

=

PH(DH)

r

s=1

j∈J(s)

w(s)j(x)ν(s)

j(x, E)

dx1

x1· · ·

dxr

xr

,

which equals

(5.44)

j1∈J(1),··· ,jr∈J

(r)

PH(DH)

r

s=1

w(s)js(x)ν(s)

js(x, E)

dx1

x1· · ·

dxr

xr

=:I(j1,...,jr)

.

Fix j1 ∈ J (1), · · · , jr ∈ J (r) and focus on the integral I(j1, · · · , jr). Define

Ωr

H(DH) :=

(zj1 , . . . , zjr) : zjs = xs/as,js , s = 1, . . . , r,xr = (xs)

r

s=1 ∈ PH(DH).

We have, by (5.23), (5.24), and replacing xs with as,jszjs , s = 1, . . . , r (simple change

of variables),

I(j1, · · · , jr)

=

Ωr

H(DH)

r

s=1

zjsfZjs

(zjs)

k∈J(s)

\js

FZk(zk)

×δπjs(E)(zjs)

k∈J(s)

\js

P(Zk ∈ πk(E) | Zk < zk)dzj1zj1

· · ·dzjrzjr

=

Ωr

H(DH)

r

s=1

fZjs(zjs)δπjs

(E)(zjs)

×

k∈p\j1,...,jr

P(Zk ∈ πk(E), Zk < zk)dzj1 · · · dzjr .(5.45)

87

Define

ΩH;j1,...,jr(DH) =z ∈ R

p

+ : x = A⊙ z ∈ DH ,

zjs = xs/as,js , s = 1, . . . , r, zk < zk(x), k ∈ p \ j1, . . . , jr.

By the independence of the Zk’s, (5.45) becomes

(5.46) I(j1, . . . , jr) = P

Z ∈ ΩH;j1,...,jr(DH) ∩ E

.

By plugging (5.46) into (5.44), we obtain

(5.47)

DH

ν(x, E)PX(dx) =

PH(DH)

ν(x, E)QXrH(dxr)

=

j1∈J(1),··· ,jr∈J

(r)

P(Z ∈ ΩH;j1,··· ,jr(DH) ∩ E) = P(A⊙ Z ∈ DH ,Z ∈ E),

because the summation over (j1, . . . , jr) accounts for all relevant hitting scenarios

corresponding to the matrix H. Plugging (5.47) into (5.42), we have

D

ν(x, E)PX(dx) =

H∈H(A)

P(X ≡ A⊙ Z ∈ DH , Z ∈ E) = P(X ∈ D,Z ∈ E) .

This completes the proof of Theorem V.4.

Proof of Lemma V.14. Consider the random vector Xr = (X1, . . . , Xr). Observe

that by the definition of the set PH(DH), on the event Xr ∈ PH(DH), we have

(5.48) Xr =

j1∈J(1), ··· , jr∈J

(r)

a1,j1Zj1

...

ar,jrZjr

r

s=1

1

k∈J(s)

\js

as,kZk < as,jsZjs

=:1Cs,js

.

Note that since J(s) ⊂ J(s), s = 1, . . . , r, the events

r

s=1 Cs,jsare disjoint for all

r-tuples (j1, . . . , jr) ∈ J (1) × · · ·× J (r).

88

Recall that our goal is to establish (5.43). By the fact that the sum in (5.48)

involves only one non-zero term for some (j1, . . . , jr), with probability one, we have

that for all measurable set ∆ ⊂ PH(DH), writing ξjs = as,jsZjs,

(5.49) QXrH(∆) ≡ P(Xr ∈ ∆)

=

j1∈J(1), ··· , jr∈J

(r)

P

(ξj1 , · · · , ξjr) ∈ ∆ ∩

r

s=1

Cs,js

.

Now, consider the last probability, for fixed (j1, . . . , jr). The random variables

ξjs , s = 1, . . . , r are independent and they have densities fZjs(xs/as,js)/as,js , xs ∈

R+. We also have that the events Cs,js, s = 1, . . . , r are mutually independent,

since their definitions involve Zk’s indexed by disjoint sets J(s), s = 1, . . . , r. By

conditioning on the ξjs ’s, we obtain that the probability in the right-hand side of

(5.49) equals

∆

r

s=1

1

as,jsf(xs/as,js)

×

r

s=1

P

k∈J(s)

\js

as,kZk < xs

dx1 · · · dxr

=

∆

r

s=1

1

as,jsf(xs/as,js)

k∈J(s)

\js

FZk(xs/as,k)

dx1 · · · dxr.

In view of (5.48) and (5.23), replacing

j1∈J(1), ··· , jr∈J

(r)

r

s=1 by

r

s=1

j∈J(s) ,

we obtain that the measure QXrH

has a density on P(DH), given by (5.43).

CHAPTER VI

Central Limit Theorems for Stationary Random Fields

The central limit theorem studies the asymptotic behavior of partial sums of

random variables Sn = X1 + · · · + Xn. In the case that the random variables are

independent, it has been well understood when the normalized partial sums converge

to a normal distribution:

Sn − ESn√n

⇒ N (0, σ2) .

The case that Xii∈N are dependent also has a long history. This dates back to at

least 1910, when Markov [58] already proved a central limit theorem for a two-state

Markov chain. Since then, the central limit theorem for stationary processes has

been an active research area in probability theory.

In this chapter, our focus is on establishing central limit theorems for stationary

random fields. That is, for stationary random variables Xi,j(i,j)∈N2 , when do we

have

(6.1)

n

i=1

n

j=1(Xi,j − EXi,j)

n⇒ N (0, σ2)?

This problem has already been considered by many researchers. For example,

Bolthausen [5], Goldie and Morrow [40] and Bradley [7] studied this problem under

suitable mixing conditions. Basu and Dorea [3], Nahapetian [62], and Poghosyan and

89

90

Roelly [73] considered the problem for multiparameter martingales. Another impor-

tant result is due to Dedecker [27, 28], whose approach was based on an adaptation of

the Lindeberg method. As a particular case, Cheng and Ho [15] established a central

limit theorem for functionals of linear random fields, based on a lexicographically

ordered martingale approximation.

Here, we aim at establish the so-called projective-type conditions such that the

central limit theorem (6.1) holds. Such conditions have recently drawn much atten-

tions in central limit theorems for stationary sequences, as they are easy to verify

when applying such results to stochastic processes from statistics and econometrics

(see e.g. Wu [119]). However, central limit theorems for stationary random fields

based on projective conditions have been much less explored.

This problem is not a simple extension of a one-dimensional problem to a high-

dimensional one. An important reason is that, the main technique for establishing

central limit theorems with projective conditions in one dimension, the martingale

approximation approach, does not apply to (high-dimensional) random fields as suc-

cessfully as to (one-dimensional) stochastic processes. This obstacle has been known

among researchers for more than 30 years. For example, Bolthausen [5] remarked

that ‘Gordin uses an approximation by martingales, but his method appears difficult

to generalizes to dimensions ≥ 2.’ (For literatures on martingale approximation, see

e.g. Gordin and Lifsic [43], Kipnis and Varadhan [54], Woodroofe [116], Maxwell and

Woodroofe [59], Wu and Woodroofe [121], Dedecker et al. [30], Peligrad et al. [68],

among others, and Merlevede et al. [60] for a survey.)

In this chapter, we establish a central limit theorem and an invariance principle

for stationary multiparameter random fields. We will apply an m-approximation

approach. we first state the main result in the next section.

91

6.1 Main Result

We start with some notations. We consider a product probability space (Ω,A,P),

i.e., a Zd-indexed-product of i.i.d. probability spaces in form of

(Ω,A,P) ≡ (RZd

,BZd

, P Zd

) .

Write k(ω) = ωk, for all ω ∈ RZd

and k ∈ Zd. Then, kk∈Zd are i.i.d. random

variables with distribution P . On such a space, we define the natural filtration

Fkk∈Zd by

(6.2) Fk := σl : l k, l ∈ Zd, for all k ∈ Z

d .

Here and in the sequel, for all vector x ∈ Rd, we write x = (x1, . . . , xd) and for all

l, k ∈ Rd, let l k stand for li ≤ ki, i = 1, . . . , d.

We focus on mean-zero stationary random fields, defined on a product probability

space. Let Tkk∈Zd denote the group of shift operators on RZd

with (Tkω)l = ωk+l,

for all k, l ∈ Zd ,ω ∈ R

Zd

. Then, we consider random fields in form of

f Tkk∈Zd ,

where f is in the class Lp

0 = f ∈ Lp(F∞),fdP = 0, p ≥ 2, with F∞ =

k∈Zd Fk.

Throughout this chapter, we consider a sequence Vnn∈N of finite rectangular

subsets of Zd, in form of

(6.3) Vn =d

i=1

1, . . . ,m(n)i

⊂ Nd , for all n ∈ N ,

with m(n)i

increasing to infinity as n → ∞ for all i = 1, . . . , d. Let

(6.4) Sn(f) ≡ S(Vn, f) =

k∈Vn

f Tk .

92

denote the partial sums with respect to Vn. Moreover, write for t ∈ [0, 1], Vn(t) =

d

i=1[0,m(n)i

t] ⊂ Rd and Rk =

d

i=1(ki − 1, ki] ⊂ Rd for all k ∈ Z

d.

We write also

(6.5) Bn,t(f) ≡ BVn,t(f) =

k∈Nd

λ(Vn(t) ∩Rk)f Tk ,

where λ is the Lebesgue measure on Rd, and consider the weak convergence in the

space C[0, 1]d, the space of continuous functions on [0, 1]d, equipped with the uniform

metric. Recall that the standard d-parameter Brownian sheet on [0, 1]d, denoted by

B(t)t∈[0,1]d , is a mean-zero Gaussian random field with covariance E(B(s)B(t)) =

d

i=1 min(si, ti), s, t ∈ [0, 1]d. Write 0 = (0, . . . , 0),1 = (1, . . . , 1) ∈ Zd. Our condi-

tion involves the following term:

(6.6) ∆d,p(f) :=

k∈Nd

E(f Tk | F1)pd

i=1 k1/2i

.

Our main result is the following.

Theorem VI.1. Consider a product probability space described above. If f ∈ L20,

f ∈ F0 and ∆d,2(f) < ∞, then

σ2 = limn→∞

E(Sn(f)2)

|Vn|< ∞

exists and

Sn(f)

|Vn|1/2

⇒ N (0, σ2) .

In addition, if f ∈ Lp

0 and ∆d,p(f) < ∞ for some p > 2, then

(6.7)Bn,·(f)

|Vn|1/2

⇒ σB(·)

in C[0, 1]d.

For the sake of simplicity, we will prove Theorem VI.1 in the case d = 2 in

Sections 6.3 and 6.4.

93

Remark VI.2. Conditions involving conditional expectations like the one here

∆d,p(f) < ∞ are referred to as projective conditions. Compared to mixing-type

conditions (see e.g. Bradley [8]), projective ones are often easy to check in practice.

One of such examples is given in Section 6.6. See for example [30] for comparisons

of projective conditions.

The rest of chapter is organized as follows. In Section 6.2 we provide preliminary

results on m-dependent approximation. We establish the central limit theorem in

Section 6.3 and then the invariance principle in Section 6.4. Sections 6.5 and 6.6 are

devoted to the applications to orthomartingales and functionals of stationary linear

random fields, respectively. In Section 6.7, we prove a moment inequality, which

plays a crucial role in proving our limit results. Some other auxiliary proofs are

given in Section 6.8.

6.2 m-Dependent Approximation

We describe the general procedure of m-dependent approximation in this sec-

tion. In this section, we do not assume any structure on the underlying probabil-

ity space, nor the filtration structure. Instead, we simply assume f ∈ L20 = f ∈

L2(Ω,A,P),fdP = 0, and Tkk∈Zd is an Abelian group of bimeasurable, measure-

preserving, one-to-one and onto maps on (Ω,A,P).

The notion of m-dependence was first introduced by Hoeffding and Robbins [49].

We say a random variable f is m-dependent, if f Tk, f Tl are independent when-

ever |k − l|∞ := maxi=1,...,d |ki − li| > m. The following result on the asymptotic

normality of sums of m-dependent random variables is due to Bolthausen [5] (see

also Rosen [81]). Recall Vnn∈N given in (6.3).

94

Theorem VI.3. Suppose fm ∈ L20 is m-dependent. Write

(6.8) σ2m=

k∈Zd

E[fm(fm Tk)] .

Then,

Sn(fm)

|Vn|1/2

⇒ N (0, σ2m) .

Now, consider the function f ∈ L20(P) and define

(6.9) fV,+ = lim sup

n→∞

Sn(f)2|Vn|

1/2.

We refer to the pseudo norm ·V,+ as the plus-norm.

Lemma VI.4. Suppose f, f1, f2, · · · ∈ L20(P) and fm is m-dependent for all m ∈ N.

If

(6.10) limm→∞

f − fmV,+ = 0 ,

then

(6.11) limm→∞

σm = limm→∞

fmV,+ =: σ < ∞

exists, and

(6.12)Sn(f)

|Vn|1/2

⇒ N (0, σ2) .

Proof. It suffices to prove (6.11). We will show that σ2mm∈N forms a Cauchy se-

quence in R+. Observe that since fm is m-dependent with zero mean,

σm = limn→∞

Sn(fm)2|Vn|

1/2.

It then follows that

|σm1 − σm2 | ≤ lim supn→∞

Sn(fm1 − fm2)2|Vn|

1/2

≤ fm1 − fV,+ + fm2 − fV,+ ,

95

which can be made arbitrarily small by taking m1,m2 large enough. We have thus

shown that σ2nn∈N is a Cauchy sequence in R+.

Remark VI.5. The idea of establishing the central limit theorem by controlling the

quantity f − fmV,+ dates back to Gordin [42], where fm was selected from a dif-

ferent subspace. In the one-dimensional case, when Vn = 1, . . . , n, Zhao and

Woodroofe [123] named ·V,+ the plus-norm, and established a necessary and suf-

ficient condition for the martingale approximation, in term of the plus-norm. See

Gordin and Peligrad [41] and Peligrad [66] for improvements and more discussions

on such conditions.

In the next section, we will establish conditions, under which (6.10) holds.

6.3 A Central Limit Theorem

From this section on, we will focus on stationary multiparameter random fields,

defined on product probability spaces. On Such a space, any integrable function

has a natural L2-approximation by m-dependent functions, and there is a natural

commuting filtration.

For the sake of simplicity, we consider only the 2-parameter random fields in the

sequel and simply say ‘random fields’ for short. We will prove a central limit theorem

here and then an invariance principle in the next section. The argument, however,

can be generalized easily to d-parameter random fields, and the result has been stated

in Theorem VI.1.

We start with a product probability space with i.i.d. random variables i,j(i,j)∈Z2 .

Recall that Ti,j(i,j)∈Z2 are the group of shift operators on RZ2and write F∞,∞ =

σ(i,j : (i, j) ∈ Z2). We focus on the class of functions Lp

0 = f ∈ Lp(F∞,∞) : Ef =

96

0, p ≥ 2. For all measurable function f ∈ L20, define, for all m ∈ N,

(6.13) fm := E(f |Fm) with Fm = σ(j : j ∈ −m, . . . ,m2) .

Clearly, fm ∈ L20, f − fm2 → 0 as m → ∞ and fm Ti,j(i,j)∈Z2 are m-dependent

functions.

Now, recall the natural filtration Fi,j(i,j)∈Z2 defined by Fk,l = σ(i,j : i ≤ k, j ≤

l). This is a 2-parameter filtration, i.e.,

(6.14) Fi,j ⊂ Fk,l if i ≤ k, j ≤ l .

Also,

(6.15) T−i,−jFk,l = Fk+i,l+j , ∀(i, j), (k, l) ∈ Z2 .

Moreover, the notion of commuting filtration is of importance to us.

Definition VI.6. A filtration Fi,j(i,j)∈Z2 is commuting, if for all Fk,l-measurable

bounded random variable Y , E(Y |Fi,j) = E(Y |Fi∧k,j∧l).

Since k,l(k,l)∈Z2 are independent random variables, Fi,j(i,j)∈Z2 is commuting

(see Proposition VI.22 in Section 6.8). This implies that the marginal filtrations

(6.16) Fi,∞ =

j≥0

Fi,j and F∞,j =

i≥0

Fi,j

are commuting, in the sense that for all Y ∈ L1(P),

(6.17) E[E(Y |Fi,∞)|F∞,j] = E[E(Y |F∞,j)|Fi,∞] = E(Y |Fi,j) .

For more details on the commuting filtration, see Khoshnevisan [53].

For all F0,0-measurable function f ∈ L20, write

(6.18) Sm,n(f) =m

i=1

n

j=1

f Ti,j .

97

Thanks to the commuting structure of the filtration, applying twice the maximal

inequality in [68], we can prove the following moment inequality with p ≥ 2:

(6.19) Sm,n(f)p ≤ Cm1/2n1/2∆(m,n),p(f)

with

∆(m,n),p(f) =m

k=1

n

l=1

E(Sk,l(f) | F1,1)pk3/2l3/2

.

In fact, we will prove a stronger inequality without the assumptions of product

probability space and the F0,0-measurability of f . See Section 6.7, Proposition VI.20

and Corollary VI.21.

Recall that

(6.20) ∆2,p(f) =∞

k=1

∞

l=1

E(f Tk,l | F1,1)pk1/2l1/2

.

Now, we can prove the following central limit theorem for adapted stationary random

fields.

Theorem VI.7. Consider the product probability space discussed above. Let Vnn∈N

be as in (6.3) with d = 2. Suppose f ∈ L20, f ∈ F0,0, and define fm as in (6.13). If

∆2,2(f) < ∞, then

limm→∞

f − fmV,+ = 0 .

Therefore, σ := limm→∞ fmV,+ < ∞ exist and Sn(f)/|Vn|1/2 ⇒ N (0, σ2).

Proof. The second part follows immediately from Lemma VI.4. It suffices to prove

f − fmV,+ → 0 as m → ∞. First, by the fact that

E(Sk,l(f) | F1,1)2 ≤k

i=1

l

j=1

E(f Tk,l | F1,1)2

98

and Fubini’s theorem, we have ∆(∞,∞),2(f) ≤ 9∆2,2(f). So, by (6.9) and (6.19), it

suffices to show

(6.21) ∆2,2(f − fm) =∞

k=1

∞

l=1

E[(f − fm) Tk,l | F1,1]2k1/2l1/2

→ 0

as m → ∞. Clearly, the summand in (6.21) converges to 0 for each k, l fixed,

since (6.13) implies f − fm2 → 0 as m → ∞ and E[(f − fm) Tk,l | F1,1]2 ≤

f − fm2. Moreover, observe that,

E(fm Tk,l | F1,1) = E[E(f Tk,l | T−k,−l(Fm)) | F1,1]

= E[E(f Tk,l | F1,1) | T−k,−l(Fm)] ,

where in the second equality we can exchange the order of conditional expectations

by the definitions of F1,1 and T−k,−l(Fm) (see Proposition VI.22 in Section 6.8 for

a detailed treatment). Therefore,

E[(f − fm) Tk,l | F1,1]2

≤ E(f Tk,l | F1,1)2 + E(fm Tk,l | F1,1)2

≤ 2E(f Tk,l | F1,1)2 .

Then, the condition ∆2,2(f) < ∞ combined with the dominated convergence theorem

yields (6.21). The proof is thus completed.

6.4 An Invariance Principle

Recall the space C[0, 1]2 and the 2-parameter Brownian sheet B(t)t∈[0,1]2 .

Theorem VI.8. Under the assumptions in Theorem VI.7, suppose in addition that

f ∈ Lp

0 and ∆2,p(f) < ∞ for some p > 2. Write Bn,t(f) as in (6.5) with d = 2.

Then,

Bn,·(f)

|Vn|1/2

⇒ σB(·) ,

99

where ‘ ⇒’ stands for the weak convergence in C[0, 1]2.

Proof. It suffices to show that the finite-dimensional distributions converge, and

Bn,t(f)/|Vn|1/2t∈[0,1]2 is tight.

We first show that, for all t = (t(1), . . . , t(k)) ⊂ [0, 1]2,

(6.22)Bn,t(1)(f)

|Vn|1/2

, · · · ,Bn,t(k)(f)

|Vn|1/2

⇒ σ(B(t(1)), · · · ,B(t(k))) =: σBt .

Consider the m-dependent function fm defined in (6.13). Then, the convergence

of the finite-dimensional distributions (6.22) with f replaced by fm follows from

the invariance principle of m-dependent random fields (see e.g. [96]). Further-

more, by Theorem VI.7, ∆2,2(f) ≤ ∆2,p(f) < ∞, so that f − fmV,+ → 0 as

m → ∞, and therefore, letting Bn,t(f)/|Vn|

1/2 denote the left-hand side of (6.22),

Bn,t(fm − f)/|Vn|

1/2 → (0, . . . , 0) ∈ Rk in probability. The convergence of the finite-

dimensional distribution (6.22) follows.

Now, we prove the tightness of Bn,t(f)t∈[0,1]2 . Fix n and consider

Vn = 1, . . . , n1× 1, . . . , n2 .

Write Bn,t ≡ Bn,t(f) and Sm,n ≡ Sm,n(f) for short. For all 0 ≤ r1 < s1 ≤ 1, 0 ≤ r2 <

s2 ≤ 1, set,

Bn((r1, s1]× (r2, s2]) := Bn,(s1,s2) − Bn,(r1,s2) − Bn,(s1,r2) +Bn,(r1,r2) .

We will show that there exists a constant C, independent of n, r1, r2, s1 and s2, such

that

(6.23) (n1n2)−1/2

Bn((r1, s1]× (r2, s2])p ≤ C(s1 − r1)(s2 − r2)∆2,p(f) .

Inequality (6.23) implies the tightness, by Nagai [61], Theorem 1.

100

Now, we prove (6.23) to complete the proof. From now on, the constant C may

change from line to line. Write mi = nisi − niri , i = 1, 2. If mi ≥ 2, i = 1, 2,

then

Bn((r1, s1]× (r2, s2])p

≤ Sm1,m2p + 2Sm1,1p + 2S1,m2p + 4S1,1p

≤ C(m1m2)1/2 ∆2,p(f)(6.24)

for some constant C, by (6.19). Note that mi ≥ 2 also implies ni(si − ri) > 1.

Therefore, mi ≤ ni(si − ri) + 1 < 2ni(si − ri), and (6.24) can be bounded by

C(n1n2)1/2[(s1 − r1)(s2 − r2)]1/2 ∆2,p(f), which yields (6.23).

In the case m1 < 2 or m2 < 2, to obtain (6.23) requires more careful analysis.

We only show the case when m1 = 1,m2 ≥ 2, as the proof for the other cases are

similar. Observe that m1 = 1 implies n1r1 < n1r1 = n1s1 ≤ n1s1. Then,

Bn((r1, s1]× (r2, s2])p

≤ n1(s1 − r1)(S1,m2p + 2S1,1p) ≤ Cn1(s1 − r1)m1/22

∆2,p(f) .

Observe that m1 = 1 also implies n1(s1−r1) ∈ (0, 2). If n1(s1−r1) ≤ 1, then n1(s1−

r1) ≤ [n1(s1 − r1)]1/2. If n1(s1 − r1) ∈ (1, 2), then n1(s1 − r1) <√2[n1(s1 − r1)]1/2.

It then follows that (6.23) still holds.

Remark VI.9. To prove the invariance principle of stationary random fields, most

of the results require finite moment of order strictly larger than 2. See for exam-

ple Berkes and Morrow [4], Goldie and Greenwood [39] and Dedecker [28]. This

is in contrast to the one-dimensional case, where the invariance principle can be

established with finite second moment assumption.

101

To the best of our knowledge, there are two invariance principles for stationary

random fields requiring finite second moment. One is due to Sashkin [96], who

assumed to be BL(θ)-dependent (including m-dependent stationary random fields).

In general the BL(θ)-dependence is difficult to check. The other is due to Basu and

Dorea [3], who proved an invariance principle for martingale difference random fields

with finite second moment assumption. However, they have stringent conditions on

the filtration (see Remark VI.13 below). In our case, it remains an open problem:

whether ∆2,2(f) < ∞ implies the invariance principle. See also a similar conjecture

by Dedecker in [28], Remark 1.

6.5 Orthomartingales

The central limit theorems and invariance principles for multiparameter martin-

gales are more difficult to establish than in the one-dimensional case. This is due to

the complex structure of multiparameter martingales. We will focus on orthomartin-

gales first and establish an invariance principle, and then compare the results on

other types of multiparameter martingales.

The idea of orthomartingales are due to R. Cairoli and J. B. Walsh. See e.g. refer-

ences in Khoshnevisan [53], which also provides a nice introduction to the materials.

For the sake of simplicity, we suppose d = 2. Consider a probability space (Ω,A,P)

and recall the definition of 2-parameter filtration (6.14). We restrict ourselves to the

filtration indexed by N2.

Definition VI.10. Given a commuting 2-parameter filtration Fi,j(i,j)∈N2 on

(Ω,A,P), we say a family of random variables Mi,j(i,j)∈N2 is a 2-parameter or-

thomartingale on (Ω,A,P), with respect to Fi,j(i,j)∈N2 , if for all (i, j) ∈ N2, Mi,j is

Fi,j-measurable, and E(Mi+1,j | Fi,∞) = E(Mi,j+1 | F∞,j) = Mi,j, almost surely.

102

In our case, for F0,0-measurable function f ∈ L20, Mm,n = Sm,n(f) as in (6.18)

yields a 2-parameter orthomartingale, if

(6.25) E(f Ti+1,j | Fi,∞) = E(f Ti,j+1 | F∞,j) = 0 almost surely,

for all (i, j) ∈ N2. In this case, we say fTi,j(i,j)∈N2 are 2-parameter orthomartingale

differences.

Remark VI.11. In our case, Mi,j(i,j)∈N2 is also a 2-parameter martingale in the

normal sense, i.e., E(Mi,j | Fk,l) = Mi∧k,j∧l, almost surely. Indeed,

E(Mi,j | Fk,l) = E[E(Mi,j | Fk,∞) | F∞,l] = E(Mi∧k,j | F∞,l) = Mi∧k,j∧l .

In general, however, the converse is not true, i.e., multiparameter martingales are

not necessarily orthomartingales (see e.g. [53] p. 33). The two notions are equivalent,

when the filtration is commuting (see e.g. [53], Chapter I, Theorem 3.5.1).

Theorem VI.12. Consider a product probability space (Ω,A,P) with a natural filtra-

tion Fi,j(i,j)∈N2. Suppose f ∈ L20 and f ∈ F0,0. If f Ti,j(i,j)∈N2 are 2-parameter

orthomartingale differences, i.e., (6.25) holds, then σ2 = limn→∞ E(Sn(f)2)/|Vn|2 <

∞ exists, and

Sn(f)

|Vn|1/2

⇒ σN (0, 1) .

In addition, if f ∈ Lp

0 for some p > 2, then the invariance principle (6.7) holds.

Proof. Observe that, (6.25) implies E(f Ti,j | F1,1) = 0 if i > 1 or j > 1. Then, for

f ∈ Lp

0, p ≥ 2,

∆∞,p(f) = E(f T1,1 | F1,1)p = fp< ∞ .

The result then follows immediately from Theorem VI.1. Note that, the argument

holds for general d-parameter orthomartingales (d ≥ 2) defined in [53].

103

Remark VI.13. Our result is more general than [3], [62] and [73] in the following way.

Let be i,j(i,j)∈Z2 be i.i.d. random variables. In [62], the central limit theorem was

established for the so-called martingale-difference random fields Mi,j(i,j)∈N2 with

Mi,j =

i

k=1

j

l=1 Dk,l, such that

E[Di,j | σ(k,l : (k, l) ∈ Z2, (k, l) = (i, j))] = 0 , for all (i, j) ∈ N

2 .

In [3] and [73], the authors considered the multiparameter martingales Mi,j(i,j)∈N2

with respect to the filtration defined by

Fi,j = σ(k,l : k ≤ i or l ≤ j) .

It is easy to see, in both cases above, their assumptions are stronger, in the sense

that they imply that Mi,j(i,j)∈N2 is an orthomartingale, with the natural filtra-

tion Fi,j(i,j)∈N2 (6.2). On the other hand, however, the results mentioned above

only assume that i,j(i,j)∈Z2 is a stationary random field, which is weaker than our

assumption.

At last, we point out that the product structure of the probability space plays

an important role. We provide an example of an orthomartingale with a different

underlying probability structure. In this case, the limit behavior is quite different

from the case that we studied so far.

Example VI.14. Suppose kk∈Z and ηkk∈Z are two families of i.i.d. random

variables. Define Gi = σ(j : j ≤ i) and Hi = σ(ηj : j ≤ i) for all i ∈ N. Then,

G = Gnn∈N and H = Hnn∈N are two filtrations.

Now, let Ynn∈N and Znn∈N be two arbitrary martingales with stationary

increment with respect to the filtration G and H, respectively. Suppose Yn =

n

i=1 Di, Zn =

n

i=1 Ei, where Dnn∈N and Enn∈N are stationary martingale dif-

104

ferences. Then, DiEj(i,j)∈N2 is a stationary random fields and

Mm,n :=m

i=1

n

j=1

DiEj = YmZn

is an orthomartingale with respect to the filtration Gi ∨Hj(i,j)∈N2 . Clearly,

Mn,n

n=

Yn√n

Zn√n⇒ N (0, σ2

Y)×N (0, σ2

Z) ,

where the limit is the distribution of the product of two independent normal ran-

dom variables (a Gaussian chaos). That is, Sn(f)/n has asymptotically non-normal

distribution.

One can also define Mm,n = Ym + Zn, which again gives an orthomartingale, and

Di + Ej(i,j)∈N2 is the corresponding stationary random field. This time, one can

show thatMn,n√n

=Yn√n+

Zn√n⇒ N (0, σ2

Y+ σ2

Z) .

Here, the limit is a normal distribution, but the normalizing sequence is√n instead

of n.

This example demonstrates that for general orthomartingales, to obtain a central

limit theorem one must assume extra conditions on the structure of the underlying

probability space. For the structure mentioned above, there is no m-dependent

approximation for the random fields. Indeed, the example corresponds to the sample

space Ω = (RZ,RZ) with [Tk,l(, η)]i,j = (i+k, ηj+l), and if we define fm similarly as

in (6.13) with

Fm := σ(i, ηj : −m ≤ i, j ≤ m) ,

then f and f Tk,l are independent, if and only if min(k, l) > m. That is, the

dependence can be very strong, along the horizontal (the vertical resp.) direction of

the random field.

105

6.6 Stationary Causal Linear Random Fields

We establish a central limit theorem for functionals of stationary causal linear ran-

dom fields. We focus on d = 2. Consider a stationary linear random field Zi,j(i,j)∈Z2

defined by

(6.26) Zi,j =

r∈Z

s∈Z

ar,si−r,j−s =

r∈Z

s∈Z

ai−r,j−sr,s ,

with coefficient ai,j(i,j)∈Z2 satisfying

(i,j)∈Z2 a2i,j < ∞. We restrict ourselves to

causal linear random fields, i.e., ai,j = 0 unless i ≥ 0 and j ≥ 0. They are also

referred to be adapted to the filtration Fi,j(i,j)∈Z2 .

Now, consider the random fields f Tk,l(k,l)∈Z2 with a more specific form f =

K(Zi,j0,0h), where h is a fixed strictly positive integer, K is a measurable function

from Rh2to R and for all (k, l) ∈ Z

2,

Zi,jk,l

h:= Zi,j : k − h+ 1 ≤ i ≤ k, l − h+ 1 ≤ j ≤ l

is viewed as a random vector in Rh2with covariates lexicographically ordered. In the

sequel, the same definition applies similarly to xi,jk,l

h, given xi,j(i,j)∈Z2 . Assume

that

(6.27) EK(Zi,j0,0h) = 0 and EKp(Zi,j

0,0h) < ∞

for some p ≥ 2. In this way,

(6.28) f Tk,l = K(Zi,jk,l

h) .

The model (6.28) is a natural extension of the functionals of causal linear processes

considered by Wu [117].

Next, we introduce a few notations similarly as in [48] and [117]. Here, our

ultimate goal is to translate Condition (6.20) into a condition on the regularity of K

106

and the summability of ai,j(i,j)∈Z2 . For all (i, j) ∈ Z2, let

(6.29) Γ(i, j) = (r, s) ∈ Z2 : r ≤ i, s ≤ j ,

and write

Zi,j =

(r,s)∈Γ(i,j)

ai−r,j−sr,s

=

(r,s)∈Γ(i,j)\Γ(1,1)

ai−r,j−sr,s +

(r,s)∈Γ(1,1)

ai−r,j−sr,s

=: Zi,j,+ + Zi,j,− .(6.30)

Write Wk,l,− = Zi,j,−k,l

hand define, for all (k, l) ∈ Z

2,

Kk,l(xi,jk,l

h) = EK(Zi,j,+ + xi,j

k,l

h) .

In this way,

(6.31) E(f Tk,l | F1,1) = Kk,l(Zi,j,−k,l

h) =: Kk,l(Wk,l,−) .

Plugging (6.31) into (6.20), we obtain a central limit theorem for functionals of

stationary causal linear random fields.

Theorem VI.15. Consider the functionals of stationary causal linear random

fields (6.28). If Conditions (6.27) holds and

(6.32)∞

k=1

∞

l=1

Kk,l(Wk,l,−)pk1/2l1/2

< ∞ ,

for p = 2, then σ2 = limn→∞ E(S2n)/n2 < ∞ exists and Sn/|Vn|

1/2 ⇒ N (0, σ2). If

the two conditions hold with p > 2, then the invariance principle (6.7) holds.

Next, we will provide conditions on K and ai,j(i,j)∈Z2 such that (6.32) holds.

For all Λ ⊂ Z2, write

(6.33) ZΛ =

(i,j)∈Λ

ai,j−i,−j and AΛ =

(i,j)∈Λ

a2i,j.

107

In particular, our conditions involves summations of ai,j over the following type of

regions:

Λ(k, l) := (i, j) ∈ Z2 : i ≥ k, j ≥ l , (k, l) ∈ Z

2.

For the sake of simplicity, we write Ak,l ≡ AΛ(k,l). The following lemma is a simple

extension of Lemma 2, part (b) in [117].

Lemma VI.16. Suppose that there exist α, β ∈ R such that 0 < α ≤ 1 ≤ β < ∞

and E(||2β) < ∞. If

(6.34) EM2α,β

(W1,1) < ∞ with Mα,β(x) = supy∈Rh

2, y =x

|K(x)−K(y)|

|x− y|α + |x− y|β,

then, for all p ≥ 2,

(6.35) Kk,l(Wk,l,−)p = O(Aα/2k+1−h,l+1−h

) .

Consequently, Condition (6.32) can be replaced by specific ones on Ak,l.

Corollary VI.17. Assume there exist α, β ∈ R as in Lemma VI.16. Consider the

functionals of stationary linear random fields in form of (6.28). Suppose Condi-

tion (6.34) holds and

(6.36)∞

k=1

∞

l=1

Aα/2k+1−h,l+1−h

k1/2l1/2< ∞ .

If E(||p) < ∞ and (6.27) hold with p = 2, then Sn/n ⇒ N (0, σ2) with some σ < ∞.

If E(||p) < ∞ and (6.27) holds with p > 2, then the invariance principle (6.7) holds.

Next, we compare our Condition (6.36) on the summability of ai,j(i,j)∈Z2 , and the

one considered by Cheng and Ho [15]. They only established central limit theorems

for functionals of stationary linear random fields, so we restrict to the case p = 2.

Cheng and Ho [15] assumed

(6.37)∞

i=0

∞

j=0

|ai,j|1/2 < ∞ ,

108

and provided different regularity conditions on K. Namely,

supΛ⊂Z2

EK2(x+ ZΛ) < ∞

for all x ∈ R with ZΛ defined in (6.33), and that for any two independent random

variables X and Y with E(K2(X) +K2(Y ) +K2(X + Y )) < M < ∞,

(6.38) E[(K(X + Y )−K(X))2] ≤ C[E(Y 2)]γ

for some γ ≥ 1/2. In general, Cheng and Ho [15]’s condition and ours on the

regularityK are not comparable and thus have different range of applications. Below,

we focus on the simple case that h = 1 andK is Lipschitz, covered by both conditions.

This correspond to α = β = 1 in (6.34) and γ = 1 in (6.38). In the following two

examples, our Condition (6.36) turn out to be weaker than Condition (6.37).

Example VI.18. Consider ai,j = (i+ j+1)−q for all i, j ≥ 0 and some q > 1. Then,

A =

∞

i=0

∞

j=0 a2i,j

< ∞ and

Ak,l =∞

j=1

j(k + l + j)−2q = O((k + l)2−2q) .

Then (6.36) is bounded by, up to a multiplicative constant,

∞

k=1

∞

l=1

(k + l)1−q

k1/2l1/2<

∞

k=1

k(1−q)/2

k1/2

∞

l=1

l(1−q)/2

l1/2≤

∞

k=1

k−q/22

.

Therefore, Condition (6.36) requires q > 2. In this case, Condition (6.37) requires

q > 4.

Example VI.19. Consider ai,j = (i+ 1)−q(j + 1)−q, for all i, j ≥ 0 for some q > 1.

Then, A =

∞

i=0

∞

j=0 a2i,j

< ∞ and

(6.39) Ak,l =∞

i=k

∞

j=l

a2i,j

= O(k−(2q−1)l−(2q−1)) .

One can thus check that Condition (6.36) requires q > 3/2 while Condition (6.37)

requires q > 2.

109

6.7 A Moment Inequality

We establish a moment inequality for stationary 2-parameter random fields on

general probability spaces, without assuming the product structure. We first review

the Peligrad–Utev inequality, a maximal Lp-inequality in dimension one, with p ≥ 2.

Let Xkk∈Z be a stationary process with Xk = f T k for all k ∈ Z, where f is a

measurable function from a probability space (Ω,A,P) to R, and T is a bimeasurable,

measure-preserving, one-to-one and onto map on (Ω,A,P). Consider

(6.40) Sn(f) =n

k=1

f T k.

Let Fkk∈Z be a filtration on (Ω,A,P) such that T−1Fk = Fk+1 for all k ∈ Z.

Supposef 2dP < ∞,

fdP = 0, f ∈ F0 (i.e., the sequence is adapted) and

f ∈ L2(F∞) L2(F−∞) with F∞ =

k∈ZFk and F−∞ =

k∈Z

Fk.

Let C denote a constant that may change from line to line. It is known that for

all f ∈ Lp(F∞), E(f | F−∞) = 0,

(6.41) max

1≤k≤n

|Sk(f)|p

≤ Cn1/2

E(f | F0)p + f − E(f | F0)p

+n

k=1

E(Sk(f) | F0)pk3/2

+n

k=1

Sk(f)− E(Sk(f) | Fk)pk3/2

.

The inequality above was first established for adapted stationary sequences in

Peligrad and Utev [67] and then extended to Lp-inequality for p ≥ 2 in Peligrad et

al. [68]. The case p ∈ (1, 2) was addressed by Wu and Zhao [122]. The non-adapted

case for p ≥ 2 was addressed by Volny [105].

For the sake of simplicity, we simplify the bound in (6.41) by regrouping the

summations. Observe that E(Sk(f) | F0)p ≤ E(Sk(f) | F1)p, E(f | F0)p =

110

E(S1(f) | F1)p and f −E(f | F0)p = S1(f)−E(S1(f) | F1)p. Thus, we obtain

(6.42) max

1≤k≤n

|Sk(f)|p

≤ Cn1/2

n

k=1

E(Sk(f) | F1)pk3/2

+n

k=1

Sk(f)− E(Sk(f) | Fk)pk3/2

.

Now, consider a general probability space (Ω,A,P), and suppose there exists

a commuting 2-parameter filtration Fi,j(i,j)∈Z2 , and an Abelian group of bimea-

surable, measure-preserving, one-to-one and onto maps Ti,j(i,j)∈Z2 on (Ω,A,P),

such that (6.15) holds. Define F∞,∞ =

(i,j)∈Z2 Fi,j, F−∞,∞ =

i∈ZFi,∞ and

F∞,−∞ =

j∈ZF∞,j. Note that when (Ω,A,P) is a product probability space, then

F−∞,∞ and F∞,−∞ are trivial, by Kolmogorov’s zero-one law.

Recall the definition of Sm,n(f) in (6.18). Given f , write Sm,n ≡ Sm,n(f) for the

sake of simplicity.

Proposition VI.20. Consider (Ω,A,P), Ti,j(i,j)∈Z2 and Fi,j(i,j)∈Z2 described as

above. Suppose p ≥ 2, f ∈ Lp(F∞,∞) and E(f | F−∞,∞) = E(f | F∞,−∞) = 0. Then,

Sm,np ≤ Cm1/2n1/2m

k=1

n

l=1

dk,l(f)

k3/2l3/2

with

dk,l(f) = E(Sk,l | F1,1)p

+E(Sk,l | F1,∞)− E(Sk,l | F1,l)p

+E(Sk,l | F∞,1)− E(Sk,l | Fk,1)p

+Sk,l − E(Sk,l | Fk,∞)− E(Sk,l | F∞,l) + E(Sk,l | Fk,l)p .

Corollary VI.21. Suppose the assumptions in Proposition VI.20 hold.

(i) If f ∈ F0,0, then

Sm,n(f)p ≤ Cm1/2n1/2m

k=1

n

l=1

E(Sk,l(f) | F1,1)pk3/2l3/2

.

111

(ii) If f Ti,j(i,j)∈Z2 are two-dimensional martingale differences, in the sense that

f ∈ Lp(F0,0) and E(f | F0,−1) = E(f | F−1,0) = 0, then

Sm,n(f)p ≤ Cm1/2n1/2f

p.

The proof of Corollary VI.21 is trivial. We only remark that the second case recov-

ers the Burkholder’s inequality for multiparameter martingale differences established

in [36].

Proof of Proposition VI.20. Fix f . Define S0,n =

n

j=1 f T0,j. Clearly,

(6.43) Sm,n =m

i=1

n

j=1

f Ti,j =m

i=1

n

j=1

f T0,j

Ti,0 =

m

i=1

S0,n Ti,0 .

Fix n. Observe that ES0,n = 0 and S0,n Ti,0 is a stationary sequence. Furthermore,

Fi,∞i∈Z is a filtration, T−1i,0 Fj,∞ = T−i,0Fj,∞ = Fi+j,∞ and E(S0,n | F−∞,∞) = 0.

Therefore, we can apply the Peligrad–Utev inequality (6.42) and obtain

Sm,np ≤ Cm1/2 m

k=1

k−3/2E(Sk,n | F1,∞)p

Λ1

+m

k=1

k−3/2Sk,n − E(Sk,n | Fk,∞)p

Λ2

.(6.44)

We first deal with Λ1. Define Sm,0 =

m

i=1 f Ti,0. Similarly as in (6.43), Sk,n =

n

j=1Sk,0 T0,j, and

E(Sk,n | F1,∞) =n

j=1

E(Sk,0 T0,j | F1,∞)

=n

j=1

E(Sk,0 T0,j | T0,−j(F1,∞)) ,

where in the last equality we used the fact that T0,j(Fi,∞) = Fi,∞, for all i, j ∈ Z.

Now, by the identify E(f | F) T = E(f T | T−1(F)), we have

(6.45) E(Sk,n | F1,∞) =n

j=1

E(Sk,0 | F1,∞) T0,j .

112

Observe that (6.45) is again a summation in the form of (6.40). Then, applying the

Peligrad–Utev inequality (6.42) again, we obtain

Λ1 ≤ Cn1/2 n

l=1

l−3/2E[E(Sk,l | F1,∞) | F∞,1]p

+n

l=1

l−3/2E(Sk,l | F1,∞)− E[E(Sk,l | F1,∞) | F∞,l]p

.

By the commuting property of the marginal filtrations (6.17), the above inequality

becomes

Λ1 ≤ Cn1/2 n

l=1

l−3/2E(Sk,l | F1,1)p

+n

l=1

l−3/2E(Sk,l | F1,∞)− E(Sk,l | F1,l)p

.(6.46)

Similarly, one can show

Λ2 =

n

j=1

[Sk,0 − E(Sk,0 | Fk,∞)] T0,j

p

≤ Cn1/2 n

l=1

l−3/2E(Sk,l | F∞,1)− E(Sk,l | Fk,1)p

+n

l=1

l−3/2Sk,l − E(Sk,l | Fk,∞)

−E(Sk,l | F∞,l) + E(Sk,l | Fk,l)p.(6.47)

Combining (6.44), (6.46) and (6.47), we have thus proved Proposition VI.20.

6.8 Auxiliary Proofs

For arbitrary σ-fields F ,G, let F ∨ G denote the smallest σ-field that contains F

and G.

Proposition VI.22. Let (Ω,B,P) be a probability space and let F ,G,H be mutually

independent sub-σ-fields of B. Then, for all random variable X ∈ B, E|X| < ∞, we

113

have

(6.48) E [E(X | F ∨ G) | G ∨H] = E(X | G) a.s.

Proposition VI.22 is closely related to the notion of conditional independence (see

e.g. [16], Chapter 7.3). Namely, provided a probability space (Ω,F ,P), and sub-σ-

fields G1,G2 and G3 of F , G1 and G2 are said to be conditionally independent given

G3, if for all A1 ∈ G1, A2 ∈ G2, P(A1 ∩ A2 | G3) = P(A1 | G3)P(A2 | G3) almost surely.

Proof of Proposition VI.22. First, we show that F ∨ G and G ∨H are conditionally

independent, given G. By Theorem 7.3.1 (ii) in [16], it is equivalent to show, for all

F ∈ F , G ∈ G, P(F ∩G | G ∨H) = P(F ∩G | G) almost surely. This is true since

P(F ∩G | G ∨H) = 1GE(1F | G ∨H) = 1GE(1F | G) = P(F ∩G | G) a.s.

Next, by Theorem 7.3.1 (iv) in [16], the conditional independence obtained above

yields E(X | G ∨ H) = E(X | G) almost surely, for all X ∈ F ∨ G, E|X| < ∞.

Replacing X by E(X | F ∨ G), we have thus proved (6.48).

Proof of Lemma VI.16. Write Wk,l = Zi,jk,l

h. Define (and recall that) Wk,l,± =

Zi,j,±k,l

h. Let Wk,l,− be a copy of Wk,l,−, independent of Wk,l,±. Set Wk,l := Wk,l,++

Wk,l,−.

Recall Kk,l(Wk,l,−) = E(K(Wk,l) | F1,1) in (6.31). Observe that by (6.30), Wk,l,− ∈

F1,1, and Wk,l,+,Wk,l,− are independent of F1,1. Therefore, E(K(Wk,l) | F1,1) =

E(K(Wk,l)) = 0, and

|Kk,l(Wk,l,−)| = |E(K(Wk,l)−K(Wk,l) | F1,1)|

≤ E(|K(Wk,l)−K(Wk,l)| | F1,1) .

Observe that by (6.34),

|K(Wk,l)−K(Wk,l)| ≤ Mα,β(Wk,l)(|Wk,l,− − Wk,l,−|α + |Wk,l,− − Wk,l,−|

β) .

114

Write Uk,l = Wk,l,− − Wk,l,−. By Cauchy–Schwartz’s inequality, and noting that

E(|Mα,β(Wk,l)|2 | F1,1) = Mα,β(Wk,l)22 = Mα,β(W1,1)22, we have

|Kk,l(Wk,l,−)| ≤ Mα,β(W1,1)2E[(|Uk,l|α + |Uk,l|

β)2 | F1,1]1/2 ,

whence, for p ≥ 2,

Kk,l(Wk,l,−)p ≤ Mα,β(W1,1)2|Uk,l|α + |Yk,l|

βp

≤ Mα,β(W1,1)2(|Uk,l|αp + |Yk,l|

βp) .(6.49)

Finally, since for all γ > 0 and n ∈ N, there exists a constant C(γ, n) > 0 such that

and for all vector w = (w1, . . . , wn) ∈ Rn,

|w|2γ = n

i=1

w2i

γ

≤ C(γ, n) n

i=1

w2γi

,

it follows that for all γ > 0,

E(|Uk,l|2γ) = E(|Wk,l,− − Wk,l,−|

2γ) = E(|Zi,j,− − Zi,j,−k,l

h|2γ)

= OE

k−h<i≤k

l−h<j≤l

(Zi,j,− − Zi,j,−)2γ

.

By Wu [117], Lemma 4, under the notation (6.33), E(||2∨2γ) < ∞ implies that for

all Λ ⊂ Z2, E(|ZΛ|

2γ) ≤ CAγ

Λ for some universal constant C. It then follows that

E(|Uk,l|2γ) = O(Aγ

k+1−h,l+1−h). Consequently, (6.49) yields

Kk,l(Wk,l,−)p ≤ Mα,β(W1,1)2

O(Aα/2

k+1−h,l+1−h) +O(Aβ/2

k+1−h,l+1−h)

= O(Aα/2k+1−h,l+1−h

) .

The proof is thus completed.

CHAPTER VII

Asymptotic Normality of Kernel Density Estimators forStationary Random Fields

Let Xii∈Zd , d ∈ N be a stationary zero-mean random field, such that the

marginal probability density function p(·) exists. We are interested in the Parzen–

Rosenblatt kernel density estimator of p(x) in the form of

(7.1) fn(x) =1

ndbn

i∈1,ndKx−Xi

bn

, x ∈ R .

Throughout this chapter, we assume that the kernel K : R → R is a bounded

Lipschitz-continuous density function, and the bandwidth bn satisfies

(7.2) bn → 0 and ndbn → ∞ as n → ∞ .

We also write, for a, b ∈ Z, a, b ≡ a, a+ 1, . . . , b.

This problem was first considered by Rosenblatt [82] and Parzen [65], in the case

that Xi’s are independent and identically distributed (i.i.d.) random variables: in

particular, one can show the consistency

limn→∞

E[(fn(x)− p(x))2] = 0,

and the asymptotic normality

(7.3) (ndbn)1/2(fn(x)− Efn(x)) ⇒ N (0, σ2

x) as n → ∞ ,

115

116

where σ2x= p(x)

K2(s)ds. See for example Silverman [97] for more references on

density estimation problems with i.i.d. data.

The case that Xi’s are dependent, however, has presented more challenges, and we

focus on establishing the asymptotic normality (7.3) in this chapter. The dependent

one-dimensional case has been considered by Robinson [80], Castellana and Leadbet-

ter [14], Bosq et al. [6], Wu and Mielniczuk [120] and Dedecker and Merlevede [29],

among others. In particular, Wu and Mielniczuk [120] investigated thoroughly the

case when Xii∈Z is a linear process. That is,

Xi =∞

k=−∞

aki−k , i ∈ Z ,

where

ka2k< ∞ and the innovations ii∈Z are i.i.d. random variables. Linear

processes are important in the study of stationary processes, as any stationary process

can be represented as linear combinations of linear processes (the so-called superlinear

processes) with martingale-difference innovations (Volny et al. [106]).

The asymptotic normality of kernel density estimators for random fields has been

considered by Tran [104], Hallin et al. [44] and El Machkouri [33, 34], among others.

The extension of results in one dimension to high dimensions, however, is not trivial.

As summarized in Hallin et al. [44], ‘the points of Zd do not have a natural ordering.

As a result, most techniques available for one-dimensional processes do not extend

to random fields.’ See more references in [44] on related discussions.

In particular, a notorious difficulty for kernel density estimation of random fields,

is that one often needs more assumptions on the bandwidth bn than the minimal

one (7.2). This condition is minimal in the sense that it is the natural condition

for the asymptotic normality (7.3) to hold when Xi’s are i.i.d. To the best of our

knowledge, only the recent results by El Machkouri [33, 34] assume no other but

minimal condition (7.2) on bn for dependent random fields.

117

We focus on the kernel density estimation for causal linear random fields Xii∈Zd

(d ∈ N) in form of

(7.4) Xi =

k∈Zd,k0

aki−k , i ∈ Zd,

where

i0 a2i< ∞ and ii∈Zd are i.i.d. zero-mean random variables with finite

second moments. Throughout this chapter, we let ‘i k’ denote ‘iτ ≥ kτ for all

τ = 1, . . . , d’ for i, k ∈ Zd, and write 0 = (0, . . . , 0),1 = (1, . . . , 1) ∈ Z

d.

We provide new conditions on the coefficient aii∈Zd such that the asymptotic

normality (7.3) holds (see Theorem VII.4 below), and compare with results obtained

by Hallin et al. [44] and El Machkouri [34]. In both cases, our conditions are weaker

on the coefficients aii∈Zd . On the other hand, our condition on the bandwidth

improves the one in [44], but it is still stronger than the minimal one (7.2) assumed

in [34].

Our proof is based on the m-approximation approach. As we will see, to address

this problem one has to establish an m-approximation with unbounded m (mn → ∞

as n → ∞). As a key step of our approach, we establish a central limit theo-

rem for triangular arrays of stationary m-dependent random fields with unbounded

m (Theorem VII.12). This result improves a central limit theorem established by

Heinrich [47]. Our m-approximation method is also involved with certain moment

inequalities for stationary random fields (Lemma VII.16), which are based on Propo-

sition VI.20. A different proof of the asymptotic normality of kernel density esti-

mators of stationary random fields is due to El Machkouri [33, 34], who also es-

tablished m-approximations with unbounded m, combined with Lindeberg’s method

(see e.g. Rio [79] and Dedecker [27]).

At last, we point out that when the asymptotic normality (7.3) holds, the ran-

118

dom variables are often said to have weak dependence, in the sense that they behave

asymptotically as i.i.d. random variables. On the other hand, when the dependence

is strong enough, the normalization for obtaining limiting distributions is of different

order from ndbn in (7.3), and the asymptotic limit may be no longer Gaussian (see

e.g. Csorgo and Mielniczuk [18] for one-dimensional case). These two regimes are

sometimes referred to as short-range dependence and long-range dependence, respec-

tively. For linear processes, Wu and Mielniczuk [120] addressed both short-range

and long-range dependence cases. For the linear random fields, however, to the best

of our knowledge, the long-range dependence case remains open. It seems that the

m-approximation method is limited to the short-range dependence case. Therefore,

the long-range dependence case is beyond the scope of this chapter.

The chapter is organized as follows. Our assumptions and main results are pre-

sented in Section 7.1. Examples and comparison with other results are provided in

Section 7.2. Section 7.3 is devoted to the central limit theorem for triangular ar-

rays of m-dependent random fields. Section 7.4 establishes asymptotic normality by

m-approximation. Auxiliary proofs are given in Section 7.5.

7.1 Assumptions and Main Result

We first introduce our conditions. For each m ∈ N, i ∈ Zd, write

(7.5) Xi,m =

k∈0,m−1daki−k and Xi,m = Xi −

Xi,m .

Let p, pm and pm denote the probability density function of X0, X0,m and X0,m,

respectively. Let pi and pi,m denote the joint density functions of (X0, Xi) and

(X0,m, Xi,m), respectively. Our first condition is on the regularity of the density

functions. Define the supremum p = supxp(x), p

i= sup

x,ypi(x, y) and similarly p

m

and pi,m

.

119

Condition VII.1. (i) The density functions p and pmm∈N exist. They are

c0-Lipschitz continuous with certain constant c0 < ∞, independent of m (i.e.,

max(|p(x)− p(y)|, |pm(x)− pm(y)|) ≤ c0|x− y|). Furthermore,

(7.6) p < ∞ and supm

pm< ∞ .

(ii) The density functions pi and pi,m exist for all i = 0,m ∈ N. Furthermore,

(7.7) supi =0

pi< ∞ and sup

m

supi =0

pi,m

< ∞ .

Condition VII.1 can be satisfied, for example, by simply assuming that the prob-

ability density function p of 0 exists and is Lipschitz. This was assumed also in Wu

and Mielniczuk [120].

Lemma VII.2. If p exists and is Lipschitz, then Condition VII.1 holds.

The proof is deferred to Section 7.5.

Our second condition is on the decay of coefficients and bandwidth bn. Define

Ak =

ik

a2i

1/2, k ∈ Z

d and Bm =

i∈0,∞d|i|∞≥m

a2i

1/2,m ∈ N ,

with |i|∞ = maxτ=1,...,d |iτ |. Write

∆n =

k∈1,nd

Ak−1d

τ=1 k1/2τ

.

Condition VII.3. There exist a sequence of integers mnn∈N such that mn → ∞

as n → ∞, and the following limits hold:

limn→∞

b1/2n

∆n = 0 ,(7.8)

limn→∞

Bmn

bn= 0 ,(7.9)

limn→∞

md

nbn = 0 ,(7.10)

limn→∞

md

nlogd n

ndbn= 0 .(7.11)

120

Theorem VII.4. If Conditions VII.1 and VII.3 hold and E(|0|α) < ∞ for some

α > 2, then the asymptotic normality (7.3) holds.

We will prove Theorem VII.4 in Section 7.4. We conclude this section with a few

remarks.

Remark VII.5. We briefly comment on each condition in Condition VII.3.

(i) Condition (7.8) is slightly weaker than

∆∞ ≡

k∈1,∞d

Ak−1d

τ=1 k1/2τ

< ∞ .

It was shown in Corollary 6.36 that, the above condition implies the asymptotic

normality of

i∈1,nd [f(Xi) − Ef(X0)]/nd/2 for Lipschitz continuous function

f such that Ef 2(X0) < ∞.

(ii) Condition (7.9) implies that

(7.12) limn→∞

E| X0,mn|

bn= 0 .

Indeed, Wu [117], Lemma 4 showed that for i.i.d. zero-mean random variables

ii∈Z with E(|0|2∨2p) < ∞, p > 0,

(7.13) E

i

aii2p

≤ C

i

a2i

p

.

Intuitively, X0,mncan be viewed as the remainder ofX0 after themn-truncation.

Condition (7.12) tells that mn needs to tend to infinity fast enough, so that the

central limit theorem holds.

(iii) Conditions (7.10) and (7.11) are useful when we apply a central limit theorem

for m-dependent random variables with unbounded m in Proposition VII.14

below.

Throughout this chapter, let C denote constants that do not depend on

i, k,m, n, x, y. The value of C may change from line to line.

121

7.2 Examples and Discussions

Theorem VII.4, and particularly Condition VII.3, is not convenient to apply for

concrete models. Instead, we provide a corollary for practical reason. Write

A[n] = maxAn,1,...,1, A1,n,1,...,1, . . . , A1,...,1,n .

Corollary VII.6. Suppose A[n] ≤ c1n−βand β > 0, and bn = c2n−γ

. Then a

sufficient condition such that Condition VII.3 holds is

(7.14) γ <dβ

d+ βand β > d .

Consequently, if E(|0|α) < ∞ for some α > 2, and Condition VII.1 and (7.14) hold,

then the asymptotic normality (7.3) follows.

Proof. Assume that mn takes the form of nδ. Observe that Bmnis of the same

order of A[mn] as n → ∞. Then, the limit conditions (7.9), (7.10) and (7.11) are

implied by

limn→∞

n−βδ+γ + ndδ−γ + nδ−1+γ/d = 0 ,

which is equivalent to γ/β < δ < minγ/d, 1 − γ/d. Since β > d implies that

∆∞ < ∞, the desired result follows.

Remark VII.7. Under the assumptions of Corollary VII.6, Condition (7.14) is very

close to necessary for Condition VII.3 to holds. Indeed, if A[n] = l(n)n−β with

limn→∞ l(n) = c2 > 0, then the same argument above yields that Condition VII.3 is

equivalent to (7.14).

Below, we provide examples of coefficients so that Condition VII.3 holds. We

assume that bn = n−γ for some γ ∈ (0, d).

122

Example VII.8. We compare our conditions and the ones by Hallin et al. [44].

They considered the case that |ai| ≤ C|i|−q

∞, i 0. Then, they require

(7.15) q > max(d+ 3, 2d+ 1/2) and limn→∞

ndb(2q−1+6d)/(2q−1−4d)n

= ∞ .

Our condition (7.14) imposes weaker assumption in this case (with bn = n−γ). First,

observe that

A2n,1,...,1 ≤ B2

n≤ C

∞

i=n

id−1i−2q≤ Cnd−2q .

We can apply Corollary VII.6 with β = q − d/2. Then, (7.14) becomes

(7.16) q >3d

2and γ < d

q − d/2

q + d/2.

Thus, to establish the asymptotic normality (7.3), our condition (7.16) is less restric-

tive than (7.15) on both q and γ.

Example VII.9. We compare our conditions and the ones by El Machkouri [34].

Note that his results apply to general stationary random fields and the linear random

fields are a specific case. In particular, he showed that for causal linear random fields,

if

(7.17)

i∈Zd

|i|q∞|ai| < ∞

with q = 5d/2, then the asymptotic normality follows.

In this case, our condition on the coefficients is weaker, requiring only q > d.

Indeed, suppose (7.17) holds with some q > 0. Then, to apply Corollary VII.6, it

suffices to observe

A2n,1 =

∞

i1=n

i2,...,id∈N

|ai|2≤ Cn−2q

∞

i1=n

i2,...,id∈N

|i|2q∞|ai|

2 < Cn−2q,

and take β = q.

123

At the same time, our result requires γ < dq/(q+d) for the bandwidth, in addition

to the minimal one (7.2) assumed in [34]. Recall also that we assume E(|0|α) < ∞ for

some α > 2, while El Machkouri’s result needs only finite-second-moment assumption

on 0.

Remark VII.10. Finally, we compare our result to Wu and Mielniczuk [120]. In the

one-dimensional case, to have asymptotic normality they assume only finite variance

of 0 and weaker assumption on the coefficient:

(7.18)∞

i=0

|ai| < ∞ .

This is weaker than our condition in one dimension (with q > d = 1 in (7.17)).

Wu and Mielniczuk followed a martingale approximation approach. It remains an

open question that in high dimension, whether the condition q > d in (7.17) can be

improved to match (7.18) in dimension one.

7.3 A Central Limit Theorem for m-Dependent Random Fields

In this section, we prove a central limit theorem for stationary triangular arrays of

m-dependent random fields. Throughout this section, let Yn,i : i ∈ Ndn∈N denote

stationary zero-mean triangular arrays. That is, for each n, Yn,ii∈Nd is stationary

and Yn,i has zero mean. Furthermore, we assume that Yn,ii∈Nd is mn-dependent in

the sense that Yn,i and Yn,j are independent if |i− j|∞ ≥ m. We provide conditions

such that

(7.19)Sn(Y )

nd/2≡

i∈1,nd Yn,i

nd/2⇒ N (0, σ2) as n → ∞.

A key condition is the following:

(7.20)

i∈Nd,1ij

Yn,i

2≤ C(j1 · · · jd)

1/2 for all n ∈ N, j ∈ Nd .

124

Remark VII.11. Observe that Proposition VI.20 provides conditions such that (7.20)

holds. In fact, inequality (7.20) has been established, under various conditions on

the dependence of stationary random fields, by Dedecker [28] and El Machkouri et

al. [35], among others.

Theorem VII.12. Suppose that there exists a constant C such that (7.20) holds. If

there exists a sequence lnn∈N ⊂ N, mn/ln → 0 and ln/n → 0 as n → ∞, such that

limn→∞

1

ldn

E

k∈1,lndYn,k

2= σ2 ,(7.21)

limn→∞

1

ldn

E

k∈1,lndYn,k

21

k∈1,lndYn,k

> nd/2

= 0 ,(7.22)

for all > 0, then (7.19) holds.

Proof. Consider partial sums over big blocks of size ldn, denoted by

ηn,k =

i∈1,lndYn,i+k(ln+mn) , k ∈ N

d.

In this way, for each n ∈ N, ηn,kk∈Nd are i.i.d., as we separate neighboring blocks

by distance mn, and Yn,ii∈Zd are mn-dependent. Set

Sn(η) =

k∈0,n/(ln+mn)−1dηn,k , n ∈ N .

Then, (7.20) implies that

Sn(Y )

nd/2−

Sn(η)

nd/2

2→ 0 as n → ∞ .

To see this, for the sake of simplicity, we consider the case n/(ln + mn) =

n/(ln +mn). Indeed, by the triangular inequality, the left-hand side above can

be bounded by sums in form of

i∈BYn,i

2/nd/2, where B can be a rectangle of

size nd−rmr

nwith r ∈ 1, . . . , d − 1. Focusing on the dominant term with r = 1,

125

we then bound the left-hand side above by C(n/(ln + mn))1/2(nd−1mn)1/2/nd/2 =

Cm1/2n /(ln +mn)1/2 → ∞ as n → ∞.

As a consequence, it suffices to show Sn(η)/nd/2 ⇒ N (0, σ2). This, under condi-

tions (7.21) and (7.22), follows from the standard central limit theorem for triangular

arrays of independent random variables (see e.g. [32], Chapter 2, Theorem 4.5).

Remark VII.13. Central limit theorems for mn-dependent random fields has been

considered by Heinrich [47]. His result has been recently applied, with mn = m

fixed, by El Machkouri et al. [35] to establish a central limit theorem for stationary

random fields.

Our application requires us to take mn → ∞. In this case our condition in

Theorem VII.12 is weaker than Heinrich’s. In particular, he assumed

(7.23) limn→∞

m2dn

nd

i∈1,ndE

Y 2n,i1|Yn,i|>nd/2m

−2dn

= 0 , for all > 0 .

This is stronger than (7.22).

7.4 Asymptotic Normality by m-Approximation

In this section, we prove Theorem VII.4 by an m-approximation argument. Fix

x ∈ R and write

Zn,i =1

√bnKx−Xi

bn

and ζn,i =

1√bnKx−Xi,mn

bn

, i ∈ Z

d .

In this way, ζn,ii∈Zd are mn-dependent. We will use ζn,i : i ∈ Zdn∈N to approx-

imate Zn,i : i ∈ Zdn∈N. We also write Zn,i = Zn,i − EZn,i and ζ

n,i= ζn,i − Eζn,i.

Setting

Sn(ζ) =

i∈1,ndζn,i

and Sn(Z − ζ) =

i∈1,nd(Zn,i − ζ

n,i) ,

126

we decompose

(7.24) (ndbn)1/2(fn(x)− Efn(x)) =

Sn(ζ)

nd/2+

Sn(Z − ζ)

nd/2.

To prove Theorem VII.4, it suffices to establish the following two results.

Proposition VII.14. Under Condition VII.1 and (7.8), (7.10), (7.11) of Condi-

tion VII.3,

(7.25)Sn(ζ)

nd/2⇒ N (0, σ2

x) .

Proposition VII.15. Under Condition VII.1 and (7.8), (7.9) of Condition VII.3,

(7.26)Sn(Z − ζ)

nd/2

P−→ 0 .

To prove the above two propositions, a key step is to establish the following

moment inequalities.

Lemma VII.16. There exists a constant C > 0, such that for all n ∈ N,

(7.27)Sn(Z − ζ)

2≤ Cnd/2

Zn,0 − ζn,0

2+ b1/2

n∆n

.

In addition, if E(|0|α) < ∞ for some α ≥ 2, then

(7.28)

i∈Nd,1ij

ζn,i

α

≤ C(j1 · · · jd)1/2

ζn,0

α+ b1/2

n∆n

, for all j ∈ N

d.


Proof of Proposition VII.14. Observing that Sn(ζ)/nd/2 is a partial sum of mn-

dependent random fields, we apply Theorem VII.12. Observe that sinceζ

n,0

2→ σx

as n → ∞, (7.28) with α = 2 and assumption (7.8) entail (7.20). Thus, to

prove (7.25), it suffices to show, for ln = mn log n,

(7.29) limn→∞

1

ldn

E

i∈1,lndζn,i

2= σ2

x,

127

and, writing ξn =

i∈1,lnd ζn,i,

(7.30) limn→∞

1

ldn

E

ξ2n1|ξn|>nd/2

= 0 , for all > 0 .

By standard calculation, under (7.7) of Condition VII.1, for all n ∈ N and i = 0,

|E(ζn,0ζn,i)| ≤ Cp

i,mnbn ≤ Cbn .

Therefore,

1

ldn

E

i∈1,lndζn,i

2− Eζ

2n,0

≤ 2

i∈−mn,mnd|E(ζ

n,0ζn,i)|1i =0 ≤ Cmd

nbn .

Thus, assumption (7.10) entails (7.29). To prove (7.30), observe that

E(ξ2n1|ξn|>nd/2) ≤ ξn

2αP(|ξn| > nd/2)(α−2)/α

≤ ξn2α

ξn

22

nd2

(α−2)/α.

This time, (7.28) and (7.8) yield ξn2 ≤ Cld/2n . For α > 2, observe that, since K is

bounded,

ζn,0

α= (E|ζ

n,0|α)1/α ≤

C

b(α−2)/2n

ζn,0

2

2

1/α≤ Cb−(α−2)/(2α)

n.

So, ξn2α≤ Cld

nb−(α−2)/αn . To sum up, we have obtained that

1

ldn

E(ξ2n1|ξn|>nd/2) ≤ C

ldn

ndbn

(α−2)/α.

Now, (7.11) entails (7.30).

Proof of Proposition VII.15. In order to obtain the desired result, it suffices to com-

bine (7.27), assumptions (7.8) and (7.9) and Lemma VII.17 below.

Lemma VII.17. Under the assumption of Condition VII.1, there exists a constant

C, such that for all n ∈ N,

(7.31)ζ

n,0 − Zn,0

2≤ C

Bmn

bn

1/2+ b1/2

n

.


128

7.5 Proofs

Proof of Lemma VII.2. (i) The existence and Lipschitz continuity of p and pm have

been proved by Wu and Mielniczuk [120], Lemma 1. To prove (7.6), observe that

|pm(y)− p(y)| ≤

|pm(y)− pm(y − x)|pm(x)dx

≤ C

|x|pm(x)dx = CE| X0,m| .(7.32)

This entails that pm(x) → p(x) uniformly for x ∈ R as m → ∞. Therefore, (7.6)

holds.

(ii) Fix i ∈ Zd \ 0 and let Fi denote the joint distribution function of (X0, Xi).

For the sake of simplicity, we prove the case of a0 = 1. Write R = X0 − 0 and

Ri = Xi− i− ai0. Now, R and Ri are dependent random variables. First, we show

that

(7.33) pi(x, y) ≡∂2

∂x∂yFi(x, y) = E[p(x−R)p(y −Ri − aix)] .

Indeed,

Fi(x, y) = P(X0 ≤ x,Xi ≤ y)

= P(0 +R ≤ x, i + ai0 +Ri ≤ y)

= EΦi(x−R, y −Ri) ,(7.34)

with, letting F denote the cumulative distribution function of 0,

Φi(x, y) =

x

−∞

F(y − aix)F(dx

) .

Differentiating (7.34) yields (7.33) (see e.g. [32], Appendix A.9 on the validation of

exchange of differentiation and expectation).

129

Next, we prove (7.7) by establishing the following two steps:

(7.35) lim|i|∞→∞

supx,y

|pi(x, y)− p(x)p(y − aix)| = 0 ,

and

(7.36) limm→∞

supx,y,i

|pi(x, y)− pi,m(x, y)| = 0 .

Then, (7.35) implies the first part of (7.7), and the two limits imply the second part.

To prove (7.35), set

Di = E(Ri | σ(k : k 0)) and Di = Ri −Di , i ∈ Z

d.

By definition, Di and R are independent. Introducing an intermediate term E[p(x−

R)p(y−Di−aix)] = p(x)Ep(y−Di−aix), we then bound |pi(x, y)−p(x)p(y−aix)| ≤

Ψ1 +Ψ2 with, under the assumption that p is bounded and Lipschitz,

Ψ1 = |pi(x, y)− E[p(x−R)p(y −Di − aix)]|

≤ E[p(x−R)|Ri −Di|] ≤ CE| Di|,

and

Ψ2 = |p(x)p(y − aix)− E[p(x−R)p(y −Di − aix)]|

≤ p(x)E|p(y − aix−Ri + ai0)− p(y −Di − aix)|

≤ C(E| Di|+ |ai|) .

By (7.13), |pi(x, y)− p(x)p(y − aix)| → 0 as |i|∞ → ∞.

To prove (7.36), define Rm = X0,m − 0 and Ri,m = Xi,m − i − ai01|i|∞<m.

Then, similarly as (7.33), one has

pi,m(x, y) = E[p(x−Rm)p(y − aix1|i|∞<m −Ri,m)] .

130

Introducing an intermediate term E[p(x−R)p(y− aix1|i|∞<m −Ri,m)], we obtain

that

|pi,m(x, y)− pi(x, y)|

≤ E[p(x−R)(|aix|1|i|∞≥m + |Ri −Ri,m|)] + CE|R−Rm|

≤ C(|x|p(x)|ai|1|i|∞≥m + |R−Rm|+ |Ri −Ri,m|) .

Since X0 has finite second moment and p is bounded and Lipschitz, supx|x|p(x) <

∞. The summability assumption on ai implies that limm→∞ sup|i|∞≥m|ai| = 0, and

supi(|R−Rm|+ |Ri−Ri,m|) → 0 as m → ∞ (recall (7.13)). Therefore, we have thus

proved (7.36).

Proof of Lemma VII.16. We only prove (7.27). The proof of (7.28) is similar. By

Proposition VI.20, there exists a constant C, such that

(7.37)

Sn(Z − ζ)2

n≤ C

k∈1,nd

E(Zn,k − ζn,k

| F1)2

d

τ=1 k1/2τ

,

where F1 = σ(k : k ∈ Zd, k 1). By the definition of ζ

n,i, (7.37) equals (up to the

multiplicative constant C),

k∈1,nd\1,mnd

E(Zn,k | F1)2

d

τ=1 k1/2τ

+

k∈1,mnd

E(Zn,k − ζn,k

| F1)2

d

τ=1 k1/2τ

≤Zn,0 − ζ

n,0

2+

k∈1,nd

E(Zn,k | F1)2

d

τ=1 k1/2τ

+

k∈1,mnd

E(ζn,k

| F1)2

d

τ=1 k1/2τ

≤ CZn,0 − ζ

n,0

2+ b1/2

n

k∈1,nd

Ak−1d

τ=1 k1/2τ

,

where the last inequality follows from Lemma VII.18 below.

Lemma VII.18. Suppose that in addition to Condition VII.1, E(|0|α) < ∞ for

some α ≥ 2. For all k ∈ Nd, k = 1,

131

E(Zn,k | F1)α

≤ Cb1/2n

Ak−1 ,(7.38)

E(ζn,k

| F1)α

≤ Cb1/2n

Ak−1 .(7.39)

Proof of Lemma VII.18. First, we controlE(Zn,k | F1)

α. For each k ∈ Z

d, intro-

duce the notation

(7.40) Γ(k) := i ∈ Zd : i k ,

and write

Xk =

i∈Γ(k)

ak−ii =

i∈Γ(1)

+

i∈Γ(k)\Γ(1)

ak−ii =: Dk + Tk .

For the sake of simplicity, write D ≡ Dk, T ≡ Tk, and, given a random variable Y , let

EY (·) ≡ E(· | Y ) denote the conditional expectation given the σ-algebra generated

by Y . Since k 1, k = 1, Tk is a non-degenerate random variable. Then,

E(Zn,k | F1) =1

√bn

EDK

x−D − T

bn

− EK

x−D − T

bn

.

Let D be a copy of D, independent of D and T . Then, the above identity becomes,

letting pT denote the density of T ,

1√bnEDED, D

Kx−D − T

bn

−K

x− D − T

bn

= b1/2n

ED

K(t)

pT (x− bnt−D)− pT (x− bnt− D)

dt .

Since pT is Lipschitz, the absolute value of the above term is bounded by

C|K(s)|dsb1/2n ED|D − D|, almost surely. (Here pT depends on k, n, but one

can show that the Lipschitz constant can be chosen independently from k, n. See

e.g. [117], Lemma 1.) To sum up, we have

E(Zn,k | F1)α≤ Cb1/2

n

ED|D − D|

α

≤ Cb1/2n

Dα≤ Cb1/2

nAk−1,

132

where the last inequality follows from (7.13). We have thus proved (7.38). To

prove (7.39), a similar argument yieldsE(ζ

n,k| F1)

α≤ Cb1/2n Ak,mn

with Ak,mn=

(

i∈0,mn−1d,ik−1 a2i)1/2 ≤ Ak−1.

Proof of Lemma VII.17. For random variables Zn,0, Zn,0, ζn,0, ζn,0, we replace the

index ‘n,0’ by ‘n’ for the sake of simplicity. First observe that

(EZn)2 + (Eζn)

2≤ C(p2bn + p2

mnbn) ≤ Cbn ,

where the last step we applied (7.7). Then,

|E(ζ2n− Z

2n)| ≤

K2(y)[pmn(x− bny)− p(x− bny)]dy

+ Cbn

≤ supy

|pmn(y)− p(y)|

K2(s)ds+ Cbn .

≤ C(Bmn+ bn) ,(7.41)

where the last inequality follows from (7.32). Next, write

(7.42)ζ

n− Zn

2

2= EZ

2n− Eζ

2n+ 2(Eζ

2n− E(Znζn)) .

For the last term on the right-hand side of (7.42), observe that E(Znζn) = E(Znζn)−

EZnEζn = E(Znζn) + O(pmn

bn). We claim that E(Znζn) is very close to Eζ2n, under

our restriction on the choice of mn. Indeed,

(7.43) |E(Znζn)− Eζ2n| ≡

E(Znζn)−

K2(y)pmn

(x− bny)dy ,

and,

E(Znζn) =

1

bnKx− y − z

bn

Kx− y

bn

pmn

(y)pmn(z)dydz

=

K(y)EK

y −

X0,mn

bn

pmn

(x− bny)dy .

133

Therefore, (7.43) can be bounded by, since K is Lipschitz,

|K(y)|E

Ky −

X0,mn

bn

−K(y)

pmn(x− bny)dy

≤E| X0,mn

|

bn

|K(y)|pmn

(x− bny)dy ,

and E| X0,mn| ≤ CBmn

by (7.13). To sum up, we have thus shown that (recall that

bn ↓ 0, whence Bmnis dominated by Bmn

/bn), under (7.6),

ζn− Zn

2

2≤ C

Bmn

bn+ bn

.

BIBLIOGRAPHY

134

135

BIBLIOGRAPHY

[1] J. Aaronson. An introduction to infinite ergodic theory, volume 50 of Mathematical Surveysand Monographs. American Mathematical Society, Providence, RI, 1997.

[2] A. A. Balkema and S. I. Resnick. Max-infinite divisibility. J. Appl. Probability, 14(2):309–319,1977.

[3] A. K. Basu and C. C. Y. Dorea. On functional central limit theorem for stationary martingalerandom fields. Acta Math. Acad. Sci. Hungar., 33(3-4):307–316, 1979.

[4] I. Berkes and G. J. Morrow. Strong invariance principles for mixing random fields. Z.Wahrsch. Verw. Gebiete, 57(1):15–37, 1981.

[5] E. Bolthausen. On the central limit theorem for stationary mixing random fields. Ann.Probab., 10(4):1047–1050, 1982.

[6] D. Bosq, F. Merlevede, and M. Peligrad. Asymptotic normality for density kernel estimatorsin discrete and continuous time. J. Multivariate Anal., 68(1):78–95, 1999.

[7] R. C. Bradley. A caution on mixing conditions for random fields. Statist. Probab. Lett.,8(5):489–491, 1989.

[8] R. C. Bradley. Introduction to strong mixing conditions. Vol. 1. Kendrick Press, Heber City,UT, 2007.

[9] T. Buishand, L. de Haan, and C. Zhou. On spatial extremes: With application to a rainfallproblem. Ann. Appl. Stat., 2(2):624–642, 2008.

[10] K. Burnecki, J. Rosinski, and A. Weron. Spectral representation and structure of stableself-similar processes. In Stochastic processes and related topics, Trends Math., pages 1–14.Birkhauser Boston, Boston, MA, 1998.

[11] S. Cambanis, C. D. Hardin, Jr., and A. Weron. Ergodic properties of stationary stableprocesses. Stochastic Process. Appl., 24(1):1–18, 1987.

[12] S. Cambanis, M. Maejima, and G. Samorodnitsky. Characterization of linear and harmoniz-able fractional stable motions. Stochastic Process. Appl., 42(1):91–110, 1992.

[13] A. Caprara, P. Toth, and M. Fischetti. Algorithms for the set covering problem. Ann. Oper.Res., 98:353–371 (2001), 2000. Optimization theory and its application (Perth, 1998).

[14] J. V. Castellana and M. R. Leadbetter. On smoothed probability density estimation forstationary processes. Stochastic Process. Appl., 21(2):179–193, 1986.

[15] T.-L. Cheng and H.-C. Ho. Central limit theorems for instantaneous filters of linear randomfields on Z

2. In Random walk, sequential analysis and related topics, pages 71–84. World Sci.Publ., Hackensack, NJ, 2006.

136

[16] Y. S. Chow and H. Teicher. Probability theory. Springer-Verlag, New York, 1978. Indepen-dence, interchangeability, martingales.

[17] D. Cooley, D. Nychka, and P. Naveau. Bayesian spatial modeling of extreme precipitationreturn levels. J. Amer. Statist. Assoc., 102(479):824–840, 2007.

[18] S. Csorgo and J. Mielniczuk. Density estimation under long-range dependence. Ann. Statist.,23(3):990–999, 1995.

[19] R. A. Davis and S. I. Resnick. Basic properties and prediction of max-ARMA processes. Adv.in Appl. Probab., 21(4):781–803, 1989.

[20] R. A. Davis and S. I. Resnick. Prediction of stationary max-stable processes. Ann. Appl.Probab., 3(2):497–525, 1993.

[21] A. C. Davison and R. L. Smith. Models for exceedances over high thresholds. J. Roy. Statist.Soc. Ser. B, 52(3):393–442, 1990. With discussion and a reply by the authors.

[22] L. de Haan. A characterization of multidimensional extreme-value distributions. SankhyaSer. A, 40(1):85–88, 1978.

[23] L. de Haan. A spectral representation for max-stable processes. Ann. Probab., 12(4):1194–1204, 1984.

[24] L. de Haan and A. Ferreira. Extreme value theory. Springer Series in Operations Researchand Financial Engineering. Springer, New York, 2006. An introduction.

[25] L. de Haan and T. T. Pereira. Spatial extremes: Models for the stationary case. The Annalsof Statistics, 34:146–168, 2006.

[26] L. de Haan and J. Pickands, III. Stationary min-stable stochastic processes. Probab. TheoryRelat. Fields, 72(4):477–492, 1986.

[27] J. Dedecker. A central limit theorem for stationary random fields. Probab. Theory RelatedFields, 110(3):397–426, 1998.

[28] J. Dedecker. Exponential inequalities and functional central limit theorems for a randomfields. ESAIM Probab. Statist., 5:77–104 (electronic), 2001.

[29] J. Dedecker and F. Merlevede. Necessary and sufficient conditions for the conditional centrallimit theorem. Ann. Probab., 30(3):1044–1081, 2002.

[30] J. Dedecker, F. Merlevede, and D. Volny. On the weak invariance principle for non-adaptedsequences under projective criteria. J. Theoret. Probab., 20(4):971–1004, 2007.

[31] C. Dombry and F. Eyi-Minko. Regular conditional distributions of max infinitely divisibleprocesses. Submitted, available at http://arxiv.org/abs/1109.6492, 2011.

[32] R. Durrett. Probability: theory and examples. Duxbury Press, Belmont, CA, second edition,1996.

[33] M. El Machkouri. Asymptotic normality of the parzen-rosenblatt density estimator forstrongly mixing random fields. Stat. Inference Stoch. Process., 14(1):73–84, 2011.

[34] M. El Machkouri. Kernel density estimation for stationary random fields. preprint, availableat http://arxiv.org/abs/1109.2694, 2011.

[35] M. El Machkouri, D. Volny, and W. B. Wu. A central limit theorem for stationary randomfields. Submitted, available at http://arxiv.org/abs/1109.0838, 2011.

137

[36] I. Fazekas. Burkholder’s inequality for multiindex martingales. Ann. Math. Inform., 32:45–51,2005.

[37] R. Furrer, D. Nychka, and S. Sain. fields: Tools for spatial data, 2009. R package version6.01.

[38] E. Gine, M. G. Hahn, and P. Vatan. Max-infinitely divisible and max-stable sample continu-ous processes. Probab. Theory Related Fields, 87(2):139–165, 1990.

[39] C. M. Goldie and P. E. Greenwood. Variance of set-indexed sums of mixing random variablesand weak convergence of set-indexed processes. Ann. Probab., 14(3):817–839, 1986.

[40] C. M. Goldie and G. J. Morrow. Central limit questions for random fields. In Dependencein probability and statistics (Oberwolfach, 1985), volume 11 of Progr. Probab. Statist., pages275–289. Birkhauser Boston, Boston, MA, 1986.

[41] M. Gordin and M. Peligrad. On the functional CLT via martingale approximation. preprint,http://arxiv.org/abs/0910.3448, 2009.

[42] M. I. Gordin. The central limit theorem for stationary processes. Dokl. Akad. Nauk SSSR,188:739–741, 1969.

[43] M. I. Gordin and B. A. Lifsic. Central limit theorem for stationary Markov processes. Dokl.Akad. Nauk SSSR, 239(4):766–767, 1978.

[44] M. Hallin, Z. Lu, and L. T. Tran. Density estimation for spatial linear processes. Bernoulli,7(4):657–668, 2001.

[45] C. D. Hardin, Jr. Isometries on subspaces of Lp. Indiana Univ. Math. J., 30(3):449–465,1981.

[46] C. D. Hardin, Jr. On the spectral representation of symmetric stable processes. J. Multivari-ate Anal., 12(3):385–401, 1982.

[47] L. Heinrich. Asymptotic behaviour of an empirical nearest-neighbour distance function forstationary Poisson cluster processes. Math. Nachr., 136:131–148, 1988.

[48] H.-C. Ho and T. Hsing. Limit theorems for functionals of moving averages. Ann. Probab.,25(4):1636–1669, 1997.

[49] W. Hoeffding and H. Robbins. The central limit theorem for dependent random variables.Duke Math. J., 15:773–780, 1948.

[50] Z. Kabluchko. Spectral representations of sum- and max-stable processes. Extremes,12(4):401–424, 2009.

[51] Z. Kabluchko, M. Schlather, and L. de Haan. Stationary max-stable fields associated tonegative definite functions. Ann. Probab., 37(5):2042–2065, 2009.

[52] G. Keller. Equilibrium states in ergodic theory, volume 42 of London Mathematical SocietyStudent Texts. Cambridge University Press, Cambridge, 1998.

[53] D. Khoshnevisan. Multiparameter processes. Springer Monographs in Mathematics. Springer-Verlag, New York, 2002. An introduction to random fields.

[54] C. Kipnis and S. R. S. Varadhan. Central limit theorem for additive functionals of reversibleMarkov processes and applications to simple exclusions. Comm. Math. Phys., 104(1):1–19,1986.

[55] U. Krengel. Ergodic theorems, volume 6 of de Gruyter Studies in Mathematics. Walter deGruyter & Co., Berlin, 1985. With a supplement by Antoine Brunel.

138

[56] J. Lamperti. On the isometries of certain function-spaces. Pacific J. Math., 8:459–466, 1958.

[57] J. B. Levy and M. S. Taqqu. Renewal reward processes with heavy-tailed inter-renewal timesand heavy-tailed rewards. Bernoulli, 6(1):23–44, 2000.

[58] A. Markov. Recherches sur un cas remarquable d’epreuves dependantes. Acta Math.,33(1):87–104, 1910.

[59] M. Maxwell and M. Woodroofe. Central limit theorems for additive functionals of Markovchains. Ann. Probab., 28(2):713–724, 2000.

[60] F. Merlevede, M. Peligrad, and S. Utev. Recent advances in invariance principles for station-ary sequences. Probab. Surv., 3:1–36 (electronic), 2006.

[61] T. Nagai. A simple tightness condition for random elements on C([0, 1]2). Bull. Math.Statist., 16(1-2):67–70, 1974/75.

[62] B. Nahapetian. Billingsley-Ibragimov theorem for martingale-difference random fields andits applications to some models of classical statistical physics. C. R. Acad. Sci. Paris Ser. IMath., 320(12):1539–1544, 1995.

[63] B. S. Nahapetian and A. N. Petrosian. Martingale-difference Gibbs random fields and centrallimit theorem. Ann. Acad. Sci. Fenn. Ser. A I Math., 17(1):105–110, 1992.

[64] P. Naveau, A. Guillou, D. Cooley, and J. Diebolt. Modelling pairwise dependence of maximain space. Biometrika, 96(1):1–17, 2009.

[65] E. Parzen. On estimation of a probability density function and mode. Ann. Math. Statist.,33:1065–1076, 1962.

[66] M. Peligrad. Conditional central limit theorem via martingale approximation. In Berkes,Bradley, Dehling, Peligrad, and Tichy, editors, Dependence in Probability, Analysis and Num-ber Theory, pages 295–309. Kendrick Press, 2010.

[67] M. Peligrad and S. Utev. A new maximal inequality and invariance principle for stationarysequences. Ann. Probab., 33(2):798–815, 2005.

[68] M. Peligrad, S. Utev, and W. B. Wu. A maximal Lp-inequality for stationary sequences andits applications. Proc. Amer. Math. Soc., 135(2):541–550 (electronic), 2007.

[69] V. Pipiras. Nonminimal sets, their projections and integral representations of stable processes.Stochastic Process. Appl., 117(9):1285–1302, 2007.

[70] V. Pipiras and M. S. Taqqu. The structure of self–similar stable mixed moving averages.Ann. Probab., 30(2):898–932, 2002.

[71] V. Pipiras and M. S. Taqqu. Stable stationary processes related to cyclic flows. Ann. Probab.,32(3A):2222–2260, 2004.

[72] V. Pipiras, M. S. Taqqu, and J. B. Levy. Slow, fast and arbitrary growth conditions forrenewal-reward processes when both the renewals and the rewards are heavy-tailed. Bernoulli,10(1):121–163, 2004.

[73] S. Poghosyan and S. Rœlly. Invariance principle for martingale-difference random fields.Statist. Probab. Lett., 38(3):235–245, 1998.

[74] R Development Core Team. R: A Language and Environment for Statistical Computing. RFoundation for Statistical Computing, Vienna, Austria, 2009. ISBN 3-900051-07-0.

[75] M. M. Rao. Conditional measures and applications, volume 271 of Pure and Applied Mathe-matics (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, second edition, 2005.

139

[76] S. I. Resnick. Extreme values, regular variation, and point processes, volume 4 of AppliedProbability. A Series of the Applied Probability Trust. Springer-Verlag, New York, 1987.

[77] S. I. Resnick. Heavy-tail phenomena. Springer Series in Operations Research and FinancialEngineering. Springer, New York, 2007. Probabilistic and statistical modeling.

[78] S. I. Resnick and R. Roy. Random usc functions, max-stable processes and continuous choice.Ann. Appl. Probab., 1(2):267–292, 1991.

[79] E. Rio. About the Lindeberg method for strongly mixing sequences. ESAIM Probab. Statist.,1:35–61 (electronic), 1995/97.

[80] P. M. Robinson. Nonparametric estimators for time series. J. Time Ser. Anal., 4(3):185–207,1983.

[81] B. Rosen. A note on asymptotic normality of sums of higher-dimensionally indexed randomvariables. Ark. Mat., 8:33–43, 1969.

[82] M. Rosenblatt. Remarks on some nonparametric estimates of a density function. Ann. Math.Statist., 27:832–837, 1956.

[83] J. Rosinski. On the structure of stationary stable processes. Ann. Probab., 23(3):1163–1187,1995.

[84] J. Rosinski. Decomposition of stationary α-stable random fields. Ann. Probab., 28(4):1797–1813, 2000.

[85] J. Rosinski. Minimal integral representations of stable processes. Probab. Math. Statist.,26(1):121–142, 2006.

[86] J. Rosinski and G. Samorodnitsky. Classes of mixing stable processes. Bernoulli, 2(4):365–377, 1996.

[87] E. Roy. Ergodic properties of Poissonian ID processes. Ann. Probab., 35(2):551–576, 2007.

[88] E. Roy. Poisson suspensions and infinite ergodic theory. Ergodic Theory Dynam. Systems,29(2):667–683, 2009.

[89] P. Roy. Ergodic theory, abelian groups and point processes induced by stable random fields.Ann. Probab., 38(2):770–793, 2010.

[90] P. Roy. Nonsingular group actions and stationary SαS random fields. Proc. Amer. Math.Soc., 138(6):2195–2202, 2010.

[91] P. Roy and G. Samorodnitsky. Stationary symmetric α-stable discrete parameter randomfields. J. Theoret. Probab., 21(1):212–233, 2008.

[92] G. Samorodnitsky. Null flows, positive flows and the structure of stationary symmetric stableprocesses. Ann. Probab., 33(5):1782–1803, 2005.

[93] G. Samorodnitsky and M. S. Taqqu. Stable non-Gaussian random processes. StochasticModeling. Chapman & Hall, New York, 1994. Stochastic models with infinite variance.

[94] M. Schlather. Models for stationary max–stable random fields. Extremes, 5:33–44, 2002.

[95] M. Schlather and J. A. Tawn. A dependence measure for multivariate and spatial extremevalues: Properties and inference. Biometrika, 90:139–156, 2003.

[96] A. P. Shashkin. The invariance principle for a (BL, θ)-dependent random field. Uspekhi Mat.Nauk, 58(3(351)):193–194, 2003.

140

[97] B. W. Silverman. Density estimation for statistics and data analysis. Monographs on Statis-tics and Applied Probability. Chapman & Hall, London, 1986.

[98] R. L. Smith. Max–stable processes and spatial extremes. unpublished manuscript, 1990.

[99] S. M. Srivastava. A course on Borel sets, volume 180 of Graduate Texts in Mathematics.Springer-Verlag, New York, 1998.

[100] S. A. Stoev. On the ergodicity and mixing of max-stable processes. Stochastic Process. Appl.,118(9):1679–1705, 2008.

[101] S. A. Stoev and M. S. Taqqu. Extremal stochastic integrals: a parallel between max-stableprocesses and α-stable processes. Extremes, 8(4):237–266 (2006), 2005.

[102] D. Surgailis, J. Rosinski, V. Mandrekar, and S. Cambanis. Stable mixed moving averages.Probab. Theory Related Fields, 97(4):543–558, 1993.

[103] D. Surgailis, J. Rosinski, V. Mandrekar, and S. Cambanis. On the mixing structure of sta-tionary increment and self–similar SαS processes. Unpublished results., 1998.

[104] L. T. Tran. Kernel density estimation on random fields. J. Multivariate Anal., 34(1):37–53,1990.

[105] D. Volny. A nonadapted version of the invariance principle of Peligrad and Utev. C. R.Math. Acad. Sci. Paris, 345(3):167–169, 2007.

[106] D. Volny, M. Woodroofe, and O. Zhao. Central limit theorems for superlinear processes.Stoch. Dyn., 11(1):71–80, 2011.

[107] Y. Wang. maxLinear: Conditional sampling for max-linear models, 2010. R package version1.0.

[108] Y. Wang, P. Roy, and S. A. Stoev. Ergodic properties of sum– and max–stable stationaryrandom fields via null and positive group actions. To appear in Ann. Probab., available athttp://arxiv.org/abs/0911.0610, 2012.

[109] Y. Wang and S. A. Stoev. On the structure and representations of max–stableprocesses. Technical Report 487, Department of Statistics, University of Michigan,http://arxiv.org/abs/0903.3594, 2009.

[110] Y. Wang and S. A. Stoev. On the association of sum- and max-stable processes. Statist.Probab. Lett., 80(5-6):480–488, 2010.

[111] Y. Wang and S. A. Stoev. On the structure and representations of max–stable processes.Adv. in Appl. Probab., 42(3):855–877, 2010.

[112] Y. Wang and S. A. Stoev. Conditional sampling for spectrally-discrete max-stable randomfields. Adv. in Appl. Probab., 43(2):463–481, 2011.

[113] Y. Wang, S. A. Stoev, and P. Roy. Decomposability for stable processes. Stochastic Process.Appl., 122(3):1093–1109, 2012.

[114] Y. Wang and M. Woodroofe. A new condition on invariance principles for stationary randomfields. Submitted, available at http://arxiv.org/abs/1101.5195, 2011.

[115] Y. Wang and M. Woodroofe. On the asymptotic normality of kernel density estimators forlinear random fields. Submitted, available at http://arxiv.org/abs/1201.0238, 2012.

[116] M. Woodroofe. A central limit theorem for functions of a Markov chain with applications toshifts. Stochastic Process. Appl., 41(1):33–44, 1992.

141

[117] W. B. Wu. Central limit theorems for functionals of linear processes and their applications.Statist. Sinica, 12(2):635–649, 2002.

[118] W. B. Wu. Nonlinear system theory: another look at dependence. Proc. Natl. Acad. Sci.USA, 102(40):14150–14154 (electronic), 2005.

[119] W. B. Wu. Asymptotic theory for stationary processes. Stat. Interface, 4(2):207–226, 2011.

[120] W. B. Wu and J. Mielniczuk. Kernel density estimation for linear processes. Ann. Statist.,30(5):1441–1459, 2002.

[121] W. B. Wu and M. Woodroofe. Martingale approximations for sums of stationary processes.Ann. Probab., 32(2):1674–1690, 2004.

[122] W. B. Wu and Z. Zhao. Moderate deviations for stationary processes. Statist. Sinica,18(2):769–782, 2008.

[123] O. Zhao and M. Woodroofe. On martingale approximations. Ann. Appl. Probab., 18(5):1831–1847, 2008.

ABSTRACT

Topics on Max-stable Processes and the Central Limit Theorem

by

Yizao Wang

Chair: Stilian. A. Stoev

This dissertation consists of results in two distinct areas of probability theory.

One is the extreme value theory, the other is the central limit theorem.

In the extreme value theory, the focus is on max-stable processes. Such processes

play an increasingly important role in characterizing and modeling extremal phe-

nomena in finance, environmental sciences and statistical mechanics. In particular,

the association of sum- and max-stable processes and the decomposability of sum-

and max-stable processes are investigated. Besides, the conditional distributions of

max-stable processes are also studied, and a computationally efficient algorithm is

developed. This algorithm has many potential applications in prediction of extremal

phenomena.

In the central limit theorem, the asymptotic normality for partial sums of sta-

tionary random fields is studied, with a focus on the projective conditions on the

dependence. Such conditions, easy to check for many stochastic processes and ran-

1

dom fields, have recently drawn many attentions for (one-dimensional) time series

models in statistics and econometrics. Here, the focus is on (high-dimensional) sta-

tionary random fields. In particular, a general central limit theorem for stationary

random fields and orthomartingales is established. The method is then extended to

establish the asymptotic normality for the kernel density estimator of linear random

fields.

topics on max-stable processes and the...

Documents