whitenoisetestingandmodeldiagnosticcheckingforfunctional ...zhangxiany/goodness-2015.pdfcompared to...

51
White Noise Testing and Model Diagnostic Checking for Functional Time Series Xianyang Zhang Texas A&M University December 28, 2015 Abstract This paper is concerned with white noise testing and model diagnostic checking for stationary functional time series. To test for the functional white noise null hypothesis, we propose a Cram´ er-von Mises type test based on the functional periodogram introduced by Panaretos and Tavakolithe (2013a). Using the Hilbert space approach, we derive the asymptotic distribution of the test statistic under suitable assumptions. A new block bootstrap procedure is introduced to obtain the critical values from the non-pivotal limiting distribution. Compared to existing methods, our approach is robust to the dependence within white noise and it does not involve the choices of functional principal components and lag truncation number. We employ the proposed method to check the adequacy of functional linear models and functional autoregressive models of order one by testing the uncorrelatedness of the residuals. Monte Carlo simulations are provided to demonstrate the empirical advantages of the proposed method over existing alternatives. Our method is illustrated via an application to cumulative intradaily returns. Keywords: Block bootstrap, Functional autoregressive models, Functional linear models, Functional time series, Goodness-of-fit, Model diagnostic checking, Periodogram, Spectra-based test, White noise testing. 1 Introduction Functional data analysis (FDA) has emerged as an important area of statistics which pro- vides convenient and informative tools for the analysis of data objects of high dimension/high resolution, and it is generally applicable to problems which are difficult to cast into a frame- work of scalar or vector observations. In many applications, especially if data are collected sequentially over time, it is natural to expect that the observations exhibit certain degrees of dependence. During the past decade, there is a growing body of research on parametric and nonparametric inference for dependent functional data. We refer the interested readers to the excellent monograph by Horv´ ath and Kokoszka (2012). In this article, our interest * Department of Statistics, Texas A&M University, College Station, TX 77843, USA. E-mail: zhangxi- [email protected]. I am grateful to Xiaofeng Shao for numerous helpful suggestions and comments on an earlier version. The author would like to thank the Editor, the Associate Editor and two reviewers for their insightful comments and constructive suggestions, which substantially improved the paper. 1

Upload: lamthuy

Post on 07-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

White Noise Testing and Model Diagnostic Checking for Functional

Time Series

Xianyang Zhang∗

Texas A&M University

December 28, 2015

Abstract This paper is concerned with white noise testing and model diagnostic checking for stationary

functional time series. To test for the functional white noise null hypothesis, we propose a Cramer-von Mises

type test based on the functional periodogram introduced by Panaretos and Tavakolithe (2013a). Using the

Hilbert space approach, we derive the asymptotic distribution of the test statistic under suitable assumptions.

A new block bootstrap procedure is introduced to obtain the critical values from the non-pivotal limiting

distribution. Compared to existing methods, our approach is robust to the dependence within white noise and

it does not involve the choices of functional principal components and lag truncation number. We employ the

proposed method to check the adequacy of functional linear models and functional autoregressive models of

order one by testing the uncorrelatedness of the residuals. Monte Carlo simulations are provided to demonstrate

the empirical advantages of the proposed method over existing alternatives. Our method is illustrated via an

application to cumulative intradaily returns.

Keywords: Block bootstrap, Functional autoregressive models, Functional linear models, Functional time

series, Goodness-of-fit, Model diagnostic checking, Periodogram, Spectra-based test, White noise testing.

1 Introduction

Functional data analysis (FDA) has emerged as an important area of statistics which pro-

vides convenient and informative tools for the analysis of data objects of high dimension/high

resolution, and it is generally applicable to problems which are difficult to cast into a frame-

work of scalar or vector observations. In many applications, especially if data are collected

sequentially over time, it is natural to expect that the observations exhibit certain degrees

of dependence. During the past decade, there is a growing body of research on parametric

and nonparametric inference for dependent functional data. We refer the interested readers

to the excellent monograph by Horvath and Kokoszka (2012). In this article, our interest

∗Department of Statistics, Texas A&M University, College Station, TX 77843, USA. E-mail: [email protected]. I am grateful to Xiaofeng Shao for numerous helpful suggestions and comments on anearlier version. The author would like to thank the Editor, the Associate Editor and two reviewers for theirinsightful comments and constructive suggestions, which substantially improved the paper.

1

concerns white noise testing (testing for serial correlation) for functional observations, and its

application to model diagnostic checking.

In the univariate/multivariate time series context, white noise testing is a classical problem

which has attracted considerable attention. There is a huge literature on the white noise

testing problem and the existing tests can be roughly categorized into two types: time domain

correlation-based tests and frequency domain periodogram-based tests [see e.g. Durlauf (1991);

Hong (1996); Deo (2000) among others]. In the functional time series context, developments

have been mainly devoted to the time domain based approaches. For example, Gabrys and

Kokoszka (2007) proposed a portmanteau test for testing the uncorrelatedness of a sequence

of functional observations. Horvath et al. (2013) proposed an independence test based on

the sum of the L2 norms of the empirical correlation functions. The validity of these tests is

justified under the independent and identically distributed (i.i.d) assumption, and thus they

are not robust to dependent white noise. In the univariate setting, the distinction between an

i.i.d sequence and an uncorrelated sequence in the diagnostic checking context has been found

to be important. On one hand, some commonly used nonlinear time series models, such as

ARCH/GARCH models imply white noise but are dependent. On the other hand, the limiting

null distributions of some commonly used test statistics, such as Box and Pierce’s portmanteau

test, are obtained under the i.i.d assumption, and they are no longer valid under the assumption

of dependent white noise [Romano and Thombs (1996); Lobato et al. (2002); Francq et al.

(2005); Horowitz et al. (2006); Shao (2011b)]. In the functional setting, this distinction is

again important. For example, a functional ARCH(1) process (FARCH(1)) is a functional

white noise but is dependent [Hormann et al. (2013)]. In this paper, we propose a spectra-

based testing procedure which is robust to the dependence within the white noise sequence.

Our test is constructed based on the periodogram function recently introduced by Panaretos

and Tavakolithe (2013a) and it has nontrivial power against Pitman’s local alternatives that

are within a√T -neighborhood of the null hypothesis, where T denotes the sample size. Our

test also avoids projecting functional objects onto a space whose dimension is held fixed in the

asymptotics. Under suitable weak dependence assumptions, we show that the spectra-based

test has a non-pivotal limiting distribution. To conduct inference, we introduce a novel block

bootstrap procedure which is able to imitate the limiting null distribution. It is worth pointing

out that when the white noise is a martingale difference sequence in a functional space, the

block size in our bootstrap procedure is allowed to be one.

In statistical modeling, diagnostic checking is an integrable part of model building. A

common way of testing the adequacy of the proposed time series model is by checking the

assumption of white noise residuals. Systematic departure from this assumption implies the

inadequacy of the fitted model. We employ the proposed white noise testing procedure to

test the goodness of fit for functional linear models and functional autoregressive models of

order one (FAR(1)) with uncorrelated but possibly dependent errors. Diagnostic checking in

2

functional linear models has been studied by Chiou and Muller (2007) and Gabrys et al. (2010).

The latter authors proposed two inferential tests (GHK tests hereafter) for error correlation.

The main differences between our test and the GHK tests are threefold. First, our test is

constructed based on the L2 norm of the periodogram function, and it does not involve the

choices of functional principal components and lag truncation number which are required in

the GHK tests. Second, we do not assume any asymptotic independence under the null of the

errors while the asymptotic validity of the GHK tests is established under the independence

assumption. The asymptotic null distribution of our test depends on the underlying data

generating process (DGP) and is no longer pivotal. We justify the validity of the proposed

block bootstrap procedure when applied to the residuals in Section 3.1. Third, we employ

the truncated regularization to estimate the functional linear operator, where the underlying

dimension is allowed to grow slowly with the sample size. While for Gabrys et al’s approaches,

a linear model is constructed and estimated via least squares after projecting the functional

objects onto a space whose dimension is fixed in the asymptotic analysis.

Furthermore, we apply the proposed method to check the adequacy of FAR(1) models. The

FAR(1) model is conceptually simple as it is an extension of the univariate AR(1) model to the

functional setup, yet very flexible because the autoregressive operator acts on a Hilbert space

whose elements can exhibit any degree of nonlinearity. Various nonparametric and prediction

methods for the FAR(1) models have been developed, and numerous applications have been

found; see Besse et al. (2000), Laukaitis and Rackauskas (2002), Antoniadis and Sapatinas

(2003), Fernandez de Castro et al. (2005), Kargin and Onatski (2008), among others. In

spite of the wide use of the FAR(1) models, there seems no systematic methods available to

check for the goodness of fit. In this paper, we shall fill in this gap by applying the spectra-

based test to test the uncorrelatedness of the residuals. In contrast to the case of functional

linear models, the estimation effect induced by replacing the unobservable innovations with

their estimates is not asymptotically negligible due to the dependence structure of the FAR(1)

models. To circumvent the difficulty, we further propose a modified block bootstrap that takes

the estimation effect into account.

Finally, we point out that spectral analysis of stationary functional time series has been

recently advanced by Panaretos and Tavakoli (2013a), Panaretos and Tavakoli (2013b) and

Hormann et al. (2015). We refer to Panaretos and Tavakoli (2013a) for many interesting

details on estimation and asymptotics of the spectral density operator.

The layout of the article is as follows. We introduce the spectra-based test in Section 2.1.

To obtain the critical values from the limiting null distribution, we propose a block bootstrap

procedure in Section 2.2. A general class of test statistics is discussed in Section 2.3. We

employ the proposed method to test the goodness of fit for the functional linear models and

the FAR(1) models in Section 3. Sections 4 is devoted to the finite sample performance of the

proposed method including simulations and an application to cumulative intradaily returns.

3

Section 5 concludes. The proofs are postponed to Section 6 and the online supplementary

material.

2 White noise testing

Notation: Let ı =√−1 be the imaginary unit. Denote by I a compact set of a Euclidian

space. Let Ik be the cartesian product of k copies of I with k ∈ N, and A := [0, 1]× I × I.Denote by L2(J ) the Hilbert space of square integrable functions defined on J , where J = Ikor A. For any functions f, g ∈ L2(J ), the inner product between f and g is defined as

< f, g >=∫J f(τ)g(τ)dτ and || · ||J denotes the inner product induced norm. Let || · || := || · ||I .

Assume that the random elements all come from the same probability space (Ω,F ,P). Denote

by LpHthe space of H := L2(I) valued random variables X such that (E||X||p)1/p < ∞. For

any compact operator, denote by ||| · |||1, || · ||L and || · ||S the nuclear norm, the uniform

norm and the Hilbert-Schmidt norm respectively. Let Re(·) and Im(·) be the real part and

the imaginary part of a complex number. Without ambiguity, we shall use the same symbol

for operator and the kernel associated with the operator in the following discussion.

2.1 Spectra-based test

For the ease of presentation, we consider a sequence of mean-zero stationary functional time

series Xt(τ)+∞t=1 defined on a compact set I. With suitable modifications on the arguments,

the results are expected to be extended to the situation with nonzero mean function. The lag-h

autocovariance function for Xt is defined as γh(τ1, τ2) = EXt(τ1)Xt−h(τ2) with τ1, τ2 ∈ I, andthe corresponding autocovariance operator is given by γh(·) = E < Xt−h, · > Xt. Following

Panaretos and Tavakolithe (2013a), we define the spectral density kernel as

fω(τ1, τ2) :=1

+∞∑

h=−∞γh(τ1, τ2) exp(−ıhω), ω ∈ [−π, π]. (1)

Given the functional observations XtTt=1 where T is the sample size, we are interested in

testing whether the functional time series Xt has serial correlation in a functional space. In

other words, we want to test the null hypothesis

H0 : fω(τ1, τ2) = γ0(τ1, τ2)/(2π), for any ω ∈ [−π, π],

versus the alternative that

Ha : fω(τ1, τ2) 6= γ0(τ1, τ2)/(2π), for some ω ∈ [−π, π],

4

i.e., γh(τ1, τ2) 6= 0 for some lag h. To introduce the testing procedure, we define the discrete

Fourier transform (DFT) of XtTt=1 to be

Xω(τ) :=1√2πT

T∑

t=1

Xt(τ) exp(−ıtω). (2)

The periodogram function is then given by

Pω(τ1, τ2) :=Xω(τ1)X−ω(τ2) =1

2πT

1≤t,t′≤TXt(τ1)Xt′(τ2) expı(t′ − t)ω

=1

T−1∑

h=1−Tγh(τ1, τ2) exp(−ıhω), τ1, τ2 ∈ I,

(3)

where γh(τ1, τ2) =∑

1≤t−h,t≤T Xt(τ1)Xt−h(τ2)/T is the sample autocovariance function. Let

Ψh(λ) = sin(hπλ)/(hπ) ∈ L2([0, 1]) and VT,h(τ1, τ2) = γh(τ1, τ2) + γh(τ2, τ1). Define the follow-

ing process based on the real part of the periodogram function

TT (λ, τ1, τ2) =√T

∫ πλ

0

RePω(τ1, τ2)dω − λ

∫ π

0

RePω(τ1, τ2)dω

=

√T

2

T−1∑

h=1

VT,h(τ1, τ2)Ψh(λ) =1

2√T

T−1∑

h=1

T∑

t=2

Yth(τ1, τ2)Ψh(λ),

(4)

where Yth(τ1, τ2) = Xt(τ1)Xt−h(τ2) +Xt(τ2)Xt−h(τ1) for h + 1 ≤ t ≤ T and Yth(τ1, τ2) = 0 for

t ≤ h. We consider the Cramer-von Mises type statistic which is defined as,

TT,CM =||TT ||2A =T

8π2

T−1∑

h=1

||VT,h||2I2/h2, (5)

where we have used the fact that < Ψh,Ψh′ >= Ih = h′/(2π2h2). Below we derive the

asymptotic distribution of TT,CM . As the functional objects and the associated operators both

lie on some Hilbert spaces, it seems natural to adopt the Hilbert space approach [see e.g.,

Politis and Romano (1994); Escanciano and Velasco (2006); Shao (2011a)] for the asymptotic

analysis.

To quantify the dependence within Xt, we shall use the Lp-m-approximating dependence

measure [Hormann and Kokoszka (2010)] for its broad applicability to linear and nonlinear

functional processes.

Definition 2.1. Assume that Xi ∈ LpH0

for some separable Hilbert space H0 and p > 0 admits

the following representation

Xi = f(ξi, ξi−1, . . . ), i = 1, 2, . . . , (6)

5

where the ξi’s are i.i.d elements taking values in a measurable space S and f is a measurable

function f : S∞ → H0. For each i ∈ N, let ξ(i)j j∈Z be an independent copy of ξjj∈Z. The

sequence Xi is said to be Lp-m-approximable if

∞∑

m=1

(E||X1 −X(m)1 ||p)1/p <∞, (7)

where X(m)i = f(ξi, ξi−1, . . . , ξi−m+1, ξ

(i)i−m, ξ

(i)i−m−1, . . . ).

Before presenting the weak convergence result, we introduce the cumulant operator which

characterizes the high order properties of functional time series.

Definition 2.2. Let Rt1,t2,...,t2k−1(τ1, . . . , τ2k−1, τ2k) = cum(Xt1(τ1), . . . , Xt2k−1

(τ2k−1), X0(τ2k))

be the cumulant function [Brillinger (1975)]. Define the 2kth order cumulant operator

L2(Ik) → L2(Ik) as

(Rt1,t2,...,t2k−1g)(τ1, . . . , τk) =

Ik

Rt1,t2,...,t2k−1(τ1, . . . , τ2k)g(τk+1, . . . , τ2k)dτk+1 · · · dτ2k. (8)

We are now in position to present one of the main results in this section. Recall that ||| · |||1denotes the nuclear norm of a compact operator, see the definition in Bosq (2000) and Zhu

(2007).

Theorem 2.1. Suppose Xt is a L4-m-approximable process. Under the assumption that∑+∞

s=−∞ |||γs|||1 <∞ and∑+∞

s1,s2,s3=−∞ |||Rs1,s2,s3|||1 <∞, we have

GT (λ, τ1, τ2) := TT (λ, τ1, τ2)− ETT (λ, τ1, τ2) ⇒ G(λ, τ1, τ2), (9)

where “ ⇒ ” denotes the weak convergence in L2(A), and G is a mean-zero Gaussian process

whose covariance structure is specified by

cov(G(λ, τ1, τ2),G(λ′, τ ′1, τ ′2)) =1

4

+∞∑

i,j=1

+∞∑

s=−∞Ψi(λ)Ψj(λ

′)cov(Xs+i(τ1)Xs(τ2) +Xs+i(τ2)Xs(τ1),

Xj(τ′1)X0(τ

′2) +Xj(τ

′2)X0(τ

′1)).

(10)

Theorem 2.1 shows that GT (λ, τ1, τ2) converges to a Gaussian process whose covariance

operator depends on the fourth order cumulant of the functional time series. Notice that

under H0, EYth(τ1, τ2) = 0 for any h ≥ 1 and thus ETT (λ, τ1, τ2) = 0. As an immediate

consequence, we obtain the following result by the continuous mapping theorem whose proof

is omitted to conserve space.

6

Theorem 2.2. Under the assumptions in Theorem 2.1 and the null hypothesis H0, we have

TT,CM →d

∫ 1

0

I2

G2(λ, τ1, τ2)dτ1dτ2dλ. (11)

Moreover, under the alternative Ha, we have

TT,CM/T →p 1

8π2

+∞∑

h=1

I2

γh(τ1, τ2) + γh(τ2, τ1)2 dτ1dτ2/h2. (12)

Theorem 2.2 shows that the Cramer-von Mises type statistic converges to a non-pivotal lim-

iting null distribution. Under any fixed alternatives with∫I2 γh(τ1, τ2) + γh(τ2, τ1)2 dτ1dτ2 6=

0 for some h ∈ N, the Cramer-von Mises type test is consistent. It is worth pointing out that

the Hilbert space approach employed here is not applicable to the statistic of Kolmogorov-

Smirnov type because the “sup” functional is no longer a continuous mapping in the Hilbert

space.

Below we consider the local alternative Ha,T : fω(τ1, τ2) = γ0(τ1, τ2)1 +

gω(τ1, τ2)/√T/(2π), where gω(τ1, τ2) is a 2π-periodic function that satisfies g−ω(τ1, τ2) =

gω(τ2, τ1) and∫ π−π gω(τ1, τ2)dω = 0. Under Ha,T , γh(τ1, τ2) =

γ0(τ1,τ2)

2π√T

∫ π−π gω(τ1, τ2) exp(ıhω)dω

for h 6= 0. Using the fact that 12

∑T−1h=1 Ψh(λ)

∫ π−πgω(τ1, τ2) + gω(τ2, τ1) exp(ıhω)dω →∫ πλ

0Regω(τ1, τ2)dω, we have TT (λ, τ1, τ2) ⇒ G(λ, τ1, τ2) + ϕ(τ1, τ2), where ϕ(τ1, τ2) =

γ0(τ1, τ2)∫ πλ0Regω(τ1, τ2)dω/(2π). Therefore, we obtain the following result.

Theorem 2.3. Under the assumptions in Theorem 2.1 and the local alternative hypothesis Ha,T ,

TT,CM →d

∫ 1

0

I2

G(λ, τ1, τ2) + ϕ(τ1, τ2)2dτ1dτ2dλ. (13)

2.2 Block bootstrap

Although the non-pivotal limiting distribution in (11) can be approximated numerically for

given parametric models, the critical values are case-by-case and they can be unreliable when

the assumed model is misspecified. To overcome this problem, we introduce a block bootstrap

approach which is able to mimic the limiting distribution of the test statistic without imposing

restrictive parametric assumptions. Assume for simplicity that bTMT = T−1 with bT ,MT ∈ Z.

Our procedure can be described as follows:

1. Let Yth(τ1, τ2) = Xt(τ1)Xt−h(τ2)+Xt(τ2)Xt−h(τ1)−γh(τ1, τ2)−γh(τ2, τ1) for h+1 ≤ t ≤ T

and Yth(τ1, τ2) = 0 for t ≤ h, where 2 ≤ t ≤ T and 1 ≤ h ≤ T − 1;

2. Define the non-overlapping blocks of indices Bs = bT (s − 1) + 2, . . . , bT s + 1 with

1 ≤ s ≤MT ;

7

3. Pick MT blocks with replacement from BsMT

s=1. Joint the MT blocks and let (t∗2, . . . , t∗T )

be the corresponding indices for the bootstrapped samples. Denote by Y∗lh = Yt∗

lh for

2 ≤ l ≤ T ;

4. Compute the bootstrap statistic T ∗T,CM by replacing VT,h with V∗

T,h =∑T

l=2 Y∗lh/T in the

definition of TT,CM ;

5. Repeat the above steps a large number of times to get the bootstrap critical values for

TT,CM .

Our method is closely related with the blockwise wild bootstrap [Shao (2011a); Zhu and Li

(2015)] while it does not require generating auxiliary variables from a common distribution.

Compared to the usual block bootstrap procedure, the proposed method is nonstandard in the

sense that it resamples the blocks containing the transformed data Yth(τ1, τ2) rather than the

blocks formed by the original observations. Below we establish the consistency of the block

bootstrap procedure.

Theorem 2.4. Assume that bT → +∞ and bT /T → 0 as T → +∞. Suppose∑+∞

s=−∞ |s| ·|||γs|||1 <∞ and

∑+∞s1,s2,s3=−∞ |si| · |||Rs1,s2,s3|||1 <∞ with i = 1, 2, 3. Further assume that

+∞∑

s1,...,sJ=−∞|sj| · ||cum(Xs1, Xs2, . . . , XsJ , X0)||IJ+1 <∞, j = 1, 2, . . . , J, (14)

for J = 1, 2, . . . , 7. Under the null hypothesis H0, or any fixed or local alternatives, we have

ρw

[L(√

T

2

T−1∑

h=1

V∗T,h(τ1, τ2)Ψh(λ)

∣∣∣∣XT

),L(G(λ, τ1, τ2))

]→ 0, (15)

in probability as T → +∞, where L(·|XT ) denotes the distribution conditional on the sample

XT = XtTt=1 and ρw is any metric that metricizes weak convergence in L2(A). As a conse-

quence, the bootstrap statistic T ∗T,CM converges to the limiting distribution specified in (11) in

probability.

Under the martingale difference assumption, i.e., E[Xs(τ)|Xs′(τ′), τ ′ ∈ I] = 0 with s′ < s

and τ ∈ I, the covariance between G(λ, τ1, τ2) and G(λ′, τ ′1, τ ′2) reduces to

1

4

+∞∑

i,j=1

Ψi(λ)Ψj(λ′)cov(X0(τ1)X−i(τ2) +X0(τ2)X−i(τ1), X0(τ

′1)X−j(τ

′2) +X0(τ

′2)X−j(τ

′1)). (16)

Following the arguments in the proof of Theorem 2.4, we can show that the bootstrap method

is able to replicate the limiting distribution with bT = 1, see Remark 6.1 and Escanciano and

Velasco (2006).

8

In practice, the performance of the bootstrap procedure could depend on the choice of bT .

We adapt the minimal volatility method [see Chapter 9 of Politis et al. (1999)] to select the

block size in our setting. The basic idea is exploited by computing the bootstrap critical values

for a number of block sizes bT and then looking for a region where the bootstrap critical values

do not change very much. The procedure can be described as follows:

1. For bT = 1 to bT =√T , compute the bootstrap critical values CT,bT (1−α) for the desired

significance level α;

2. For each bT , define V IbT as the volatility index which is equal to the standard deviation

of the values CT,bT−k(1− α), CT,bT−k+1(1− α), . . . , CT,bT+k(1− α) for k = 3;

3. Pick b∗ corresponding to the smallest volatility index and use CT,b∗(1−α) as the critical

value of the test.

We shall investigate the finite performance of this procedure in Section 4.

Remark 2.1. Let Wth(τ1, τ2) = Xt+h(τ1)Xt(τ2) + Xt+h(τ2)Xt(τ1) − γh(τ1, τ2) − γh(τ2, τ1) for

1 ≤ t + h ≤ T and Wth(τ1, τ2) = 0 for t + h > T , where 1 ≤ t, h ≤ T − 1. In view

of the proof of Theorem 2.4, the block bootstrap procedure based on the transformed data

Wth, 1 ≤ t, h ≤ T − 1 is also valid.

Remark 2.2. It might be possible to construct a pivotal test by replacing the periodogram

function by its normalized counterpart. However, such normalization step requires the esti-

mation of the fourth order property of the white noise sequence as well as estimation of an

inverse operator, which can be quite challenging from a practical viewpoint. To avoid these

complications, we suggest the use of the block bootstrap procedure to calibrate the sampling

distribution of TT,CM .

2.3 A general class of test statistics

In this section, we provide some discussions on a general class of test statistics. The Cramer-

von Mises statistic is the weighted sum of the L2 norms of VT,hT−1h=1 , where VT,h = γh + γ∗h

with γ∗h denoting the adjoint operator of γh. Because the Cramer-von Mises statistic puts

rather heavy weights on low order sample autocovariance operator, it can lose power against

strong dependence. Motivated by the form of TT,CM , we consider a general class of statistics.

Let ζh(λ)+∞h=1 be a sequence of orthogonal basis in L2[0, 1] such that

∑+∞h=1 ||ζh||2 <∞. Then

a general class of test statistics can be defined as

TT,G = S(

1√T

T−1∑

h=1

T∑

t=2

Ythζh

), (17)

9

where S : L2(A) → R is a continuous functional. In view of the proofs of Theorem 2.1 and

Theorem 2.4, we can show that under suitable assumptions, TT,G converges to a non-pivotal

limiting null distribution and the block bootstrap is still valid. If S is the L2 norm and

||ζh|| = Ih ≤ H with 1 ≤ H ≤ T − 1, the general test statistic has the form,

TT,G = T

H∑

h=1

||γh + γ∗h||2S , (18)

which can be viewed as a functional version of the Box and Pierce’s test. When H is allowed

to grow slowly with the sample size T , Horvath et al. (2013) showed that

T∑H

h=1 ||γh||2I2 −H|||γ0|||21(2H||γ0||4S)

1/2→ N(0, 1), (19)

where |||γ0|||1 =∫I γ0(τ, τ)dτ and ||γ0||S =

(∫I2 γ

20(τ1, τ2)dτ1dτ2

)1/2. This result is closely

related to the findings in Hong (1996) for univariate time series [also see Hong and Lee (2003);

Shao (2011b)].

3 Model diagnostics

In this section, we apply the proposed white noise testing procedure to test the goodness

of fit for functional linear models and FAR(1) models. The basic idea here is to check the

assumption of white noise errors by examining the residuals from the fitted model. In the case

of FAR(1) models, a modified block bootstrap is introduced to capture the estimation effect.

3.1 Functional linear models

Testing the error correlation for functional linear models has been considered in Gabrys

et al. (2010). The authors proposed two time domain tests with different ways of defining

the finite-dimensional residuals. The GHK tests involve the choices of functional principal

components and lag truncation number for the autocovariance function. In practice, the time

domain tests can be sensitive to the choice of the tuning parameters especially when the first

few principal components of the errors are uncorrelated or the errors are only correlated at

large lags (also see the simulations in Section 4). Moreover, the GHK tests are not robust

to uncorrelated but dependent processes such as the FARCH process because the asymptotic

validity of their tests is established under the independence assumption. In contrast, our

spectra-based test belongs to the test of omnibus type, and it has nontrivial power against

local alternatives that are within a√T -neighborhood of the null hypothesis. One important

feature of our testing procedure is its robustness to the dependence within the error sequence.

10

To fix the idea, we consider the functional linear model,

Yt(τ1) = η(Xt)(τ1) + ǫt(τ1), t = 1, 2, . . . , τ1 ∈ I, (20)

where η(X)(τ1) =∫I K(τ1, τ2)X(τ2)dτ2 for K ∈ L2(I2) and X ∈ H, and ǫt(τ) is a sequence

of white noise which is uncorrelated with the covariate Xt(τ) (with some abuse of notion,

we use Xt to denote the covariate in this subsection). Suppose ǫt and Xt are both

mean-zero sequences and they are jointly stationary. Define the covariance operator of Xtas γX(·) = γX,X(·) = E[< Xt, · > Xt] and the cross-covariance operator between Xt and

Yt as γY,X(·) = E[< Xt, · > Yt]. For u, u′ ∈ H, note that

< γY,X(u), u′ >= E < Xt, u >< Yt, u

′ >= E < Xt, u >< η(Xt), u′ >=< ηγX(u), u

′ >,

which implies that γY,X = ηγX . By Mercer’s lemma [Riesz and Sz-Nagy (1955)], the covariance

operator γX admits the spectral decomposition,

γX(·) =+∞∑

j=1

λj < φj, · > φj, (21)

where λj and φj are the eigenvalue and eigenfunction respectively. Suppose that the eigenvalues

are ordered so that λ1 ≥ λ2 ≥ · · · ≥ 0. Define the sample covariance operator γX(·) =∑T

t=1 <

Xt, · > Xt/T and the sample cross-covariance operator γY,X(·) =∑T

t=1 < Xt, · > Yt/T . Let

λjTj=1 be the eigenvalues of γX with λ1 ≥ λ2 ≥ · · · ≥ λT ≥ 0, and let φjTj=1 be the

corresponding eigenfunctions. When λp > 0 and λp+1 = 0 for some p ∈ N, Xt lies on a

p-dimensional space spanned by φjpj=1, which is isomorphic to Rp. In this case, η can be

estimated by γY,X γ−1X , where γ−1

X is the inverse of γX restricted to the space spanned by

φjpj=1. In what follows, we consider the case where λj > 0 for all j ∈ N, which reflects more

fully the infinite-dimensional nature of functional data. Estimating the inverse covariance

operator γ−1X is challenging due to the so-called ill-posed problem. To tackle this problem, we

consider the truncated estimator,

γ−1X (·) =

kT∑

j=1

λ−1j < φj , · > φj, (22)

where kT grows slowly with the sample size. To regularize the inverse covariance operator, one

may alternatively consider the Tikhonov regularization, i.e., γ−1X (·) =∑T

j=1λj

λ2j+aT< φj, · > φj

or the penalization, i.e., γ−1X (·) =∑T

j=11

λj+aT< φj, · > φj with aT being a positive sequence

that converges to zero. Readers are referred to Arsenin and Tikhonov (1977) and Groetsch

(1993) for more details about the ill-posed problem. Based on the truncated regularization, a

11

moment estimator for η is given by

ηT (·) = γY,X γ−1X (·) = 1

T

T∑

t=1

kT∑

j=1

λ−1j < Xt, φj >< φj, · > Yt. (23)

Because kT is allowed to grow slowly with T in our asymptotic analysis, our approach is

significantly different from some existing works [e.g. Gabrys et al. (2010)] where functional

objects are projected onto a finite-dimensional space with the underlying dimension being

fixed in the asymptotics. In practice, kT can be chosen using the simple cumulative variation

rule, i.e.,

kT = arg min1≤kT≤T

∑kTj=1 λj∑Tj=1 λj

> α0

,

where α0 ∈ (0, 1), e.g. α0 = 85%, 90% or 95%. However, a serious investigation about the

choice of kT is beyond the scope of the current paper.

Denote by γX,s and γǫ,s the autocovariance operators for Xt and ǫt at lag s respectively.

Let RXs1,s2,s3

and Rǫs1,s2,s3

be the cumulant operators for Xt and ǫt. We make the following

assumptions, under which we establish the consistency of ηT under the uniform norm.

Assumption 3.1. Suppose Xt and ǫt are jointly stationary. Assume that Xtis L4-m-approximable with

∑+∞s=−∞ |||γX,s|||1 < +∞ and

∑+∞s1,s2,s3=−∞ |||RX

s1,s2,s3|||1 <

+∞, and the same assumption holds for ǫt. Define Rǫ,Xs+h,s,h(τ1, τ2, τ3, τ4) =

cum(ǫs+h(τ1), Xs(τ2), ǫh(τ3), X0(τ4)), and suppose that suph∈Z∑+∞

s=−∞ |||Rǫ,Xs+h,s,h|||1 <∞.

Assumption 3.2. Assume that λj > 0 for all j.

Assumption 3.3. Let α1 = 2√2(λ1 − λ2)

−1 and αj = 2√2max(λj−1 − λj)

−1, (λj − λj+1)−1

for j ≥ 2. Assume that T−1λ−2kT

= o(1), T−1/4∑kT

j=1 αj/λj = o(1), λ−1kTT−1/2

∑kTj=1 1/λ

2j = o(1)

and∑

j>kT||η(φj)||2 = o(1/

√T ).

Lemma 3.1. Under Assumptions 3.1-3.3, ||ηT − η||L = op(T−1/4).

Assumption 3.3 imposes restrictions regarding the spacing and decay rate of the eigenvalues.

In particular, it requires the distinct eigenvalues condition λ1 > λ2 > λ3 > · · · , which is

common in the literature [see e.g. Horvath and Kokoszka (2012)]. The above assumptions

allow us to show the op(T−1/4) rate which further implies that the estimation effect caused by

replacing η with its estimate ηT is asymptotically negligible.

Based on the consistent estimator ηT , the errors can be naturally estimated by ǫt = Yt −ηT (Xt). Let Vǫ,T,h(τ1, τ2) = γǫ,h(τ1, τ2) + γǫ,h(τ2, τ1) with γǫ,h(τ1, τ2) =

∑Tt=h+1 ǫt(τ1)ǫt−h(τ2)/T.

We define the test statistic as

Tǫ,T,CM =T

8π2

T−1∑

h=1

||Vǫ,T,h||2I2/h2. (24)

12

Theorem 3.1. Suppose that ǫt satisfies the assumptions in Theorem 2.1. Under Assumptions

3.1-3.3 and the null hypothesis (i.e., the error sequence is uncorrelated), we have

Tǫ,T,CM →d

∫ 1

0

I2

G2ǫ (λ, τ1, τ2)dτ1dτ2dλ, (25)

where Gǫ is a mean-zero Gaussian process whose covariance structure is given by

cov(Gǫ(λ, τ1, τ2),Gǫ(λ′, τ ′1, τ ′2)) =1

4

+∞∑

i,j=1

+∞∑

s=−∞Ψi(λ)Ψj(λ

′)cov(ǫs+i(τ1)ǫs(τ2) + ǫs+i(τ2)ǫs(τ1),

ǫj(τ′1)ǫ0(τ

′2) + ǫj(τ

′2)ǫ0(τ

′1)).

Theorem 3.1 indicates that the effect caused by replacing the errors with their estimates is

asymptotically negligible (also see (10)), which is essentially due to the exogeneity of Xt. Toconduct inference based on Tǫ,T,CM , we can employ the block bootstrap described in Section 2.2.

Define Yǫ,th(τ1, τ2) = ǫt(τ1)ǫt−h(τ2) + ǫt(τ2)ǫt−h(τ1)− γǫ,h(τ1, τ2)− γǫ,h(τ2, τ1) for h+ 1 ≤ t ≤ T

and Yǫ,th(τ1, τ2) = 0 for t ≤ h, where 2 ≤ t ≤ T and 1 ≤ h ≤ T − 1. Replacing Yth with Yǫ,th,and following the steps in Section 2.2, we obtain the (approximate) critical values of Tǫ,T,CM .

The consistency of the block bootstrap procedure is established in the following theorem.

Theorem 3.2. Suppose the assumptions in Theorem 2.4 hold for ǫt. Then under Assumptions

3.1-3.3 and the null hypothesis,

ρw

[L(T ∗ǫ,T,CM

∣∣∣∣ZT

),L(∫ 1

0

I2

G2ǫ (λ, τ1, τ2)dτ1dτ2dλ

)]→ 0,

in probability, where T ∗ǫ,T,CM denotes the bootstrap statistic and ZT = (Xt, Yt)Tt=1.

3.2 Functional autoregressive models

A sequence of H-valued random variables Xt(τ) is called a FAR(1) process if it is sta-

tionary and satisfies that

Xt(τ1)− µ(τ1) = ρ(Xt−1 − µ)(τ1) + εt(τ1), t = 1, 2, . . . , (26)

where ρ(X)(τ1) =∫I ψ(τ1, τ2)X(τ2)dτ2 for ψ ∈ L2(I2) and X ∈ H, and εt(τ) is a sequence

of mean-zero H-valued (white noise) innovations in L2H. When ||ρj ||L < 1 for some j ∈ N,

there is a unique stationary solution Xt(τ) = µ(τ) +∑+∞

j=0 ρj(εt−j)(τ) for (26). For the sake

of clarity, we assume that µ(τ) = 0 throughout the following discussion. With slightly abuse

of notation, define the autocovariance operator of Xt at lag s as γX,s(·) = E < Xt−s, · > Xt

with γX = γX,0. Let λjTj=1 and φjTj=1 be respectively the eigenvalues and eigenfunctions of

13

the sample covariance operator γX =∑T−1

t=1 < Xt, · > Xt/T . By Theorem 3.2 of Bosq (2000),

we have ργX = γX,1. This result motivates us to consider the moment estimator,

ρT (·) = γX,1γ−1X (·) = 1

T

T∑

t=2

kT∑

j=1

λ−1j < Xt−1, φj >< φj , · > Xt, (27)

where γX,1 =∑T

t=2 < Xt−1, · > Xt/T and γ−1X (·) =

∑kTj=1 λ

−1j < φj, · > φj is the truncated

estimator. Denote by γε,0 and Rεs1,s2,s3 the covariance operator and the cumulant operator for

εt respectively. To establish the consistency of ρT , we make the following assumptions.

Assumption 3.4. Assume that ||ρ||S < 1 and∑+∞

s1,s2,s3=1 ||Rεs1,s2,s3||S < +∞.

Assumption 3.5. Assume that λj > 0 for all j.

Assumption 3.6. Let α1 = 2√2(λ1 − λ2)

−1 and αj = 2√2max(λj−1 − λj)

−1, (λj − λj+1)−1

for j ≥ 2. Assume that T−1/4∑kT

j=1 αj/λj = o(1), λ−1kTT−1/2

∑kTj=1 1/λ

2j = o(1) and

∑j>kT

||ρ(φj)||2 = o(1/√T ).

Lemma 3.2. Under Assumptions 3.4-3.6, ||ρT − ρ||L = op(T−1/4).

Lemma 3.2 shows that ρT is consistent under the uniform norm (with the rate op(T−1/4)),

which slightly generalizes the results in Chapter 8 of Bosq (2000) by allowing the errors to

be dependent. The residual from the fitted model is thus given by εt = Xt − ρT (Xt−1) with

1 ≤ t ≤ T and X0 = 0. Let Vε,T,h(τ1, τ2) = γε,h(τ1, τ2) + γε,h(τ2, τ1), where γε,h(τ1, τ2) =∑T

t=h+1 εt(τ1)εt−h(τ2)/T . Our test statistic is defined to be,

Tε,T,CM =T

8π2

T−1∑

h=1

||Vε,T,h||2I2/h2. (28)

To study the asymptotic behavior of Tε,T,CM , we present the following result whenXt is infinite-

dimensional. Let πkT (·) =∑kT

j=1 < φj , · > φj be the projection operator and γ−1/2X (·) =

∑+∞j=1 λ

−1/2j < φj, · > φj. Define Vε,T,h(τ1, τ2) =

∑Tt=h+1 εt(τ1)εt−h(τ2) + εt(τ2)εt−h(τ1) /T ,

and denote by ρ∗ the adjoint operator of ρ.

Proposition 3.1. Consider the FAR(1) model [see (26)] with εt being a sequence of i.i.d

random variables in L2H. If

|||γ−1/2X ρh−1γ2ε,0ρ

h−1∗ γ

−1/2X |||1 =

+∞∑

j=1

||γε,0ρh−1∗ (φj)||2λj

<∞, (29)

14

uniformly for all h ≥ 1, we have

√T

2

T−1∑

h=1

[Vε,T,h(τ1, τ2)−

1

T

T∑

t=2

εt(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2)

+ εt(τ2)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ1)]Ψh(λ) ⇒ Gε(λ, τ1, τ2),

(30)

where Gε is a mean-zero Gaussian process such that

cov(Gε(λ, τ1, τ2),Gε(λ′, τ ′1, τ ′2))

=1

4

+∞∑

i,j=1

Ψi(λ)Ψj(λ′)cov(h0,i(τ1, τ2) + h0,i(τ2, τ1), h0,j(τ1, τ2) + h0,j(τ2, τ1)),

with hs,i(τ1, τ2) = εs(τ1)εs−i(τ2)− εs(τ1)γε,0ρi−1∗ γ−1

X (Xs−1)(τ2).

Condition (29) basically requires that γε,0ρh−1∗ should be at least as smooth as γ

1/2X . In

particular, if ρ(·) = ∑+∞j=1 λρ,j < φj , · > φj with 0 < infj |λρ,j| ≤ supj |λρ,j| < 1, and γε,0(·) =∑+∞

j=1 λε,j < φj, · > φj (i.e., ρ and γε,0 share the same set of eigenfunctions), we have γX(·) =∑+∞j=1

λε,j1−λ2ρ,j

< φj, · > φj. Condition (29) is fulfilled because |||γε,0|||1 =∑+∞

j=1 λε,j <∞.

We are now in position to present the main result in this section.

Theorem 3.3. Suppose Assumptions 3.4-3.6 are fulfilled. Under the assumptions in Proposition

3.1, we have √T

2

T−1∑

h=1

Vε,T,h(τ1, τ2)Ψh(λ) ⇒ Gε(λ, τ1, τ2), (31)

and

Tε,T,CM →d

∫ 1

0

I2

G2ε (λ, τ1, τ2)dτ1dτ2dλ. (32)

We proved Theorem 3.3 by showing the asymptotic equivalence between√T2

∑T−1h=1 Vε,T,h(τ1, τ2)Ψh(λ) and the LHS of (30). Note that the covariance structure

of the limiting process Gε(λ, τ1, τ2) is different from that of the limiting process of√T2

∑T−1h=1 Vε,T,h(τ1, τ2)Ψh(λ) which is defined based on unobserved innovations (see e.g.

(10)). As seen from the arguments in the appendix, the estimation effect caused by replacing

the unobservable innovations with their estimates is not asymptotically negligible due to the

autocorrelation structure of the FAR(1) models (in contrast, for functional linear models, the

estimation effect is negligible as the covariate is exogenous). The block bootstrap procedure in

Section 2.2 can not be applied directly in this case as it does not capture the randomness from

ρT in the asymptotics. To overcome the difficulty, we consider a modified block bootstrap

procedure. Define

Bt,h(τ1, τ2) =T∑

j=h+1

et−1,j−1εt(τ1)εj−h(τ2)/T,

15

where et,j =∑kT

k=1 < Xt, φk >< Xj, φk > /λk. Here Bt,h is introduced to capture the

estimation effect. Let Yε,th(τ1, τ2) = εt(τ1)εt−h(τ2) + εt(τ2)εt−h(τ1)− γε,h(τ1, τ2)− γε,h(τ2, τ1)−Bt,h(τ1, τ2) − Bt,h(τ2, τ1) for h + 1 ≤ t ≤ T and Yth(τ1, τ2) = −Bt,h(τ1, τ2) − Bt,h(τ2, τ1) for

2 ≤ t ≤ h, where 2 ≤ t ≤ T and 1 ≤ h ≤ T − 1. The modified block bootstrap can be

described in a similar way as before by replacing Yth with Yε,th in the procedure presented

in Section 2.2. Proposition 3.2 below provides theoretical justification for the modified block

bootstrap under Assumption 6.1 in the technical appendix.

Proposition 3.2. Assume lim sup bT /√T < ∞. Under Assumptions 3.4-3.6 and Assumption

6.1, we have

ρw

[L(T ∗ε,T,CM

∣∣∣∣XT

),L(∫ 1

0

I2

G2ε (λ, τ1, τ2)dτ1dτ2dλ

)]→ 0,

in probability, where T ∗ε,T,CM denotes the bootstrap statistic based on the modified block bootstrap

and XT = XtTt=1.

To conclude this section, we point out that the above results extend to FAR(p) models.

This is because if Xt is generated from FAR(p), then (Xt, . . . , Xt−p+1) is a FAR(1) process

in L2(Ip).

4 Numerical results

In this section, we evaluate the finite sample performance of the proposed method through

simulations and an application to cumulative intradaily returns. Throughout the simulations,

the sample sizes considered are T = 101 and 257, and the number of Monte Carlo replications

is set to be 1000 except for the minimal volatility method, where the number of replications

is 200 (due to the computational cost).

4.1 White noise testing

Under the null hypothesis, we simulate i.i.d standard Brownian motions (BMs), Brownian

bridges (BBs) and the FARCH(1) process which is defined as,

Xt(τ1) = εt(τ1)

τ1 +

∫ 1

0

cψ exp

(τ 21 + τ 22

2

)X2t−1(τ2)dτ2, t = 1, 2, . . . , τ1 ∈ [0, 1], (33)

where εt is a sequence of i.i.d standard BMs and cψ = 0.3418 so that the Hilbert-Schmidt

norm of the operator associated with cψ exp((τ21 +τ

22 )/2) is around 0.5. The data are generated

on a grid on 1000 equispaced points in [0, 1] for each functional observation. The functional

data are obtained by using B-splines with 20 basis functions. We also tried 100 basis functions

16

and found that the number of basis functions does not affect our results much. Under the

alternatives, we simulate data from FAR(1) models with i.i.d innovations (BM or BB), and

Gaussian kernel Kg(τ1, τ2) = cg exp((τ21 + τ 22 )/2) or Wiener kernel Kw(τ1, τ2) = cwmin(τ1, τ2),

where cg and cw are chosen so that the corresponding Hilbert-Schmidt norm is equal to 0.3. We

present the empirical sizes and powers for the bootstrap assisted spectra-based test (denoted

by “Boot”), the portmanteau test in Gabrys and Kokoszka (2007) (the GK test), and the

independence test in Horvath et al. (2013) (the HHR test) in Tables 1-2. We also implement

the minimum volatility method described in Section 2.2.

Denote by d andH the number of principal components and lag truncation number involved

in the GK test and the HHR test. Note that H is fixed in the asymptotics for the GK test,

while it is allowed to grow slowly in the HHR test. We choose d such that the cumulative

variation explained by the first d principal components is above 90%, and H = 1, 3 and 5 for

the GK test, and set H = 10, 30 and 50 for the HHR test. It is worth noting that the GK test

and HHR test use respectively the chi-square and standard normal distribution, which make

them easier to implement as compared to the bootstrap-assisted test. Bootstrap versions for

the GK and HHR tests are also possible and will likely lead to more accurate approximation.

From Tables 1, we observe that the GK test exhibits severe upward size-distortion for

the FARCH(1) process because it is not robust to the dependence within the white noise.

We also note that the GK test tends to be undersized when H gets large. The HHR test

generally provides accurate size but we still observe some upward size-distortion at 1% level

since the convergence of the sampling distribution of the HHR test statistic to the standard

normal distribution appears to be slow in the tails. This problem can be alleviated by suitable

power transformation [see Horvath et al. (2013)]. The spectra-based test provides accurate

size with bT = 1, which is due to the fact that the processes considered under the null are

martingale difference sequences and the block bootstrap is valid with bT = 1 as commented

in Section 2.2. When bT becomes larger, the size is more distorted at all levels, but the size

distortion decreases as T increases. In terms of the power, the proposed test outperforms the

two alternative methods in most cases and its power is robust to the choice of block size. The

GK test and the HHR test are comparably less powerful and they tend to deliver less power as

H increases because of the correlation structure of the FAR(1) models. It is also worth noting

that the minimum volatility method tends to perform well. Overall speaking, our method has

very good size and power properties.

4.2 Model diagnostic checking

Consider the functional linear model in (20) with η being the linear operator associated with

the Gaussian or Wiener kernels with Hilbert-Schmidt norm 0.5. The covariate Xt is gener-

ated from the FAR(1) model with Gaussian kernel and BB innovations that are independent of

the errors ǫt. Under the null, ǫt are i.i.d BMs, BBs, or are generated from the FARCH(1)

17

process defined in (33). Under the alternatives, we simulate ǫt from the FAR(1) model with

i.i.d innovations (BM or BB), and Gaussian or Wiener kernels with the Hilbert-Schmidt norm

equal to 0.3. For the spectra-based test, kT is chosen so that∑kT

j=1 λj/∑T

j=1 λj > 0.9 (unre-

ported simulation results show that the spectra-based test is in fact quite robust to the choice

of kT ). Tables 3-4 summarize the empirical sizes and powers for the spectra-based test and

the GHK tests. The GHK tests deliver accurate sizes for i.i.d errors while its size distortion

is severe for the FARCH(1) process. Note that as H gets larger, the empirical size and power

of the GHK tests both decrease. The spectra-based test provides accurate size with bT = 1.

When the block size increases, the size distortion becomes more obvious but it gets improved

for T = 257. As for the power, the spectra-based test generally outperforms the GHK tests

with H = 3, 5. And its power is not quite sensitive to the choice of block size.

Next we consider the goodness of fit testing for the FAR(1) models. Under the null, we

simulate FAR(1) processes with BM, BB or FARCH(1) innovations, and Gaussian or Wiener

kernels with the Hilbert-Schmidt norm equal to 0.5. We employ the modified block bootstrap

procedure described in Section 3.2 to obtain the critical values for the spectra-based test. To

evaluate the power of the proposed method, we further consider the FAR(2) model which is

defined as

Xt(τ1) = ρ1(Xt−1)(τ1) + ρ2(Xt−2)(τ1) + εt(τ1), t = 1, 2, . . . , (34)

where ρ1 and ρ2 are linear operators associated with the Gaussian or Wiener kernels, with

||ρ1||S = 0.5 and ||ρ2||S = 0.3. Table 5 presents the empirical sizes and powers of the spectra-

based test. For bT = 1, the spectra-based test delivers accurate size across all models. The size

becomes more distorted for larger bT , but it gets improved when the sample size T increases.

Moreover, as seen from Table 5, the spectra-based test delivers reasonable power, and it

becomes more powerful as the sample size increases. We also note that the minimum volatility

method performs well in this case. To summarize, the simulation results clearly demonstrate

the usefulness of the proposed method.

4.3 Application to cumulative intradaily returns

Intradaily price curves have been the subject of interest in the investment community and

its properties are known to be quite different from those of daily or weekly closing price, see

e.g. Tsay (2005). To illustrate the usefulness of the proposed method, we apply the testing

procedures developed in previous sections to the cumulative intradaily returns (CIDRs) of the

stocks of IBM and McDonald’s Corporation (MCD) from 2005 to 2007. Suppose Pt(s) is the

stock price (or more generally any financial asset price) at time s on day t. Following Gabrys

et al. (2010) and Kokoszka and Reimherr (2013), the CIDRs are defined as

Rt(s) = 100log(Pt(s))− log(Pt(s0)), s ∈ [0, 1], (35)

18

with s0 being some reference time point (such as the market opening price) and the interval

[0, 1] is the rescaled trading period (in our example, 9:30 to 16:00 EST). We consider the daily

trading data that are recorded on a 5-minute grid. The CIDRs are computed at finite grids

(78 points for each trading day) and are transformed into smooth curves using cubic B-splines

with 40 basis functions and least squares fitting. The top panels of Figure 1 present the CIDRs

for IBM and MCD from 2005 to 2007. A natural question to ask here is whether the shape of

the CIDRs curves on previous days provides any hint to the shape on the following day. To

partly answer this question, we employ the three tests that have been examined in Section 4.1

to test the uncorrelatedness of the CIDRs curves. The bottom panels of Figure 1 depict the

p-values for the spectra-based test with bT = 1, the GHK test with d = 4 and H = 3, 5, and

the HHR test with H = 30, 50. The three tests generally deliver the same conclusion except

that the HHR test suggests possible departure from the white noise assumption for the CIDRs

of IBM at 2006. Our empirical result is consistent with those in Kokoszka and Reimherr (2013)

which suggest that the shapes of CIDRs curves are generally not predictable. However, it is

interesting to find that all the three tests reject the null at 5% level for the CIDRs of IBM at

2007, the year preceding the collapse of 2008. We further employ the test in Section 3.2 with

the modified block bootstrap to check the adequacy of a FAR(1) model. The testing result

does not provide strong support for the fit of FAR(1) model as the corresponding p-values

are 8.8%, 2.8% and 5.2% for bT = 1, 13, 19 respectively (note the sample size T = 248 in this

case). The test based on the minimum volatility method also rejects the null hypothesis at

the 10% level. As seen from Figure 1, the CIDRs of IBM appear to be more volatile and

exhibit certain degrees of nonstationarity at 2007, the year before the financial crisis. More

sophisticated modeling that takes into account theses features might be helpful for explaining

the CIDRs patterns.

5 Conclusion

We have proposed new spectra-based tests for white noise testing and model diagnostic

checking in the context of functional time series. Compared with existing methods, the pro-

posed method has a number of appealing features: (i) with the aid of block bootstrap, our

procedure is able to capture the dependence within the white noise or error sequence, and

thus is widely applicable to a large class of uncorrelated nonlinear functional processes; (ii)

our tests are constructed based on the L2 norm of the spectral distribution function, and it

avoids the choices of functional principal components and lag truncation number which are

usually required for existing tests. In particular, when the white noise or the error sequence

is a martingale difference sequence, the bootstrap method is able to replicate the limiting

distribution even with the block size being one. Based on the empirical results in Section 4,

we suggest the block size bT = 1 in the bootstrap assisted spectra-based test for testing the

19

martingale difference null hypothesis; (iii) with kN growing slowly with the sample size, our

procedure allows the complexity of the estimated model to grow slowly with the sample size

[also see the recent work Fremdt et al. (2013)]. In contrast, the complexity of the estimated

model is typically fixed in the asymptotics for existing methods.

Although spectral methods have proven to be very powerful for analyzing time series and

spatial processes, they are not very well developed for functional data. It is hoped that our

modest contribution can spawn interest among researchers working in the spectral analysis for

functional time series.

6 Technical appendix

The appendix contains proofs of the main results in Sections 2-3. Proofs of some secondary

lemmas are provided in the online supplementary material.

6.1 Proofs of the main results in Section 2.1

Let αi(λ), λ ∈ [0, 1]+∞i=1 and ui(τ), τ ∈ I+∞

i=1 be two sequences of orthonormal basis of the

Hilbert spaces L2([0, 1]) and L2(I) respectively. Then αi(λ)uj(τ1)ul(τ2) : λ ∈ [0, 1], τ1, τ2 ∈ I+∞i,j,l=1

forms a sequence of basis for L2(A). Define Wh,i =< Ψh, αi > and Xt,j =< Xt, uj > . Denote by

L(A)⊗k the cartesian product of k copies of L(A).

Lemma 6.1. Suppose Xt is a L4-m-approximable process. Under the assumption that∑+∞

s=−∞ |||γs|||1 < ∞ and∑+∞

s1,s2,s3=−∞ |||Rs1,s2,s3 |||1 < ∞, we have

√TVT,h(τ1, τ2)Ψh(λ) : 1 ≤ h ≤ k ⇒ Yh(λ, τ1, τ2) : 1 ≤ h ≤ k, k ∈ N,

where VT,h = VT,h−EVT,h and Yh(λ, τ1, τ2) : 1 ≤ h ≤ k is a mean-zero Gaussian process in L(A)⊗k

whose cross-covariance structure is given in (38).

Proof of Lemma 6.1. Under the assumption that Xt is L4-m-approximable, the

sequence Wh,i(Xt,jXt−h,l + Xt,lXt−h,j − EXt,jXt−h,l − EXt,lXt−h,j) : 1 ≤h ≤ k+∞

t=1 is L2-m-approximable [Hormann and Kokoszka (2010)]. Thus1√T

∑Tt=h+1 Wh,i(Xt,jXt−h,l +Xt,lXt−h,j − EXt,jXt−h,l − EXt,lXt−h,j) : 1 ≤ h ≤ k

converges

to a multivariate normal distribution [Theorem A.1 of Aue et al. (2009)]. We show that for k ∈ N

and 1 ≤ h ≤ k,

limmax(N1,N2,N3)→+∞

supT

1

TE

i≥N1,j≥N2,l≥N3

(T∑

t=h+1

Wh,i(Xt,jXt−h,l − EXt,jXt−h,l)

)2

= 0, (36)

which implies the tightness of the sequence in L(A)⊗k [see e.g., Theorem 2.7 of Bosq (2000)]. Note

20

that,

supT

1

TE

i≥N1,j≥N2,l≥N3

(T∑

t=h+1

Wh,i(Xt,jXt−h,l − EXt,jXt−h,l)

)2

=supT

1

T

i≥N1,j≥N2,l≥N3

T∑

t=h+1

T∑

t′=h+1

W 2h,icov(Xt,jXt−h,l,Xt′,jXt′−h,l)

=∑

i≥N1

W 2h,i sup

T

1

T

j≥N2,l≥N3

T∑

t=h+1

T∑

t′=h+1

cum(Xt,j ,Xt−h,l,Xt′,j,Xt′−h,l)

+ EXt,jXt′,jEXt−h,lXt′−h,l + EXt,jXt′−h,lEXt−h,lXt′,j

=∑

i≥N1

W 2h,i sup

T

1

T

j≥N2,l≥N3

T∑

t=h+1

T∑

t′=h+1

< Rt−t′+h,t−t′,h(uj ⊗ ul), uj ⊗ ul >

+ < γt−t′uj , uj >< γt−t′ul, ul > + < γt−t′+hul, uj >< γt−t′−huj , ul >

≤∑

i≥N1

W 2h,i

j≥N2,l≥N3

+∞∑

s=−∞

∣∣∣∣ < Rs+h,s,h(uj ⊗ ul), uj ⊗ ul >

+ < γsuj , uj >< γsul, ul > + < γs+hul, uj >< γs−huj , ul >

∣∣∣∣.

By the summability condition, we deduce that

+∞∑

s=−∞

∞∑

j,l=1

| < γsuj , uj >< γsul, ul > | ≤+∞∑

s=−∞|||γs|||21 ≤

(+∞∑

s=−∞|||γs|||1

)2

< ∞,

+∞∑

s=−∞

∞∑

j,l=1

| < γs+hul, uj >< γs−huj , ul > |

≤+∞∑

s=−∞

+∞∑

j,l=1

| < γs+hul, uj > |2

1/2

+∞∑

j,l=1

| < γs+hul, uj > |2

1/2

≤+∞∑

s=−∞||γs||2S ≤

(+∞∑

s=−∞|||γs|||1

)2

< ∞,

and+∞∑

s=−∞

∞∑

j,l=1

| < Rs+h,s,h(uj ⊗ ul), uj ⊗ ul > | ≤+∞∑

s=−∞|||Rs+h,s,h|||1 < ∞. (37)

Therefore (36) holds. From the above derivation, it is not hard to see that the cross-covariance

operator between Yi and Yj is given by

(Ri,j φ)(λ, τ1, τ2) :=

+∞∑

s=−∞Ψi(λ)

[0,1]×I2

Ψj(λ′)cov(Xs+i(τ1)Xs(τ2) +Xs+i(τ2)Xs(τ1),

Xj(τ′1)X0(τ

′2) +Xj(τ

′2)X0(τ

′1))φ(λ

′, τ ′1, τ′2)dλ

′dτ ′1dτ′2,

(38)

21

where φ(λ′, τ ′1, τ′2) ∈ L2(A). ♦

Proof of Theorem 2.1. Write GT (λ, τ1, τ2) as

GT (λ, τ1, τ2) =

√T

2

k∑

h=1

VT,h(τ1, τ2)Ψh(λ) +

√T

2

T−1∑

h=k+1

VT,h(τ1, τ2)Ψh(λ)

=J1,k,T (λ, τ1, τ2) + J2,k,T (λ, τ1, τ2).

For any fixed k, Lemma 6.1 and the continuous mapping theorem imply that J1,k,T (λ, τ1, τ2) ⇒∑k

h=1Yh(λ, τ1, τ2). We show that for any ǫ > 0 and δ > 0, there exists a large enough k so that

limT→+∞ P (||J2,k,T ||A > ǫ) < δ. Simple calculation yields that ||J2,k,T ||2A = T8π2

∑T−1h=k+1 ||VT,h||2I2/h

2,

where we have used the fact that < Ψh,Ψh′ >= 12π2h2

Iδ = δ′. It thus follows that for large enough

k and T ≥ k + 2,

P (||J2,k,T ||A > ǫ) ≤E||J2,k,T ||2Aǫ2

≤ T

2π2ǫ2

T−1∑

h=k+1

E||γh − Eγh||2I2/h2

=1

2Tπ2ǫ2

T−1∑

h=k+1

+∞∑

i,j=1

T∑

t,t′=h+1

cov(Xt,iXt−h,j ,Xt′,iXt′−h,j)/h2

=1

2Tπ2ǫ2

T−1∑

h=k+1

+∞∑

i,j=1

T∑

t,t′=h+1

cum(Xt,i,Xt−h,j ,Xt′,i,Xt′−h,j)

+ EXt,iXt′,iEXt−h,jXt′−h,j + EXt,iXt′−h,jEXt−h,jXt′,i

/h2

≤ 1

2π2ǫ2

T−1∑

h=k+1

+∞∑

s=−∞

(|||γs|||21 + |||Rs+h,s,h|||1 + ||γs−h||S ||γs+h||S

)/h2 < δ.

(39)

The tightness ofGT follows from the above arguments. Finally, we show that∑k

h=1Yh ⇒∑∞h=1Yh as

k → +∞, which implies the finite-dimensional convergence of GT [see Proposition 6.3.9 of Brockwell

and Davis (1991)]. By (38), we have

E

∣∣∣∣∣

∣∣∣∣∣

+∞∑

h=k+1

Yh

∣∣∣∣∣

∣∣∣∣∣

2

A

≤+∞∑

h,h′=k+1

E < Yh,Yh′ >

≤ 1

2π2

+∞∑

h=k+1

+∞∑

s=−∞

I2

cov(Xs+h(τ1)Xs(τ2) +Xs+h(τ2)Xs(τ1),Xh(τ1)X0(τ2) +Xh(τ2)X0(τ1))dτ1dτ2/h2,

≤ 1

π2

+∞∑

h=k+1

+∞∑

s=−∞

(|||γs|||21 + |||Rs+h,s,h|||1 + ||γs−h||S ||γs+h||S + |||Rs,s−h,−h|||1 + |||γs−h|||1|||γs+h|||1

+ ||γs||2S)/h2 → 0,

as k → +∞, which thus completes the proof. ♦

22

6.2 Proofs of the main results in Section 2.2

Let E∗ and var∗ be the expectation and variance in the bootstrap world (conditional on

the sample XtTt=1). Denote by C a generic positive constant which is different from line

to line. Throughout the following proofs, we shall use the unbiased estimator γh(τ1, τ2) =1

T−h∑T

t=h+1 Xt(τ1)Xt−h(τ2) for γh(τ1, τ2). It is straightforward to verify that the difference by

replacing 1T

∑Tt=h+1Xt(τ1)Xt−h(τ2) with 1

T−h∑T

t=h+1 Xt(τ1)Xt−h(τ2) is asymptotically negligible.

Let Γth,l = (Xt,l2Xt−h,l3 + Xt,l3Xt−h,l2− < γhul3 , ul2 > − < γhul2 , ul3 >)Wh,l1 and Γth,l =

(Xt,l2Xt−h,l3 + Xt,l3Xt−h,l2− < γhul3 , ul2 > − < γhul2 , ul3 >)Wh,l1 with t ≥ h + 1, l = (l1, l2, l3)

and l1, l2, l3 ∈ N. Set Γth,l = Γth,l = 0 with t ≤ h.

Lemma 6.2. Suppose the assumptions in Theorem 2.4 hold. Then for any k ∈ N,

1

2√T

MT∑

s=1

k∑

h=1

t∈B∗s

Γth,l →d<

k∑

h=1

Yh/2, αl1 ⊗ ul2 ⊗ ul3 >, (40)

in probability.

Proof of Lemma 6.2. First note that

var∗

1

2√T

MT∑

s=1

k∑

h=1

t∈B∗s

Γth,l

=

1

4T

MT∑

s=1

(k∑

h=1

t∈Bs

Γth,l

)2

− 1

4TMT

(k∑

h=1

T∑

t=h+1

Γth,l

)2

=I1,l,T − I2,l,T .

For I1,l,T , we have

EI1,l,T =1

4T

MT∑

s=1

k∑

h,h′=1

t,t′∈Bs

EΓth,lΓt′h′,l

=1

4T

MT∑

s=1

k∑

h,h′=1

t∈Bs,t≥h+1

t′∈Bs,t′≥h′+1

Wh,l1Wh′,l1cov(Xt,l2Xt−h,l3 +Xt,l3Xt−h,l2 ,Xt′,l2Xt′−h′,l3

+Xt′,l3Xt′−h′,l2)

→1

4

k∑

h,h′=1

Wh,l1Wh′,l1

∞∑

s=−∞cov(Xs+h,l2Xs,l3 +Xs+h,l3Xs,l2 ,Xh′,l2X0,l3 +Xh′,l3X0,l2)

=1

4

k∑

h,h′=1

< Rh,h′(αl1 ⊗ ul2 ⊗ ul3), αl1 ⊗ ul2 ⊗ ul3 > .

(41)

Based on similar calculation, it is not hard to show that EI2,l,T = O(1/MT ) which implies that

23

I2,l,T = op(1). Next we show that var(I1,l,T ) = o(1). Simple algebra yields that

var(I1,l,T ) =1

16T 2

MT∑

s,s′=1

k∑

h1,h2=1

k∑

h′1,h′

2=1

t1,t2∈Bs

t′1,t′2∈Bs′

cov(Γt1h1,lΓt2h2,l,Γt′1h′1,lΓt′2h′2,l)

=1

16T 2

MT∑

s,s′=1

k∑

h1,h2=1

k∑

h′1,h′

2=1

t1,t2∈Bs

t′1,t′2∈Bs′

cum(Γt1h1,l,Γt2h2,l,Γt′1h′1,l,Γt′2h′2,l)

+ cov(Γt1h1,l,Γt′1h′1,l)cov(Γt2h2,l,Γt′2h′2,l) + cov(Γt1h1,l,Γt′2h′2,l)cov(Γt2h2,l,Γt′1h′1,l)

=1

16T 2

MT∑

s,s′=1

k∑

h1,h2,h′1,h′2=1

t1,t2∈Bs

t′1,t′2∈Bs′

(J1,l,T + J2,l,T + J3,l,T ).

We first deal with J1,l,T . By Lemma F.9 in Panaretos and Tavakolithe (2013a), Theorem ii.2 in

Rosenblatt (1985) and the triangle inequality, we deduce that

|cum(Xt1,l2Xt1−h1,l3 ,Xt2,l2Xt2−h2,l3 ,Xt′1,l2Xt′

1−h′

1,l3 ,Xt′

2,l2Xt′

2−h′

2,l3)|

≤||cum(Xt1Xt1−h1 ,Xt2Xt2−h2 ,Xt′1Xt′

1−h′

1,Xt′

2Xt′

2−h′

2)||I8

≤∑

v=v1∪···∪vp||cum(X1 : 1 ∈ v1) · · · cum(Xp : p ∈ vp)||I8

≤∑

v=v1∪···∪vp||cum(X1 : 1 ∈ v1)||I|v1| · · · ||cum(Xp : p ∈ vp)||I|vp| ,

where |vj | denotes the cardinality of the set vj and the summation is over all indecomposable partitions

of the two way table

t1 t1 − h1

t2 t2 − h2

t′1 t′1 − h′1

t′2 t′2 − h′2.

By the linearity of joint cumulant, we have

J1,l,T ≤ C|Wh1,l1Wh2,l1Wh′1,l1Wh′

2,l1 |

v=v1∪···∪vp||cum(X1 : 1 ∈ v1)||I|v1| · · · ||cum(Xp : p ∈ vp)||I|vp| .

Using some standard arguments [see e.g., Shao (2011a)], we are able to show that the contribution

from J1,l,T is of order o(1) under condition (14). The arguments for J2,l,T and J3,l,T are in fact the

24

same by switching t′1 with t′2. For J2,l,T , it contains a typical term as,

cov(Xt1,l2Xt1−h1,l3 ,Xt′1,l2Xt′

1−h′

1,l3)cov(Xt2,l2Xt2−h2,l3 ,Xt′

2,l2Xt′

2−h′

2,l3)

=

< Rt1−t′1+h′1,t1−t′1−h1+h′1,h′1(ul2 ⊗ ul3), ul2 ⊗ ul3 > + < γt1−t′1ul2 , ul2 >< γt1−t′1−h1+h′1ul3 , ul3 >

+ < γt1−t′1−h1ul2 , ul3 >< γt1−t′1+h′1ul3 , ul2 >

×

< Rt2−t′2+h′2,t2−t′2−h2+h′2,h′2(ul2 ⊗ ul3), ul2 ⊗ ul3 >

+ < γt2−t′2ul2 , ul2 >< γt2−t′2−h2+h′2ul3 , ul3 > + < γt2−t′2−h2ul2 , ul3 >< γt2−t′2+h′2ul3 , ul2 >

.

When s = s′, summation over the above terms is of order O(1/MT ) and when s 6= s′, summation

over the above terms is of order O(MT /T2). Similar arguments can be applied to the other terms

contained in J2,l,T and hence J2,l,T = o(1). The above arguments then imply that

var∗

1

2√T

MT∑

s=1

k∑

h=1

t∈B∗s

Γth,l

→p 1

4

k∑

h,h′=1

< Rh,h′(αl1 ⊗ ul2 ⊗ ul3), αl1 ⊗ ul2 ⊗ ul3 > . (42)

Conditional on the sample XtTt=1, we note that

MT∑

s=1

k∑

h=1

t∈B∗s

Γth,l

/(2

√T ) =

MT∑

s=1

Ds,l

is the summation of i.i.d mean zero random variables, where Ds,l =∑k

h=1

∑t∈B∗

sΓth,l/(2

√T ). Let

Γh,l = Γth,l − Γth,l and note that

Evar∗

1

2√T

MT∑

s=1

k∑

h=1

t∈B∗s

Γh,l

=

1

4TE

MT∑

s=1

(k∑

h=1

t∈Bs

Γh,l

)2

− 1

4TMTE

(k∑

h=1

T∑

t=h+1

Γh,l

)2

≤CbTE

(k∑

h=1

Γh,l

)2

→ 0.

(43)

From (42) and (43), we have var∗(

12√T

∑MT

s=1

∑kh=1

∑t∈B∗

sΓth,l

)→p 1

4

∑kh,h′=1 < Rh,h′(αl1 ⊗ ul2 ⊗

ul3), αl1 ⊗ ul2 ⊗ ul3 > . The conclusion follows from Lemma 6.3, which shows that the Lyapunov

condition holds in probability. ♦

Lemma 6.3. Let Ds,l =(∑k

h=1

∑t∈B∗

sΓth,l

)/(2

√T ). Under the assumptions in Theorem 2.4,

E∑MT

s=1 E∗|Ds,l|4 → 0.

Consider√T2

∑T−1h=1 V∗

h(τ1, τ2)Ψh(λ), where V∗h(τ1, τ2) =

∑Tl=2 Y∗

lh/T and Ylh is defined by re-

placing γh with γh in the definition of Ylh. Let V∗h(τ1, τ2) = V∗

h(τ1, τ2) − V∗h(τ1, τ2). Then we

have√T2

∑T−1h=1 V∗

h(τ1, τ2)Ψh(λ) =√T2

∑T−1h=1 V∗

h(τ1, τ2)Ψh(λ) +√T2

∑T−1h=1 V∗

h(τ1, τ2)Ψh(λ). Denote by

J∗2,k,T =

√T2

∑T−1h=k+1 V∗

h(τ1, τ2)Ψh(λ) and J∗2,k,T =

√T2

∑T−1h=k+1 V∗

h(τ1, τ2)Ψh(λ). We have the following

result.

25

Lemma 6.4. Under the assumptions in Theorem 2.4, limk→+∞ limT→+∞ EE∗||J∗

2,k,T ||2A = 0 and

limk→+∞ limT→+∞ EE∗||J∗

2,k,T ||2A = 0.

By Lemma 6.2, Lemma 6.4 and Proposition 6.3.9 of Brockwell and Davis (1991), we establish the

finite-dimensional convergence for√T2

∑kh=1 V∗

h(τ1, τ2)Ψh(λ). Note that√T2

∑T−1h=1 V∗

h(τ1, τ2)Ψh(λ) =

1√MT

∑MT

s=1

√T−1T

∑T−1h=1

∑t∈B∗

sYth(τ1, τ2)Ψh(λ)/(2

√bT ), where for different s, the random vari-

ables inside the bracket are independent conditional on the data. Thus to show the tightness, we

only need to verify that

EE∗

∣∣∣∣∣∣

∣∣∣∣∣∣

T−1∑

h=1

t∈B∗s

YthΨh/(2√

bT )

∣∣∣∣∣∣

∣∣∣∣∣∣

2

A

=1

8π2bTE

T−1∑

h=1

E∗

∣∣∣∣∣∣

∣∣∣∣∣∣

t∈B∗s

Yth

∣∣∣∣∣∣

∣∣∣∣∣∣

2

I2

/h2

≤ 1

4π2(T − 1)E

T−1∑

h=1

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

Yth∣∣∣∣∣

∣∣∣∣∣

2

I2

/h2 +1

4π2(T − 1)E

T−1∑

h=1

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

(Yth − Yth)∣∣∣∣∣

∣∣∣∣∣

2

I2

/h2 < ∞.

The arguments are similar to those in the proof of Lemma 6.4. The details are omitted.

Remark 6.1. Under the martingale difference assumption, it can be shown that the bootstrap pro-

cedure with bT = 1 is still valid. The key is to note that EI1,l,T in (41) converges to

1

4

k∑

h,h′=1

Wh,l1Wh′,l1cov(X0,l2X−h,l3 +X0,l3X−h,l2 ,X0,l2X−h′,l3 +X0,l3X−h′,l2).

As k → +∞, the above quantity further converges to < R(αl1 ⊗ul2 ⊗ul3), αl1 ⊗ul2 ⊗ul3 >, where R

is the linear operator associated with the kernel defined in (16). The proof is basically a repetition

of the argument presented above.

6.3 Proofs of the main results in Section 3.1

Lemma 6.5. Assume that Xt and ǫt are both mean-zero stationary with∑+∞

s=−∞(|||RXs,0,s|||1 +

||γX,s||2S) < +∞,∑+∞

s=−∞ |||Rǫs,0,s|||1 < +∞, and ||γǫ,0||S < +∞. Then 1

T

∑Tt=1 ||Yt||2 →p E||Y1||2.

Proof of Theorem 3.1. We only need to show that

T

∣∣∣∣∣

T−1∑

h=1

||Vǫ,T,h||2I2/h2 −

T−1∑

h=1

||Vǫ,T,h||2I2/h2

∣∣∣∣∣ = op(1). (44)

The conclusion follows from Theorem 2.2. Let ϑT = η − ηT . Then ǫt = Yt − ηT (Xt) = ǫt + ϑT (Xt).

Note that Vǫ,T,h(τ1, τ2) = Vǫ,T,h(τ1, τ2) + gh,T (τ1, τ2) + gadh,T (τ1, τ2), where gadh,T (τ1, τ2) = gh,T (τ2, τ1)

and

gh,T (τ1, τ2) =1

T

T∑

t=h+1

gt,h,T (τ1, τ2) =1

T

T∑

t=h+1

ǫt(τ1)ϑT (Xt−h)(τ2) + ϑT (Xt)(τ1)ǫt−h(τ2)

+ ϑT (Xt)(τ1)ϑT (Xt−h)(τ2)

.

(45)

26

Then by simple algebra, we get

∣∣∣∣∣

T−1∑

h=1

1

h2(||Vǫ,T,h||2I2 − ||Vǫ,T,h||2I2

)∣∣∣∣∣ ≤

T−1∑

h=1

1

h2

(2||gh,T + gadh,T ||I2 · ||Vǫ,T,h||I2 + ||gh,T + gadh,T ||2I2

)

≤2

(T−1∑

h=1

1

h2||gh,T + gadh,T ||2I2

)1/2(T−1∑

h=1

1

h2||Vǫ,T,h||2I2

)1/2

+T−1∑

h=1

1

h2||gh,T + gadh,T ||2I2

≤4

(T−1∑

h=1

1

h2||gh,T ||2I2

)1/2(T−1∑

h=1

1

h2||Vǫ,T,h||2I2

)1/2

+ 4T−1∑

h=1

1

h2||gh,T ||2I2 .

The triangle inequality yields that

T

T−1∑

h=1

1

h2||gh,T ||2I2 ≤C

T

T−1∑

h=1

1

h2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

ǫtϑT (Xt−h)

∣∣∣∣∣

∣∣∣∣∣

2

I2

+

T−1∑

h=1

1

h2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

ϑT (Xt)ǫt−h

∣∣∣∣∣

∣∣∣∣∣

2

I2

+

T−1∑

h=1

1

h2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

ϑT (Xt)ϑT (Xt−h)

∣∣∣∣∣

∣∣∣∣∣

2

I2

= C(L1,T + L2,T + L3,T ).

We first consider L1,T . Note that∣∣∣∣∣∣∑T

t=h+1 ǫtϑT (Xt−h)∣∣∣∣∣∣I2

≤ ||ϑT ||L∣∣∣∣∣∣∑T

t=h+1 ǫtXt−h∣∣∣∣∣∣I2

and

E

T−1∑

h=1

1

Th2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

ǫtXt−h

∣∣∣∣∣

∣∣∣∣∣

2

I2

=

T−1∑

h=1

1

Th2

T∑

t,t′=h+1

E < ǫt, ǫt′ >< Xt−h,Xt′−h >

=E

T−1∑

h=1

1

Th2

T∑

t,t′=h+1

I2

ǫt(τ1)ǫt′(τ1)Xt−h(τ2)Xt′−h(τ2)dτ1dτ2

≤T−1∑

h=1

1

h2

T−1∑

s=1−T

∣∣∣∣∣

I2

Rǫ,Xs+h,s,h(τ1, τ2, τ1, τ2) + γǫ,s(τ1, τ1)γX,s(τ2, τ2)

dτ1dτ2

∣∣∣∣∣

≤T−1∑

h=1

1

h2

+∞∑

s=−∞

(|||Rǫ,X

s+h,s,h|||1 + |||γǫ,s|||1 · |||γX,s|||1)< ∞,

where γǫ,s = 0 for s 6= 0. Similarly we have

E

T−1∑

h=1

1

Th2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

Xtǫt−h

∣∣∣∣∣

∣∣∣∣∣

2

I2

≤T−1∑

h=1

1

h2

+∞∑

s=−∞

(|||Rǫ,X

s−h,s,−h|||1 + |||γǫ,s|||1 · |||γX,s|||1)< ∞.

Because ||ϑT ||L = op(1), we obtain L1,T = op(1) and L2,T = op(1). Finally, using the fact that

T ||ϑT ||4L = op(1) and

E

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

XtXt−h

∣∣∣∣∣

∣∣∣∣∣

2

I2

=T∑

t,t′=h+1

E < Xt,Xt′ >< Xt−h,Xt′−h >≤ T 2E||X1||4,

27

we deduce that

L3,T ≤T ||ϑT ||4LT−1∑

h=1

1

T 2h2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

XtXt−h

∣∣∣∣∣

∣∣∣∣∣

2

I2

= op(1),

which completes the proof. ♦

Proof of Theorem 3.2. We aim to show that

E∗

∣∣∣∣∣

∣∣∣∣∣

T−1∑

h=1

V∗ǫ,T,hΨh −

T−1∑

h=1

V∗ǫ,T,hΨh

∣∣∣∣∣

∣∣∣∣∣

2

A

= op(1/T ). (46)

And the conclusion follows from Theorem 2.4. Notice that

Yǫ,th(τ1, τ2) =Yǫ,th(τ1, τ2) + gt,h,T (τ1, τ2) + γǫ,h(τ1, τ2)− γǫ,h(τ1, τ2)

+ gt,h,T (τ2, τ1) + γǫ,h(τ2, τ1)− γǫ,h(τ2, τ1).

Simple calculation then yields

E∗

∣∣∣∣∣

∣∣∣∣∣

T−1∑

h=1

V∗ǫ,T,hΨh −

T−1∑

h=1

V∗ǫ,T,hΨh

∣∣∣∣∣

∣∣∣∣∣

2

A

=1

2π2

T−1∑

h=1

1

h2E∗||V∗

ǫ,T,h − V∗ǫ,T,h||2I2

=1

2T 2π2

T−1∑

h=1

1

h2E∗ <

MT∑

s=1

t∈B∗s

(Yǫ,th −Yǫ,th),MT∑

s=1

t∈B∗s

(Yǫ,th −Yǫ,th) >

≤ 1

2T 2π2

T−1∑

h=1

1

h2

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

(Yǫ,th − Yǫ,th)∣∣∣∣∣

∣∣∣∣∣

2

I2

+1

2T 2π2

T−1∑

h=1

1

h2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

(Yǫ,th − Yǫ,th)∣∣∣∣∣

∣∣∣∣∣

2

I2

. (47)

For the second term in (47), we have 12T 2π2

∑T−1h=1

1h2

∣∣∣∣∣∣∑T

t=h+1(Yǫ,th − Yǫ,th)∣∣∣∣∣∣2

I2= op(1/T ). Using

similar arguments as in the proof of Theorem 3.1, it is not hard to show that

1

2T 2π2

T−1∑

h=1

1

h2

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

(Yǫ,th − Yǫ,th)∣∣∣∣∣

∣∣∣∣∣

2

I2

≤C

T−1∑

h=1

1

T 2h2

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

gt,h,T

∣∣∣∣∣

∣∣∣∣∣

2

I2

+ op(1/T ) = op(1/T ),

which completes the proof. ♦

6.4 Proofs of the main results in Section 3.2

Lemma 6.6. Assume that Xt follows a FAR(1) process with the innovation sequence εt and

||ρ||S < 1. Assume that ||γε,0||S < +∞ and∑+∞

s1,s2,s3=1 ||Rεs1,s2,s3 ||S < ∞. Then 1

T

∑Tt=1 ||Xt||2 →p

E||X1||2.

Proof of Proposition 3.1. To be clear, we only show the weak convergence of1√T

∑T−1h=1

∑Tt=h+1 εt(τ1)εt−h(τ2) − ∑T

t=2 εt(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2)Ψh(λ). The result

for the symmetrized version follows from essentially the same arguments. Recall the basis

28

αi(λ)uj(τ1)ul(τ2) : λ ∈ [0, 1], τ1, τ2 ∈ I+∞i,j,l=1 defined at the beginning of Section 6.1. Let

βt,j =< Xt, φj >, εt,j =< εt, uj > with εt,j = 0 for t < 0, and Wh,i =< Ψh, αi >. For any

1 ≤ h ≤ T − 1, define the projection sequence,

ST,ijl(h) :=1√T

T∑

t=2

Wh,i

(εt,jεt−h,l − εt,j < γε,0ρ

h−1∗ γ−1

X πkT (Xt−1), ul >)

=1√T

T∑

t=2

Wh,i

(εt,jεt−h,l − εt,j

kT∑

k=1

λ−1k βt−1,k < φk, ρ

h−1γε,0(ul) >

),

where we have used the fact that γ−1X πkT (·) =

∑kTk=1 λ

−1k < φk, · > φk and

< γε,0ρh−1∗ γ−1

X πkT (Xt−1), ul >=< γ−1X πkT (Xt−1), ρ

h−1γε,0(ul) >

= <

kT∑

k=1

λ−1k βt−1,kφk, ρ

h−1γε,0(ul) >=

kT∑

k=1

λ−1k βt−1,k < φk, ρ

h−1γε,0(ul) > .

Notice that EST,ijl(h) = 0 and

var(ST,ijl(h)) =1

TE

T∑

t=2

Wh,i

(εt,jεt−h,l − εt,j

kT∑

k=1

λ−1k βt−1,k < φk, ρ

h−1γε,0(ul) >

)2

≤ 2

TE

(T∑

t=2

Wh,iεt,jεt−h,l

)2

+2

TE

(T∑

t=2

Wh,iεt,j

kT∑

k=1

λ−1k βt−1,k < φk, ρ

h−1γε,0(ul) >

)2

≤2W 2h,iEε

21,jEε

21,l +

2

TE

( T∑

t,t′=2

W 2h,iεt,jεt′,j

kT∑

k,k′=1

λ−1k λ−1

k′ βt−1,kβt′−1,k′ < φk, ρh−1γε,0(ul) >

< φk′ , ρh−1γε,0(ul) >

)

=2W 2h,iEε

21,jEε

21,l +

2

T

( T∑

t=2

W 2h,iEε

2t,j

kT∑

k,k′=1

λ−1k λ−1

k′ Eβt−1,kβt−1,k′ < φk, ρh−1γε,0(ul) >

< φk′ , ρh−1γε,0(ul) >

)

≤2W 2h,iEε

21,jEε

21,l + 2W 2

h,iEε21,j

+∞∑

k=1

λ−1k < γε,0ρ

h−1∗ (φk), ul >

2< ∞,

where we have used the fact that Eεt,jεt′,jβt−1,kβt′−1,k′ = 0 for t 6= t′ and Eβt,jβt,j′ = 1j = j′λj .Under the assumption that εt is a sequence of i.i.d random variables in L2

H, we have Xt is L2-

m-approximable [see Example 2.1 of Hormann and Kokoszka (2010)]. Thus by the independence

between εt and Xt−1, and Lemma 2.1 of Hormann and Kokoszka (2010), we know εtXt−1 and the

corresponding projection sequence are both L2-m-approximable. The finite-dimensional convergence

follows from similar arguments in the proof of Theorem A.1 of Aue et al. (2009). On the other hand,

29

under Condition (29), we have

limmax(N1,N2,N3)→+∞

supT

i≥N1,j≥N2,l≥N3

var(ST,ijl(h))

≤ limmax(N1,N2,N3)→+∞

supT

i≥N1,j≥N2,l≥N3

2W 2h,i

Eε21,jEε

21,l + Eε21,j

+∞∑

k=1

λ−1k < γε,0ρ

h−1∗ (φk), ul >

2

= 0.

The above arguments imply that

1√T

T∑

t=2

εt(τ1)εt−h(τ2)− εt(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2)Ψh(λ) ⇒ Nh(λ, τ1, τ2),

jointly for 1 ≤ h ≤ k with k ∈ N being fixed, where Nh(λ, τ1, τ2) : 1 ≤ h ≤ k is a mean-zero

Gaussian process in L(A)⊗k. And the cross-covariance operator between Ni and Nj is given by

Hij φ(λ, τ1, τ2) = limkT→+∞

Ψi(λ)

[0,1]2×IΨj(λ

′)cov(εt(τ1)εt−i(τ2)− εt(τ1)γε,0ρi−1∗ γ−1

X πkT (Xt−1)(τ2),

εt(τ′1)εt−j(τ

′2)− εt(τ

′1)γε,0ρ

j−1∗ γ−1

X πkT (Xt−1)(τ′2))φ(λ

′, τ ′1, τ′2)dλ

′dτ ′1dτ′2,

for φ ∈ L2(A). The rest of the arguments are similar to those in the Proof of Theorem 2.1. Define

Jk,T = 1√T

∑T−1h=k+1

∑Tt=h+1 εt(τ1)εt−h(τ2)−

∑Tt=2 εt(τ1)γε,0ρ

h−1∗ γ−1

X πkT (Xt−1)(τ2)Ψh(λ). It suffices

to show that: (1) for any ǫ, δ > 0, there exists a large enough k such that P (||Jk,T ||A > ǫ) < δ and

(2) E||∑+∞h=k+1 Nh||2A → 0 as k → +∞. For the first goal, we note that

P (||Jk,T ||A > ǫ) ≤ E||Jk,T ||2Aǫ2

=1

2π2ǫ2T

T−1∑

h=k+1

1

h2E

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

εtεt−h −T∑

t=2

εtγε,0ρh−1∗ γ−1

X πkT (Xt−1)

∣∣∣∣∣

∣∣∣∣∣

2

I2

=1

2π2ǫ2T

T−1∑

h=k+1

1

h2

+∞∑

j,l=1

E

T∑

t=2

(εt,jεt−h,l − εt,j

kT∑

k=1

λ−1k βt−1,k < φk, ρ

h−1γε,0(ul) >)

2

≤I1 + I2,

where

I1 =1

π2ǫ2T

T−1∑

h=k+1

1

h2

+∞∑

j,l=1

E

T∑

t=2

(εt,jεt−h,l)

2

≤ 1

π2ǫ2

T−1∑

h=k+1

1

h2|||γε,0|||21 < δ/2,

30

and

I2 =1

π2ǫ2T

T−1∑

h=k+1

1

h2

+∞∑

j,l=1

E

T∑

t=2

εt,j

kT∑

k=1

λ−1k βt−1,k < φk, ρ

h−1γε,0(ul) >

2

≤ 1

π2ǫ2

T−1∑

h=k+1

1

h2

+∞∑

j,l=1

Eε2t,j

+∞∑

k=1

λ−1k < γε,0ρ

h−1∗ (φk), ul >

2

=1

π2ǫ2

T−1∑

h=k+1

1

h2|||γε,0|||1

+∞∑

k=1

λ−1k ||γε,0ρh−1

∗ (φk)||2 < δ/2,

for large enough k. As for the second goal, we have

E

∣∣∣∣∣

∣∣∣∣∣

+∞∑

h=k+1

Nh

∣∣∣∣∣

∣∣∣∣∣

2

A

=

+∞∑

h,h′=k+1

E < Nh, Nh′ >

=1

2π2

+∞∑

h=k+1

I2

var(εt(τ1)εt−h(τ2)− εt(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2))dτ1dτ2/h2

≤ 1

π2

+∞∑

h=k+1

I2

var(εt(τ1)εt−h(τ2)) + cov(εt(τ1)γε,0ρ

h−1∗ γ−1

X πkT (Xt−1)(τ2))dτ1dτ2/h

2

=1

π2

+∞∑

h=k+1

(|||γε,0|||21 + |||γε,0|||1

+∞∑

k=1

λ−1k ||γε,0ρh−1

∗ (φk)||2)/h2 → 0,

as k → +∞. The proof is thus completed. ♦

Proof of Theorem 3.3. Let DT = ρ − ρT . Then εt = Xt − ρT (Xt−1) = εt + DT (Xt−1). Note

that Vε,T,h(τ1, τ2) = Vε,T,h(τ1, τ2) + 1T

∑Tt=h+1DT (Xt−1)(τ1)εt−h(τ2) + DT (Xt−1)(τ2)εt−h(τ1) +

fh,T (τ1, τ2) + fh,T (τ2, τ1), where

fh,T (τ1, τ2) =1

T

T∑

t=h+1

εt(τ1)DT (Xt−h−1)(τ2) +DT (Xt−1)(τ1)DT (Xt−h−1)(τ2)

. (48)

We first show that T∑T−1

h=1 ||fh,T ||2I2/h2 = op(1). To this end, notice that

E

T−1∑

h=1

1

Th2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

εtXt−h−1

∣∣∣∣∣

∣∣∣∣∣

2

I2

=

T−1∑

h=1

1

Th2

T∑

t,t′=h+1

+∞∑

j,j′=0

E < ρj(εt−h−1−j), ρj′(εt′−h−1−j′) >< εt, εt′ >

=E

T−1∑

h=1

1

Th2

T∑

t,t′=h+1

+∞∑

j,j′=0

I3

Pj,j′(τ3, τ′3)εt(τ1)εt−h−1−j(τ3)εt′(τ1)εt′−h−1−j′(τ

′3)dτ1dτ3dτ

′3

≤T−1∑

h=1

1

h2

T−1∑

s=1−T

∣∣∣∣∣

+∞∑

j,j′=0

I3

Pj,j′(τ3, τ′3)

Rεs+h+1+j′,s−j+j′,h+1+j′(τ1, τ3, τ1, τ

′3)

+ γε,s(τ1, τ1)γε,s+j′−j(τ3, τ′3) + γε,s+h+1+j′(τ1, τ

′3)γε,s−h−1−j(τ3, τ1)

dτ1dτ3dτ

′3

∣∣∣∣∣,

31

where Pj,j′(τ3, τ′3) =

∫ρj(τ2, τ3)ρ

j′(τ2, τ′3)dτ2 and γε,s = 0 for s 6= 0. Because ||Pj,j′ ||S ≤ ||ρ||j+j′S , we

obtain

I3

Pj,j′(τ3, τ′3)Rε

s+h+1+j′,s−j+j′,h+1+j′(τ1, τ3, τ1, τ′3)dτ1dτ3dτ

′3

≤||Pj,j′ ||S∫

I2

(∫

IRεs+h+1+j′,s−j+j′,h+1+j′(τ1, τ3, τ1, τ

′3)dτ1

)2

dτ3dτ′3

1/2

≤||ρ||j+j′S |||Rεs+h+1+j′,s−j+j′,h+1+j′|||1.

It is also not hard to show that

I3

Pj,j′(τ3, τ′3)γε,s(τ1, τ1)γε,s+j′−j(τ3, τ

′3)dτ1dτ3dτ

′3 ≤ |||γε,s|||1||ρ||j+j

S ||γε,s+j′−j||S ,

and

I3

Pj,j′(τ3, τ′3)γε,s+h+1+j′(τ1, τ

′3)γε,s−h−1−j(τ3, τ1)dτ1dτ3dτ

′3

≤||Pj,j′||S∫

I2

(∫

Iγε,s+h+1+j′(τ1, τ

′3)γε,s−h−1−j(τ3, τ1)dτ1

)2

dτ3dτ′3

1/2

≤||ρ||j+j′S ||γε,s+h+1+j′||S ||γε,s−h−1−j||S .

Combing the above derivations along with the summability conditions, we de-

duce that∑T−1

h=11Th2

∣∣∣∣∣∣∑T

t=h+1 εtXt−h−1

∣∣∣∣∣∣2

I2= Op(1). Using the fact that

||∑Tt=h+1 εtDT (Xt−h−1)||I2 ≤ ||DT ||L||

∑Tt=h+1 εtXt−h−1||I2 and ||DT ||L = op(1), we obtain

that∑T−1

h=11Th2 ||

∑Tt=h+1 εtDT (Xt−h−1)||2I2 = op(1). By Lemma 3.2 and the fact that

E

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

Xt−1Xt−h−1

∣∣∣∣∣

∣∣∣∣∣

2

I2

=

T∑

t,t′=h+1

E < Xt−1,Xt′−1 >< Xt−h−1,Xt′−h−1 >≤ T 2E||X1||4,

we get

T−1∑

h=1

1

Th2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

DT (Xt−1)DT (Xt−h−1)

∣∣∣∣∣

∣∣∣∣∣

2

I2

≤ T ||DT ||4LT−1∑

h=1

1

T 2h2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

Xt−1Xt−h−1

∣∣∣∣∣

∣∣∣∣∣

2

I2

= op(1),

which implies that T∑T−1

h=1 ||fh,T ||2I2/h2 = op(1). Therefore, we only need to show that

√T

2

T−1∑

h=1

[Vε,T,h(τ1, τ2) +

1

T

T∑

t=h+1

DT (Xt−1)(τ1)εt−h(τ2) +DT (Xt−1)(τ2)εt−h(τ1)]Ψh(λ)

32

converges to Gε(λ, τ1, τ2) in the corresponding functional space. Consider

−DT (Xt−1)(τ1)εt−h(τ2) = (ρT − ρ)(Xt−1)(τ1)εt−h(τ2)

=γX,1γ−1X (Xt−1)− ργX γ

−1X (Xt−1) + ργX γ

−1X (Xt−1)− ρ(Xt−1)(τ1)εt−h(τ2)

=1

T

T∑

j=2

< Xj−1, γ−1X (Xt−1) > εj(τ1)εt−h(τ2) + ργX γ−1

X (Xt−1)− ρ(Xt−1)(τ1)εt−h(τ2).

By Lemma 6.7, we know that the contribution from ργX γ−1X (Xt−1)−ρ(Xt−1)(τ1)εt−h(τ2) is asymp-

totically negligible. Switching the index t and j, we get

T∑

t=h+1

T∑

j=2

< Xj−1, γ−1X (Xt−1) > εj(τ1)εt−h(τ2) =

T∑

t=2

T∑

j=h+1

< Xt−1, γ−1X (Xj−1) > εt(τ1)εj−h(τ2).

By Lemma 6.8 and (30), we have√T2

∑T−1h=1 Vε,T,h(τ1, τ2)Ψh(λ) ⇒ Gε(λ, τ1, τ2). Finally, by the contin-

uous mapping theorem, we obtain Tε,T,CM →d∫ 10

∫I2 G2

ε (λ, τ1, τ2)dτ1dτ2λ which completes the proof.

Lemma 6.7. Under the assumptions in Theorem 3.3,

T−1∑

h=1

1

Th2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

ργX γ−1X − ρ(Xt−1)εt−h

∣∣∣∣∣

∣∣∣∣∣

2

I2

= op(1). (49)

Lemma 6.8. Under the assumptions in Theorem 3.3,

T−1∑

h=1

1

Th2

∣∣∣∣∣∣∣∣1

T

T∑

t=2

T∑

j=h+1

< Xt−1, γ−1X (Xj−1) > εt(τ1)εj−h(τ2)

−T∑

t=2

γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2)εt(τ1)

∣∣∣∣∣∣∣∣2

I2

= op(1).

(50)

6.5 Theoretical results for the modified block bootstrap

For bTMT = T − 1 with bT ,MT ∈ Z, define the non-overlapping blocks of indices Bs = bT (s −1) + 2, . . . , bT s + 1 with 1 ≤ s ≤ MT . Pick MT blocks with replacement from BsMT

s=1. Joint the

MT blocks and let (t∗2, . . . , t∗T ) be the corresponding indices.

Assumption 6.1. Let Yε,th(τ1, τ2) = εt(τ1)εt−h(τ2) + εt(τ2)εt−h(τ1) − γε,h(τ1, τ2) −γε,h(τ2, τ1) − Bt,h(τ1, τ2) − Bt,h(τ2, τ1) with Bt,h(τ1, τ2) = εt(τ1)γε,0ρ

h−1∗ γ−1

X πkT (Xt−1)(τ2) −∑T

t=2 εt(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2)/T . Define V∗ε,T,h =

∑Tl=2 Y∗

ε,lh/T , where Y∗ε,lh = Yε,t∗

lh.

Assume that

ρw

[L(√

T

2

T−1∑

h=1

V∗ε,T,h(τ1, τ2)Ψh(λ)

∣∣∣∣XT),L(Gε(λ, τ1, τ2))

]→ 0, (51)

and

suph

1

TE

∣∣∣∣∣

∣∣∣∣∣

T∑

t=2

εtγε,0ρh−1∗ γ−1

X πkT (Xt−1)

∣∣∣∣∣

∣∣∣∣∣I2

< ∞. (52)

33

When Xt is finite-dimensional, (51) can be verified using similar arguments in Section 6.2. In

view of Proposition 3.1, we expect that (51) still holds when Xt is infinite-dimensional and the

errors are i.i.d. It is worth noting that when εt are i.i.d, 1T E

∣∣∣∣∣∣∑T

t=2 εtγε,0ρh−1∗ γ−1

X πkT (Xt−1)∣∣∣∣∣∣I2

≤|||γε,0|||1|||γε,0ρh−1

∗ γ−1X πkT γXπkT γ

−1X ρh−1γε,0|||1. For h ≥ 2, |||γε,0ρh−1

∗ γ−1X πkT γXπkT γ

−1X ρh−1γε,0|||1 ≤

||γε,0||2S ||γ−1/2X ρh−1||2L < ∞ provided that ||γ−1/2

X ρ||L < ∞ [also see Assumption A1 in Mas (2007)].

The same conclusion holds when h = 1, because γε,0 = γX − ργXρ∗ [Theorem 3.2 of Bosq (2000)].

The key idea here is to show the asymptotic equivalence of the modified block bootstrap to

the procedure described in Assumption 6.1 which is based on the unobserved innovation sequence.

Notice that Yε,th(τ1, τ2) = Uε,th(τ1, τ2)+Uε,th(τ2, τ1) with Uε,th(τ1, τ2) = εt(τ1)εt−h(τ2)− γε,h(τ1, τ2)−Bt,h(τ1, τ2). We have the decomposition,

Uε,th(τ1, τ2) = εt(τ1)εt−h(τ2)− γε,h(τ1, τ2)− Bt,h(τ1, τ2) = εt(τ1)εt−h(τ2)− γε,h(τ1, τ2)

− εt(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2) +1

T

T∑

t=2

εt(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2)

+

εt(τ1)γε,0ρ

h−1∗ γ−1

X πkTT (Xt−1)− εt(τ1)IT,h(Xt−1)(τ2)

+

εt(τ1)DT (Xt−h−1)(τ2) +DT (Xt−1)(τ1)DT (Xt−h−1)(τ2)

+

γε,h(τ1, τ2)− γε,h(τ1, τ2)

+

DT (Xt−1)(τ1)εt−h(τ2)−DT (Xt−1)(τ1)IT,h(Xt−1)(τ2)

+

εt(τ1)IT,h(Xt−1)(τ2)− Bt,h(τ1, τ2)

1

T

T∑

t=2

εt(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2)

= Uε,th(τ1, τ2) + L1,t,h + L2,t,h + L3,t,h + L4,t,h + L5,t,h − L6,t,h,

where Uε,th(τ1, τ2) = εt(τ1)εt−h(τ2)− γε,h(τ1, τ2)− Bt,h(τ1, τ2) and Lj,t,h’s are defined implicity as the

random variables inside the brackets.

Lemma 6.9. Under the assumptions in Proposition 3.2,∑T−1

h=11Th2

∑MT

s=1 ||∑

t∈BsLj,t,h||2I2 = op(1)

with j = 1, 2, 3, 4, 5 and 6.

Proof of Proposition 3.2. We aim to show that

TE∗

∣∣∣∣∣

∣∣∣∣∣

T−1∑

h=1

V∗ε,T,hΨh −

T−1∑

h=1

V∗ε,T,hΨh

∣∣∣∣∣

∣∣∣∣∣

2

A

= op(1). (53)

34

The conclusion follows from Assumptions 6.1. Simple calculation yields that

TE∗

∣∣∣∣∣

∣∣∣∣∣

T−1∑

h=1

V∗ε,T,hΨh −

T−1∑

h=1

V∗ε,T,hΨh

∣∣∣∣∣

∣∣∣∣∣

2

A

=T

2π2

T−1∑

h=1

1

h2E∗||V∗

ε,T,h − V∗ε,T,h||2I2

=1

2Tπ2

T−1∑

h=1

1

h2E∗ <

MT∑

s=1

t∈B∗s

(Yε,th − Yε,th),MT∑

s=1

t∈B∗s

(Yε,th − Yε,th) >

≤ 2

Tπ2

T−1∑

h=1

1

h2

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

(Uε,th − Uε,th)

∣∣∣∣∣

∣∣∣∣∣

2

I2

+1

Tπ2

T−1∑

h=1

1

h2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

Yε,th∣∣∣∣∣

∣∣∣∣∣

2

I2

+

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

Yε,th∣∣∣∣∣

∣∣∣∣∣

2

I2

.

(54)

By Lemma 6.9, we get 1Tπ2

∑T−1h=1

1h2∑MT

s=1

∣∣∣∣∣∣∑

t∈Bs(Uε,th − Uε,th)

∣∣∣∣∣∣2

I2= op(1). To deal with the second

term in (54), we note that

1

T

T∑

t=2

< γ−1X (Xt−1),Xj−1 > εt(τ1)

=1

T

T∑

t=2

< γ−1X (Xt−1),Xj−1 > Xt(τ1)−

1

T

T∑

t=2

< γ−1X (Xt−1),Xj−1 > ρT (Xt−1)(τ1)

=1

T

T∑

t=2

< γ−1X (Xt−1),Xj−1 > Xt(τ1)−

1

T 2

T∑

t,l=2

< Xt−1, γ−1X (Xj−1) >< Xt−1, γ

−1X (Xl−1) > Xl(τ1)

=1

T

T∑

t=2

< Xt−1, γ−1X (Xj−1) > Xt(τ1)−

1

T

T∑

l=2

< γX γ−1X (Xj−1), γ

−1X (Xl−1) > Xl(τ1)

=1

T

T∑

t=2

< Xt−1 − γX γ−1X (Xt−1), γ

−1X (Xj−1) > Xt(τ1) = 0,

which implies that∑T

t=2 Bt,h(τ1, τ2) = 0 for 1 ≤ h ≤ T − 1. Thus it is straightforward to verify that

the second term in (54) is of order op(1). ♦

References

[1] Antoniadis, A., and Sapatinas, T. (2003), “Wavelet Methods for Continuous-time Prediction

Using Hilbert-valued Autoregressive Processes,” Journal of Multivariate Analysis, 87, 133-158.

[2] Arsenin, V., and Tikhonov, A. (1977), Solutions of Ill-posed Problems, Washington D.C.: Winston

and Sons.

[3] Aue, A., Horvath, S., Hormann, L., and Reimherr, M. (2009), “Detecting Changes in the Covari-

ance Structure of Multivariate Time Series Models,” Annals of Statistics, 37, 4046-4087.

[4] Besse, P., Cardot, H., and Stephenson, D. (2000), “Autoregressive Forecasting of Some Functional

Climatic Variations,” Scandinavian Journal of Statistics, 27, 673-687.

35

[5] Bosq, D. (2000), Linear Process in Function Spaces: Theory and Applications, New York:

Springer.

[6] Brillinger, D. R. (1975), Time Series: Data Analysis and Theory, San Francisco: Holden-Day.

[7] Brockwell, P. J., and Davis, R. A. (1991), Time Series: Theory and Methods, Springer.

[8] Chiou, J.-M., Muller, H.-G., and Wang, J.-L. (2004), “Functional Response Models,” Statistica

Sinica, 14, 675-693.

[9] Dahlhaus, R. (1985), “On the Asymptotic Distribution of Bartlett’s Up-statistic,” Journal of

Time Series Analysis, 6, 213-227.

[10] Deo, R. S. (2000), “Spectral Tests for the Martingale Hypothesis under Conditional Het-

eroscedasticity,” Journal of Econometrics, 99, 291-315.

[11] Durlauf, S. (1991), “Spectral-based Test for the Martingale Hypothesis,” Journal of Economet-

rics, 50, 1-19.

[12] Escanciano, J., and Velasco, C. (2006), “Generalized Spectral Tests for the Martingale Difference

Hypothesis,” Journal of Econometrics, 134, 151-185.

[13] Fernandez de Castroa, B., Guillasb, S., and Gonzalez, W. (2005), “Manteigac Functional Samples

and Bootstrap for Predicting Sulfur Dioxide Levels,” Technometrics, 47, 212-222.

[14] Francq, C., Roy, R., and Zakoıan, J. M. (2005), “Diagnostic Checking in ARMA Models with

Uncorrelated Errors,” Journal of the American Statistical Association, 100, 532-544.

[15] Fremdt, S., Horvath, L., Kokoszka, P., and Steinebach, J. G. (2014), “Functional Data Analysis

with Increasing Number of Projections,” Journal of Multivariate Analysis, 124, 313-332

[16] Gabrys, R., Horvath, L., and Kokoszka, P. (2010), “Tests for Error Correlation in the Functional

Linear Model,” Journal of the American Statistical Association, 105, 1113-1125.

[17] Gabrys, R., and Kokoszka, P. (2007), “Portmanteau Test of Independence for Functional Ob-

servations,” Journal of the American Statistical Association, 102, 1338-348.

[18] Groetsch. C. (1993), Inverse Problems in the Mathematical Sciences, Wiesbaden: Vieweg.

[19] Hardle, W., Horowitz, J., and Kreiss, J-P. (2003), “Bootstrap Methods for Time Series,” Inter-

national Statistical Review, 71, 435-459.

[20] Hong, Y. (1996), “Consistent Testing for Serial Correlation of Unknown Form,” Econometrica,

64, 837-864.

[21] Hong, Y., and Lee, Y. J. (2003), “Consistent Testing for Serial Uncorrelation of Unknown

Form under General Conditional Heteroscedasticity,” Preprint, Cornell University, Department

of Economics.

36

[22] Hormann, S., Horvath, L., and Reeder, R. (2013), “A Functional Version of the ARCH Model,”

Econometric Theory, 29, 267-288.

[23] Hormann, S., Kidzinski, L., and Hallin, M. (2015), “Dynamic Functional Principal Components,”

Journal of the Royal Statistical Society, Ser. B, 77, 319-348.

[24] Hormann, S., and Kokoszka, P. (2010), “Weakly Dependent Functional Data,” Annals of Statis-

tistics, 38, 1845-1884.

[25] Horowitz, J. L., Lobato, I. N., Nankervis, J. C., and Savin, N. E. (2006), “Bootstrapping the

Box-Pierce Q test: A Robust Test of Uncorrelatedness,” Journal of Econometrics, 133, 841-862.

[26] Horvath, L., Huskova, M., and Rice, G. (2013), “Test of Independence for Functional Data,”

Journal of Multivariate analysis, 117, 100-119.

[27] Horvath, L., and Kokoszka, P. (2012), Inference for Functional Data with Applications, New

York: Springer.

[28] Kargin, V., and Onatski, A. (2008), “Curve Forecasting by Functional Autoregression,” Journal

of Multivariate Analysis, 99, 2508-2526.

[29] Kokoszka, P., and Reimherr, M. (2013), “Predictability of Shapes of Intraday Price Curves,”

Econometrics Journal, 16, 285-308.

[30] Laukaitis, A., and Rackauskas, A. (2002), “Functional Data Analysis of Payment Systems,”

Nonlinear Analysis: Modeling and Control, 7, 53-68.

[31] Lobato, I. N., Nankervis, J. C., and Savin, N. E. (2002), “Testing for Zero Autocorrelation in

the Presence of Statistical Dependence,” Econometric Theory, 18, 730-743.

[32] Mas, A. (2007), “Weak Convergence in the Functional Autoregressive Model,” Journal of Mul-

tivariate analysis, 98, 1231-1261

[33] Panaretos, V. M., and Tavakoli, S. (2013a), “Fourier Analysis of Stationary Time Series in

Function Space,” Annals of Statistics, 41, 568-603.

[34] Panaretos, V. M., and Tavakoli, S. (2013b), “Cramer-Karhunen-Loeve Representation and Har-

monic Principal Component Analysis of Functional Time Series,” Stochastic Processes and their

Applications, 123, 2779-2807.

[35] Politis, D. N., and Romano, J. (1994), “Limit Theorems for Weakly Dependent Hilbert Space

Valued Random Variables with Application to the Stationary Bootstrap,” Statistica Sinica, 4,

461-476.

[36] Politis, D. N., Romano, J., and Wolf, M. (1999), Subsampling, New York: Springer-Verlag.

[37] Riesz, F., and Sz-Nagy, B. (1955), Functional Analysis, New York: Ungar.

37

[38] Romano, J. L., and Thombs, L. A. (1996), “Inference for Autocorrelations under Weak Assump-

tions,” Journal of the American Statistical Association, 91, 590-600.

[39] Shao, X. (2011a), “A Bootstrap-assisted Spectral Test of White Noise under Unknown Depen-

dence,” Journal of Econometrics, 162, 213-224.

[40] Shao, X. (2011b), “Testing for White Noise under Unknown Dependence and Its Applications

to Goodness-of-fit for Time Series Models,” Econometric Theory, 27, 312-343.

[41] Tsay, R. S. (2005), Analysis of Financial Time Series, New York: Wiley.

[42] Zhu, K. (2007), Operator Theory in Function Spaces, Vol. 138 of Mathematical surveys and

monographs, American Mathematical Society, Providence, RI.

[43] Zhu, K., and Li, W. (2015), “A Bootstrapped Spectral Test for Adequacy in Weak ARMA

Models,” Journal of Econometrics, 187, 113-130.

38

Table 1: Empirical sizes in percentage for the GK test, the HHR test, and the spectra-based test. The

sample size T = 101 (upper panel), 257 (lower panel), and the number of Monte Carlo replications

is 1000. The bootstrap critical values are computed based on B = 299 resamples.

BM BB FARCH

10% 5% 1% 10% 5% 1% 10% 5% 1%

GK, H = 1 8.6 4.0 0.7 8.4 3.9 0.6 19.2 12.2 4.1

GK, H = 3 10.0 4.5 0.7 7.7 4.2 0.4 16.9 9.9 3.4

GK, H = 5 7.5 3.5 0.3 6.7 3.2 0.8 13.1 8.1 2.2

HHR, H = 10 9.5 5.3 1.8 10.7 7.2 3.3 9.8 7.4 3.2

HHR, H = 30 8.9 5.3 2.3 11.1 6.9 2.7 10.7 7.1 3.3

HHR, H = 50 8.3 5.5 1.9 11.2 7.1 2.9 9.7 6.2 2.4

Boot, bT = 1 8.6 4.4 1.0 10.6 5.0 1.4 9.7 4.7 0.5

Boot, bT = 5 9.2 5.6 1.5 10.1 5.2 0.9 12.9 5.7 1.4

Boot, bT = 10 15.0 8.6 2.5 10.9 4.9 1.8 15.0 7.4 1.3

Boot, bT = b∗ 11.5 7.0 1.5 10.5 5.5 1.0 9.5 5.5 1.0

GK, H = 1 10.4 5.0 1.6 9.1 4.9 0.7 23.5 14.1 4.4

GK, H = 3 9.8 4.8 1.2 10.1 4.4 1.3 17.2 11.1 3.4

GK, H = 5 10.0 4.3 0.6 8.7 4.6 0.8 14.6 9.3 2.0

HHR, H = 10 9.0 5.8 2.7 10.9 7.3 3.6 12.2 8.6 4.5

HHR, H = 30 10.0 6.9 2.8 11.2 6.8 2.0 11.4 7.8 3.5

HHR, H = 50 11.7 7.4 2.6 9.9 6.2 3.7 11.7 7.7 3.1

Boot, bT = 1 11.1 5.8 1.1 11.0 5.5 1.6 11.2 5.4 1.6

Boot, bT = 8 10.5 5.4 1.8 10.4 5.5 0.7 10.8 5.3 2.1

Boot, bT = 16 10.8 6.3 2.2 10.3 4.8 0.9 11.8 6.0 1.7

Boot, bT = b∗ 9.5 4.5 2.0 11.0 6.0 1.5 13.0 6.0 1.5

Note: b∗ denotes the block size chosen by the minimum volatility method.

39

Table 2: Empirical powers in percentage for the GK test, the HHR test, and the spectra-based

test. The sample size T = 101 (upper panel), 257 (lower panel), and the number of Monte Carlo

replications is 1000. The bootstrap critical values are computed based on B = 299 resamples.

FAR(1)-G-BM FAR(1)-W-BM FAR(1)-G-BB FAR(1)-W-BB

10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%

GK, H = 1 77.1 66.1 39.3 69.4 57.1 31.8 76.9 63.6 35.5 69.1 53.3 25.2

GK, H = 3 55.4 42.2 23.7 49.1 35.5 19.8 51.2 38.4 15.6 43.5 31.8 12.0

GK, H = 5 45.2 34.3 17.4 39.0 29.1 15.1 38.0 25.6 10.5 32.0 20.7 8.5

HHR, H = 10 54.2 45.8 32.0 57.7 49.5 35.5 49.1 38.8 25.6 46.1 37.4 25.1

HHR, H = 30 42.7 34.2 23.3 45.7 37.8 26.5 37.3 28.0 16.4 37.0 28.6 15.9

HHR, H = 50 39.9 32.0 20.4 42.7 35.8 24.1 32.1 24.0 13.7 31.8 24.3 13.7

Boot, bT = 1 87.0 78.5 55.6 89.5 82.9 62.8 80.8 71.9 47.9 80.9 69.9 45.2

Boot, bT = 5 84.9 73.6 43.0 87.9 78.0 48.4 77.7 65.4 32.2 76.9 64.4 32.1

Boot, bT = 10 85.2 74.7 45.5 87.8 78.0 51.1 79.5 65.8 36.2 79.0 64.7 35.3

Boot, bT = b∗ 85.5 81.0 49.5 89.0 83.5 55.0 78.5 66.5 37.5 76.5 63.5 36.5

GK, H = 1 99.5 99.2 95.8 99.0 97.9 93.0 100.0 99.8 98.2 99.3 98.0 91.7

GK, H = 3 97.7 94.4 83.3 95.2 90.7 77.7 97.7 94.4 83.8 93.3 86.5 71.6

GK, H = 5 93.3 88.0 73.1 89.9 83.0 66.3 92.3 85.8 65.8 83.5 74.5 52.0

HHR, H = 10 95.9 93.6 86.0 96.3 94.6 88.9 93.3 89.7 78.6 92.4 87.7 76.5

HHR, H = 30 84.4 78.5 64.3 87.1 82.0 69.3 77.2 68.8 51.8 76.3 67.8 51.1

HHR, H = 50 78.0 70.5 54.3 82.2 75.9 60.6 68.3 59.5 42.6 67.8 58.5 42.2

Boot, bT = 1 99.8 99.3 97.1 99.7 99.5 98.6 99.9 99.0 94.5 99.5 99.1 93.7

Boot, bT = 8 99.5 99.1 94.3 99.8 99.1 96.0 99.4 97.7 90.2 99.4 97.7 89.3

Boot, bT = 16 99.8 99.4 93.5 99.9 99.6 94.9 99.4 97.5 87.8 99.0 97.3 87.3

Boot, bT = b∗ 99.5 99.5 97.0 99.5 99.5 97.5 100.0 99.5 94.0 100.0 100.0 94.0

Note: b∗ denotes the block size chosen by the minimum volatility method.

40

Table 3: Empirical sizes in percentage for the GKH tests and the spectra-based test for testingthe error correlation in functional linear models. Note that K(·, ·) is the Gaussian kernel (i)or the Wiener kernel (ii). The sample size T = 101 (upper panel), 257 (lower panel), and thenumber of Monte Carlo replications is 1000. The bootstrap critical values are computed basedon B = 299 resamples.

G-BM G-BB G-FARCH

10% 5% 1% 10% 5% 1% 10% 5% 1%

GHK1, H = 1 (i) 8.6 4.3 1.0 8.5 3.9 1.1 22.7 14.3 5.4(ii) 8.7 4.2 1.0 8.6 4.2 1.1 22.7 14.4 5.4

GHK1, H = 3 (i) 8.6 3.5 1.0 6.6 2.9 0.6 17.0 9.9 3.6(ii) 8.6 3.5 1.0 6.9 2.9 0.7 17.1 9.9 3.6

GHK1, H = 5 (i) 6.1 3.1 0.8 6.7 2.5 0.7 14.0 8.2 1.8(ii) 6.0 3.1 0.8 6.8 2.5 0.6 14.1 8.4 1.8

GHK2, H = 1 (i) 8.9 3.5 0.8 9.2 4.0 1.1 19.9 11.9 3.6(ii) 9.0 3.4 0.7 9.1 3.5 1.3 19.9 11.9 3.8

GHK2, H = 3 (i) 7.7 3.7 0.9 7.0 2.9 0.4 15.8 9.5 2.5(ii) 8.1 3.9 0.9 7.4 3.3 0.5 15.7 9.2 2.4

GHK2, H = 5 (i) 6.3 3.1 0.5 7.2 2.9 0.4 13.2 8.1 2.0(ii) 6.7 3.1 0.6 7.0 2.6 0.4 13.1 7.7 1.9

Boot, bT = 1 (i) 11.1 5.3 1.3 11.3 5.5 1.0 11.5 5.5 0.6(ii) 11.2 5.3 1.2 11.3 5.3 1.1 11.6 5.5 0.6

Boot, bT = 5 (i) 12.0 5.4 1.5 10.1 4.7 0.8 11.9 5.2 1.1(ii) 12.1 5.4 1.4 9.9 4.9 0.8 11.8 5.2 1.1

Boot, bT = 10 (i) 14.6 6.7 1.3 11.8 5.9 1.2 14.1 7.3 2.4(ii) 14.5 6.8 1.1 11.5 6.0 1.2 14.4 7.4 2.4

Boot, bT = b∗ (i) 13.5 8.0 2.5 11.5 7.0 1.0 14.0 4.0 0.5(ii) 13.5 8.0 2.5 11.5 6.5 1.0 14.5 3.5 0.5

GHK1, H = 1 (i) 10.9 5.3 1.4 9.2 5.2 1.0 29.7 17.5 7.5(ii) 10.9 5.3 1.4 9.4 5.1 1.0 29.5 17.5 7.5

GHK1, H = 3 (i) 10.2 5.2 1.3 8.9 4.3 1.2 23.3 14.1 4.9(ii) 9.8 5.3 1.3 8.9 4.4 1.2 23.1 13.9 5.0

GHK1, H = 5 (i) 9.6 4.4 0.5 8.7 4.2 0.5 20.3 11.7 2.9(ii) 9.6 4.5 0.5 8.7 4.1 0.5 20.4 12.2 2.9

GHK2, H = 1 (i) 11.1 6.0 1.2 10.0 5.3 1.0 22.1 13.8 5.5(ii) 11.1 6.1 1.1 9.8 5.8 1.0 22.2 13.8 5.3

GHK2, H = 3 (i) 9.3 4.6 0.4 9.2 4.7 0.7 18.0 11.1 3.3(ii) 9.5 4.6 0.4 9.4 4.1 0.8 17.9 10.7 3.1

GHK2, H = 5 (i) 9.4 4.0 0.5 7.6 3.7 0.8 15.5 9.2 2.3(ii) 9.4 4.2 0.5 7.9 4.1 0.6 15.6 9.3 2.1

Boot, bT = 1 (i) 10.1 5.0 1.1 10.6 4.3 1.0 10.9 4.8 1.2(ii) 10.0 4.9 1.0 10.6 4.2 1.0 10.8 4.9 1.2

Boot, bT = 8 (i) 10.0 4.7 1.6 10.5 5.1 1.1 12.9 6.5 1.1(ii) 10.1 4.8 1.7 10.5 5.1 1.1 12.9 6.5 1.1

Boot, bT = 16 (i) 13.6 7.6 2.0 10.1 5.1 1.7 11.7 6.1 1.8(ii) 13.4 7.6 1.9 10.3 5.0 1.7 11.6 6.2 1.8

Boot, bT = b∗ (i) 10.0 5.0 1.5 12.0 5.5 2.0 8.0 3.0 0.0(ii) 9.0 5.0 1.5 12.0 6.0 2.0 8.5 3.0 0.0

Note: b∗ denotes the block size chosen by the minimum volatility method.

41

Table 4: Empirical powers in percentage for the GKH tests and the spectra-based test fortesting the error correlation in functional linear models. Note that K(·, ·) is the Gaussiankernel (i) or the Wiener kernel (ii). The sample size T = 101 (upper panel), 257 (lowerpanel), and the number of Monte Carlo replications is 1000. The bootstrap critical values arecomputed based on B = 299 resamples.

FAR(1)-G-BM FAR(1)-W-BM FAR(1)-G-BB FAR(1)-W-BB

10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%

GHK1, H = 1 (i) 84.0 72.1 45.5 37.6 26.0 8.9 78.6 67.1 40.3 67.5 52.6 27.0(ii) 83.8 72.3 45.6 37.4 25.6 8.8 78.2 67.4 40.5 67.4 52.7 26.8

GHK1, H = 3 (i) 58.8 44.7 20.3 23.2 13.8 3.8 51.2 38.4 18.2 42.0 31.9 12.0(ii) 59.0 44.8 20.4 23.2 14.0 3.6 51.3 38.7 17.9 41.6 31.7 11.9

GHK1, H = 5 (i) 45.7 31.0 10.9 18.4 10.5 3.5 38.4 27.2 11.2 31.7 21.0 7.3(ii) 45.6 31.2 10.8 18.4 10.5 3.4 38.4 26.9 11.2 32.1 20.7 7.2

GHK2, H = 1 (i) 73.7 62.2 36.7 63.7 51.5 28.3 82.6 72.6 48.7 71.4 58.7 31.7(ii) 70.7 59.3 34.0 64.0 52.1 28.6 76.1 65.9 38.0 72.0 58.9 31.3

GHK2, H = 3 (i) 53.3 41.3 19.5 45.8 33.0 14.7 58.6 44.5 21.7 46.1 33.4 14.6(ii) 51.5 38.7 18.7 46.0 33.4 14.7 51.1 36.3 16.7 45.8 33.5 14.6

GHK2, H = 5 (i) 44.0 32.1 14.5 38.2 27.0 12.0 43.6 30.3 11.6 35.1 23.1 7.9(ii) 42.0 30.7 13.7 38.5 27.3 11.9 38.1 25.1 8.9 34.8 23.7 8.7

Boot, bT = 1 (i) 76.2 67.3 44.1 79.6 72.0 49.1 71.5 59.2 35.7 70.1 58.4 34.6(ii) 76.3 67.3 44.3 79.6 71.9 49.2 71.7 59.1 35.4 70.2 58.5 34.5

Boot, bT = 5 (i) 80.2 68.4 37.9 84.1 72.5 42.3 71.2 56.1 27.0 70.3 54.5 26.1(ii) 80.2 67.9 38.2 83.9 72.6 42.2 71.0 56.0 27.0 70.4 54.2 25.9

Boot, bT = 10 (i) 80.9 65.3 40.1 83.8 71.0 45.7 70.9 53.5 26.5 69.0 52.2 24.9(ii) 80.9 65.2 40.6 83.7 70.9 45.7 71.0 53.4 26.2 68.8 52.6 25.2

Boot, bT = b∗ (i) 79.0 67.5 38.0 82.5 72.0 42.5 67.5 57.0 27.5 68.5 57.5 25.0(ii) 79.0 67.5 38.5 82.5 71.5 42.5 68.5 55.0 28.5 68.5 57.0 25.0

GHK1, H = 1 (i) 100.0 100.0 99.4 86.6 78.0 56.8 99.9 99.9 99.4 99.8 99.4 97.0(ii) 100.0 100.0 99.4 86.5 77.9 57.0 99.9 99.9 99.4 99.8 99.4 96.9

GHK1, H = 3 (i) 99.2 97.4 91.7 65.5 54.2 31.3 99.3 98.1 92.5 96.9 92.4 79.0(ii) 99.0 97.5 92.1 65.8 54.3 31.3 99.3 98.1 92.6 96.9 92.3 78.6

GHK1, H = 5 (i) 95.3 92.1 79.5 55.2 42.6 22.9 97.2 91.6 78.6 89.2 82.1 59.8(ii) 95.3 92.0 79.5 55.5 42.3 22.9 97.1 91.7 78.4 89.2 82.2 60.1

GHK2, H = 1 (i) 99.4 98.6 93.0 98.6 97.7 89.3 100.0 100.0 99.9 100.0 99.6 97.4(ii) 99.3 98.0 90.8 98.8 97.8 89.5 100.0 99.8 98.4 99.9 99.7 97.6

GHK2, H = 3 (i) 95.1 91.0 79.2 93.6 86.7 72.3 99.3 98.5 92.1 97.1 94.0 81.8(ii) 94.0 89.9 77.0 93.5 86.9 72.2 98.4 95.4 86.2 97.1 93.5 81.7

GHK2, H = 5 (i) 90.9 84.6 68.5 86.7 80.0 61.6 96.7 93.0 79.2 91.2 84.1 63.5(ii) 89.5 82.6 66.8 86.8 80.2 61.7 93.3 88.2 67.7 92.4 84.1 64.2

Boot, bT = 1 (i) 99.7 99.1 96.0 99.8 99.5 97.6 99.4 98.1 91.6 99.0 97.9 90.6(ii) 99.7 99.1 95.9 99.8 99.5 97.7 99.4 98.1 91.7 99.1 97.9 90.8

Boot, bT = 8 (i) 99.5 99.0 92.1 99.7 99.4 94.5 99.3 96.9 84.9 98.4 96.9 84.1(ii) 99.5 99.0 92.1 99.7 99.3 94.5 99.3 97.0 84.9 98.5 96.8 84.9

Boot, bT = 16 (i) 99.7 99.1 92.0 99.9 99.6 95.0 99.3 97.1 85.0 99.0 96.4 84.1(ii) 99.7 99.1 92.1 99.9 99.6 95.0 99.3 97.0 84.6 98.9 96.5 84.0

Boot, bT = b∗ (i) 100.0 99.0 95.0 100.0 99.5 96.0 99.0 96.5 90.0 99.0 96.0 89.5(ii) 100.0 99.0 95.0 100.0 99.5 96.0 99.0 96.5 90.0 99.0 96.0 90.0

Note: b∗ denotes the block size chosen by the minimum volatility method.

42

Table 5: Empirical sizes and powers in percentage for the spectra-based test for testing the goodness

of fit for FAR(1) models with BM (i), BB (ii) or FARCH(1) (iii) innovations. Under the alternatives,

the data are generated from FAR(2) models with the corresponding innovations. The sample size

T = 101 (upper panel), 257 (lower panel), and the number of Monte Carlo replications is 1000. The

bootstrap critical values are computed based on B = 299 resamples.

FAR(1)-G FAR(1)-W FAR(2)-G FAR(2)-W

10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%

Boot, bT = 1 (i) 9.5 5.5 1.2 9.5 5.7 1.3 75.0 63.0 39.5 81.7 72.8 49.8

(ii) 9.6 4.6 1.3 9.1 4.7 1.1 49.8 37.0 15.8 57.3 43.3 20.2

(iii) 10.9 4.8 1.1 11.0 5.9 1.2 56.3 42.4 18.7 73.0 61.2 33.8

Boot, bT = 5 (i) 12.9 6.4 1.4 12.4 7.6 1.6 72.4 59.8 30.2 81.7 69.3 38.5

(ii) 10.1 5.2 0.8 11.1 5.3 0.8 46.0 30.6 10.8 51.9 35.7 14.6

(iii) 10.3 4.1 1.0 11.4 5.3 0.9 55.0 39.3 16.0 70.8 55.8 26.8

Boot, bT = 10 (i) 11.5 6.0 0.7 11.4 5.9 1.8 75.9 61.0 30.8 81.8 67.5 38.6

(ii) 10.4 4.3 0.7 10.5 4.5 0.6 41.9 27.9 9.1 48.0 32.1 12.2

(iii) 12.8 6.0 0.7 13.7 7.7 1.8 56.5 39.6 18.1 73.2 57.0 30.0

Boot, bT = b∗ (i) 10.0 4.0 1.0 9.5 6.0 1.0 83.0 71.5 47.0 84.5 77.5 52.0

(ii) 9.0 2.5 0.5 10.0 4.0 0.5 47.5 32.0 17.0 55.5 38.0 16.0

(iii) 9.0 4.5 1.5 9.0 5.0 1.5 61.0 45.5 20.5 72.0 60.5 31.5

Boot, bT = 1 (i) 9.1 5.5 1.4 9.6 5.2 1.2 99.6 98.5 93.3 99.8 99.6 96.6

(ii) 9.9 5.0 1.1 10.0 4.3 1.0 93.2 86.3 70.1 94.9 91.4 75.8

(iii) 11.1 5.2 1.3 11.7 6.8 1.3 92.8 87.7 70.2 98.2 96.6 90.2

Boot, bT = 8 (i) 10.3 5.5 1.2 10.5 5.7 1.0 99.1 97.8 88.5 99.6 99.1 94.1

(ii) 9.0 4.0 0.7 9.1 3.9 0.7 89.9 80.5 57.7 93.5 87.3 65.7

(iii) 11.5 5.1 0.9 11.8 5.6 1.3 91.5 82.4 58.4 99.3 97.4 82.4

Boot, bT = 16 (i) 11.3 6.1 1.2 10.9 6.2 1.7 99.0 97.0 86.3 99.7 98.9 92.4

(ii) 8.8 3.4 0.5 8.7 3.1 0.6 89.4 80.9 53.7 93.2 86.5 60.8

(iii) 11.2 5.9 1.6 11.9 6.5 2.3 92.1 84.2 60.4 98.6 96.5 83.8

Boot, bT = b∗ (i) 9.5 6.5 1.5 12.0 6.0 2.5 98.0 97.0 90.5 99.5 98.5 95.0

(ii) 13.0 5.5 1.0 12.0 5.5 1.0 89.0 81.5 57.5 93.0 87.5 67.0

(iii) 9.5 3.5 1.0 9.0 3.5 1.0 96.5 90.0 73.5 99.5 99.5 91.0

Note: b∗ denotes the block size chosen by the minimum volatility method.

43

−4

−2

02

IBM

Intr

ad

aily

cu

mu

lativ

e r

etu

rns

2005 2006 2007

−4

−2

02

4

MCD

Intr

ad

aily

cu

mu

lativ

e r

etu

rns

2005 2006 2007

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

IBM

valu

es

PC1, 84.0%PC2, 7.4%PC3, 2.6%

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

MCD

valu

es

PC1, 86.1%PC2, 6.5%PC3, 2.4%

0.0

0.1

0.2

0.3

0.4

0.5

IBM

Year

P−

valu

es

2005 2006 2007

boot, bN=1GK, H=3GK, H=5HHR, H=30HHR, H=50

0.0

0.2

0.4

0.6

0.8

MCD

Year

P−

valu

es

2005 2006 2007

boot, bN=1GK, H=3GK, H=5HHR, H=30HHR, H=50

Figure 1: Top panels: CIDRs for IBM and MCD from 2005 to 2007; Middle panels: the first

three PCs of the CIDRs for IBM and MCD at 2007; Bottom panels: p-values for testing the

uncorrelatedness of the CIDRs for IBM and MCD from 2005 to 2007.

44

Supplement to “White Noise Testing and Model Diagnostic Checking

for Functional Time Series”

Xianyang Zhang∗

Texas A&M University

December 27, 2015

This supplement contains the proofs of Lemmas 3.1-3.2 and 6.3-6.9 in the main paper.

1 Proofs of Lemmas

Proof of Lemma 6.3. Write Ds,l =(∑k

h=1

∑t∈B∗

sΓth,l

)/(2

√T ) and Ds,l =

(∑kh=1

∑t∈B∗

sΓh,l

)/(2

√T ). By the CR-inequality, we have E

∗|Ds,l|4 ≤ C(E∗|Ds,l|4 + E

∗|Ds,l|4).

For the first term, we have

E

MT∑

s=1

E∗|Ds,l|4 =

1

16T 2

MT∑

s=1

k∑

h1,h2,h3,h4=1

t1,t2,t3,t4∈Bs

EΓt1h1,lΓt2h2,lΓt3h3,lΓt4h4,l

=1

16T 2

MT∑

s=1

k∑

h1,h2,h3,h4=1

t1,t2,t3,t4∈Bs

cum(Γt1h1,l,Γt2h2,l,Γt3h3,l,Γt4h4,l)

+ cov(Γt1h1,l,Γt2h2,l)cov(Γt3h3,l,Γt4h4,l) + cov(Γt1h1,l,Γt3h3,l)cov(Γt2h2,l,Γt4h4,l)

+ cov(Γt1h1,l,Γt4h4,l)cov(Γt2h2,l,Γt3h3,l)

→ 0,

by similar arguments in Section 6.2. For the second term, it is not hard to show

E

MT∑

s=1

E∗|Ds,l|4 ≤

MT b4T

16T 2

k∑

h1,h2,h3,h4=1

EΓh1,lΓh2,lΓh3,lΓh4,l

≤CMT b4T

T 6

k∑

h1,h2,h3,h4=1

Wh1,l1Wh2,l1Wh3,l1Wh4,l1

T∑

t1=h1+1

T∑

t2=h2+1

T∑

t3=h3+1

T∑

t4=h4+1

E

4∏

i=1

(Xti,l2Xti−hi,l3 +Xti,l3Xti−hi,l2 − EXti,l2Xti−hi,l3 − EXti,l3Xti−hi,l2) → 0.

Therefore, E∑MT

s=1 E∗|Ds,l|4 ≤ CE

∑MT

s=1

(E∗|Ds,l|4 + E

∗|Ds,l|4)→ 0. ♦

∗Department of Statistics, Texas A&M University, College Station, TX 77843, USA. E-mail: [email protected].

1

Proof of Lemma 6.4. We only prove the result for J∗2,k,T . The argument for J∗

2,k,T is similar. Notethat

EE∗||J∗

2,k,T ||2A =E1

8π2T

T−1∑

h=k+1

1

h2E∗ <

MT∑

s=1

t∈B∗s

Yth,

MT∑

s′=1

t′∈B∗s′

Yt′h >

=E1

8π2T

T−1∑

h=k+1

1

h2

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

Yth

∣∣∣∣∣

∣∣∣∣∣

2

I2

+ (1− 1/MT )

∣∣∣∣∣

∣∣∣∣∣

MT∑

s=1

t∈Bs

Yth

∣∣∣∣∣

∣∣∣∣∣

2

I2

≤E1

2π2T

T−1∑

h=k+1

1

h2

+∞∑

i,j=1

MT∑

s=1

(∑

t∈Bs

(Xt,iXt−h,j − EXt,iXt−h,j)

)2

+ (1− 1/MT )

(MT∑

s=1

t∈Bs

(Xt,iXt−h,j − EXt,iXt−h,j)

)2

=E1

2π2T

T−1∑

h=k+1

1

h2(J1,h,T + J2,h,T ).

For the first term, we have EJ1,h,T/T = 1T

∑+∞i,j=1

∑MT

s=1

∑t,t′∈Bs

cov(Xt,iXt−h,j ,Xt′,iXt′−h,j). Follow-ing the arguments in the proof of Theorem 2.1, we can show that the contribution from J1,h,T is oforder O(1) uniformly for h. On the other hand, we note that,

EJ2,h,T/T ≤ 1

T

+∞∑

i,j=1

MT∑

s,s′=1

t∈Bs

t′∈Bs′

cov(Xt,iXt−h,j,Xt′,iXt′−h,j)

=1

T

+∞∑

i,j=1

s 6=s′

t∈Bs

t′∈Bs′

cov(Xt,iXt−h,j,Xt′,iXt′−h,j) +1

T

+∞∑

i,j=1

MT∑

s=1

t,t′∈Bs

cov(Xt,iXt−h,j ,Xt′,iXt′−h,j).

(S.1)

The second term in (S.1) can be handled using similar arguments in Section 6.2. And for the firstterm, we have

1

T

+∞∑

i,j=1

s 6=s′

t∈Bs

t′∈Bs′

cov(Xt,iXt−h,j,Xt′,iXt′−h,j)

≤CMT

T

+∞∑

s=−∞|s|(|||γs|||21 + |||Rs+h,s,h|||1 + ||γs−h||S ||γs+h||S

).

Therefore, limk→+∞ limT→+∞ EE∗||J∗

2,k,T ||2 = 0. ♦

Proof of Lemma 6.5. We aim to show that var(∑T

t=1 ||Yt||2/T ) = o(1). Using the fact that Yt =η(Xt) + ǫt and η is a bounded linear operator, we deduce that

1

T 2var

(T∑

t=1

||Yt||2)

≤ C

T 2

var

(T∑

t=1

||Xt||2)

+ var

(T∑

t=1

||ǫt||2)

.

2

Direct calculation yields that

1

T 2var

(T∑

t=1

||Xt||2)

=1

T 2

T∑

t,t′=1

cov(||Xt||2, ||Xt′ ||2

)=

1

T 2

T∑

t,t′=1

I2

cov(X2t (τ1),X

2t′(τ2))dτ1dτ2.

≤ 1

T

+∞∑

s=−∞

∣∣∣∣∫

I2

RXs,0,s(τ1, τ2, τ1, τ2) + 2γ2X,s(τ1, τ2)dτ1dτ2

∣∣∣∣

≤ 1

T

+∞∑

s=−∞(|||RX

s,0,s|||1 + 2||γX,s||2S) = O(1/T ).

The same argument applies to the term associated with ǫt. ♦

Proof of Lemma 3.1. Under the assumption that Xt and ǫt are both L4-m-approximable, it isstraightforward to show that ||CY,X − CY,X ||S = Op(1/

√T ) and ||CX − CX ||S = Op(1/

√T ) [see

Theorem 3.1 of Hormann and Kokoszka (2010)]. Define the projection operator πkT (·) =∑kT

j=1 <

φj , · > φj and πkT (·) =∑kT

j=1 < φj , · > φj . For any x ∈ L2(I) with ||x|| ≤ 1, we have thedecomposition ηT (x) − η(x) = ηT πkT (x) − ηπkT (x) + ηπkT (x) − η(x) = A1,T (x) + A2,T (x). Wefurther decompose A1,T (x) as follows,

A1,T (x) =ηT πkT (x)− ηπkT (x) = CY,X

kT∑

j=1

(λ−1j − λ−1

j ) < φj , x > φj

+ CY,X

kT∑

j=1

λ−1j < φj − φj , x > φj

+ CY,X

kT∑

j=1

λ−1j < φj , x > (φj − φj)

+ (CY,X − CY,X)

kT∑

j=1

λ−1j < φj, x > φj

=A1,1,T (x) +A1,2,T (x) +A1,3,T (x) +A1,4,T (x).

By the Cauchy Schwartz inequality, we can show that ||CY,X(φj)||2 ≤ λj∑T

t=1 ||Yt||2/T . Because

|λj − λj| ≤ ||CX − CX ||L, we deduce that

||A1,1,T (x)|| ≤kT∑

j=1

|λj − λj|λjλj

| < φj , x > | · ||CY,X(φj)||

≤(

T∑

t=1

||Yt||2/T)1/2

||CX − CX ||LkT∑

j=1

1

λj λ1/2j

| < φj , x > |

≤(

T∑

t=1

||Yt||2/T)1/2

λ−1/2kT

kT∑

j=1

1

λ2j

1/2

||CX − CX ||L.

As Tλ2kT

→ +∞, we have P (||CX − CX ||L ≤ λkT /2) → 1, which implies that P (λkT > λkT /2) → 1.

On the event λkT > λkT /2, we have

||A1,1,T (x)|| ≤√2

(T∑

t=1

||Yt||2/T)1/2

λ−1/2kT

kT∑

j=1

1

λ2j

1/2

||CX − CX ||L.

3

By Lemma 6.5 and the assumption that (∑kT

j=1 1/λ2j )/(λkT

√T ) = o(1), we have

sup||x||≤1 ||A1,1,T (x)|| = op(1/T1/4). Further by Lemma 4.3 of Bosq (2000), we have ||φj − φj || ≤

αj ||CX − CX ||L. Thus we obtain

||A1,2,T (x)|| ≤(

T∑

t=1

||Yt||2/T)1/2

||CX − CX ||LkT∑

j=1

λ1/2j λ−1

j αj

≤C

(T∑

t=1

||Yt||2/T)1/2

||CX − CX ||LkT∑

j=1

λ−1j αj = op(1/T

1/4),

by the assumption that∑kT

j=1 λ−1j αj/

√T = o(1/T 1/4). Turning to A1,3,T (x) and A1,4,T (x), we have

||A1,3,T (x)|| ≤ ||CY,X ||L||CX − CX ||LkT∑

j=1

λ−1j aj = op(1/T

1/4),

and

||A1,4,T (x)|| ≤ ||CY,X − CY,X ||LkT∑

j=1

λ−1j = op(1/T

1/4),

which implies that sup||x||≤1 ||A1,T (x)|| = op(1/T1/4). Finally, we note that

||A2,T (x)|| ≤∑

j>kT

| < x, φj > | · ||η(φj)|| ≤

j>kT

||η(φj)||2

1/2

= o(1/T 1/4).

Summarizing the above results, we have ||ηT − η||L = sup||x||≤1 ||ηT (x)− η(x)|| = op(1/T1/4). ♦

Proof of Lemma 6.6. By stationarity, we only need to show that var(∑T

t=1 ||Xt||2)/T 2 = o(1). Letρj(·, ·) be the kernel associated with the operator ρj . Define Pj,j′(τ

′1, τ

′2) =

∫I ρ

j(τ1, τ′1)ρ

j′(τ1, τ′2)dτ1,

which is the kernel associated with the operator ρj∗ρj′where ρj∗ denotes the adjoint operator of ρj.

Using the fact that Xt =∑+∞

j=0 ρj(εt−j) [see Theorem 3.1 of Bosq (2000)], we have

1

T 2var

(T∑

t=1

||Xt||2)

=1

T 2

T∑

t,t′=1

cov

∣∣∣∣∣∣

∣∣∣∣∣∣

+∞∑

j1=0

ρj1(εt−j1)

∣∣∣∣∣∣

∣∣∣∣∣∣

2

,

∣∣∣∣∣∣

∣∣∣∣∣∣

+∞∑

j2=0

ρj2(εt′−j2)

∣∣∣∣∣∣

∣∣∣∣∣∣

2

=1

T 2

T∑

t,t′=1

I2

cov

+∞∑

j1=0

ρj1(εt−j1)(τ1)

2

,

+∞∑

j2=0

ρj2(εt′−j2)(τ2)

2 dτ1dτ2

=1

T 2

T∑

t,t′=1

T∑

j1,j2,j3,j4=0

I4

Pj1,j2(τ′1, τ

′2)Pj3,j4(τ

′3, τ

′4)cov(εt−j1(τ

′1)εt−j2(τ

′2), εt′−j3(τ

′3)εt′−j4(τ

′4))dτ

′1dτ

′2dτ

′3dτ

′4

=1

T 2

T−1∑

s=1−T

(T − |s|)T∑

j1,j2,j3,j4=0

I4

Pj1,j2(τ′1, τ

′2)Pj3,j4(τ

′3, τ

′4)

s+j4−j1,s+j4−j2,j4−j3(τ′1, τ

′2, τ

′3, τ

′4)

+ γε,s+j3−j1(τ′1, τ

′3)γε,s+j4−j2(τ

′2, τ

′4) + γε,s+j4−j1(τ

′1, τ

′4)γε,s+j3−j2(τ

′2, τ

′3)

dτ ′1dτ

′2dτ

′3dτ

′4,

4

where γε,s = 0 for s 6= 0. Because ||Pj1,j2 ||S ≤ ||ρ||j1+j2S , by the Cauchy-Schwarz inequality, we deduce

that∫

I4

Pj1,j2(τ′1, τ

′2)Pj3,j4(τ

′3, τ

′4)Rε

s+j4−j1,s+j4−j2,j4−j3(τ′1, τ

′2, τ

′3, τ

′4)dτ

′1dτ

′2dτ

′3dτ

′4

≤||ρ||j1+j2+j3+j4S ||Rε

s+j4−j1,s+j4−j2,j4−j3 ||S .

Similarly we have

I4

Pj1,j2(τ′1, τ

′2)Pj3,j4(τ

′3, τ

′4)γε,s+j3−j1(τ

′1, τ

′3)γε,s+j4−j2(τ

′2, τ

′4)dτ

′1dτ

′2dτ

′3dτ

′4

≤||ρ||j1+j2+j3+j4S ||γε,s+j3−j1 ||S ||γε,s+j4−j2 ||S ,

and∫

I4

Pj1,j2(τ′1, τ

′2)Pj3,j4(τ

′3, τ

′4)γε,s+j4−j1(τ

′1, τ

′4)γε,s+j3−j2(τ

′2, τ

′3)dτ

′1dτ

′2dτ

′3dτ

′4

≤||ρ||j1+j2+j3+j4S ||γε,s+j4−j1 ||S ||γε,s+j3−j2 ||S ,

The conclusion follows from the summability conditions. ♦

Proof of Lemma 3.2. Under the assumption that E||ε1||4 < ∞, we know Xt is L4-m-approximable[see Example 2.1 of Hormann and Kokoszka (2010)]. It is straightforward to show that ||γX,i −γX,i||S = Op(1/

√T ) with i = 0, 1. The rest of the proof is essentially similar to those in Section 8.6 of

Bosq (2000) and the proof of Lemma 3.1. We sketch the steps below. Define the projection operatorsπkT (·) =

∑kTj=1 < φj , · > φj and πkT (·) =

∑kTj=1 < φj , · > φj . For x ∈ L2(I) with ||x|| ≤ 1, we have

the decomposition ρT (x) − ρ(x) = ρT πkT (x) − ρπkT (x) + ρπkT (x) − ρ(x) = A1,T (x) + A2,T (x).Further decompose A1,T (x) as follows,

A1,T (x) =ρT πkT (x)− ρπkT (x) = γX,1

kT∑

j=1

(λ−1j − λ−1

j ) < φj , x > φj

+ γX,1

kT∑

j=1

λ−1j < φj − φj , x > φj

+ γX,1

kT∑

j=1

λ−1j < φj, x > (φj − φj)

+ (γX,1 − γ1)

kT∑

j=1

λ−1j < φj , x > φj

=A1,1,T (x) +A1,2,T (x) +A1,3,T (x) +A1,4,T (x).

By similar arguments in the proof of Lemma 3.1 [also see equations (8.83), (8.85), (8.87) and (8.89)

5

of Bosq (2000)], we have

||A1,1,T (x)|| ≤ 2√2||γX − γX ||L

(1

T

T∑

t=1

||Xt||2)1/2

λ−1/2kT

kT∑

j=1

λ−2j

1/2

= op(1/T1/4),

||A1,2,T (x)|| ≤ C

(1

T

T∑

t=1

||Xt||2)1/2

kT∑

j=1

λ−1j αj

||γX − γX ||L = op(1/T

1/4),

||A1,3,T (x)|| ≤ ||γX,1||L

kT∑

j=1

λ−1j αj

||γX − γX ||L = op(1/T

1/4),

||A1,4,T (x)|| ≤ ||γX,1 − γX,1||L

kT∑

j=1

λ−1j

= op(1/T

1/4).

Here we have used Lemma 6.6 and Assumption 3.6. Thus we obtain that sup||x||≤1 ||A1,T (x)|| =

op(1/T1/4). The conclusion follows by noting that ||A2,T (x)|| ≤

∑j>kT

| < φj , x > | · ||ρ(φj)|| ≤(∑

j>kT||ρ(φj)||2)1/2 = o(1/T 1/4). ♦

Proof of Lemma 6.7. We first note that

T−1∑

h=1

1

Th2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

ργX γ−1X − ρ(Xt−1)εt−h

∣∣∣∣∣

∣∣∣∣∣

2

I2

≤∣∣∣∣ργX γ−1

X − ρ∣∣∣∣2L

T−1∑

h=1

1

Th2

∣∣∣∣∣

∣∣∣∣∣

T∑

t=h+1

Xt−1εt−h

∣∣∣∣∣

∣∣∣∣∣

2

I2

.

Thus we only need to show that∣∣∣∣ργX γ−1

X − ρ∣∣∣∣L = op(1). For any x ∈ L2(I) with ||x|| ≤ 1, simple

algebra yields that γX γ−1X (x) − x = 1

T

∑Tt=2

∑kTj=1

1λj

< Xt−1, φj >< x, φj > Xt−1 − x = −∑j>kT<

x, φj > φj . It thus implies ||ργX γ−1X (x) − ρ(x)||2 = ||∑j>kT

< x, φj > ρ(φj)||2 ≤ (∑

j>kT| <

x, φj > | · ||ρ(φj)||)2 ≤∑j>kT< x, φj >

2∑

j>kT||ρ(φj)||2 ≤∑j>kT

||ρ(φj)||2. Under the assumption∑kT

j=1 αj/√T = o(1), we can show

∑j>kT

||ρ(φj)||2 = op(1) by similar arguments in the proof of

Lemma 8.2 in Bosq (2000). It thus implies that∣∣∣∣ργX γ−1

X − ρ∣∣∣∣2L ≤ ∑

j>kT||ρ(φj)||2 = op(1). The

conclusion follows by noting that∑T−1

h=11

Th2

∣∣∣∣∣∣∑T

t=h+1Xt−1εt−h

∣∣∣∣∣∣2

I2= Op(1). ♦

Proof of Lemma 6.8. Define the operator IT,h(·) = 1T

∑Tj=h+1 < Xj−1, γ

−1X (·) > εj−h = γε1,Xh

γ−1X (·)

with γε1,Xh= 1

T

∑Tj=h+1 < Xj−1, · > εj−h. We note that the RHS of (50) is bounded by

1T

∣∣∣∣∣∣∑T

t=2Xt−1εt

∣∣∣∣∣∣2

I2

∑T−1h=1

1h2 ||IT,h − γε,0ρ

h−1∗ γ−1

X πkT ||2L. Because 1T

∣∣∣∣∣∣∑T

t=2 Xt−1εt

∣∣∣∣∣∣2

I2= Op(1), we

only need to show that∑T−1

h=11h2 ||IT,h−γε,0ρ

h−1∗ γ−1

X πkT ||2L = op(1). For any x ∈ L2(I) with ||x|| ≤ 1,the triangle inequality yields that

||IT,h(x)− γε,0ρh−1∗ γ−1

X πkT (x)||L ≤||γε1,Xh− γε,0ρ

h−1∗ ||L||γ−1

X (x)||+ ||γε,0||L||ρ∗||h−1

L ||γ−1X (x)− γ−1

X πkT (x)||.

Using similar arguments in the proof of Lemma 3.2, we can show that ||γ−1X − γ−1

X πkT ||L = op(1).Moreover, under the assumption that |||γε,0|||1 < ∞ and

∑+∞s1,s2,s3=1 |||Rε

s1,s2,s3 |||1 < ∞, it can be

shown∑T−1

h=11h2E||γε1,Xh

−γε,0ρh−1∗ ||2S = O(1/T ) (see arguments in the proof of Theorem 3.3). Notice

that ||γ−1X (x)|| ≤ ||γ−1

X πkT (x)|| + op(1) = λ−1kT

||x|| + op(1). The conclusion follows provided that

Tλ2kT

→ +∞. ♦

6

Proof of Lemma 6.9. Using the arguments in the proof of Lemma 6.8 and the proof of Theorem3.3, we can show that

∑T−1h=1

1Th2

∑MT

s=1 ||∑

t∈BsLj,t,h||2I2 = op(1) with j = 1, 2, 3 provided that

lim sup bT /√T < ∞. Below we consider L4,t,h. Decompose L4,t,h as,

L4,t,h =DTρh−1γε,0(τ1, τ2)−DTγXπkT γ

−1X ρh−1γε,0(τ1, τ2)

+DT (Xt−1)(τ1)εt−h(τ2)−DTρh−1γε,0(τ1, τ2)

+DTγXπkT γ−1X ρh−1γε,0(τ1, τ2)−DT (Xt−1)(τ1)γε,0ρ

h−1∗ γ−1

X πkT (Xt−1)(τ2)

+DT (Xt−1)(τ1)γε,0ρh−1∗ γ−1

X πkT (Xt−1)(τ2)−DT (Xt−1)(τ1)IT,h(Xt−1)(τ2)

=L(1)4,t,h + L

(2)4,t,h + L

(3)4,t,h + L

(4)4,t,h. (S.2)

We note that∑T−1

h=11

Th2

∑MT

s=1 ||∑

t∈BsL(1)4,t,h||2I2 ≤ ∑T−1

h=1bTh2 ||DT ||2L||ρh−1γε,0||2S = op(1) because

lim sup bT /√T < ∞. It is straightforward to show that

∑T−1h=1

1Th2

∑MT

s=1 ||∑

t∈BsL(2)4,t,h||2I2 = o(1)

by using similar arguments as before. For the third term in (S.2), notice that

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

DTγXπkT γ−1X ρh−1γε,0(τ1, τ2)−DT (Xt−1)(τ1)γε,0ρ

h−1∗ γ−1

X πkT (Xt−1)(τ2)∣∣∣∣∣

∣∣∣∣∣

2

I2

=||DT ||2L||γε,0ρh−1∗ γ−1

X πkT ||2L

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

Xt−1(τ1)Xt−1(τ2)− γX(τ1, τ2)∣∣∣∣∣

∣∣∣∣∣

2

I2

.

Using the fact that ||γε,0ρh−1∗ γ−1

X πkT ||2L ≤ C/λ2kT

and the assumption that 1/(Tλ4kT

) = o(1) (as

T−1/4∑kT

j=1 αj/λj = o(1)), we can show∑T−1

h=11

Th2

∑MT

s=1 ||∑

t∈BsL(3)4,t,h||2I2 = op(1). Following

the proof of Lemma 6.8 we have∑T−1

h=11h2 ||IT,h − γε,0ρ

h−1∗ γ−1

X πkT ||L = op(1). It is then not

hard to show that∑T−1

h=11

Th2

∑MT

s=1 ||∑

t∈BsL(4)4,t,h||2I2 = op(1). The contribution from L6,t,h is

asymptotically negligible provided that 1T E||

∑Tt=2 εtγε,0ρ

h−1∗ γ−1

X πkT (Xt−1)||I2 = O(1). Finally, let

γX,−h(·) =∑T

j=h+1 < Xj−1, · > Xj−h−1 and note that

T−1∑

h=1

1

Th2

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

L5,t,h

∣∣∣∣∣

∣∣∣∣∣

2

I2

≤T−1∑

h=1

1

Th2||DT γX,−hγ

−1X ||2L

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

εtXt−1

∣∣∣∣∣

∣∣∣∣∣

2

I2

≤||DT ||2L||γ−1X ||2L

T−1∑

h=1

1

Th2||γX,−h||2S

MT∑

s=1

∣∣∣∣∣

∣∣∣∣∣∑

t∈Bs

εtXt−1

∣∣∣∣∣

∣∣∣∣∣

2

I2

= op(1),

where we have used the fact that ||DT ||2L/λ2kT

→ 0, 1T

∑MT

s=1

∣∣∣∣∑t∈Bs

εtXt−1

∣∣∣∣2I2 = Op(1), and

∑T−1h=1

1h2 ||γX,−h||2S = Op(1). ♦

References

[1] Bosq, D. (2000), Linear Process in Function Spaces: Theory and Applications, New York:Springer.

[2] Hormann, S., and Kokoszka, P. (2010), “Weakly Dependent Functional Data,” Annals of Statis-

tistics, 38, 1845-1884.

7