whitenoisetestingandmodeldiagnosticcheckingforfunctional ...zhangxiany/goodness-2015.pdfcompared to...
Post on 07-May-2018
214 Views
Preview:
TRANSCRIPT
White Noise Testing and Model Diagnostic Checking for Functional
Time Series
Xianyang Zhang∗
Texas A&M University
December 28, 2015
Abstract This paper is concerned with white noise testing and model diagnostic checking for stationary
functional time series. To test for the functional white noise null hypothesis, we propose a Cramer-von Mises
type test based on the functional periodogram introduced by Panaretos and Tavakolithe (2013a). Using the
Hilbert space approach, we derive the asymptotic distribution of the test statistic under suitable assumptions.
A new block bootstrap procedure is introduced to obtain the critical values from the non-pivotal limiting
distribution. Compared to existing methods, our approach is robust to the dependence within white noise and
it does not involve the choices of functional principal components and lag truncation number. We employ the
proposed method to check the adequacy of functional linear models and functional autoregressive models of
order one by testing the uncorrelatedness of the residuals. Monte Carlo simulations are provided to demonstrate
the empirical advantages of the proposed method over existing alternatives. Our method is illustrated via an
application to cumulative intradaily returns.
Keywords: Block bootstrap, Functional autoregressive models, Functional linear models, Functional time
series, Goodness-of-fit, Model diagnostic checking, Periodogram, Spectra-based test, White noise testing.
1 Introduction
Functional data analysis (FDA) has emerged as an important area of statistics which pro-
vides convenient and informative tools for the analysis of data objects of high dimension/high
resolution, and it is generally applicable to problems which are difficult to cast into a frame-
work of scalar or vector observations. In many applications, especially if data are collected
sequentially over time, it is natural to expect that the observations exhibit certain degrees
of dependence. During the past decade, there is a growing body of research on parametric
and nonparametric inference for dependent functional data. We refer the interested readers
to the excellent monograph by Horvath and Kokoszka (2012). In this article, our interest
∗Department of Statistics, Texas A&M University, College Station, TX 77843, USA. E-mail: zhangxi-any@stat.tamu.edu. I am grateful to Xiaofeng Shao for numerous helpful suggestions and comments on anearlier version. The author would like to thank the Editor, the Associate Editor and two reviewers for theirinsightful comments and constructive suggestions, which substantially improved the paper.
1
concerns white noise testing (testing for serial correlation) for functional observations, and its
application to model diagnostic checking.
In the univariate/multivariate time series context, white noise testing is a classical problem
which has attracted considerable attention. There is a huge literature on the white noise
testing problem and the existing tests can be roughly categorized into two types: time domain
correlation-based tests and frequency domain periodogram-based tests [see e.g. Durlauf (1991);
Hong (1996); Deo (2000) among others]. In the functional time series context, developments
have been mainly devoted to the time domain based approaches. For example, Gabrys and
Kokoszka (2007) proposed a portmanteau test for testing the uncorrelatedness of a sequence
of functional observations. Horvath et al. (2013) proposed an independence test based on
the sum of the L2 norms of the empirical correlation functions. The validity of these tests is
justified under the independent and identically distributed (i.i.d) assumption, and thus they
are not robust to dependent white noise. In the univariate setting, the distinction between an
i.i.d sequence and an uncorrelated sequence in the diagnostic checking context has been found
to be important. On one hand, some commonly used nonlinear time series models, such as
ARCH/GARCH models imply white noise but are dependent. On the other hand, the limiting
null distributions of some commonly used test statistics, such as Box and Pierce’s portmanteau
test, are obtained under the i.i.d assumption, and they are no longer valid under the assumption
of dependent white noise [Romano and Thombs (1996); Lobato et al. (2002); Francq et al.
(2005); Horowitz et al. (2006); Shao (2011b)]. In the functional setting, this distinction is
again important. For example, a functional ARCH(1) process (FARCH(1)) is a functional
white noise but is dependent [Hormann et al. (2013)]. In this paper, we propose a spectra-
based testing procedure which is robust to the dependence within the white noise sequence.
Our test is constructed based on the periodogram function recently introduced by Panaretos
and Tavakolithe (2013a) and it has nontrivial power against Pitman’s local alternatives that
are within a√T -neighborhood of the null hypothesis, where T denotes the sample size. Our
test also avoids projecting functional objects onto a space whose dimension is held fixed in the
asymptotics. Under suitable weak dependence assumptions, we show that the spectra-based
test has a non-pivotal limiting distribution. To conduct inference, we introduce a novel block
bootstrap procedure which is able to imitate the limiting null distribution. It is worth pointing
out that when the white noise is a martingale difference sequence in a functional space, the
block size in our bootstrap procedure is allowed to be one.
In statistical modeling, diagnostic checking is an integrable part of model building. A
common way of testing the adequacy of the proposed time series model is by checking the
assumption of white noise residuals. Systematic departure from this assumption implies the
inadequacy of the fitted model. We employ the proposed white noise testing procedure to
test the goodness of fit for functional linear models and functional autoregressive models of
order one (FAR(1)) with uncorrelated but possibly dependent errors. Diagnostic checking in
2
functional linear models has been studied by Chiou and Muller (2007) and Gabrys et al. (2010).
The latter authors proposed two inferential tests (GHK tests hereafter) for error correlation.
The main differences between our test and the GHK tests are threefold. First, our test is
constructed based on the L2 norm of the periodogram function, and it does not involve the
choices of functional principal components and lag truncation number which are required in
the GHK tests. Second, we do not assume any asymptotic independence under the null of the
errors while the asymptotic validity of the GHK tests is established under the independence
assumption. The asymptotic null distribution of our test depends on the underlying data
generating process (DGP) and is no longer pivotal. We justify the validity of the proposed
block bootstrap procedure when applied to the residuals in Section 3.1. Third, we employ
the truncated regularization to estimate the functional linear operator, where the underlying
dimension is allowed to grow slowly with the sample size. While for Gabrys et al’s approaches,
a linear model is constructed and estimated via least squares after projecting the functional
objects onto a space whose dimension is fixed in the asymptotic analysis.
Furthermore, we apply the proposed method to check the adequacy of FAR(1) models. The
FAR(1) model is conceptually simple as it is an extension of the univariate AR(1) model to the
functional setup, yet very flexible because the autoregressive operator acts on a Hilbert space
whose elements can exhibit any degree of nonlinearity. Various nonparametric and prediction
methods for the FAR(1) models have been developed, and numerous applications have been
found; see Besse et al. (2000), Laukaitis and Rackauskas (2002), Antoniadis and Sapatinas
(2003), Fernandez de Castro et al. (2005), Kargin and Onatski (2008), among others. In
spite of the wide use of the FAR(1) models, there seems no systematic methods available to
check for the goodness of fit. In this paper, we shall fill in this gap by applying the spectra-
based test to test the uncorrelatedness of the residuals. In contrast to the case of functional
linear models, the estimation effect induced by replacing the unobservable innovations with
their estimates is not asymptotically negligible due to the dependence structure of the FAR(1)
models. To circumvent the difficulty, we further propose a modified block bootstrap that takes
the estimation effect into account.
Finally, we point out that spectral analysis of stationary functional time series has been
recently advanced by Panaretos and Tavakoli (2013a), Panaretos and Tavakoli (2013b) and
Hormann et al. (2015). We refer to Panaretos and Tavakoli (2013a) for many interesting
details on estimation and asymptotics of the spectral density operator.
The layout of the article is as follows. We introduce the spectra-based test in Section 2.1.
To obtain the critical values from the limiting null distribution, we propose a block bootstrap
procedure in Section 2.2. A general class of test statistics is discussed in Section 2.3. We
employ the proposed method to test the goodness of fit for the functional linear models and
the FAR(1) models in Section 3. Sections 4 is devoted to the finite sample performance of the
proposed method including simulations and an application to cumulative intradaily returns.
3
Section 5 concludes. The proofs are postponed to Section 6 and the online supplementary
material.
2 White noise testing
Notation: Let ı =√−1 be the imaginary unit. Denote by I a compact set of a Euclidian
space. Let Ik be the cartesian product of k copies of I with k ∈ N, and A := [0, 1]× I × I.Denote by L2(J ) the Hilbert space of square integrable functions defined on J , where J = Ikor A. For any functions f, g ∈ L2(J ), the inner product between f and g is defined as
< f, g >=∫J f(τ)g(τ)dτ and || · ||J denotes the inner product induced norm. Let || · || := || · ||I .
Assume that the random elements all come from the same probability space (Ω,F ,P). Denote
by LpHthe space of H := L2(I) valued random variables X such that (E||X||p)1/p < ∞. For
any compact operator, denote by ||| · |||1, || · ||L and || · ||S the nuclear norm, the uniform
norm and the Hilbert-Schmidt norm respectively. Let Re(·) and Im(·) be the real part and
the imaginary part of a complex number. Without ambiguity, we shall use the same symbol
for operator and the kernel associated with the operator in the following discussion.
2.1 Spectra-based test
For the ease of presentation, we consider a sequence of mean-zero stationary functional time
series Xt(τ)+∞t=1 defined on a compact set I. With suitable modifications on the arguments,
the results are expected to be extended to the situation with nonzero mean function. The lag-h
autocovariance function for Xt is defined as γh(τ1, τ2) = EXt(τ1)Xt−h(τ2) with τ1, τ2 ∈ I, andthe corresponding autocovariance operator is given by γh(·) = E < Xt−h, · > Xt. Following
Panaretos and Tavakolithe (2013a), we define the spectral density kernel as
fω(τ1, τ2) :=1
2π
+∞∑
h=−∞γh(τ1, τ2) exp(−ıhω), ω ∈ [−π, π]. (1)
Given the functional observations XtTt=1 where T is the sample size, we are interested in
testing whether the functional time series Xt has serial correlation in a functional space. In
other words, we want to test the null hypothesis
H0 : fω(τ1, τ2) = γ0(τ1, τ2)/(2π), for any ω ∈ [−π, π],
versus the alternative that
Ha : fω(τ1, τ2) 6= γ0(τ1, τ2)/(2π), for some ω ∈ [−π, π],
4
i.e., γh(τ1, τ2) 6= 0 for some lag h. To introduce the testing procedure, we define the discrete
Fourier transform (DFT) of XtTt=1 to be
Xω(τ) :=1√2πT
T∑
t=1
Xt(τ) exp(−ıtω). (2)
The periodogram function is then given by
Pω(τ1, τ2) :=Xω(τ1)X−ω(τ2) =1
2πT
∑
1≤t,t′≤TXt(τ1)Xt′(τ2) expı(t′ − t)ω
=1
2π
T−1∑
h=1−Tγh(τ1, τ2) exp(−ıhω), τ1, τ2 ∈ I,
(3)
where γh(τ1, τ2) =∑
1≤t−h,t≤T Xt(τ1)Xt−h(τ2)/T is the sample autocovariance function. Let
Ψh(λ) = sin(hπλ)/(hπ) ∈ L2([0, 1]) and VT,h(τ1, τ2) = γh(τ1, τ2) + γh(τ2, τ1). Define the follow-
ing process based on the real part of the periodogram function
TT (λ, τ1, τ2) =√T
∫ πλ
0
RePω(τ1, τ2)dω − λ
∫ π
0
RePω(τ1, τ2)dω
=
√T
2
T−1∑
h=1
VT,h(τ1, τ2)Ψh(λ) =1
2√T
T−1∑
h=1
T∑
t=2
Yth(τ1, τ2)Ψh(λ),
(4)
where Yth(τ1, τ2) = Xt(τ1)Xt−h(τ2) +Xt(τ2)Xt−h(τ1) for h + 1 ≤ t ≤ T and Yth(τ1, τ2) = 0 for
t ≤ h. We consider the Cramer-von Mises type statistic which is defined as,
TT,CM =||TT ||2A =T
8π2
T−1∑
h=1
||VT,h||2I2/h2, (5)
where we have used the fact that < Ψh,Ψh′ >= Ih = h′/(2π2h2). Below we derive the
asymptotic distribution of TT,CM . As the functional objects and the associated operators both
lie on some Hilbert spaces, it seems natural to adopt the Hilbert space approach [see e.g.,
Politis and Romano (1994); Escanciano and Velasco (2006); Shao (2011a)] for the asymptotic
analysis.
To quantify the dependence within Xt, we shall use the Lp-m-approximating dependence
measure [Hormann and Kokoszka (2010)] for its broad applicability to linear and nonlinear
functional processes.
Definition 2.1. Assume that Xi ∈ LpH0
for some separable Hilbert space H0 and p > 0 admits
the following representation
Xi = f(ξi, ξi−1, . . . ), i = 1, 2, . . . , (6)
5
where the ξi’s are i.i.d elements taking values in a measurable space S and f is a measurable
function f : S∞ → H0. For each i ∈ N, let ξ(i)j j∈Z be an independent copy of ξjj∈Z. The
sequence Xi is said to be Lp-m-approximable if
∞∑
m=1
(E||X1 −X(m)1 ||p)1/p <∞, (7)
where X(m)i = f(ξi, ξi−1, . . . , ξi−m+1, ξ
(i)i−m, ξ
(i)i−m−1, . . . ).
Before presenting the weak convergence result, we introduce the cumulant operator which
characterizes the high order properties of functional time series.
Definition 2.2. Let Rt1,t2,...,t2k−1(τ1, . . . , τ2k−1, τ2k) = cum(Xt1(τ1), . . . , Xt2k−1
(τ2k−1), X0(τ2k))
be the cumulant function [Brillinger (1975)]. Define the 2kth order cumulant operator
L2(Ik) → L2(Ik) as
(Rt1,t2,...,t2k−1g)(τ1, . . . , τk) =
∫
Ik
Rt1,t2,...,t2k−1(τ1, . . . , τ2k)g(τk+1, . . . , τ2k)dτk+1 · · · dτ2k. (8)
We are now in position to present one of the main results in this section. Recall that ||| · |||1denotes the nuclear norm of a compact operator, see the definition in Bosq (2000) and Zhu
(2007).
Theorem 2.1. Suppose Xt is a L4-m-approximable process. Under the assumption that∑+∞
s=−∞ |||γs|||1 <∞ and∑+∞
s1,s2,s3=−∞ |||Rs1,s2,s3|||1 <∞, we have
GT (λ, τ1, τ2) := TT (λ, τ1, τ2)− ETT (λ, τ1, τ2) ⇒ G(λ, τ1, τ2), (9)
where “ ⇒ ” denotes the weak convergence in L2(A), and G is a mean-zero Gaussian process
whose covariance structure is specified by
cov(G(λ, τ1, τ2),G(λ′, τ ′1, τ ′2)) =1
4
+∞∑
i,j=1
+∞∑
s=−∞Ψi(λ)Ψj(λ
′)cov(Xs+i(τ1)Xs(τ2) +Xs+i(τ2)Xs(τ1),
Xj(τ′1)X0(τ
′2) +Xj(τ
′2)X0(τ
′1)).
(10)
Theorem 2.1 shows that GT (λ, τ1, τ2) converges to a Gaussian process whose covariance
operator depends on the fourth order cumulant of the functional time series. Notice that
under H0, EYth(τ1, τ2) = 0 for any h ≥ 1 and thus ETT (λ, τ1, τ2) = 0. As an immediate
consequence, we obtain the following result by the continuous mapping theorem whose proof
is omitted to conserve space.
6
Theorem 2.2. Under the assumptions in Theorem 2.1 and the null hypothesis H0, we have
TT,CM →d
∫ 1
0
∫
I2
G2(λ, τ1, τ2)dτ1dτ2dλ. (11)
Moreover, under the alternative Ha, we have
TT,CM/T →p 1
8π2
+∞∑
h=1
∫
I2
γh(τ1, τ2) + γh(τ2, τ1)2 dτ1dτ2/h2. (12)
Theorem 2.2 shows that the Cramer-von Mises type statistic converges to a non-pivotal lim-
iting null distribution. Under any fixed alternatives with∫I2 γh(τ1, τ2) + γh(τ2, τ1)2 dτ1dτ2 6=
0 for some h ∈ N, the Cramer-von Mises type test is consistent. It is worth pointing out that
the Hilbert space approach employed here is not applicable to the statistic of Kolmogorov-
Smirnov type because the “sup” functional is no longer a continuous mapping in the Hilbert
space.
Below we consider the local alternative Ha,T : fω(τ1, τ2) = γ0(τ1, τ2)1 +
gω(τ1, τ2)/√T/(2π), where gω(τ1, τ2) is a 2π-periodic function that satisfies g−ω(τ1, τ2) =
gω(τ2, τ1) and∫ π−π gω(τ1, τ2)dω = 0. Under Ha,T , γh(τ1, τ2) =
γ0(τ1,τ2)
2π√T
∫ π−π gω(τ1, τ2) exp(ıhω)dω
for h 6= 0. Using the fact that 12
∑T−1h=1 Ψh(λ)
∫ π−πgω(τ1, τ2) + gω(τ2, τ1) exp(ıhω)dω →∫ πλ
0Regω(τ1, τ2)dω, we have TT (λ, τ1, τ2) ⇒ G(λ, τ1, τ2) + ϕ(τ1, τ2), where ϕ(τ1, τ2) =
γ0(τ1, τ2)∫ πλ0Regω(τ1, τ2)dω/(2π). Therefore, we obtain the following result.
Theorem 2.3. Under the assumptions in Theorem 2.1 and the local alternative hypothesis Ha,T ,
TT,CM →d
∫ 1
0
∫
I2
G(λ, τ1, τ2) + ϕ(τ1, τ2)2dτ1dτ2dλ. (13)
2.2 Block bootstrap
Although the non-pivotal limiting distribution in (11) can be approximated numerically for
given parametric models, the critical values are case-by-case and they can be unreliable when
the assumed model is misspecified. To overcome this problem, we introduce a block bootstrap
approach which is able to mimic the limiting distribution of the test statistic without imposing
restrictive parametric assumptions. Assume for simplicity that bTMT = T−1 with bT ,MT ∈ Z.
Our procedure can be described as follows:
1. Let Yth(τ1, τ2) = Xt(τ1)Xt−h(τ2)+Xt(τ2)Xt−h(τ1)−γh(τ1, τ2)−γh(τ2, τ1) for h+1 ≤ t ≤ T
and Yth(τ1, τ2) = 0 for t ≤ h, where 2 ≤ t ≤ T and 1 ≤ h ≤ T − 1;
2. Define the non-overlapping blocks of indices Bs = bT (s − 1) + 2, . . . , bT s + 1 with
1 ≤ s ≤MT ;
7
3. Pick MT blocks with replacement from BsMT
s=1. Joint the MT blocks and let (t∗2, . . . , t∗T )
be the corresponding indices for the bootstrapped samples. Denote by Y∗lh = Yt∗
lh for
2 ≤ l ≤ T ;
4. Compute the bootstrap statistic T ∗T,CM by replacing VT,h with V∗
T,h =∑T
l=2 Y∗lh/T in the
definition of TT,CM ;
5. Repeat the above steps a large number of times to get the bootstrap critical values for
TT,CM .
Our method is closely related with the blockwise wild bootstrap [Shao (2011a); Zhu and Li
(2015)] while it does not require generating auxiliary variables from a common distribution.
Compared to the usual block bootstrap procedure, the proposed method is nonstandard in the
sense that it resamples the blocks containing the transformed data Yth(τ1, τ2) rather than the
blocks formed by the original observations. Below we establish the consistency of the block
bootstrap procedure.
Theorem 2.4. Assume that bT → +∞ and bT /T → 0 as T → +∞. Suppose∑+∞
s=−∞ |s| ·|||γs|||1 <∞ and
∑+∞s1,s2,s3=−∞ |si| · |||Rs1,s2,s3|||1 <∞ with i = 1, 2, 3. Further assume that
+∞∑
s1,...,sJ=−∞|sj| · ||cum(Xs1, Xs2, . . . , XsJ , X0)||IJ+1 <∞, j = 1, 2, . . . , J, (14)
for J = 1, 2, . . . , 7. Under the null hypothesis H0, or any fixed or local alternatives, we have
ρw
[L(√
T
2
T−1∑
h=1
V∗T,h(τ1, τ2)Ψh(λ)
∣∣∣∣XT
),L(G(λ, τ1, τ2))
]→ 0, (15)
in probability as T → +∞, where L(·|XT ) denotes the distribution conditional on the sample
XT = XtTt=1 and ρw is any metric that metricizes weak convergence in L2(A). As a conse-
quence, the bootstrap statistic T ∗T,CM converges to the limiting distribution specified in (11) in
probability.
Under the martingale difference assumption, i.e., E[Xs(τ)|Xs′(τ′), τ ′ ∈ I] = 0 with s′ < s
and τ ∈ I, the covariance between G(λ, τ1, τ2) and G(λ′, τ ′1, τ ′2) reduces to
1
4
+∞∑
i,j=1
Ψi(λ)Ψj(λ′)cov(X0(τ1)X−i(τ2) +X0(τ2)X−i(τ1), X0(τ
′1)X−j(τ
′2) +X0(τ
′2)X−j(τ
′1)). (16)
Following the arguments in the proof of Theorem 2.4, we can show that the bootstrap method
is able to replicate the limiting distribution with bT = 1, see Remark 6.1 and Escanciano and
Velasco (2006).
8
In practice, the performance of the bootstrap procedure could depend on the choice of bT .
We adapt the minimal volatility method [see Chapter 9 of Politis et al. (1999)] to select the
block size in our setting. The basic idea is exploited by computing the bootstrap critical values
for a number of block sizes bT and then looking for a region where the bootstrap critical values
do not change very much. The procedure can be described as follows:
1. For bT = 1 to bT =√T , compute the bootstrap critical values CT,bT (1−α) for the desired
significance level α;
2. For each bT , define V IbT as the volatility index which is equal to the standard deviation
of the values CT,bT−k(1− α), CT,bT−k+1(1− α), . . . , CT,bT+k(1− α) for k = 3;
3. Pick b∗ corresponding to the smallest volatility index and use CT,b∗(1−α) as the critical
value of the test.
We shall investigate the finite performance of this procedure in Section 4.
Remark 2.1. Let Wth(τ1, τ2) = Xt+h(τ1)Xt(τ2) + Xt+h(τ2)Xt(τ1) − γh(τ1, τ2) − γh(τ2, τ1) for
1 ≤ t + h ≤ T and Wth(τ1, τ2) = 0 for t + h > T , where 1 ≤ t, h ≤ T − 1. In view
of the proof of Theorem 2.4, the block bootstrap procedure based on the transformed data
Wth, 1 ≤ t, h ≤ T − 1 is also valid.
Remark 2.2. It might be possible to construct a pivotal test by replacing the periodogram
function by its normalized counterpart. However, such normalization step requires the esti-
mation of the fourth order property of the white noise sequence as well as estimation of an
inverse operator, which can be quite challenging from a practical viewpoint. To avoid these
complications, we suggest the use of the block bootstrap procedure to calibrate the sampling
distribution of TT,CM .
2.3 A general class of test statistics
In this section, we provide some discussions on a general class of test statistics. The Cramer-
von Mises statistic is the weighted sum of the L2 norms of VT,hT−1h=1 , where VT,h = γh + γ∗h
with γ∗h denoting the adjoint operator of γh. Because the Cramer-von Mises statistic puts
rather heavy weights on low order sample autocovariance operator, it can lose power against
strong dependence. Motivated by the form of TT,CM , we consider a general class of statistics.
Let ζh(λ)+∞h=1 be a sequence of orthogonal basis in L2[0, 1] such that
∑+∞h=1 ||ζh||2 <∞. Then
a general class of test statistics can be defined as
TT,G = S(
1√T
T−1∑
h=1
T∑
t=2
Ythζh
), (17)
9
where S : L2(A) → R is a continuous functional. In view of the proofs of Theorem 2.1 and
Theorem 2.4, we can show that under suitable assumptions, TT,G converges to a non-pivotal
limiting null distribution and the block bootstrap is still valid. If S is the L2 norm and
||ζh|| = Ih ≤ H with 1 ≤ H ≤ T − 1, the general test statistic has the form,
TT,G = T
H∑
h=1
||γh + γ∗h||2S , (18)
which can be viewed as a functional version of the Box and Pierce’s test. When H is allowed
to grow slowly with the sample size T , Horvath et al. (2013) showed that
T∑H
h=1 ||γh||2I2 −H|||γ0|||21(2H||γ0||4S)
1/2→ N(0, 1), (19)
where |||γ0|||1 =∫I γ0(τ, τ)dτ and ||γ0||S =
(∫I2 γ
20(τ1, τ2)dτ1dτ2
)1/2. This result is closely
related to the findings in Hong (1996) for univariate time series [also see Hong and Lee (2003);
Shao (2011b)].
3 Model diagnostics
In this section, we apply the proposed white noise testing procedure to test the goodness
of fit for functional linear models and FAR(1) models. The basic idea here is to check the
assumption of white noise errors by examining the residuals from the fitted model. In the case
of FAR(1) models, a modified block bootstrap is introduced to capture the estimation effect.
3.1 Functional linear models
Testing the error correlation for functional linear models has been considered in Gabrys
et al. (2010). The authors proposed two time domain tests with different ways of defining
the finite-dimensional residuals. The GHK tests involve the choices of functional principal
components and lag truncation number for the autocovariance function. In practice, the time
domain tests can be sensitive to the choice of the tuning parameters especially when the first
few principal components of the errors are uncorrelated or the errors are only correlated at
large lags (also see the simulations in Section 4). Moreover, the GHK tests are not robust
to uncorrelated but dependent processes such as the FARCH process because the asymptotic
validity of their tests is established under the independence assumption. In contrast, our
spectra-based test belongs to the test of omnibus type, and it has nontrivial power against
local alternatives that are within a√T -neighborhood of the null hypothesis. One important
feature of our testing procedure is its robustness to the dependence within the error sequence.
10
To fix the idea, we consider the functional linear model,
Yt(τ1) = η(Xt)(τ1) + ǫt(τ1), t = 1, 2, . . . , τ1 ∈ I, (20)
where η(X)(τ1) =∫I K(τ1, τ2)X(τ2)dτ2 for K ∈ L2(I2) and X ∈ H, and ǫt(τ) is a sequence
of white noise which is uncorrelated with the covariate Xt(τ) (with some abuse of notion,
we use Xt to denote the covariate in this subsection). Suppose ǫt and Xt are both
mean-zero sequences and they are jointly stationary. Define the covariance operator of Xtas γX(·) = γX,X(·) = E[< Xt, · > Xt] and the cross-covariance operator between Xt and
Yt as γY,X(·) = E[< Xt, · > Yt]. For u, u′ ∈ H, note that
< γY,X(u), u′ >= E < Xt, u >< Yt, u
′ >= E < Xt, u >< η(Xt), u′ >=< ηγX(u), u
′ >,
which implies that γY,X = ηγX . By Mercer’s lemma [Riesz and Sz-Nagy (1955)], the covariance
operator γX admits the spectral decomposition,
γX(·) =+∞∑
j=1
λj < φj, · > φj, (21)
where λj and φj are the eigenvalue and eigenfunction respectively. Suppose that the eigenvalues
are ordered so that λ1 ≥ λ2 ≥ · · · ≥ 0. Define the sample covariance operator γX(·) =∑T
t=1 <
Xt, · > Xt/T and the sample cross-covariance operator γY,X(·) =∑T
t=1 < Xt, · > Yt/T . Let
λjTj=1 be the eigenvalues of γX with λ1 ≥ λ2 ≥ · · · ≥ λT ≥ 0, and let φjTj=1 be the
corresponding eigenfunctions. When λp > 0 and λp+1 = 0 for some p ∈ N, Xt lies on a
p-dimensional space spanned by φjpj=1, which is isomorphic to Rp. In this case, η can be
estimated by γY,X γ−1X , where γ−1
X is the inverse of γX restricted to the space spanned by
φjpj=1. In what follows, we consider the case where λj > 0 for all j ∈ N, which reflects more
fully the infinite-dimensional nature of functional data. Estimating the inverse covariance
operator γ−1X is challenging due to the so-called ill-posed problem. To tackle this problem, we
consider the truncated estimator,
γ−1X (·) =
kT∑
j=1
λ−1j < φj , · > φj, (22)
where kT grows slowly with the sample size. To regularize the inverse covariance operator, one
may alternatively consider the Tikhonov regularization, i.e., γ−1X (·) =∑T
j=1λj
λ2j+aT< φj, · > φj
or the penalization, i.e., γ−1X (·) =∑T
j=11
λj+aT< φj, · > φj with aT being a positive sequence
that converges to zero. Readers are referred to Arsenin and Tikhonov (1977) and Groetsch
(1993) for more details about the ill-posed problem. Based on the truncated regularization, a
11
moment estimator for η is given by
ηT (·) = γY,X γ−1X (·) = 1
T
T∑
t=1
kT∑
j=1
λ−1j < Xt, φj >< φj, · > Yt. (23)
Because kT is allowed to grow slowly with T in our asymptotic analysis, our approach is
significantly different from some existing works [e.g. Gabrys et al. (2010)] where functional
objects are projected onto a finite-dimensional space with the underlying dimension being
fixed in the asymptotics. In practice, kT can be chosen using the simple cumulative variation
rule, i.e.,
kT = arg min1≤kT≤T
∑kTj=1 λj∑Tj=1 λj
> α0
,
where α0 ∈ (0, 1), e.g. α0 = 85%, 90% or 95%. However, a serious investigation about the
choice of kT is beyond the scope of the current paper.
Denote by γX,s and γǫ,s the autocovariance operators for Xt and ǫt at lag s respectively.
Let RXs1,s2,s3
and Rǫs1,s2,s3
be the cumulant operators for Xt and ǫt. We make the following
assumptions, under which we establish the consistency of ηT under the uniform norm.
Assumption 3.1. Suppose Xt and ǫt are jointly stationary. Assume that Xtis L4-m-approximable with
∑+∞s=−∞ |||γX,s|||1 < +∞ and
∑+∞s1,s2,s3=−∞ |||RX
s1,s2,s3|||1 <
+∞, and the same assumption holds for ǫt. Define Rǫ,Xs+h,s,h(τ1, τ2, τ3, τ4) =
cum(ǫs+h(τ1), Xs(τ2), ǫh(τ3), X0(τ4)), and suppose that suph∈Z∑+∞
s=−∞ |||Rǫ,Xs+h,s,h|||1 <∞.
Assumption 3.2. Assume that λj > 0 for all j.
Assumption 3.3. Let α1 = 2√2(λ1 − λ2)
−1 and αj = 2√2max(λj−1 − λj)
−1, (λj − λj+1)−1
for j ≥ 2. Assume that T−1λ−2kT
= o(1), T−1/4∑kT
j=1 αj/λj = o(1), λ−1kTT−1/2
∑kTj=1 1/λ
2j = o(1)
and∑
j>kT||η(φj)||2 = o(1/
√T ).
Lemma 3.1. Under Assumptions 3.1-3.3, ||ηT − η||L = op(T−1/4).
Assumption 3.3 imposes restrictions regarding the spacing and decay rate of the eigenvalues.
In particular, it requires the distinct eigenvalues condition λ1 > λ2 > λ3 > · · · , which is
common in the literature [see e.g. Horvath and Kokoszka (2012)]. The above assumptions
allow us to show the op(T−1/4) rate which further implies that the estimation effect caused by
replacing η with its estimate ηT is asymptotically negligible.
Based on the consistent estimator ηT , the errors can be naturally estimated by ǫt = Yt −ηT (Xt). Let Vǫ,T,h(τ1, τ2) = γǫ,h(τ1, τ2) + γǫ,h(τ2, τ1) with γǫ,h(τ1, τ2) =
∑Tt=h+1 ǫt(τ1)ǫt−h(τ2)/T.
We define the test statistic as
Tǫ,T,CM =T
8π2
T−1∑
h=1
||Vǫ,T,h||2I2/h2. (24)
12
Theorem 3.1. Suppose that ǫt satisfies the assumptions in Theorem 2.1. Under Assumptions
3.1-3.3 and the null hypothesis (i.e., the error sequence is uncorrelated), we have
Tǫ,T,CM →d
∫ 1
0
∫
I2
G2ǫ (λ, τ1, τ2)dτ1dτ2dλ, (25)
where Gǫ is a mean-zero Gaussian process whose covariance structure is given by
cov(Gǫ(λ, τ1, τ2),Gǫ(λ′, τ ′1, τ ′2)) =1
4
+∞∑
i,j=1
+∞∑
s=−∞Ψi(λ)Ψj(λ
′)cov(ǫs+i(τ1)ǫs(τ2) + ǫs+i(τ2)ǫs(τ1),
ǫj(τ′1)ǫ0(τ
′2) + ǫj(τ
′2)ǫ0(τ
′1)).
Theorem 3.1 indicates that the effect caused by replacing the errors with their estimates is
asymptotically negligible (also see (10)), which is essentially due to the exogeneity of Xt. Toconduct inference based on Tǫ,T,CM , we can employ the block bootstrap described in Section 2.2.
Define Yǫ,th(τ1, τ2) = ǫt(τ1)ǫt−h(τ2) + ǫt(τ2)ǫt−h(τ1)− γǫ,h(τ1, τ2)− γǫ,h(τ2, τ1) for h+ 1 ≤ t ≤ T
and Yǫ,th(τ1, τ2) = 0 for t ≤ h, where 2 ≤ t ≤ T and 1 ≤ h ≤ T − 1. Replacing Yth with Yǫ,th,and following the steps in Section 2.2, we obtain the (approximate) critical values of Tǫ,T,CM .
The consistency of the block bootstrap procedure is established in the following theorem.
Theorem 3.2. Suppose the assumptions in Theorem 2.4 hold for ǫt. Then under Assumptions
3.1-3.3 and the null hypothesis,
ρw
[L(T ∗ǫ,T,CM
∣∣∣∣ZT
),L(∫ 1
0
∫
I2
G2ǫ (λ, τ1, τ2)dτ1dτ2dλ
)]→ 0,
in probability, where T ∗ǫ,T,CM denotes the bootstrap statistic and ZT = (Xt, Yt)Tt=1.
3.2 Functional autoregressive models
A sequence of H-valued random variables Xt(τ) is called a FAR(1) process if it is sta-
tionary and satisfies that
Xt(τ1)− µ(τ1) = ρ(Xt−1 − µ)(τ1) + εt(τ1), t = 1, 2, . . . , (26)
where ρ(X)(τ1) =∫I ψ(τ1, τ2)X(τ2)dτ2 for ψ ∈ L2(I2) and X ∈ H, and εt(τ) is a sequence
of mean-zero H-valued (white noise) innovations in L2H. When ||ρj ||L < 1 for some j ∈ N,
there is a unique stationary solution Xt(τ) = µ(τ) +∑+∞
j=0 ρj(εt−j)(τ) for (26). For the sake
of clarity, we assume that µ(τ) = 0 throughout the following discussion. With slightly abuse
of notation, define the autocovariance operator of Xt at lag s as γX,s(·) = E < Xt−s, · > Xt
with γX = γX,0. Let λjTj=1 and φjTj=1 be respectively the eigenvalues and eigenfunctions of
13
the sample covariance operator γX =∑T−1
t=1 < Xt, · > Xt/T . By Theorem 3.2 of Bosq (2000),
we have ργX = γX,1. This result motivates us to consider the moment estimator,
ρT (·) = γX,1γ−1X (·) = 1
T
T∑
t=2
kT∑
j=1
λ−1j < Xt−1, φj >< φj , · > Xt, (27)
where γX,1 =∑T
t=2 < Xt−1, · > Xt/T and γ−1X (·) =
∑kTj=1 λ
−1j < φj, · > φj is the truncated
estimator. Denote by γε,0 and Rεs1,s2,s3 the covariance operator and the cumulant operator for
εt respectively. To establish the consistency of ρT , we make the following assumptions.
Assumption 3.4. Assume that ||ρ||S < 1 and∑+∞
s1,s2,s3=1 ||Rεs1,s2,s3||S < +∞.
Assumption 3.5. Assume that λj > 0 for all j.
Assumption 3.6. Let α1 = 2√2(λ1 − λ2)
−1 and αj = 2√2max(λj−1 − λj)
−1, (λj − λj+1)−1
for j ≥ 2. Assume that T−1/4∑kT
j=1 αj/λj = o(1), λ−1kTT−1/2
∑kTj=1 1/λ
2j = o(1) and
∑j>kT
||ρ(φj)||2 = o(1/√T ).
Lemma 3.2. Under Assumptions 3.4-3.6, ||ρT − ρ||L = op(T−1/4).
Lemma 3.2 shows that ρT is consistent under the uniform norm (with the rate op(T−1/4)),
which slightly generalizes the results in Chapter 8 of Bosq (2000) by allowing the errors to
be dependent. The residual from the fitted model is thus given by εt = Xt − ρT (Xt−1) with
1 ≤ t ≤ T and X0 = 0. Let Vε,T,h(τ1, τ2) = γε,h(τ1, τ2) + γε,h(τ2, τ1), where γε,h(τ1, τ2) =∑T
t=h+1 εt(τ1)εt−h(τ2)/T . Our test statistic is defined to be,
Tε,T,CM =T
8π2
T−1∑
h=1
||Vε,T,h||2I2/h2. (28)
To study the asymptotic behavior of Tε,T,CM , we present the following result whenXt is infinite-
dimensional. Let πkT (·) =∑kT
j=1 < φj , · > φj be the projection operator and γ−1/2X (·) =
∑+∞j=1 λ
−1/2j < φj, · > φj. Define Vε,T,h(τ1, τ2) =
∑Tt=h+1 εt(τ1)εt−h(τ2) + εt(τ2)εt−h(τ1) /T ,
and denote by ρ∗ the adjoint operator of ρ.
Proposition 3.1. Consider the FAR(1) model [see (26)] with εt being a sequence of i.i.d
random variables in L2H. If
|||γ−1/2X ρh−1γ2ε,0ρ
h−1∗ γ
−1/2X |||1 =
+∞∑
j=1
||γε,0ρh−1∗ (φj)||2λj
<∞, (29)
14
uniformly for all h ≥ 1, we have
√T
2
T−1∑
h=1
[Vε,T,h(τ1, τ2)−
1
T
T∑
t=2
εt(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2)
+ εt(τ2)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ1)]Ψh(λ) ⇒ Gε(λ, τ1, τ2),
(30)
where Gε is a mean-zero Gaussian process such that
cov(Gε(λ, τ1, τ2),Gε(λ′, τ ′1, τ ′2))
=1
4
+∞∑
i,j=1
Ψi(λ)Ψj(λ′)cov(h0,i(τ1, τ2) + h0,i(τ2, τ1), h0,j(τ1, τ2) + h0,j(τ2, τ1)),
with hs,i(τ1, τ2) = εs(τ1)εs−i(τ2)− εs(τ1)γε,0ρi−1∗ γ−1
X (Xs−1)(τ2).
Condition (29) basically requires that γε,0ρh−1∗ should be at least as smooth as γ
1/2X . In
particular, if ρ(·) = ∑+∞j=1 λρ,j < φj , · > φj with 0 < infj |λρ,j| ≤ supj |λρ,j| < 1, and γε,0(·) =∑+∞
j=1 λε,j < φj, · > φj (i.e., ρ and γε,0 share the same set of eigenfunctions), we have γX(·) =∑+∞j=1
λε,j1−λ2ρ,j
< φj, · > φj. Condition (29) is fulfilled because |||γε,0|||1 =∑+∞
j=1 λε,j <∞.
We are now in position to present the main result in this section.
Theorem 3.3. Suppose Assumptions 3.4-3.6 are fulfilled. Under the assumptions in Proposition
3.1, we have √T
2
T−1∑
h=1
Vε,T,h(τ1, τ2)Ψh(λ) ⇒ Gε(λ, τ1, τ2), (31)
and
Tε,T,CM →d
∫ 1
0
∫
I2
G2ε (λ, τ1, τ2)dτ1dτ2dλ. (32)
We proved Theorem 3.3 by showing the asymptotic equivalence between√T2
∑T−1h=1 Vε,T,h(τ1, τ2)Ψh(λ) and the LHS of (30). Note that the covariance structure
of the limiting process Gε(λ, τ1, τ2) is different from that of the limiting process of√T2
∑T−1h=1 Vε,T,h(τ1, τ2)Ψh(λ) which is defined based on unobserved innovations (see e.g.
(10)). As seen from the arguments in the appendix, the estimation effect caused by replacing
the unobservable innovations with their estimates is not asymptotically negligible due to the
autocorrelation structure of the FAR(1) models (in contrast, for functional linear models, the
estimation effect is negligible as the covariate is exogenous). The block bootstrap procedure in
Section 2.2 can not be applied directly in this case as it does not capture the randomness from
ρT in the asymptotics. To overcome the difficulty, we consider a modified block bootstrap
procedure. Define
Bt,h(τ1, τ2) =T∑
j=h+1
et−1,j−1εt(τ1)εj−h(τ2)/T,
15
where et,j =∑kT
k=1 < Xt, φk >< Xj, φk > /λk. Here Bt,h is introduced to capture the
estimation effect. Let Yε,th(τ1, τ2) = εt(τ1)εt−h(τ2) + εt(τ2)εt−h(τ1)− γε,h(τ1, τ2)− γε,h(τ2, τ1)−Bt,h(τ1, τ2) − Bt,h(τ2, τ1) for h + 1 ≤ t ≤ T and Yth(τ1, τ2) = −Bt,h(τ1, τ2) − Bt,h(τ2, τ1) for
2 ≤ t ≤ h, where 2 ≤ t ≤ T and 1 ≤ h ≤ T − 1. The modified block bootstrap can be
described in a similar way as before by replacing Yth with Yε,th in the procedure presented
in Section 2.2. Proposition 3.2 below provides theoretical justification for the modified block
bootstrap under Assumption 6.1 in the technical appendix.
Proposition 3.2. Assume lim sup bT /√T < ∞. Under Assumptions 3.4-3.6 and Assumption
6.1, we have
ρw
[L(T ∗ε,T,CM
∣∣∣∣XT
),L(∫ 1
0
∫
I2
G2ε (λ, τ1, τ2)dτ1dτ2dλ
)]→ 0,
in probability, where T ∗ε,T,CM denotes the bootstrap statistic based on the modified block bootstrap
and XT = XtTt=1.
To conclude this section, we point out that the above results extend to FAR(p) models.
This is because if Xt is generated from FAR(p), then (Xt, . . . , Xt−p+1) is a FAR(1) process
in L2(Ip).
4 Numerical results
In this section, we evaluate the finite sample performance of the proposed method through
simulations and an application to cumulative intradaily returns. Throughout the simulations,
the sample sizes considered are T = 101 and 257, and the number of Monte Carlo replications
is set to be 1000 except for the minimal volatility method, where the number of replications
is 200 (due to the computational cost).
4.1 White noise testing
Under the null hypothesis, we simulate i.i.d standard Brownian motions (BMs), Brownian
bridges (BBs) and the FARCH(1) process which is defined as,
Xt(τ1) = εt(τ1)
√
τ1 +
∫ 1
0
cψ exp
(τ 21 + τ 22
2
)X2t−1(τ2)dτ2, t = 1, 2, . . . , τ1 ∈ [0, 1], (33)
where εt is a sequence of i.i.d standard BMs and cψ = 0.3418 so that the Hilbert-Schmidt
norm of the operator associated with cψ exp((τ21 +τ
22 )/2) is around 0.5. The data are generated
on a grid on 1000 equispaced points in [0, 1] for each functional observation. The functional
data are obtained by using B-splines with 20 basis functions. We also tried 100 basis functions
16
and found that the number of basis functions does not affect our results much. Under the
alternatives, we simulate data from FAR(1) models with i.i.d innovations (BM or BB), and
Gaussian kernel Kg(τ1, τ2) = cg exp((τ21 + τ 22 )/2) or Wiener kernel Kw(τ1, τ2) = cwmin(τ1, τ2),
where cg and cw are chosen so that the corresponding Hilbert-Schmidt norm is equal to 0.3. We
present the empirical sizes and powers for the bootstrap assisted spectra-based test (denoted
by “Boot”), the portmanteau test in Gabrys and Kokoszka (2007) (the GK test), and the
independence test in Horvath et al. (2013) (the HHR test) in Tables 1-2. We also implement
the minimum volatility method described in Section 2.2.
Denote by d andH the number of principal components and lag truncation number involved
in the GK test and the HHR test. Note that H is fixed in the asymptotics for the GK test,
while it is allowed to grow slowly in the HHR test. We choose d such that the cumulative
variation explained by the first d principal components is above 90%, and H = 1, 3 and 5 for
the GK test, and set H = 10, 30 and 50 for the HHR test. It is worth noting that the GK test
and HHR test use respectively the chi-square and standard normal distribution, which make
them easier to implement as compared to the bootstrap-assisted test. Bootstrap versions for
the GK and HHR tests are also possible and will likely lead to more accurate approximation.
From Tables 1, we observe that the GK test exhibits severe upward size-distortion for
the FARCH(1) process because it is not robust to the dependence within the white noise.
We also note that the GK test tends to be undersized when H gets large. The HHR test
generally provides accurate size but we still observe some upward size-distortion at 1% level
since the convergence of the sampling distribution of the HHR test statistic to the standard
normal distribution appears to be slow in the tails. This problem can be alleviated by suitable
power transformation [see Horvath et al. (2013)]. The spectra-based test provides accurate
size with bT = 1, which is due to the fact that the processes considered under the null are
martingale difference sequences and the block bootstrap is valid with bT = 1 as commented
in Section 2.2. When bT becomes larger, the size is more distorted at all levels, but the size
distortion decreases as T increases. In terms of the power, the proposed test outperforms the
two alternative methods in most cases and its power is robust to the choice of block size. The
GK test and the HHR test are comparably less powerful and they tend to deliver less power as
H increases because of the correlation structure of the FAR(1) models. It is also worth noting
that the minimum volatility method tends to perform well. Overall speaking, our method has
very good size and power properties.
4.2 Model diagnostic checking
Consider the functional linear model in (20) with η being the linear operator associated with
the Gaussian or Wiener kernels with Hilbert-Schmidt norm 0.5. The covariate Xt is gener-
ated from the FAR(1) model with Gaussian kernel and BB innovations that are independent of
the errors ǫt. Under the null, ǫt are i.i.d BMs, BBs, or are generated from the FARCH(1)
17
process defined in (33). Under the alternatives, we simulate ǫt from the FAR(1) model with
i.i.d innovations (BM or BB), and Gaussian or Wiener kernels with the Hilbert-Schmidt norm
equal to 0.3. For the spectra-based test, kT is chosen so that∑kT
j=1 λj/∑T
j=1 λj > 0.9 (unre-
ported simulation results show that the spectra-based test is in fact quite robust to the choice
of kT ). Tables 3-4 summarize the empirical sizes and powers for the spectra-based test and
the GHK tests. The GHK tests deliver accurate sizes for i.i.d errors while its size distortion
is severe for the FARCH(1) process. Note that as H gets larger, the empirical size and power
of the GHK tests both decrease. The spectra-based test provides accurate size with bT = 1.
When the block size increases, the size distortion becomes more obvious but it gets improved
for T = 257. As for the power, the spectra-based test generally outperforms the GHK tests
with H = 3, 5. And its power is not quite sensitive to the choice of block size.
Next we consider the goodness of fit testing for the FAR(1) models. Under the null, we
simulate FAR(1) processes with BM, BB or FARCH(1) innovations, and Gaussian or Wiener
kernels with the Hilbert-Schmidt norm equal to 0.5. We employ the modified block bootstrap
procedure described in Section 3.2 to obtain the critical values for the spectra-based test. To
evaluate the power of the proposed method, we further consider the FAR(2) model which is
defined as
Xt(τ1) = ρ1(Xt−1)(τ1) + ρ2(Xt−2)(τ1) + εt(τ1), t = 1, 2, . . . , (34)
where ρ1 and ρ2 are linear operators associated with the Gaussian or Wiener kernels, with
||ρ1||S = 0.5 and ||ρ2||S = 0.3. Table 5 presents the empirical sizes and powers of the spectra-
based test. For bT = 1, the spectra-based test delivers accurate size across all models. The size
becomes more distorted for larger bT , but it gets improved when the sample size T increases.
Moreover, as seen from Table 5, the spectra-based test delivers reasonable power, and it
becomes more powerful as the sample size increases. We also note that the minimum volatility
method performs well in this case. To summarize, the simulation results clearly demonstrate
the usefulness of the proposed method.
4.3 Application to cumulative intradaily returns
Intradaily price curves have been the subject of interest in the investment community and
its properties are known to be quite different from those of daily or weekly closing price, see
e.g. Tsay (2005). To illustrate the usefulness of the proposed method, we apply the testing
procedures developed in previous sections to the cumulative intradaily returns (CIDRs) of the
stocks of IBM and McDonald’s Corporation (MCD) from 2005 to 2007. Suppose Pt(s) is the
stock price (or more generally any financial asset price) at time s on day t. Following Gabrys
et al. (2010) and Kokoszka and Reimherr (2013), the CIDRs are defined as
Rt(s) = 100log(Pt(s))− log(Pt(s0)), s ∈ [0, 1], (35)
18
with s0 being some reference time point (such as the market opening price) and the interval
[0, 1] is the rescaled trading period (in our example, 9:30 to 16:00 EST). We consider the daily
trading data that are recorded on a 5-minute grid. The CIDRs are computed at finite grids
(78 points for each trading day) and are transformed into smooth curves using cubic B-splines
with 40 basis functions and least squares fitting. The top panels of Figure 1 present the CIDRs
for IBM and MCD from 2005 to 2007. A natural question to ask here is whether the shape of
the CIDRs curves on previous days provides any hint to the shape on the following day. To
partly answer this question, we employ the three tests that have been examined in Section 4.1
to test the uncorrelatedness of the CIDRs curves. The bottom panels of Figure 1 depict the
p-values for the spectra-based test with bT = 1, the GHK test with d = 4 and H = 3, 5, and
the HHR test with H = 30, 50. The three tests generally deliver the same conclusion except
that the HHR test suggests possible departure from the white noise assumption for the CIDRs
of IBM at 2006. Our empirical result is consistent with those in Kokoszka and Reimherr (2013)
which suggest that the shapes of CIDRs curves are generally not predictable. However, it is
interesting to find that all the three tests reject the null at 5% level for the CIDRs of IBM at
2007, the year preceding the collapse of 2008. We further employ the test in Section 3.2 with
the modified block bootstrap to check the adequacy of a FAR(1) model. The testing result
does not provide strong support for the fit of FAR(1) model as the corresponding p-values
are 8.8%, 2.8% and 5.2% for bT = 1, 13, 19 respectively (note the sample size T = 248 in this
case). The test based on the minimum volatility method also rejects the null hypothesis at
the 10% level. As seen from Figure 1, the CIDRs of IBM appear to be more volatile and
exhibit certain degrees of nonstationarity at 2007, the year before the financial crisis. More
sophisticated modeling that takes into account theses features might be helpful for explaining
the CIDRs patterns.
5 Conclusion
We have proposed new spectra-based tests for white noise testing and model diagnostic
checking in the context of functional time series. Compared with existing methods, the pro-
posed method has a number of appealing features: (i) with the aid of block bootstrap, our
procedure is able to capture the dependence within the white noise or error sequence, and
thus is widely applicable to a large class of uncorrelated nonlinear functional processes; (ii)
our tests are constructed based on the L2 norm of the spectral distribution function, and it
avoids the choices of functional principal components and lag truncation number which are
usually required for existing tests. In particular, when the white noise or the error sequence
is a martingale difference sequence, the bootstrap method is able to replicate the limiting
distribution even with the block size being one. Based on the empirical results in Section 4,
we suggest the block size bT = 1 in the bootstrap assisted spectra-based test for testing the
19
martingale difference null hypothesis; (iii) with kN growing slowly with the sample size, our
procedure allows the complexity of the estimated model to grow slowly with the sample size
[also see the recent work Fremdt et al. (2013)]. In contrast, the complexity of the estimated
model is typically fixed in the asymptotics for existing methods.
Although spectral methods have proven to be very powerful for analyzing time series and
spatial processes, they are not very well developed for functional data. It is hoped that our
modest contribution can spawn interest among researchers working in the spectral analysis for
functional time series.
6 Technical appendix
The appendix contains proofs of the main results in Sections 2-3. Proofs of some secondary
lemmas are provided in the online supplementary material.
6.1 Proofs of the main results in Section 2.1
Let αi(λ), λ ∈ [0, 1]+∞i=1 and ui(τ), τ ∈ I+∞
i=1 be two sequences of orthonormal basis of the
Hilbert spaces L2([0, 1]) and L2(I) respectively. Then αi(λ)uj(τ1)ul(τ2) : λ ∈ [0, 1], τ1, τ2 ∈ I+∞i,j,l=1
forms a sequence of basis for L2(A). Define Wh,i =< Ψh, αi > and Xt,j =< Xt, uj > . Denote by
L(A)⊗k the cartesian product of k copies of L(A).
Lemma 6.1. Suppose Xt is a L4-m-approximable process. Under the assumption that∑+∞
s=−∞ |||γs|||1 < ∞ and∑+∞
s1,s2,s3=−∞ |||Rs1,s2,s3 |||1 < ∞, we have
√TVT,h(τ1, τ2)Ψh(λ) : 1 ≤ h ≤ k ⇒ Yh(λ, τ1, τ2) : 1 ≤ h ≤ k, k ∈ N,
where VT,h = VT,h−EVT,h and Yh(λ, τ1, τ2) : 1 ≤ h ≤ k is a mean-zero Gaussian process in L(A)⊗k
whose cross-covariance structure is given in (38).
Proof of Lemma 6.1. Under the assumption that Xt is L4-m-approximable, the
sequence Wh,i(Xt,jXt−h,l + Xt,lXt−h,j − EXt,jXt−h,l − EXt,lXt−h,j) : 1 ≤h ≤ k+∞
t=1 is L2-m-approximable [Hormann and Kokoszka (2010)]. Thus1√T
∑Tt=h+1 Wh,i(Xt,jXt−h,l +Xt,lXt−h,j − EXt,jXt−h,l − EXt,lXt−h,j) : 1 ≤ h ≤ k
converges
to a multivariate normal distribution [Theorem A.1 of Aue et al. (2009)]. We show that for k ∈ N
and 1 ≤ h ≤ k,
limmax(N1,N2,N3)→+∞
supT
1
TE
∑
i≥N1,j≥N2,l≥N3
(T∑
t=h+1
Wh,i(Xt,jXt−h,l − EXt,jXt−h,l)
)2
= 0, (36)
which implies the tightness of the sequence in L(A)⊗k [see e.g., Theorem 2.7 of Bosq (2000)]. Note
20
that,
supT
1
TE
∑
i≥N1,j≥N2,l≥N3
(T∑
t=h+1
Wh,i(Xt,jXt−h,l − EXt,jXt−h,l)
)2
=supT
1
T
∑
i≥N1,j≥N2,l≥N3
T∑
t=h+1
T∑
t′=h+1
W 2h,icov(Xt,jXt−h,l,Xt′,jXt′−h,l)
=∑
i≥N1
W 2h,i sup
T
1
T
∑
j≥N2,l≥N3
T∑
t=h+1
T∑
t′=h+1
cum(Xt,j ,Xt−h,l,Xt′,j,Xt′−h,l)
+ EXt,jXt′,jEXt−h,lXt′−h,l + EXt,jXt′−h,lEXt−h,lXt′,j
=∑
i≥N1
W 2h,i sup
T
1
T
∑
j≥N2,l≥N3
T∑
t=h+1
T∑
t′=h+1
< Rt−t′+h,t−t′,h(uj ⊗ ul), uj ⊗ ul >
+ < γt−t′uj , uj >< γt−t′ul, ul > + < γt−t′+hul, uj >< γt−t′−huj , ul >
≤∑
i≥N1
W 2h,i
∑
j≥N2,l≥N3
+∞∑
s=−∞
∣∣∣∣ < Rs+h,s,h(uj ⊗ ul), uj ⊗ ul >
+ < γsuj , uj >< γsul, ul > + < γs+hul, uj >< γs−huj , ul >
∣∣∣∣.
By the summability condition, we deduce that
+∞∑
s=−∞
∞∑
j,l=1
| < γsuj , uj >< γsul, ul > | ≤+∞∑
s=−∞|||γs|||21 ≤
(+∞∑
s=−∞|||γs|||1
)2
< ∞,
+∞∑
s=−∞
∞∑
j,l=1
| < γs+hul, uj >< γs−huj , ul > |
≤+∞∑
s=−∞
+∞∑
j,l=1
| < γs+hul, uj > |2
1/2
+∞∑
j,l=1
| < γs+hul, uj > |2
1/2
≤+∞∑
s=−∞||γs||2S ≤
(+∞∑
s=−∞|||γs|||1
)2
< ∞,
and+∞∑
s=−∞
∞∑
j,l=1
| < Rs+h,s,h(uj ⊗ ul), uj ⊗ ul > | ≤+∞∑
s=−∞|||Rs+h,s,h|||1 < ∞. (37)
Therefore (36) holds. From the above derivation, it is not hard to see that the cross-covariance
operator between Yi and Yj is given by
(Ri,j φ)(λ, τ1, τ2) :=
+∞∑
s=−∞Ψi(λ)
∫
[0,1]×I2
Ψj(λ′)cov(Xs+i(τ1)Xs(τ2) +Xs+i(τ2)Xs(τ1),
Xj(τ′1)X0(τ
′2) +Xj(τ
′2)X0(τ
′1))φ(λ
′, τ ′1, τ′2)dλ
′dτ ′1dτ′2,
(38)
21
where φ(λ′, τ ′1, τ′2) ∈ L2(A). ♦
Proof of Theorem 2.1. Write GT (λ, τ1, τ2) as
GT (λ, τ1, τ2) =
√T
2
k∑
h=1
VT,h(τ1, τ2)Ψh(λ) +
√T
2
T−1∑
h=k+1
VT,h(τ1, τ2)Ψh(λ)
=J1,k,T (λ, τ1, τ2) + J2,k,T (λ, τ1, τ2).
For any fixed k, Lemma 6.1 and the continuous mapping theorem imply that J1,k,T (λ, τ1, τ2) ⇒∑k
h=1Yh(λ, τ1, τ2). We show that for any ǫ > 0 and δ > 0, there exists a large enough k so that
limT→+∞ P (||J2,k,T ||A > ǫ) < δ. Simple calculation yields that ||J2,k,T ||2A = T8π2
∑T−1h=k+1 ||VT,h||2I2/h
2,
where we have used the fact that < Ψh,Ψh′ >= 12π2h2
Iδ = δ′. It thus follows that for large enough
k and T ≥ k + 2,
P (||J2,k,T ||A > ǫ) ≤E||J2,k,T ||2Aǫ2
≤ T
2π2ǫ2
T−1∑
h=k+1
E||γh − Eγh||2I2/h2
=1
2Tπ2ǫ2
T−1∑
h=k+1
+∞∑
i,j=1
T∑
t,t′=h+1
cov(Xt,iXt−h,j ,Xt′,iXt′−h,j)/h2
=1
2Tπ2ǫ2
T−1∑
h=k+1
+∞∑
i,j=1
T∑
t,t′=h+1
cum(Xt,i,Xt−h,j ,Xt′,i,Xt′−h,j)
+ EXt,iXt′,iEXt−h,jXt′−h,j + EXt,iXt′−h,jEXt−h,jXt′,i
/h2
≤ 1
2π2ǫ2
T−1∑
h=k+1
+∞∑
s=−∞
(|||γs|||21 + |||Rs+h,s,h|||1 + ||γs−h||S ||γs+h||S
)/h2 < δ.
(39)
The tightness ofGT follows from the above arguments. Finally, we show that∑k
h=1Yh ⇒∑∞h=1Yh as
k → +∞, which implies the finite-dimensional convergence of GT [see Proposition 6.3.9 of Brockwell
and Davis (1991)]. By (38), we have
E
∣∣∣∣∣
∣∣∣∣∣
+∞∑
h=k+1
Yh
∣∣∣∣∣
∣∣∣∣∣
2
A
≤+∞∑
h,h′=k+1
E < Yh,Yh′ >
≤ 1
2π2
+∞∑
h=k+1
+∞∑
s=−∞
∫
I2
cov(Xs+h(τ1)Xs(τ2) +Xs+h(τ2)Xs(τ1),Xh(τ1)X0(τ2) +Xh(τ2)X0(τ1))dτ1dτ2/h2,
≤ 1
π2
+∞∑
h=k+1
+∞∑
s=−∞
(|||γs|||21 + |||Rs+h,s,h|||1 + ||γs−h||S ||γs+h||S + |||Rs,s−h,−h|||1 + |||γs−h|||1|||γs+h|||1
+ ||γs||2S)/h2 → 0,
as k → +∞, which thus completes the proof. ♦
22
6.2 Proofs of the main results in Section 2.2
Let E∗ and var∗ be the expectation and variance in the bootstrap world (conditional on
the sample XtTt=1). Denote by C a generic positive constant which is different from line
to line. Throughout the following proofs, we shall use the unbiased estimator γh(τ1, τ2) =1
T−h∑T
t=h+1 Xt(τ1)Xt−h(τ2) for γh(τ1, τ2). It is straightforward to verify that the difference by
replacing 1T
∑Tt=h+1Xt(τ1)Xt−h(τ2) with 1
T−h∑T
t=h+1 Xt(τ1)Xt−h(τ2) is asymptotically negligible.
Let Γth,l = (Xt,l2Xt−h,l3 + Xt,l3Xt−h,l2− < γhul3 , ul2 > − < γhul2 , ul3 >)Wh,l1 and Γth,l =
(Xt,l2Xt−h,l3 + Xt,l3Xt−h,l2− < γhul3 , ul2 > − < γhul2 , ul3 >)Wh,l1 with t ≥ h + 1, l = (l1, l2, l3)
and l1, l2, l3 ∈ N. Set Γth,l = Γth,l = 0 with t ≤ h.
Lemma 6.2. Suppose the assumptions in Theorem 2.4 hold. Then for any k ∈ N,
1
2√T
MT∑
s=1
k∑
h=1
∑
t∈B∗s
Γth,l →d<
k∑
h=1
Yh/2, αl1 ⊗ ul2 ⊗ ul3 >, (40)
in probability.
Proof of Lemma 6.2. First note that
var∗
1
2√T
MT∑
s=1
k∑
h=1
∑
t∈B∗s
Γth,l
=
1
4T
MT∑
s=1
(k∑
h=1
∑
t∈Bs
Γth,l
)2
− 1
4TMT
(k∑
h=1
T∑
t=h+1
Γth,l
)2
=I1,l,T − I2,l,T .
For I1,l,T , we have
EI1,l,T =1
4T
MT∑
s=1
k∑
h,h′=1
∑
t,t′∈Bs
EΓth,lΓt′h′,l
=1
4T
MT∑
s=1
k∑
h,h′=1
∑
t∈Bs,t≥h+1
∑
t′∈Bs,t′≥h′+1
Wh,l1Wh′,l1cov(Xt,l2Xt−h,l3 +Xt,l3Xt−h,l2 ,Xt′,l2Xt′−h′,l3
+Xt′,l3Xt′−h′,l2)
→1
4
k∑
h,h′=1
Wh,l1Wh′,l1
∞∑
s=−∞cov(Xs+h,l2Xs,l3 +Xs+h,l3Xs,l2 ,Xh′,l2X0,l3 +Xh′,l3X0,l2)
=1
4
k∑
h,h′=1
< Rh,h′(αl1 ⊗ ul2 ⊗ ul3), αl1 ⊗ ul2 ⊗ ul3 > .
(41)
Based on similar calculation, it is not hard to show that EI2,l,T = O(1/MT ) which implies that
23
I2,l,T = op(1). Next we show that var(I1,l,T ) = o(1). Simple algebra yields that
var(I1,l,T ) =1
16T 2
MT∑
s,s′=1
k∑
h1,h2=1
k∑
h′1,h′
2=1
∑
t1,t2∈Bs
∑
t′1,t′2∈Bs′
cov(Γt1h1,lΓt2h2,l,Γt′1h′1,lΓt′2h′2,l)
=1
16T 2
MT∑
s,s′=1
k∑
h1,h2=1
k∑
h′1,h′
2=1
∑
t1,t2∈Bs
∑
t′1,t′2∈Bs′
cum(Γt1h1,l,Γt2h2,l,Γt′1h′1,l,Γt′2h′2,l)
+ cov(Γt1h1,l,Γt′1h′1,l)cov(Γt2h2,l,Γt′2h′2,l) + cov(Γt1h1,l,Γt′2h′2,l)cov(Γt2h2,l,Γt′1h′1,l)
=1
16T 2
MT∑
s,s′=1
k∑
h1,h2,h′1,h′2=1
∑
t1,t2∈Bs
∑
t′1,t′2∈Bs′
(J1,l,T + J2,l,T + J3,l,T ).
We first deal with J1,l,T . By Lemma F.9 in Panaretos and Tavakolithe (2013a), Theorem ii.2 in
Rosenblatt (1985) and the triangle inequality, we deduce that
|cum(Xt1,l2Xt1−h1,l3 ,Xt2,l2Xt2−h2,l3 ,Xt′1,l2Xt′
1−h′
1,l3 ,Xt′
2,l2Xt′
2−h′
2,l3)|
≤||cum(Xt1Xt1−h1 ,Xt2Xt2−h2 ,Xt′1Xt′
1−h′
1,Xt′
2Xt′
2−h′
2)||I8
≤∑
v=v1∪···∪vp||cum(X1 : 1 ∈ v1) · · · cum(Xp : p ∈ vp)||I8
≤∑
v=v1∪···∪vp||cum(X1 : 1 ∈ v1)||I|v1| · · · ||cum(Xp : p ∈ vp)||I|vp| ,
where |vj | denotes the cardinality of the set vj and the summation is over all indecomposable partitions
of the two way table
t1 t1 − h1
t2 t2 − h2
t′1 t′1 − h′1
t′2 t′2 − h′2.
By the linearity of joint cumulant, we have
J1,l,T ≤ C|Wh1,l1Wh2,l1Wh′1,l1Wh′
2,l1 |
∑
v=v1∪···∪vp||cum(X1 : 1 ∈ v1)||I|v1| · · · ||cum(Xp : p ∈ vp)||I|vp| .
Using some standard arguments [see e.g., Shao (2011a)], we are able to show that the contribution
from J1,l,T is of order o(1) under condition (14). The arguments for J2,l,T and J3,l,T are in fact the
24
same by switching t′1 with t′2. For J2,l,T , it contains a typical term as,
cov(Xt1,l2Xt1−h1,l3 ,Xt′1,l2Xt′
1−h′
1,l3)cov(Xt2,l2Xt2−h2,l3 ,Xt′
2,l2Xt′
2−h′
2,l3)
=
< Rt1−t′1+h′1,t1−t′1−h1+h′1,h′1(ul2 ⊗ ul3), ul2 ⊗ ul3 > + < γt1−t′1ul2 , ul2 >< γt1−t′1−h1+h′1ul3 , ul3 >
+ < γt1−t′1−h1ul2 , ul3 >< γt1−t′1+h′1ul3 , ul2 >
×
< Rt2−t′2+h′2,t2−t′2−h2+h′2,h′2(ul2 ⊗ ul3), ul2 ⊗ ul3 >
+ < γt2−t′2ul2 , ul2 >< γt2−t′2−h2+h′2ul3 , ul3 > + < γt2−t′2−h2ul2 , ul3 >< γt2−t′2+h′2ul3 , ul2 >
.
When s = s′, summation over the above terms is of order O(1/MT ) and when s 6= s′, summation
over the above terms is of order O(MT /T2). Similar arguments can be applied to the other terms
contained in J2,l,T and hence J2,l,T = o(1). The above arguments then imply that
var∗
1
2√T
MT∑
s=1
k∑
h=1
∑
t∈B∗s
Γth,l
→p 1
4
k∑
h,h′=1
< Rh,h′(αl1 ⊗ ul2 ⊗ ul3), αl1 ⊗ ul2 ⊗ ul3 > . (42)
Conditional on the sample XtTt=1, we note that
MT∑
s=1
k∑
h=1
∑
t∈B∗s
Γth,l
/(2
√T ) =
MT∑
s=1
Ds,l
is the summation of i.i.d mean zero random variables, where Ds,l =∑k
h=1
∑t∈B∗
sΓth,l/(2
√T ). Let
Γh,l = Γth,l − Γth,l and note that
Evar∗
1
2√T
MT∑
s=1
k∑
h=1
∑
t∈B∗s
Γh,l
=
1
4TE
MT∑
s=1
(k∑
h=1
∑
t∈Bs
Γh,l
)2
− 1
4TMTE
(k∑
h=1
T∑
t=h+1
Γh,l
)2
≤CbTE
(k∑
h=1
Γh,l
)2
→ 0.
(43)
From (42) and (43), we have var∗(
12√T
∑MT
s=1
∑kh=1
∑t∈B∗
sΓth,l
)→p 1
4
∑kh,h′=1 < Rh,h′(αl1 ⊗ ul2 ⊗
ul3), αl1 ⊗ ul2 ⊗ ul3 > . The conclusion follows from Lemma 6.3, which shows that the Lyapunov
condition holds in probability. ♦
Lemma 6.3. Let Ds,l =(∑k
h=1
∑t∈B∗
sΓth,l
)/(2
√T ). Under the assumptions in Theorem 2.4,
E∑MT
s=1 E∗|Ds,l|4 → 0.
Consider√T2
∑T−1h=1 V∗
h(τ1, τ2)Ψh(λ), where V∗h(τ1, τ2) =
∑Tl=2 Y∗
lh/T and Ylh is defined by re-
placing γh with γh in the definition of Ylh. Let V∗h(τ1, τ2) = V∗
h(τ1, τ2) − V∗h(τ1, τ2). Then we
have√T2
∑T−1h=1 V∗
h(τ1, τ2)Ψh(λ) =√T2
∑T−1h=1 V∗
h(τ1, τ2)Ψh(λ) +√T2
∑T−1h=1 V∗
h(τ1, τ2)Ψh(λ). Denote by
J∗2,k,T =
√T2
∑T−1h=k+1 V∗
h(τ1, τ2)Ψh(λ) and J∗2,k,T =
√T2
∑T−1h=k+1 V∗
h(τ1, τ2)Ψh(λ). We have the following
result.
25
Lemma 6.4. Under the assumptions in Theorem 2.4, limk→+∞ limT→+∞ EE∗||J∗
2,k,T ||2A = 0 and
limk→+∞ limT→+∞ EE∗||J∗
2,k,T ||2A = 0.
By Lemma 6.2, Lemma 6.4 and Proposition 6.3.9 of Brockwell and Davis (1991), we establish the
finite-dimensional convergence for√T2
∑kh=1 V∗
h(τ1, τ2)Ψh(λ). Note that√T2
∑T−1h=1 V∗
h(τ1, τ2)Ψh(λ) =
1√MT
∑MT
s=1
√T−1T
∑T−1h=1
∑t∈B∗
sYth(τ1, τ2)Ψh(λ)/(2
√bT ), where for different s, the random vari-
ables inside the bracket are independent conditional on the data. Thus to show the tightness, we
only need to verify that
EE∗
∣∣∣∣∣∣
∣∣∣∣∣∣
T−1∑
h=1
∑
t∈B∗s
YthΨh/(2√
bT )
∣∣∣∣∣∣
∣∣∣∣∣∣
2
A
=1
8π2bTE
T−1∑
h=1
E∗
∣∣∣∣∣∣
∣∣∣∣∣∣
∑
t∈B∗s
Yth
∣∣∣∣∣∣
∣∣∣∣∣∣
2
I2
/h2
≤ 1
4π2(T − 1)E
T−1∑
h=1
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
Yth∣∣∣∣∣
∣∣∣∣∣
2
I2
/h2 +1
4π2(T − 1)E
T−1∑
h=1
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
(Yth − Yth)∣∣∣∣∣
∣∣∣∣∣
2
I2
/h2 < ∞.
The arguments are similar to those in the proof of Lemma 6.4. The details are omitted.
Remark 6.1. Under the martingale difference assumption, it can be shown that the bootstrap pro-
cedure with bT = 1 is still valid. The key is to note that EI1,l,T in (41) converges to
1
4
k∑
h,h′=1
Wh,l1Wh′,l1cov(X0,l2X−h,l3 +X0,l3X−h,l2 ,X0,l2X−h′,l3 +X0,l3X−h′,l2).
As k → +∞, the above quantity further converges to < R(αl1 ⊗ul2 ⊗ul3), αl1 ⊗ul2 ⊗ul3 >, where R
is the linear operator associated with the kernel defined in (16). The proof is basically a repetition
of the argument presented above.
6.3 Proofs of the main results in Section 3.1
Lemma 6.5. Assume that Xt and ǫt are both mean-zero stationary with∑+∞
s=−∞(|||RXs,0,s|||1 +
||γX,s||2S) < +∞,∑+∞
s=−∞ |||Rǫs,0,s|||1 < +∞, and ||γǫ,0||S < +∞. Then 1
T
∑Tt=1 ||Yt||2 →p E||Y1||2.
Proof of Theorem 3.1. We only need to show that
T
∣∣∣∣∣
T−1∑
h=1
||Vǫ,T,h||2I2/h2 −
T−1∑
h=1
||Vǫ,T,h||2I2/h2
∣∣∣∣∣ = op(1). (44)
The conclusion follows from Theorem 2.2. Let ϑT = η − ηT . Then ǫt = Yt − ηT (Xt) = ǫt + ϑT (Xt).
Note that Vǫ,T,h(τ1, τ2) = Vǫ,T,h(τ1, τ2) + gh,T (τ1, τ2) + gadh,T (τ1, τ2), where gadh,T (τ1, τ2) = gh,T (τ2, τ1)
and
gh,T (τ1, τ2) =1
T
T∑
t=h+1
gt,h,T (τ1, τ2) =1
T
T∑
t=h+1
ǫt(τ1)ϑT (Xt−h)(τ2) + ϑT (Xt)(τ1)ǫt−h(τ2)
+ ϑT (Xt)(τ1)ϑT (Xt−h)(τ2)
.
(45)
26
Then by simple algebra, we get
∣∣∣∣∣
T−1∑
h=1
1
h2(||Vǫ,T,h||2I2 − ||Vǫ,T,h||2I2
)∣∣∣∣∣ ≤
T−1∑
h=1
1
h2
(2||gh,T + gadh,T ||I2 · ||Vǫ,T,h||I2 + ||gh,T + gadh,T ||2I2
)
≤2
(T−1∑
h=1
1
h2||gh,T + gadh,T ||2I2
)1/2(T−1∑
h=1
1
h2||Vǫ,T,h||2I2
)1/2
+T−1∑
h=1
1
h2||gh,T + gadh,T ||2I2
≤4
(T−1∑
h=1
1
h2||gh,T ||2I2
)1/2(T−1∑
h=1
1
h2||Vǫ,T,h||2I2
)1/2
+ 4T−1∑
h=1
1
h2||gh,T ||2I2 .
The triangle inequality yields that
T
T−1∑
h=1
1
h2||gh,T ||2I2 ≤C
T
T−1∑
h=1
1
h2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
ǫtϑT (Xt−h)
∣∣∣∣∣
∣∣∣∣∣
2
I2
+
T−1∑
h=1
1
h2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
ϑT (Xt)ǫt−h
∣∣∣∣∣
∣∣∣∣∣
2
I2
+
T−1∑
h=1
1
h2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
ϑT (Xt)ϑT (Xt−h)
∣∣∣∣∣
∣∣∣∣∣
2
I2
= C(L1,T + L2,T + L3,T ).
We first consider L1,T . Note that∣∣∣∣∣∣∑T
t=h+1 ǫtϑT (Xt−h)∣∣∣∣∣∣I2
≤ ||ϑT ||L∣∣∣∣∣∣∑T
t=h+1 ǫtXt−h∣∣∣∣∣∣I2
and
E
T−1∑
h=1
1
Th2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
ǫtXt−h
∣∣∣∣∣
∣∣∣∣∣
2
I2
=
T−1∑
h=1
1
Th2
T∑
t,t′=h+1
E < ǫt, ǫt′ >< Xt−h,Xt′−h >
=E
T−1∑
h=1
1
Th2
T∑
t,t′=h+1
∫
I2
ǫt(τ1)ǫt′(τ1)Xt−h(τ2)Xt′−h(τ2)dτ1dτ2
≤T−1∑
h=1
1
h2
T−1∑
s=1−T
∣∣∣∣∣
∫
I2
Rǫ,Xs+h,s,h(τ1, τ2, τ1, τ2) + γǫ,s(τ1, τ1)γX,s(τ2, τ2)
dτ1dτ2
∣∣∣∣∣
≤T−1∑
h=1
1
h2
+∞∑
s=−∞
(|||Rǫ,X
s+h,s,h|||1 + |||γǫ,s|||1 · |||γX,s|||1)< ∞,
where γǫ,s = 0 for s 6= 0. Similarly we have
E
T−1∑
h=1
1
Th2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
Xtǫt−h
∣∣∣∣∣
∣∣∣∣∣
2
I2
≤T−1∑
h=1
1
h2
+∞∑
s=−∞
(|||Rǫ,X
s−h,s,−h|||1 + |||γǫ,s|||1 · |||γX,s|||1)< ∞.
Because ||ϑT ||L = op(1), we obtain L1,T = op(1) and L2,T = op(1). Finally, using the fact that
T ||ϑT ||4L = op(1) and
E
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
XtXt−h
∣∣∣∣∣
∣∣∣∣∣
2
I2
=T∑
t,t′=h+1
E < Xt,Xt′ >< Xt−h,Xt′−h >≤ T 2E||X1||4,
27
we deduce that
L3,T ≤T ||ϑT ||4LT−1∑
h=1
1
T 2h2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
XtXt−h
∣∣∣∣∣
∣∣∣∣∣
2
I2
= op(1),
which completes the proof. ♦
Proof of Theorem 3.2. We aim to show that
E∗
∣∣∣∣∣
∣∣∣∣∣
T−1∑
h=1
V∗ǫ,T,hΨh −
T−1∑
h=1
V∗ǫ,T,hΨh
∣∣∣∣∣
∣∣∣∣∣
2
A
= op(1/T ). (46)
And the conclusion follows from Theorem 2.4. Notice that
Yǫ,th(τ1, τ2) =Yǫ,th(τ1, τ2) + gt,h,T (τ1, τ2) + γǫ,h(τ1, τ2)− γǫ,h(τ1, τ2)
+ gt,h,T (τ2, τ1) + γǫ,h(τ2, τ1)− γǫ,h(τ2, τ1).
Simple calculation then yields
E∗
∣∣∣∣∣
∣∣∣∣∣
T−1∑
h=1
V∗ǫ,T,hΨh −
T−1∑
h=1
V∗ǫ,T,hΨh
∣∣∣∣∣
∣∣∣∣∣
2
A
=1
2π2
T−1∑
h=1
1
h2E∗||V∗
ǫ,T,h − V∗ǫ,T,h||2I2
=1
2T 2π2
T−1∑
h=1
1
h2E∗ <
MT∑
s=1
∑
t∈B∗s
(Yǫ,th −Yǫ,th),MT∑
s=1
∑
t∈B∗s
(Yǫ,th −Yǫ,th) >
≤ 1
2T 2π2
T−1∑
h=1
1
h2
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
(Yǫ,th − Yǫ,th)∣∣∣∣∣
∣∣∣∣∣
2
I2
+1
2T 2π2
T−1∑
h=1
1
h2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
(Yǫ,th − Yǫ,th)∣∣∣∣∣
∣∣∣∣∣
2
I2
. (47)
For the second term in (47), we have 12T 2π2
∑T−1h=1
1h2
∣∣∣∣∣∣∑T
t=h+1(Yǫ,th − Yǫ,th)∣∣∣∣∣∣2
I2= op(1/T ). Using
similar arguments as in the proof of Theorem 3.1, it is not hard to show that
1
2T 2π2
T−1∑
h=1
1
h2
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
(Yǫ,th − Yǫ,th)∣∣∣∣∣
∣∣∣∣∣
2
I2
≤C
T−1∑
h=1
1
T 2h2
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
gt,h,T
∣∣∣∣∣
∣∣∣∣∣
2
I2
+ op(1/T ) = op(1/T ),
which completes the proof. ♦
6.4 Proofs of the main results in Section 3.2
Lemma 6.6. Assume that Xt follows a FAR(1) process with the innovation sequence εt and
||ρ||S < 1. Assume that ||γε,0||S < +∞ and∑+∞
s1,s2,s3=1 ||Rεs1,s2,s3 ||S < ∞. Then 1
T
∑Tt=1 ||Xt||2 →p
E||X1||2.
Proof of Proposition 3.1. To be clear, we only show the weak convergence of1√T
∑T−1h=1
∑Tt=h+1 εt(τ1)εt−h(τ2) − ∑T
t=2 εt(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2)Ψh(λ). The result
for the symmetrized version follows from essentially the same arguments. Recall the basis
28
αi(λ)uj(τ1)ul(τ2) : λ ∈ [0, 1], τ1, τ2 ∈ I+∞i,j,l=1 defined at the beginning of Section 6.1. Let
βt,j =< Xt, φj >, εt,j =< εt, uj > with εt,j = 0 for t < 0, and Wh,i =< Ψh, αi >. For any
1 ≤ h ≤ T − 1, define the projection sequence,
ST,ijl(h) :=1√T
T∑
t=2
Wh,i
(εt,jεt−h,l − εt,j < γε,0ρ
h−1∗ γ−1
X πkT (Xt−1), ul >)
=1√T
T∑
t=2
Wh,i
(εt,jεt−h,l − εt,j
kT∑
k=1
λ−1k βt−1,k < φk, ρ
h−1γε,0(ul) >
),
where we have used the fact that γ−1X πkT (·) =
∑kTk=1 λ
−1k < φk, · > φk and
< γε,0ρh−1∗ γ−1
X πkT (Xt−1), ul >=< γ−1X πkT (Xt−1), ρ
h−1γε,0(ul) >
= <
kT∑
k=1
λ−1k βt−1,kφk, ρ
h−1γε,0(ul) >=
kT∑
k=1
λ−1k βt−1,k < φk, ρ
h−1γε,0(ul) > .
Notice that EST,ijl(h) = 0 and
var(ST,ijl(h)) =1
TE
T∑
t=2
Wh,i
(εt,jεt−h,l − εt,j
kT∑
k=1
λ−1k βt−1,k < φk, ρ
h−1γε,0(ul) >
)2
≤ 2
TE
(T∑
t=2
Wh,iεt,jεt−h,l
)2
+2
TE
(T∑
t=2
Wh,iεt,j
kT∑
k=1
λ−1k βt−1,k < φk, ρ
h−1γε,0(ul) >
)2
≤2W 2h,iEε
21,jEε
21,l +
2
TE
( T∑
t,t′=2
W 2h,iεt,jεt′,j
kT∑
k,k′=1
λ−1k λ−1
k′ βt−1,kβt′−1,k′ < φk, ρh−1γε,0(ul) >
< φk′ , ρh−1γε,0(ul) >
)
=2W 2h,iEε
21,jEε
21,l +
2
T
( T∑
t=2
W 2h,iEε
2t,j
kT∑
k,k′=1
λ−1k λ−1
k′ Eβt−1,kβt−1,k′ < φk, ρh−1γε,0(ul) >
< φk′ , ρh−1γε,0(ul) >
)
≤2W 2h,iEε
21,jEε
21,l + 2W 2
h,iEε21,j
+∞∑
k=1
λ−1k < γε,0ρ
h−1∗ (φk), ul >
2< ∞,
where we have used the fact that Eεt,jεt′,jβt−1,kβt′−1,k′ = 0 for t 6= t′ and Eβt,jβt,j′ = 1j = j′λj .Under the assumption that εt is a sequence of i.i.d random variables in L2
H, we have Xt is L2-
m-approximable [see Example 2.1 of Hormann and Kokoszka (2010)]. Thus by the independence
between εt and Xt−1, and Lemma 2.1 of Hormann and Kokoszka (2010), we know εtXt−1 and the
corresponding projection sequence are both L2-m-approximable. The finite-dimensional convergence
follows from similar arguments in the proof of Theorem A.1 of Aue et al. (2009). On the other hand,
29
under Condition (29), we have
limmax(N1,N2,N3)→+∞
supT
∑
i≥N1,j≥N2,l≥N3
var(ST,ijl(h))
≤ limmax(N1,N2,N3)→+∞
supT
∑
i≥N1,j≥N2,l≥N3
2W 2h,i
Eε21,jEε
21,l + Eε21,j
+∞∑
k=1
λ−1k < γε,0ρ
h−1∗ (φk), ul >
2
= 0.
The above arguments imply that
1√T
T∑
t=2
εt(τ1)εt−h(τ2)− εt(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2)Ψh(λ) ⇒ Nh(λ, τ1, τ2),
jointly for 1 ≤ h ≤ k with k ∈ N being fixed, where Nh(λ, τ1, τ2) : 1 ≤ h ≤ k is a mean-zero
Gaussian process in L(A)⊗k. And the cross-covariance operator between Ni and Nj is given by
Hij φ(λ, τ1, τ2) = limkT→+∞
Ψi(λ)
∫
[0,1]2×IΨj(λ
′)cov(εt(τ1)εt−i(τ2)− εt(τ1)γε,0ρi−1∗ γ−1
X πkT (Xt−1)(τ2),
εt(τ′1)εt−j(τ
′2)− εt(τ
′1)γε,0ρ
j−1∗ γ−1
X πkT (Xt−1)(τ′2))φ(λ
′, τ ′1, τ′2)dλ
′dτ ′1dτ′2,
for φ ∈ L2(A). The rest of the arguments are similar to those in the Proof of Theorem 2.1. Define
Jk,T = 1√T
∑T−1h=k+1
∑Tt=h+1 εt(τ1)εt−h(τ2)−
∑Tt=2 εt(τ1)γε,0ρ
h−1∗ γ−1
X πkT (Xt−1)(τ2)Ψh(λ). It suffices
to show that: (1) for any ǫ, δ > 0, there exists a large enough k such that P (||Jk,T ||A > ǫ) < δ and
(2) E||∑+∞h=k+1 Nh||2A → 0 as k → +∞. For the first goal, we note that
P (||Jk,T ||A > ǫ) ≤ E||Jk,T ||2Aǫ2
=1
2π2ǫ2T
T−1∑
h=k+1
1
h2E
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
εtεt−h −T∑
t=2
εtγε,0ρh−1∗ γ−1
X πkT (Xt−1)
∣∣∣∣∣
∣∣∣∣∣
2
I2
=1
2π2ǫ2T
T−1∑
h=k+1
1
h2
+∞∑
j,l=1
E
T∑
t=2
(εt,jεt−h,l − εt,j
kT∑
k=1
λ−1k βt−1,k < φk, ρ
h−1γε,0(ul) >)
2
≤I1 + I2,
where
I1 =1
π2ǫ2T
T−1∑
h=k+1
1
h2
+∞∑
j,l=1
E
T∑
t=2
(εt,jεt−h,l)
2
≤ 1
π2ǫ2
T−1∑
h=k+1
1
h2|||γε,0|||21 < δ/2,
30
and
I2 =1
π2ǫ2T
T−1∑
h=k+1
1
h2
+∞∑
j,l=1
E
T∑
t=2
εt,j
kT∑
k=1
λ−1k βt−1,k < φk, ρ
h−1γε,0(ul) >
2
≤ 1
π2ǫ2
T−1∑
h=k+1
1
h2
+∞∑
j,l=1
Eε2t,j
+∞∑
k=1
λ−1k < γε,0ρ
h−1∗ (φk), ul >
2
=1
π2ǫ2
T−1∑
h=k+1
1
h2|||γε,0|||1
+∞∑
k=1
λ−1k ||γε,0ρh−1
∗ (φk)||2 < δ/2,
for large enough k. As for the second goal, we have
E
∣∣∣∣∣
∣∣∣∣∣
+∞∑
h=k+1
Nh
∣∣∣∣∣
∣∣∣∣∣
2
A
=
+∞∑
h,h′=k+1
E < Nh, Nh′ >
=1
2π2
+∞∑
h=k+1
∫
I2
var(εt(τ1)εt−h(τ2)− εt(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2))dτ1dτ2/h2
≤ 1
π2
+∞∑
h=k+1
∫
I2
var(εt(τ1)εt−h(τ2)) + cov(εt(τ1)γε,0ρ
h−1∗ γ−1
X πkT (Xt−1)(τ2))dτ1dτ2/h
2
=1
π2
+∞∑
h=k+1
(|||γε,0|||21 + |||γε,0|||1
+∞∑
k=1
λ−1k ||γε,0ρh−1
∗ (φk)||2)/h2 → 0,
as k → +∞. The proof is thus completed. ♦
Proof of Theorem 3.3. Let DT = ρ − ρT . Then εt = Xt − ρT (Xt−1) = εt + DT (Xt−1). Note
that Vε,T,h(τ1, τ2) = Vε,T,h(τ1, τ2) + 1T
∑Tt=h+1DT (Xt−1)(τ1)εt−h(τ2) + DT (Xt−1)(τ2)εt−h(τ1) +
fh,T (τ1, τ2) + fh,T (τ2, τ1), where
fh,T (τ1, τ2) =1
T
T∑
t=h+1
εt(τ1)DT (Xt−h−1)(τ2) +DT (Xt−1)(τ1)DT (Xt−h−1)(τ2)
. (48)
We first show that T∑T−1
h=1 ||fh,T ||2I2/h2 = op(1). To this end, notice that
E
T−1∑
h=1
1
Th2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
εtXt−h−1
∣∣∣∣∣
∣∣∣∣∣
2
I2
=
T−1∑
h=1
1
Th2
T∑
t,t′=h+1
+∞∑
j,j′=0
E < ρj(εt−h−1−j), ρj′(εt′−h−1−j′) >< εt, εt′ >
=E
T−1∑
h=1
1
Th2
T∑
t,t′=h+1
+∞∑
j,j′=0
∫
I3
Pj,j′(τ3, τ′3)εt(τ1)εt−h−1−j(τ3)εt′(τ1)εt′−h−1−j′(τ
′3)dτ1dτ3dτ
′3
≤T−1∑
h=1
1
h2
T−1∑
s=1−T
∣∣∣∣∣
+∞∑
j,j′=0
∫
I3
Pj,j′(τ3, τ′3)
Rεs+h+1+j′,s−j+j′,h+1+j′(τ1, τ3, τ1, τ
′3)
+ γε,s(τ1, τ1)γε,s+j′−j(τ3, τ′3) + γε,s+h+1+j′(τ1, τ
′3)γε,s−h−1−j(τ3, τ1)
dτ1dτ3dτ
′3
∣∣∣∣∣,
31
where Pj,j′(τ3, τ′3) =
∫ρj(τ2, τ3)ρ
j′(τ2, τ′3)dτ2 and γε,s = 0 for s 6= 0. Because ||Pj,j′ ||S ≤ ||ρ||j+j′S , we
obtain
∫
I3
Pj,j′(τ3, τ′3)Rε
s+h+1+j′,s−j+j′,h+1+j′(τ1, τ3, τ1, τ′3)dτ1dτ3dτ
′3
≤||Pj,j′ ||S∫
I2
(∫
IRεs+h+1+j′,s−j+j′,h+1+j′(τ1, τ3, τ1, τ
′3)dτ1
)2
dτ3dτ′3
1/2
≤||ρ||j+j′S |||Rεs+h+1+j′,s−j+j′,h+1+j′|||1.
It is also not hard to show that
∫
I3
Pj,j′(τ3, τ′3)γε,s(τ1, τ1)γε,s+j′−j(τ3, τ
′3)dτ1dτ3dτ
′3 ≤ |||γε,s|||1||ρ||j+j
′
S ||γε,s+j′−j||S ,
and
∫
I3
Pj,j′(τ3, τ′3)γε,s+h+1+j′(τ1, τ
′3)γε,s−h−1−j(τ3, τ1)dτ1dτ3dτ
′3
≤||Pj,j′||S∫
I2
(∫
Iγε,s+h+1+j′(τ1, τ
′3)γε,s−h−1−j(τ3, τ1)dτ1
)2
dτ3dτ′3
1/2
≤||ρ||j+j′S ||γε,s+h+1+j′||S ||γε,s−h−1−j||S .
Combing the above derivations along with the summability conditions, we de-
duce that∑T−1
h=11Th2
∣∣∣∣∣∣∑T
t=h+1 εtXt−h−1
∣∣∣∣∣∣2
I2= Op(1). Using the fact that
||∑Tt=h+1 εtDT (Xt−h−1)||I2 ≤ ||DT ||L||
∑Tt=h+1 εtXt−h−1||I2 and ||DT ||L = op(1), we obtain
that∑T−1
h=11Th2 ||
∑Tt=h+1 εtDT (Xt−h−1)||2I2 = op(1). By Lemma 3.2 and the fact that
E
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
Xt−1Xt−h−1
∣∣∣∣∣
∣∣∣∣∣
2
I2
=
T∑
t,t′=h+1
E < Xt−1,Xt′−1 >< Xt−h−1,Xt′−h−1 >≤ T 2E||X1||4,
we get
T−1∑
h=1
1
Th2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
DT (Xt−1)DT (Xt−h−1)
∣∣∣∣∣
∣∣∣∣∣
2
I2
≤ T ||DT ||4LT−1∑
h=1
1
T 2h2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
Xt−1Xt−h−1
∣∣∣∣∣
∣∣∣∣∣
2
I2
= op(1),
which implies that T∑T−1
h=1 ||fh,T ||2I2/h2 = op(1). Therefore, we only need to show that
√T
2
T−1∑
h=1
[Vε,T,h(τ1, τ2) +
1
T
T∑
t=h+1
DT (Xt−1)(τ1)εt−h(τ2) +DT (Xt−1)(τ2)εt−h(τ1)]Ψh(λ)
32
converges to Gε(λ, τ1, τ2) in the corresponding functional space. Consider
−DT (Xt−1)(τ1)εt−h(τ2) = (ρT − ρ)(Xt−1)(τ1)εt−h(τ2)
=γX,1γ−1X (Xt−1)− ργX γ
−1X (Xt−1) + ργX γ
−1X (Xt−1)− ρ(Xt−1)(τ1)εt−h(τ2)
=1
T
T∑
j=2
< Xj−1, γ−1X (Xt−1) > εj(τ1)εt−h(τ2) + ργX γ−1
X (Xt−1)− ρ(Xt−1)(τ1)εt−h(τ2).
By Lemma 6.7, we know that the contribution from ργX γ−1X (Xt−1)−ρ(Xt−1)(τ1)εt−h(τ2) is asymp-
totically negligible. Switching the index t and j, we get
T∑
t=h+1
T∑
j=2
< Xj−1, γ−1X (Xt−1) > εj(τ1)εt−h(τ2) =
T∑
t=2
T∑
j=h+1
< Xt−1, γ−1X (Xj−1) > εt(τ1)εj−h(τ2).
By Lemma 6.8 and (30), we have√T2
∑T−1h=1 Vε,T,h(τ1, τ2)Ψh(λ) ⇒ Gε(λ, τ1, τ2). Finally, by the contin-
uous mapping theorem, we obtain Tε,T,CM →d∫ 10
∫I2 G2
ε (λ, τ1, τ2)dτ1dτ2λ which completes the proof.
♦
Lemma 6.7. Under the assumptions in Theorem 3.3,
T−1∑
h=1
1
Th2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
ργX γ−1X − ρ(Xt−1)εt−h
∣∣∣∣∣
∣∣∣∣∣
2
I2
= op(1). (49)
Lemma 6.8. Under the assumptions in Theorem 3.3,
T−1∑
h=1
1
Th2
∣∣∣∣∣∣∣∣1
T
T∑
t=2
T∑
j=h+1
< Xt−1, γ−1X (Xj−1) > εt(τ1)εj−h(τ2)
−T∑
t=2
γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2)εt(τ1)
∣∣∣∣∣∣∣∣2
I2
= op(1).
(50)
6.5 Theoretical results for the modified block bootstrap
For bTMT = T − 1 with bT ,MT ∈ Z, define the non-overlapping blocks of indices Bs = bT (s −1) + 2, . . . , bT s + 1 with 1 ≤ s ≤ MT . Pick MT blocks with replacement from BsMT
s=1. Joint the
MT blocks and let (t∗2, . . . , t∗T ) be the corresponding indices.
Assumption 6.1. Let Yε,th(τ1, τ2) = εt(τ1)εt−h(τ2) + εt(τ2)εt−h(τ1) − γε,h(τ1, τ2) −γε,h(τ2, τ1) − Bt,h(τ1, τ2) − Bt,h(τ2, τ1) with Bt,h(τ1, τ2) = εt(τ1)γε,0ρ
h−1∗ γ−1
X πkT (Xt−1)(τ2) −∑T
t=2 εt(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2)/T . Define V∗ε,T,h =
∑Tl=2 Y∗
ε,lh/T , where Y∗ε,lh = Yε,t∗
lh.
Assume that
ρw
[L(√
T
2
T−1∑
h=1
V∗ε,T,h(τ1, τ2)Ψh(λ)
∣∣∣∣XT),L(Gε(λ, τ1, τ2))
]→ 0, (51)
and
suph
1
TE
∣∣∣∣∣
∣∣∣∣∣
T∑
t=2
εtγε,0ρh−1∗ γ−1
X πkT (Xt−1)
∣∣∣∣∣
∣∣∣∣∣I2
< ∞. (52)
33
When Xt is finite-dimensional, (51) can be verified using similar arguments in Section 6.2. In
view of Proposition 3.1, we expect that (51) still holds when Xt is infinite-dimensional and the
errors are i.i.d. It is worth noting that when εt are i.i.d, 1T E
∣∣∣∣∣∣∑T
t=2 εtγε,0ρh−1∗ γ−1
X πkT (Xt−1)∣∣∣∣∣∣I2
≤|||γε,0|||1|||γε,0ρh−1
∗ γ−1X πkT γXπkT γ
−1X ρh−1γε,0|||1. For h ≥ 2, |||γε,0ρh−1
∗ γ−1X πkT γXπkT γ
−1X ρh−1γε,0|||1 ≤
||γε,0||2S ||γ−1/2X ρh−1||2L < ∞ provided that ||γ−1/2
X ρ||L < ∞ [also see Assumption A1 in Mas (2007)].
The same conclusion holds when h = 1, because γε,0 = γX − ργXρ∗ [Theorem 3.2 of Bosq (2000)].
The key idea here is to show the asymptotic equivalence of the modified block bootstrap to
the procedure described in Assumption 6.1 which is based on the unobserved innovation sequence.
Notice that Yε,th(τ1, τ2) = Uε,th(τ1, τ2)+Uε,th(τ2, τ1) with Uε,th(τ1, τ2) = εt(τ1)εt−h(τ2)− γε,h(τ1, τ2)−Bt,h(τ1, τ2). We have the decomposition,
Uε,th(τ1, τ2) = εt(τ1)εt−h(τ2)− γε,h(τ1, τ2)− Bt,h(τ1, τ2) = εt(τ1)εt−h(τ2)− γε,h(τ1, τ2)
− εt(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2) +1
T
T∑
t=2
εt(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2)
+
εt(τ1)γε,0ρ
h−1∗ γ−1
X πkTT (Xt−1)− εt(τ1)IT,h(Xt−1)(τ2)
+
εt(τ1)DT (Xt−h−1)(τ2) +DT (Xt−1)(τ1)DT (Xt−h−1)(τ2)
+
γε,h(τ1, τ2)− γε,h(τ1, τ2)
+
DT (Xt−1)(τ1)εt−h(τ2)−DT (Xt−1)(τ1)IT,h(Xt−1)(τ2)
+
εt(τ1)IT,h(Xt−1)(τ2)− Bt,h(τ1, τ2)
−
1
T
T∑
t=2
εt(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2)
= Uε,th(τ1, τ2) + L1,t,h + L2,t,h + L3,t,h + L4,t,h + L5,t,h − L6,t,h,
where Uε,th(τ1, τ2) = εt(τ1)εt−h(τ2)− γε,h(τ1, τ2)− Bt,h(τ1, τ2) and Lj,t,h’s are defined implicity as the
random variables inside the brackets.
Lemma 6.9. Under the assumptions in Proposition 3.2,∑T−1
h=11Th2
∑MT
s=1 ||∑
t∈BsLj,t,h||2I2 = op(1)
with j = 1, 2, 3, 4, 5 and 6.
Proof of Proposition 3.2. We aim to show that
TE∗
∣∣∣∣∣
∣∣∣∣∣
T−1∑
h=1
V∗ε,T,hΨh −
T−1∑
h=1
V∗ε,T,hΨh
∣∣∣∣∣
∣∣∣∣∣
2
A
= op(1). (53)
34
The conclusion follows from Assumptions 6.1. Simple calculation yields that
TE∗
∣∣∣∣∣
∣∣∣∣∣
T−1∑
h=1
V∗ε,T,hΨh −
T−1∑
h=1
V∗ε,T,hΨh
∣∣∣∣∣
∣∣∣∣∣
2
A
=T
2π2
T−1∑
h=1
1
h2E∗||V∗
ε,T,h − V∗ε,T,h||2I2
=1
2Tπ2
T−1∑
h=1
1
h2E∗ <
MT∑
s=1
∑
t∈B∗s
(Yε,th − Yε,th),MT∑
s=1
∑
t∈B∗s
(Yε,th − Yε,th) >
≤ 2
Tπ2
T−1∑
h=1
1
h2
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
(Uε,th − Uε,th)
∣∣∣∣∣
∣∣∣∣∣
2
I2
+1
Tπ2
T−1∑
h=1
1
h2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
Yε,th∣∣∣∣∣
∣∣∣∣∣
2
I2
+
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
Yε,th∣∣∣∣∣
∣∣∣∣∣
2
I2
.
(54)
By Lemma 6.9, we get 1Tπ2
∑T−1h=1
1h2∑MT
s=1
∣∣∣∣∣∣∑
t∈Bs(Uε,th − Uε,th)
∣∣∣∣∣∣2
I2= op(1). To deal with the second
term in (54), we note that
1
T
T∑
t=2
< γ−1X (Xt−1),Xj−1 > εt(τ1)
=1
T
T∑
t=2
< γ−1X (Xt−1),Xj−1 > Xt(τ1)−
1
T
T∑
t=2
< γ−1X (Xt−1),Xj−1 > ρT (Xt−1)(τ1)
=1
T
T∑
t=2
< γ−1X (Xt−1),Xj−1 > Xt(τ1)−
1
T 2
T∑
t,l=2
< Xt−1, γ−1X (Xj−1) >< Xt−1, γ
−1X (Xl−1) > Xl(τ1)
=1
T
T∑
t=2
< Xt−1, γ−1X (Xj−1) > Xt(τ1)−
1
T
T∑
l=2
< γX γ−1X (Xj−1), γ
−1X (Xl−1) > Xl(τ1)
=1
T
T∑
t=2
< Xt−1 − γX γ−1X (Xt−1), γ
−1X (Xj−1) > Xt(τ1) = 0,
which implies that∑T
t=2 Bt,h(τ1, τ2) = 0 for 1 ≤ h ≤ T − 1. Thus it is straightforward to verify that
the second term in (54) is of order op(1). ♦
References
[1] Antoniadis, A., and Sapatinas, T. (2003), “Wavelet Methods for Continuous-time Prediction
Using Hilbert-valued Autoregressive Processes,” Journal of Multivariate Analysis, 87, 133-158.
[2] Arsenin, V., and Tikhonov, A. (1977), Solutions of Ill-posed Problems, Washington D.C.: Winston
and Sons.
[3] Aue, A., Horvath, S., Hormann, L., and Reimherr, M. (2009), “Detecting Changes in the Covari-
ance Structure of Multivariate Time Series Models,” Annals of Statistics, 37, 4046-4087.
[4] Besse, P., Cardot, H., and Stephenson, D. (2000), “Autoregressive Forecasting of Some Functional
Climatic Variations,” Scandinavian Journal of Statistics, 27, 673-687.
35
[5] Bosq, D. (2000), Linear Process in Function Spaces: Theory and Applications, New York:
Springer.
[6] Brillinger, D. R. (1975), Time Series: Data Analysis and Theory, San Francisco: Holden-Day.
[7] Brockwell, P. J., and Davis, R. A. (1991), Time Series: Theory and Methods, Springer.
[8] Chiou, J.-M., Muller, H.-G., and Wang, J.-L. (2004), “Functional Response Models,” Statistica
Sinica, 14, 675-693.
[9] Dahlhaus, R. (1985), “On the Asymptotic Distribution of Bartlett’s Up-statistic,” Journal of
Time Series Analysis, 6, 213-227.
[10] Deo, R. S. (2000), “Spectral Tests for the Martingale Hypothesis under Conditional Het-
eroscedasticity,” Journal of Econometrics, 99, 291-315.
[11] Durlauf, S. (1991), “Spectral-based Test for the Martingale Hypothesis,” Journal of Economet-
rics, 50, 1-19.
[12] Escanciano, J., and Velasco, C. (2006), “Generalized Spectral Tests for the Martingale Difference
Hypothesis,” Journal of Econometrics, 134, 151-185.
[13] Fernandez de Castroa, B., Guillasb, S., and Gonzalez, W. (2005), “Manteigac Functional Samples
and Bootstrap for Predicting Sulfur Dioxide Levels,” Technometrics, 47, 212-222.
[14] Francq, C., Roy, R., and Zakoıan, J. M. (2005), “Diagnostic Checking in ARMA Models with
Uncorrelated Errors,” Journal of the American Statistical Association, 100, 532-544.
[15] Fremdt, S., Horvath, L., Kokoszka, P., and Steinebach, J. G. (2014), “Functional Data Analysis
with Increasing Number of Projections,” Journal of Multivariate Analysis, 124, 313-332
[16] Gabrys, R., Horvath, L., and Kokoszka, P. (2010), “Tests for Error Correlation in the Functional
Linear Model,” Journal of the American Statistical Association, 105, 1113-1125.
[17] Gabrys, R., and Kokoszka, P. (2007), “Portmanteau Test of Independence for Functional Ob-
servations,” Journal of the American Statistical Association, 102, 1338-348.
[18] Groetsch. C. (1993), Inverse Problems in the Mathematical Sciences, Wiesbaden: Vieweg.
[19] Hardle, W., Horowitz, J., and Kreiss, J-P. (2003), “Bootstrap Methods for Time Series,” Inter-
national Statistical Review, 71, 435-459.
[20] Hong, Y. (1996), “Consistent Testing for Serial Correlation of Unknown Form,” Econometrica,
64, 837-864.
[21] Hong, Y., and Lee, Y. J. (2003), “Consistent Testing for Serial Uncorrelation of Unknown
Form under General Conditional Heteroscedasticity,” Preprint, Cornell University, Department
of Economics.
36
[22] Hormann, S., Horvath, L., and Reeder, R. (2013), “A Functional Version of the ARCH Model,”
Econometric Theory, 29, 267-288.
[23] Hormann, S., Kidzinski, L., and Hallin, M. (2015), “Dynamic Functional Principal Components,”
Journal of the Royal Statistical Society, Ser. B, 77, 319-348.
[24] Hormann, S., and Kokoszka, P. (2010), “Weakly Dependent Functional Data,” Annals of Statis-
tistics, 38, 1845-1884.
[25] Horowitz, J. L., Lobato, I. N., Nankervis, J. C., and Savin, N. E. (2006), “Bootstrapping the
Box-Pierce Q test: A Robust Test of Uncorrelatedness,” Journal of Econometrics, 133, 841-862.
[26] Horvath, L., Huskova, M., and Rice, G. (2013), “Test of Independence for Functional Data,”
Journal of Multivariate analysis, 117, 100-119.
[27] Horvath, L., and Kokoszka, P. (2012), Inference for Functional Data with Applications, New
York: Springer.
[28] Kargin, V., and Onatski, A. (2008), “Curve Forecasting by Functional Autoregression,” Journal
of Multivariate Analysis, 99, 2508-2526.
[29] Kokoszka, P., and Reimherr, M. (2013), “Predictability of Shapes of Intraday Price Curves,”
Econometrics Journal, 16, 285-308.
[30] Laukaitis, A., and Rackauskas, A. (2002), “Functional Data Analysis of Payment Systems,”
Nonlinear Analysis: Modeling and Control, 7, 53-68.
[31] Lobato, I. N., Nankervis, J. C., and Savin, N. E. (2002), “Testing for Zero Autocorrelation in
the Presence of Statistical Dependence,” Econometric Theory, 18, 730-743.
[32] Mas, A. (2007), “Weak Convergence in the Functional Autoregressive Model,” Journal of Mul-
tivariate analysis, 98, 1231-1261
[33] Panaretos, V. M., and Tavakoli, S. (2013a), “Fourier Analysis of Stationary Time Series in
Function Space,” Annals of Statistics, 41, 568-603.
[34] Panaretos, V. M., and Tavakoli, S. (2013b), “Cramer-Karhunen-Loeve Representation and Har-
monic Principal Component Analysis of Functional Time Series,” Stochastic Processes and their
Applications, 123, 2779-2807.
[35] Politis, D. N., and Romano, J. (1994), “Limit Theorems for Weakly Dependent Hilbert Space
Valued Random Variables with Application to the Stationary Bootstrap,” Statistica Sinica, 4,
461-476.
[36] Politis, D. N., Romano, J., and Wolf, M. (1999), Subsampling, New York: Springer-Verlag.
[37] Riesz, F., and Sz-Nagy, B. (1955), Functional Analysis, New York: Ungar.
37
[38] Romano, J. L., and Thombs, L. A. (1996), “Inference for Autocorrelations under Weak Assump-
tions,” Journal of the American Statistical Association, 91, 590-600.
[39] Shao, X. (2011a), “A Bootstrap-assisted Spectral Test of White Noise under Unknown Depen-
dence,” Journal of Econometrics, 162, 213-224.
[40] Shao, X. (2011b), “Testing for White Noise under Unknown Dependence and Its Applications
to Goodness-of-fit for Time Series Models,” Econometric Theory, 27, 312-343.
[41] Tsay, R. S. (2005), Analysis of Financial Time Series, New York: Wiley.
[42] Zhu, K. (2007), Operator Theory in Function Spaces, Vol. 138 of Mathematical surveys and
monographs, American Mathematical Society, Providence, RI.
[43] Zhu, K., and Li, W. (2015), “A Bootstrapped Spectral Test for Adequacy in Weak ARMA
Models,” Journal of Econometrics, 187, 113-130.
38
Table 1: Empirical sizes in percentage for the GK test, the HHR test, and the spectra-based test. The
sample size T = 101 (upper panel), 257 (lower panel), and the number of Monte Carlo replications
is 1000. The bootstrap critical values are computed based on B = 299 resamples.
BM BB FARCH
10% 5% 1% 10% 5% 1% 10% 5% 1%
GK, H = 1 8.6 4.0 0.7 8.4 3.9 0.6 19.2 12.2 4.1
GK, H = 3 10.0 4.5 0.7 7.7 4.2 0.4 16.9 9.9 3.4
GK, H = 5 7.5 3.5 0.3 6.7 3.2 0.8 13.1 8.1 2.2
HHR, H = 10 9.5 5.3 1.8 10.7 7.2 3.3 9.8 7.4 3.2
HHR, H = 30 8.9 5.3 2.3 11.1 6.9 2.7 10.7 7.1 3.3
HHR, H = 50 8.3 5.5 1.9 11.2 7.1 2.9 9.7 6.2 2.4
Boot, bT = 1 8.6 4.4 1.0 10.6 5.0 1.4 9.7 4.7 0.5
Boot, bT = 5 9.2 5.6 1.5 10.1 5.2 0.9 12.9 5.7 1.4
Boot, bT = 10 15.0 8.6 2.5 10.9 4.9 1.8 15.0 7.4 1.3
Boot, bT = b∗ 11.5 7.0 1.5 10.5 5.5 1.0 9.5 5.5 1.0
GK, H = 1 10.4 5.0 1.6 9.1 4.9 0.7 23.5 14.1 4.4
GK, H = 3 9.8 4.8 1.2 10.1 4.4 1.3 17.2 11.1 3.4
GK, H = 5 10.0 4.3 0.6 8.7 4.6 0.8 14.6 9.3 2.0
HHR, H = 10 9.0 5.8 2.7 10.9 7.3 3.6 12.2 8.6 4.5
HHR, H = 30 10.0 6.9 2.8 11.2 6.8 2.0 11.4 7.8 3.5
HHR, H = 50 11.7 7.4 2.6 9.9 6.2 3.7 11.7 7.7 3.1
Boot, bT = 1 11.1 5.8 1.1 11.0 5.5 1.6 11.2 5.4 1.6
Boot, bT = 8 10.5 5.4 1.8 10.4 5.5 0.7 10.8 5.3 2.1
Boot, bT = 16 10.8 6.3 2.2 10.3 4.8 0.9 11.8 6.0 1.7
Boot, bT = b∗ 9.5 4.5 2.0 11.0 6.0 1.5 13.0 6.0 1.5
Note: b∗ denotes the block size chosen by the minimum volatility method.
39
Table 2: Empirical powers in percentage for the GK test, the HHR test, and the spectra-based
test. The sample size T = 101 (upper panel), 257 (lower panel), and the number of Monte Carlo
replications is 1000. The bootstrap critical values are computed based on B = 299 resamples.
FAR(1)-G-BM FAR(1)-W-BM FAR(1)-G-BB FAR(1)-W-BB
10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%
GK, H = 1 77.1 66.1 39.3 69.4 57.1 31.8 76.9 63.6 35.5 69.1 53.3 25.2
GK, H = 3 55.4 42.2 23.7 49.1 35.5 19.8 51.2 38.4 15.6 43.5 31.8 12.0
GK, H = 5 45.2 34.3 17.4 39.0 29.1 15.1 38.0 25.6 10.5 32.0 20.7 8.5
HHR, H = 10 54.2 45.8 32.0 57.7 49.5 35.5 49.1 38.8 25.6 46.1 37.4 25.1
HHR, H = 30 42.7 34.2 23.3 45.7 37.8 26.5 37.3 28.0 16.4 37.0 28.6 15.9
HHR, H = 50 39.9 32.0 20.4 42.7 35.8 24.1 32.1 24.0 13.7 31.8 24.3 13.7
Boot, bT = 1 87.0 78.5 55.6 89.5 82.9 62.8 80.8 71.9 47.9 80.9 69.9 45.2
Boot, bT = 5 84.9 73.6 43.0 87.9 78.0 48.4 77.7 65.4 32.2 76.9 64.4 32.1
Boot, bT = 10 85.2 74.7 45.5 87.8 78.0 51.1 79.5 65.8 36.2 79.0 64.7 35.3
Boot, bT = b∗ 85.5 81.0 49.5 89.0 83.5 55.0 78.5 66.5 37.5 76.5 63.5 36.5
GK, H = 1 99.5 99.2 95.8 99.0 97.9 93.0 100.0 99.8 98.2 99.3 98.0 91.7
GK, H = 3 97.7 94.4 83.3 95.2 90.7 77.7 97.7 94.4 83.8 93.3 86.5 71.6
GK, H = 5 93.3 88.0 73.1 89.9 83.0 66.3 92.3 85.8 65.8 83.5 74.5 52.0
HHR, H = 10 95.9 93.6 86.0 96.3 94.6 88.9 93.3 89.7 78.6 92.4 87.7 76.5
HHR, H = 30 84.4 78.5 64.3 87.1 82.0 69.3 77.2 68.8 51.8 76.3 67.8 51.1
HHR, H = 50 78.0 70.5 54.3 82.2 75.9 60.6 68.3 59.5 42.6 67.8 58.5 42.2
Boot, bT = 1 99.8 99.3 97.1 99.7 99.5 98.6 99.9 99.0 94.5 99.5 99.1 93.7
Boot, bT = 8 99.5 99.1 94.3 99.8 99.1 96.0 99.4 97.7 90.2 99.4 97.7 89.3
Boot, bT = 16 99.8 99.4 93.5 99.9 99.6 94.9 99.4 97.5 87.8 99.0 97.3 87.3
Boot, bT = b∗ 99.5 99.5 97.0 99.5 99.5 97.5 100.0 99.5 94.0 100.0 100.0 94.0
Note: b∗ denotes the block size chosen by the minimum volatility method.
40
Table 3: Empirical sizes in percentage for the GKH tests and the spectra-based test for testingthe error correlation in functional linear models. Note that K(·, ·) is the Gaussian kernel (i)or the Wiener kernel (ii). The sample size T = 101 (upper panel), 257 (lower panel), and thenumber of Monte Carlo replications is 1000. The bootstrap critical values are computed basedon B = 299 resamples.
G-BM G-BB G-FARCH
10% 5% 1% 10% 5% 1% 10% 5% 1%
GHK1, H = 1 (i) 8.6 4.3 1.0 8.5 3.9 1.1 22.7 14.3 5.4(ii) 8.7 4.2 1.0 8.6 4.2 1.1 22.7 14.4 5.4
GHK1, H = 3 (i) 8.6 3.5 1.0 6.6 2.9 0.6 17.0 9.9 3.6(ii) 8.6 3.5 1.0 6.9 2.9 0.7 17.1 9.9 3.6
GHK1, H = 5 (i) 6.1 3.1 0.8 6.7 2.5 0.7 14.0 8.2 1.8(ii) 6.0 3.1 0.8 6.8 2.5 0.6 14.1 8.4 1.8
GHK2, H = 1 (i) 8.9 3.5 0.8 9.2 4.0 1.1 19.9 11.9 3.6(ii) 9.0 3.4 0.7 9.1 3.5 1.3 19.9 11.9 3.8
GHK2, H = 3 (i) 7.7 3.7 0.9 7.0 2.9 0.4 15.8 9.5 2.5(ii) 8.1 3.9 0.9 7.4 3.3 0.5 15.7 9.2 2.4
GHK2, H = 5 (i) 6.3 3.1 0.5 7.2 2.9 0.4 13.2 8.1 2.0(ii) 6.7 3.1 0.6 7.0 2.6 0.4 13.1 7.7 1.9
Boot, bT = 1 (i) 11.1 5.3 1.3 11.3 5.5 1.0 11.5 5.5 0.6(ii) 11.2 5.3 1.2 11.3 5.3 1.1 11.6 5.5 0.6
Boot, bT = 5 (i) 12.0 5.4 1.5 10.1 4.7 0.8 11.9 5.2 1.1(ii) 12.1 5.4 1.4 9.9 4.9 0.8 11.8 5.2 1.1
Boot, bT = 10 (i) 14.6 6.7 1.3 11.8 5.9 1.2 14.1 7.3 2.4(ii) 14.5 6.8 1.1 11.5 6.0 1.2 14.4 7.4 2.4
Boot, bT = b∗ (i) 13.5 8.0 2.5 11.5 7.0 1.0 14.0 4.0 0.5(ii) 13.5 8.0 2.5 11.5 6.5 1.0 14.5 3.5 0.5
GHK1, H = 1 (i) 10.9 5.3 1.4 9.2 5.2 1.0 29.7 17.5 7.5(ii) 10.9 5.3 1.4 9.4 5.1 1.0 29.5 17.5 7.5
GHK1, H = 3 (i) 10.2 5.2 1.3 8.9 4.3 1.2 23.3 14.1 4.9(ii) 9.8 5.3 1.3 8.9 4.4 1.2 23.1 13.9 5.0
GHK1, H = 5 (i) 9.6 4.4 0.5 8.7 4.2 0.5 20.3 11.7 2.9(ii) 9.6 4.5 0.5 8.7 4.1 0.5 20.4 12.2 2.9
GHK2, H = 1 (i) 11.1 6.0 1.2 10.0 5.3 1.0 22.1 13.8 5.5(ii) 11.1 6.1 1.1 9.8 5.8 1.0 22.2 13.8 5.3
GHK2, H = 3 (i) 9.3 4.6 0.4 9.2 4.7 0.7 18.0 11.1 3.3(ii) 9.5 4.6 0.4 9.4 4.1 0.8 17.9 10.7 3.1
GHK2, H = 5 (i) 9.4 4.0 0.5 7.6 3.7 0.8 15.5 9.2 2.3(ii) 9.4 4.2 0.5 7.9 4.1 0.6 15.6 9.3 2.1
Boot, bT = 1 (i) 10.1 5.0 1.1 10.6 4.3 1.0 10.9 4.8 1.2(ii) 10.0 4.9 1.0 10.6 4.2 1.0 10.8 4.9 1.2
Boot, bT = 8 (i) 10.0 4.7 1.6 10.5 5.1 1.1 12.9 6.5 1.1(ii) 10.1 4.8 1.7 10.5 5.1 1.1 12.9 6.5 1.1
Boot, bT = 16 (i) 13.6 7.6 2.0 10.1 5.1 1.7 11.7 6.1 1.8(ii) 13.4 7.6 1.9 10.3 5.0 1.7 11.6 6.2 1.8
Boot, bT = b∗ (i) 10.0 5.0 1.5 12.0 5.5 2.0 8.0 3.0 0.0(ii) 9.0 5.0 1.5 12.0 6.0 2.0 8.5 3.0 0.0
Note: b∗ denotes the block size chosen by the minimum volatility method.
41
Table 4: Empirical powers in percentage for the GKH tests and the spectra-based test fortesting the error correlation in functional linear models. Note that K(·, ·) is the Gaussiankernel (i) or the Wiener kernel (ii). The sample size T = 101 (upper panel), 257 (lowerpanel), and the number of Monte Carlo replications is 1000. The bootstrap critical values arecomputed based on B = 299 resamples.
FAR(1)-G-BM FAR(1)-W-BM FAR(1)-G-BB FAR(1)-W-BB
10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%
GHK1, H = 1 (i) 84.0 72.1 45.5 37.6 26.0 8.9 78.6 67.1 40.3 67.5 52.6 27.0(ii) 83.8 72.3 45.6 37.4 25.6 8.8 78.2 67.4 40.5 67.4 52.7 26.8
GHK1, H = 3 (i) 58.8 44.7 20.3 23.2 13.8 3.8 51.2 38.4 18.2 42.0 31.9 12.0(ii) 59.0 44.8 20.4 23.2 14.0 3.6 51.3 38.7 17.9 41.6 31.7 11.9
GHK1, H = 5 (i) 45.7 31.0 10.9 18.4 10.5 3.5 38.4 27.2 11.2 31.7 21.0 7.3(ii) 45.6 31.2 10.8 18.4 10.5 3.4 38.4 26.9 11.2 32.1 20.7 7.2
GHK2, H = 1 (i) 73.7 62.2 36.7 63.7 51.5 28.3 82.6 72.6 48.7 71.4 58.7 31.7(ii) 70.7 59.3 34.0 64.0 52.1 28.6 76.1 65.9 38.0 72.0 58.9 31.3
GHK2, H = 3 (i) 53.3 41.3 19.5 45.8 33.0 14.7 58.6 44.5 21.7 46.1 33.4 14.6(ii) 51.5 38.7 18.7 46.0 33.4 14.7 51.1 36.3 16.7 45.8 33.5 14.6
GHK2, H = 5 (i) 44.0 32.1 14.5 38.2 27.0 12.0 43.6 30.3 11.6 35.1 23.1 7.9(ii) 42.0 30.7 13.7 38.5 27.3 11.9 38.1 25.1 8.9 34.8 23.7 8.7
Boot, bT = 1 (i) 76.2 67.3 44.1 79.6 72.0 49.1 71.5 59.2 35.7 70.1 58.4 34.6(ii) 76.3 67.3 44.3 79.6 71.9 49.2 71.7 59.1 35.4 70.2 58.5 34.5
Boot, bT = 5 (i) 80.2 68.4 37.9 84.1 72.5 42.3 71.2 56.1 27.0 70.3 54.5 26.1(ii) 80.2 67.9 38.2 83.9 72.6 42.2 71.0 56.0 27.0 70.4 54.2 25.9
Boot, bT = 10 (i) 80.9 65.3 40.1 83.8 71.0 45.7 70.9 53.5 26.5 69.0 52.2 24.9(ii) 80.9 65.2 40.6 83.7 70.9 45.7 71.0 53.4 26.2 68.8 52.6 25.2
Boot, bT = b∗ (i) 79.0 67.5 38.0 82.5 72.0 42.5 67.5 57.0 27.5 68.5 57.5 25.0(ii) 79.0 67.5 38.5 82.5 71.5 42.5 68.5 55.0 28.5 68.5 57.0 25.0
GHK1, H = 1 (i) 100.0 100.0 99.4 86.6 78.0 56.8 99.9 99.9 99.4 99.8 99.4 97.0(ii) 100.0 100.0 99.4 86.5 77.9 57.0 99.9 99.9 99.4 99.8 99.4 96.9
GHK1, H = 3 (i) 99.2 97.4 91.7 65.5 54.2 31.3 99.3 98.1 92.5 96.9 92.4 79.0(ii) 99.0 97.5 92.1 65.8 54.3 31.3 99.3 98.1 92.6 96.9 92.3 78.6
GHK1, H = 5 (i) 95.3 92.1 79.5 55.2 42.6 22.9 97.2 91.6 78.6 89.2 82.1 59.8(ii) 95.3 92.0 79.5 55.5 42.3 22.9 97.1 91.7 78.4 89.2 82.2 60.1
GHK2, H = 1 (i) 99.4 98.6 93.0 98.6 97.7 89.3 100.0 100.0 99.9 100.0 99.6 97.4(ii) 99.3 98.0 90.8 98.8 97.8 89.5 100.0 99.8 98.4 99.9 99.7 97.6
GHK2, H = 3 (i) 95.1 91.0 79.2 93.6 86.7 72.3 99.3 98.5 92.1 97.1 94.0 81.8(ii) 94.0 89.9 77.0 93.5 86.9 72.2 98.4 95.4 86.2 97.1 93.5 81.7
GHK2, H = 5 (i) 90.9 84.6 68.5 86.7 80.0 61.6 96.7 93.0 79.2 91.2 84.1 63.5(ii) 89.5 82.6 66.8 86.8 80.2 61.7 93.3 88.2 67.7 92.4 84.1 64.2
Boot, bT = 1 (i) 99.7 99.1 96.0 99.8 99.5 97.6 99.4 98.1 91.6 99.0 97.9 90.6(ii) 99.7 99.1 95.9 99.8 99.5 97.7 99.4 98.1 91.7 99.1 97.9 90.8
Boot, bT = 8 (i) 99.5 99.0 92.1 99.7 99.4 94.5 99.3 96.9 84.9 98.4 96.9 84.1(ii) 99.5 99.0 92.1 99.7 99.3 94.5 99.3 97.0 84.9 98.5 96.8 84.9
Boot, bT = 16 (i) 99.7 99.1 92.0 99.9 99.6 95.0 99.3 97.1 85.0 99.0 96.4 84.1(ii) 99.7 99.1 92.1 99.9 99.6 95.0 99.3 97.0 84.6 98.9 96.5 84.0
Boot, bT = b∗ (i) 100.0 99.0 95.0 100.0 99.5 96.0 99.0 96.5 90.0 99.0 96.0 89.5(ii) 100.0 99.0 95.0 100.0 99.5 96.0 99.0 96.5 90.0 99.0 96.0 90.0
Note: b∗ denotes the block size chosen by the minimum volatility method.
42
Table 5: Empirical sizes and powers in percentage for the spectra-based test for testing the goodness
of fit for FAR(1) models with BM (i), BB (ii) or FARCH(1) (iii) innovations. Under the alternatives,
the data are generated from FAR(2) models with the corresponding innovations. The sample size
T = 101 (upper panel), 257 (lower panel), and the number of Monte Carlo replications is 1000. The
bootstrap critical values are computed based on B = 299 resamples.
FAR(1)-G FAR(1)-W FAR(2)-G FAR(2)-W
10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%
Boot, bT = 1 (i) 9.5 5.5 1.2 9.5 5.7 1.3 75.0 63.0 39.5 81.7 72.8 49.8
(ii) 9.6 4.6 1.3 9.1 4.7 1.1 49.8 37.0 15.8 57.3 43.3 20.2
(iii) 10.9 4.8 1.1 11.0 5.9 1.2 56.3 42.4 18.7 73.0 61.2 33.8
Boot, bT = 5 (i) 12.9 6.4 1.4 12.4 7.6 1.6 72.4 59.8 30.2 81.7 69.3 38.5
(ii) 10.1 5.2 0.8 11.1 5.3 0.8 46.0 30.6 10.8 51.9 35.7 14.6
(iii) 10.3 4.1 1.0 11.4 5.3 0.9 55.0 39.3 16.0 70.8 55.8 26.8
Boot, bT = 10 (i) 11.5 6.0 0.7 11.4 5.9 1.8 75.9 61.0 30.8 81.8 67.5 38.6
(ii) 10.4 4.3 0.7 10.5 4.5 0.6 41.9 27.9 9.1 48.0 32.1 12.2
(iii) 12.8 6.0 0.7 13.7 7.7 1.8 56.5 39.6 18.1 73.2 57.0 30.0
Boot, bT = b∗ (i) 10.0 4.0 1.0 9.5 6.0 1.0 83.0 71.5 47.0 84.5 77.5 52.0
(ii) 9.0 2.5 0.5 10.0 4.0 0.5 47.5 32.0 17.0 55.5 38.0 16.0
(iii) 9.0 4.5 1.5 9.0 5.0 1.5 61.0 45.5 20.5 72.0 60.5 31.5
Boot, bT = 1 (i) 9.1 5.5 1.4 9.6 5.2 1.2 99.6 98.5 93.3 99.8 99.6 96.6
(ii) 9.9 5.0 1.1 10.0 4.3 1.0 93.2 86.3 70.1 94.9 91.4 75.8
(iii) 11.1 5.2 1.3 11.7 6.8 1.3 92.8 87.7 70.2 98.2 96.6 90.2
Boot, bT = 8 (i) 10.3 5.5 1.2 10.5 5.7 1.0 99.1 97.8 88.5 99.6 99.1 94.1
(ii) 9.0 4.0 0.7 9.1 3.9 0.7 89.9 80.5 57.7 93.5 87.3 65.7
(iii) 11.5 5.1 0.9 11.8 5.6 1.3 91.5 82.4 58.4 99.3 97.4 82.4
Boot, bT = 16 (i) 11.3 6.1 1.2 10.9 6.2 1.7 99.0 97.0 86.3 99.7 98.9 92.4
(ii) 8.8 3.4 0.5 8.7 3.1 0.6 89.4 80.9 53.7 93.2 86.5 60.8
(iii) 11.2 5.9 1.6 11.9 6.5 2.3 92.1 84.2 60.4 98.6 96.5 83.8
Boot, bT = b∗ (i) 9.5 6.5 1.5 12.0 6.0 2.5 98.0 97.0 90.5 99.5 98.5 95.0
(ii) 13.0 5.5 1.0 12.0 5.5 1.0 89.0 81.5 57.5 93.0 87.5 67.0
(iii) 9.5 3.5 1.0 9.0 3.5 1.0 96.5 90.0 73.5 99.5 99.5 91.0
Note: b∗ denotes the block size chosen by the minimum volatility method.
43
−4
−2
02
IBM
Intr
ad
aily
cu
mu
lativ
e r
etu
rns
2005 2006 2007
−4
−2
02
4
MCD
Intr
ad
aily
cu
mu
lativ
e r
etu
rns
2005 2006 2007
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
IBM
valu
es
PC1, 84.0%PC2, 7.4%PC3, 2.6%
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
MCD
valu
es
PC1, 86.1%PC2, 6.5%PC3, 2.4%
0.0
0.1
0.2
0.3
0.4
0.5
IBM
Year
P−
valu
es
2005 2006 2007
boot, bN=1GK, H=3GK, H=5HHR, H=30HHR, H=50
0.0
0.2
0.4
0.6
0.8
MCD
Year
P−
valu
es
2005 2006 2007
boot, bN=1GK, H=3GK, H=5HHR, H=30HHR, H=50
Figure 1: Top panels: CIDRs for IBM and MCD from 2005 to 2007; Middle panels: the first
three PCs of the CIDRs for IBM and MCD at 2007; Bottom panels: p-values for testing the
uncorrelatedness of the CIDRs for IBM and MCD from 2005 to 2007.
44
Supplement to “White Noise Testing and Model Diagnostic Checking
for Functional Time Series”
Xianyang Zhang∗
Texas A&M University
December 27, 2015
This supplement contains the proofs of Lemmas 3.1-3.2 and 6.3-6.9 in the main paper.
1 Proofs of Lemmas
Proof of Lemma 6.3. Write Ds,l =(∑k
h=1
∑t∈B∗
sΓth,l
)/(2
√T ) and Ds,l =
(∑kh=1
∑t∈B∗
sΓh,l
)/(2
√T ). By the CR-inequality, we have E
∗|Ds,l|4 ≤ C(E∗|Ds,l|4 + E
∗|Ds,l|4).
For the first term, we have
E
MT∑
s=1
E∗|Ds,l|4 =
1
16T 2
MT∑
s=1
k∑
h1,h2,h3,h4=1
∑
t1,t2,t3,t4∈Bs
EΓt1h1,lΓt2h2,lΓt3h3,lΓt4h4,l
=1
16T 2
MT∑
s=1
k∑
h1,h2,h3,h4=1
∑
t1,t2,t3,t4∈Bs
cum(Γt1h1,l,Γt2h2,l,Γt3h3,l,Γt4h4,l)
+ cov(Γt1h1,l,Γt2h2,l)cov(Γt3h3,l,Γt4h4,l) + cov(Γt1h1,l,Γt3h3,l)cov(Γt2h2,l,Γt4h4,l)
+ cov(Γt1h1,l,Γt4h4,l)cov(Γt2h2,l,Γt3h3,l)
→ 0,
by similar arguments in Section 6.2. For the second term, it is not hard to show
E
MT∑
s=1
E∗|Ds,l|4 ≤
MT b4T
16T 2
k∑
h1,h2,h3,h4=1
EΓh1,lΓh2,lΓh3,lΓh4,l
≤CMT b4T
T 6
k∑
h1,h2,h3,h4=1
Wh1,l1Wh2,l1Wh3,l1Wh4,l1
T∑
t1=h1+1
T∑
t2=h2+1
T∑
t3=h3+1
T∑
t4=h4+1
E
4∏
i=1
(Xti,l2Xti−hi,l3 +Xti,l3Xti−hi,l2 − EXti,l2Xti−hi,l3 − EXti,l3Xti−hi,l2) → 0.
Therefore, E∑MT
s=1 E∗|Ds,l|4 ≤ CE
∑MT
s=1
(E∗|Ds,l|4 + E
∗|Ds,l|4)→ 0. ♦
∗Department of Statistics, Texas A&M University, College Station, TX 77843, USA. E-mail: zhangxi-any@stat.tamu.edu.
1
Proof of Lemma 6.4. We only prove the result for J∗2,k,T . The argument for J∗
2,k,T is similar. Notethat
EE∗||J∗
2,k,T ||2A =E1
8π2T
T−1∑
h=k+1
1
h2E∗ <
MT∑
s=1
∑
t∈B∗s
Yth,
MT∑
s′=1
∑
t′∈B∗s′
Yt′h >
=E1
8π2T
T−1∑
h=k+1
1
h2
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
Yth
∣∣∣∣∣
∣∣∣∣∣
2
I2
+ (1− 1/MT )
∣∣∣∣∣
∣∣∣∣∣
MT∑
s=1
∑
t∈Bs
Yth
∣∣∣∣∣
∣∣∣∣∣
2
I2
≤E1
2π2T
T−1∑
h=k+1
1
h2
+∞∑
i,j=1
MT∑
s=1
(∑
t∈Bs
(Xt,iXt−h,j − EXt,iXt−h,j)
)2
+ (1− 1/MT )
(MT∑
s=1
∑
t∈Bs
(Xt,iXt−h,j − EXt,iXt−h,j)
)2
=E1
2π2T
T−1∑
h=k+1
1
h2(J1,h,T + J2,h,T ).
For the first term, we have EJ1,h,T/T = 1T
∑+∞i,j=1
∑MT
s=1
∑t,t′∈Bs
cov(Xt,iXt−h,j ,Xt′,iXt′−h,j). Follow-ing the arguments in the proof of Theorem 2.1, we can show that the contribution from J1,h,T is oforder O(1) uniformly for h. On the other hand, we note that,
EJ2,h,T/T ≤ 1
T
+∞∑
i,j=1
MT∑
s,s′=1
∑
t∈Bs
∑
t′∈Bs′
cov(Xt,iXt−h,j,Xt′,iXt′−h,j)
=1
T
+∞∑
i,j=1
∑
s 6=s′
∑
t∈Bs
∑
t′∈Bs′
cov(Xt,iXt−h,j,Xt′,iXt′−h,j) +1
T
+∞∑
i,j=1
MT∑
s=1
∑
t,t′∈Bs
cov(Xt,iXt−h,j ,Xt′,iXt′−h,j).
(S.1)
The second term in (S.1) can be handled using similar arguments in Section 6.2. And for the firstterm, we have
1
T
+∞∑
i,j=1
∑
s 6=s′
∑
t∈Bs
∑
t′∈Bs′
cov(Xt,iXt−h,j,Xt′,iXt′−h,j)
≤CMT
T
+∞∑
s=−∞|s|(|||γs|||21 + |||Rs+h,s,h|||1 + ||γs−h||S ||γs+h||S
).
Therefore, limk→+∞ limT→+∞ EE∗||J∗
2,k,T ||2 = 0. ♦
Proof of Lemma 6.5. We aim to show that var(∑T
t=1 ||Yt||2/T ) = o(1). Using the fact that Yt =η(Xt) + ǫt and η is a bounded linear operator, we deduce that
1
T 2var
(T∑
t=1
||Yt||2)
≤ C
T 2
var
(T∑
t=1
||Xt||2)
+ var
(T∑
t=1
||ǫt||2)
.
2
Direct calculation yields that
1
T 2var
(T∑
t=1
||Xt||2)
=1
T 2
T∑
t,t′=1
cov(||Xt||2, ||Xt′ ||2
)=
1
T 2
T∑
t,t′=1
∫
I2
cov(X2t (τ1),X
2t′(τ2))dτ1dτ2.
≤ 1
T
+∞∑
s=−∞
∣∣∣∣∫
I2
RXs,0,s(τ1, τ2, τ1, τ2) + 2γ2X,s(τ1, τ2)dτ1dτ2
∣∣∣∣
≤ 1
T
+∞∑
s=−∞(|||RX
s,0,s|||1 + 2||γX,s||2S) = O(1/T ).
The same argument applies to the term associated with ǫt. ♦
Proof of Lemma 3.1. Under the assumption that Xt and ǫt are both L4-m-approximable, it isstraightforward to show that ||CY,X − CY,X ||S = Op(1/
√T ) and ||CX − CX ||S = Op(1/
√T ) [see
Theorem 3.1 of Hormann and Kokoszka (2010)]. Define the projection operator πkT (·) =∑kT
j=1 <
φj , · > φj and πkT (·) =∑kT
j=1 < φj , · > φj . For any x ∈ L2(I) with ||x|| ≤ 1, we have thedecomposition ηT (x) − η(x) = ηT πkT (x) − ηπkT (x) + ηπkT (x) − η(x) = A1,T (x) + A2,T (x). Wefurther decompose A1,T (x) as follows,
A1,T (x) =ηT πkT (x)− ηπkT (x) = CY,X
kT∑
j=1
(λ−1j − λ−1
j ) < φj , x > φj
+ CY,X
kT∑
j=1
λ−1j < φj − φj , x > φj
+ CY,X
kT∑
j=1
λ−1j < φj , x > (φj − φj)
+ (CY,X − CY,X)
kT∑
j=1
λ−1j < φj, x > φj
=A1,1,T (x) +A1,2,T (x) +A1,3,T (x) +A1,4,T (x).
By the Cauchy Schwartz inequality, we can show that ||CY,X(φj)||2 ≤ λj∑T
t=1 ||Yt||2/T . Because
|λj − λj| ≤ ||CX − CX ||L, we deduce that
||A1,1,T (x)|| ≤kT∑
j=1
|λj − λj|λjλj
| < φj , x > | · ||CY,X(φj)||
≤(
T∑
t=1
||Yt||2/T)1/2
||CX − CX ||LkT∑
j=1
1
λj λ1/2j
| < φj , x > |
≤(
T∑
t=1
||Yt||2/T)1/2
λ−1/2kT
kT∑
j=1
1
λ2j
1/2
||CX − CX ||L.
As Tλ2kT
→ +∞, we have P (||CX − CX ||L ≤ λkT /2) → 1, which implies that P (λkT > λkT /2) → 1.
On the event λkT > λkT /2, we have
||A1,1,T (x)|| ≤√2
(T∑
t=1
||Yt||2/T)1/2
λ−1/2kT
kT∑
j=1
1
λ2j
1/2
||CX − CX ||L.
3
By Lemma 6.5 and the assumption that (∑kT
j=1 1/λ2j )/(λkT
√T ) = o(1), we have
sup||x||≤1 ||A1,1,T (x)|| = op(1/T1/4). Further by Lemma 4.3 of Bosq (2000), we have ||φj − φj || ≤
αj ||CX − CX ||L. Thus we obtain
||A1,2,T (x)|| ≤(
T∑
t=1
||Yt||2/T)1/2
||CX − CX ||LkT∑
j=1
λ1/2j λ−1
j αj
≤C
(T∑
t=1
||Yt||2/T)1/2
||CX − CX ||LkT∑
j=1
λ−1j αj = op(1/T
1/4),
by the assumption that∑kT
j=1 λ−1j αj/
√T = o(1/T 1/4). Turning to A1,3,T (x) and A1,4,T (x), we have
||A1,3,T (x)|| ≤ ||CY,X ||L||CX − CX ||LkT∑
j=1
λ−1j aj = op(1/T
1/4),
and
||A1,4,T (x)|| ≤ ||CY,X − CY,X ||LkT∑
j=1
λ−1j = op(1/T
1/4),
which implies that sup||x||≤1 ||A1,T (x)|| = op(1/T1/4). Finally, we note that
||A2,T (x)|| ≤∑
j>kT
| < x, φj > | · ||η(φj)|| ≤
∑
j>kT
||η(φj)||2
1/2
= o(1/T 1/4).
Summarizing the above results, we have ||ηT − η||L = sup||x||≤1 ||ηT (x)− η(x)|| = op(1/T1/4). ♦
Proof of Lemma 6.6. By stationarity, we only need to show that var(∑T
t=1 ||Xt||2)/T 2 = o(1). Letρj(·, ·) be the kernel associated with the operator ρj . Define Pj,j′(τ
′1, τ
′2) =
∫I ρ
j(τ1, τ′1)ρ
j′(τ1, τ′2)dτ1,
which is the kernel associated with the operator ρj∗ρj′where ρj∗ denotes the adjoint operator of ρj.
Using the fact that Xt =∑+∞
j=0 ρj(εt−j) [see Theorem 3.1 of Bosq (2000)], we have
1
T 2var
(T∑
t=1
||Xt||2)
=1
T 2
T∑
t,t′=1
cov
∣∣∣∣∣∣
∣∣∣∣∣∣
+∞∑
j1=0
ρj1(εt−j1)
∣∣∣∣∣∣
∣∣∣∣∣∣
2
,
∣∣∣∣∣∣
∣∣∣∣∣∣
+∞∑
j2=0
ρj2(εt′−j2)
∣∣∣∣∣∣
∣∣∣∣∣∣
2
=1
T 2
T∑
t,t′=1
∫
I2
cov
+∞∑
j1=0
ρj1(εt−j1)(τ1)
2
,
+∞∑
j2=0
ρj2(εt′−j2)(τ2)
2 dτ1dτ2
=1
T 2
T∑
t,t′=1
T∑
j1,j2,j3,j4=0
∫
I4
Pj1,j2(τ′1, τ
′2)Pj3,j4(τ
′3, τ
′4)cov(εt−j1(τ
′1)εt−j2(τ
′2), εt′−j3(τ
′3)εt′−j4(τ
′4))dτ
′1dτ
′2dτ
′3dτ
′4
=1
T 2
T−1∑
s=1−T
(T − |s|)T∑
j1,j2,j3,j4=0
∫
I4
Pj1,j2(τ′1, τ
′2)Pj3,j4(τ
′3, τ
′4)
Rε
s+j4−j1,s+j4−j2,j4−j3(τ′1, τ
′2, τ
′3, τ
′4)
+ γε,s+j3−j1(τ′1, τ
′3)γε,s+j4−j2(τ
′2, τ
′4) + γε,s+j4−j1(τ
′1, τ
′4)γε,s+j3−j2(τ
′2, τ
′3)
dτ ′1dτ
′2dτ
′3dτ
′4,
4
where γε,s = 0 for s 6= 0. Because ||Pj1,j2 ||S ≤ ||ρ||j1+j2S , by the Cauchy-Schwarz inequality, we deduce
that∫
I4
Pj1,j2(τ′1, τ
′2)Pj3,j4(τ
′3, τ
′4)Rε
s+j4−j1,s+j4−j2,j4−j3(τ′1, τ
′2, τ
′3, τ
′4)dτ
′1dτ
′2dτ
′3dτ
′4
≤||ρ||j1+j2+j3+j4S ||Rε
s+j4−j1,s+j4−j2,j4−j3 ||S .
Similarly we have
∫
I4
Pj1,j2(τ′1, τ
′2)Pj3,j4(τ
′3, τ
′4)γε,s+j3−j1(τ
′1, τ
′3)γε,s+j4−j2(τ
′2, τ
′4)dτ
′1dτ
′2dτ
′3dτ
′4
≤||ρ||j1+j2+j3+j4S ||γε,s+j3−j1 ||S ||γε,s+j4−j2 ||S ,
and∫
I4
Pj1,j2(τ′1, τ
′2)Pj3,j4(τ
′3, τ
′4)γε,s+j4−j1(τ
′1, τ
′4)γε,s+j3−j2(τ
′2, τ
′3)dτ
′1dτ
′2dτ
′3dτ
′4
≤||ρ||j1+j2+j3+j4S ||γε,s+j4−j1 ||S ||γε,s+j3−j2 ||S ,
The conclusion follows from the summability conditions. ♦
Proof of Lemma 3.2. Under the assumption that E||ε1||4 < ∞, we know Xt is L4-m-approximable[see Example 2.1 of Hormann and Kokoszka (2010)]. It is straightforward to show that ||γX,i −γX,i||S = Op(1/
√T ) with i = 0, 1. The rest of the proof is essentially similar to those in Section 8.6 of
Bosq (2000) and the proof of Lemma 3.1. We sketch the steps below. Define the projection operatorsπkT (·) =
∑kTj=1 < φj , · > φj and πkT (·) =
∑kTj=1 < φj , · > φj . For x ∈ L2(I) with ||x|| ≤ 1, we have
the decomposition ρT (x) − ρ(x) = ρT πkT (x) − ρπkT (x) + ρπkT (x) − ρ(x) = A1,T (x) + A2,T (x).Further decompose A1,T (x) as follows,
A1,T (x) =ρT πkT (x)− ρπkT (x) = γX,1
kT∑
j=1
(λ−1j − λ−1
j ) < φj , x > φj
+ γX,1
kT∑
j=1
λ−1j < φj − φj , x > φj
+ γX,1
kT∑
j=1
λ−1j < φj, x > (φj − φj)
+ (γX,1 − γ1)
kT∑
j=1
λ−1j < φj , x > φj
=A1,1,T (x) +A1,2,T (x) +A1,3,T (x) +A1,4,T (x).
By similar arguments in the proof of Lemma 3.1 [also see equations (8.83), (8.85), (8.87) and (8.89)
5
of Bosq (2000)], we have
||A1,1,T (x)|| ≤ 2√2||γX − γX ||L
(1
T
T∑
t=1
||Xt||2)1/2
λ−1/2kT
kT∑
j=1
λ−2j
1/2
= op(1/T1/4),
||A1,2,T (x)|| ≤ C
(1
T
T∑
t=1
||Xt||2)1/2
kT∑
j=1
λ−1j αj
||γX − γX ||L = op(1/T
1/4),
||A1,3,T (x)|| ≤ ||γX,1||L
kT∑
j=1
λ−1j αj
||γX − γX ||L = op(1/T
1/4),
||A1,4,T (x)|| ≤ ||γX,1 − γX,1||L
kT∑
j=1
λ−1j
= op(1/T
1/4).
Here we have used Lemma 6.6 and Assumption 3.6. Thus we obtain that sup||x||≤1 ||A1,T (x)|| =
op(1/T1/4). The conclusion follows by noting that ||A2,T (x)|| ≤
∑j>kT
| < φj , x > | · ||ρ(φj)|| ≤(∑
j>kT||ρ(φj)||2)1/2 = o(1/T 1/4). ♦
Proof of Lemma 6.7. We first note that
T−1∑
h=1
1
Th2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
ργX γ−1X − ρ(Xt−1)εt−h
∣∣∣∣∣
∣∣∣∣∣
2
I2
≤∣∣∣∣ργX γ−1
X − ρ∣∣∣∣2L
T−1∑
h=1
1
Th2
∣∣∣∣∣
∣∣∣∣∣
T∑
t=h+1
Xt−1εt−h
∣∣∣∣∣
∣∣∣∣∣
2
I2
.
Thus we only need to show that∣∣∣∣ργX γ−1
X − ρ∣∣∣∣L = op(1). For any x ∈ L2(I) with ||x|| ≤ 1, simple
algebra yields that γX γ−1X (x) − x = 1
T
∑Tt=2
∑kTj=1
1λj
< Xt−1, φj >< x, φj > Xt−1 − x = −∑j>kT<
x, φj > φj . It thus implies ||ργX γ−1X (x) − ρ(x)||2 = ||∑j>kT
< x, φj > ρ(φj)||2 ≤ (∑
j>kT| <
x, φj > | · ||ρ(φj)||)2 ≤∑j>kT< x, φj >
2∑
j>kT||ρ(φj)||2 ≤∑j>kT
||ρ(φj)||2. Under the assumption∑kT
j=1 αj/√T = o(1), we can show
∑j>kT
||ρ(φj)||2 = op(1) by similar arguments in the proof of
Lemma 8.2 in Bosq (2000). It thus implies that∣∣∣∣ργX γ−1
X − ρ∣∣∣∣2L ≤ ∑
j>kT||ρ(φj)||2 = op(1). The
conclusion follows by noting that∑T−1
h=11
Th2
∣∣∣∣∣∣∑T
t=h+1Xt−1εt−h
∣∣∣∣∣∣2
I2= Op(1). ♦
Proof of Lemma 6.8. Define the operator IT,h(·) = 1T
∑Tj=h+1 < Xj−1, γ
−1X (·) > εj−h = γε1,Xh
γ−1X (·)
with γε1,Xh= 1
T
∑Tj=h+1 < Xj−1, · > εj−h. We note that the RHS of (50) is bounded by
1T
∣∣∣∣∣∣∑T
t=2Xt−1εt
∣∣∣∣∣∣2
I2
∑T−1h=1
1h2 ||IT,h − γε,0ρ
h−1∗ γ−1
X πkT ||2L. Because 1T
∣∣∣∣∣∣∑T
t=2 Xt−1εt
∣∣∣∣∣∣2
I2= Op(1), we
only need to show that∑T−1
h=11h2 ||IT,h−γε,0ρ
h−1∗ γ−1
X πkT ||2L = op(1). For any x ∈ L2(I) with ||x|| ≤ 1,the triangle inequality yields that
||IT,h(x)− γε,0ρh−1∗ γ−1
X πkT (x)||L ≤||γε1,Xh− γε,0ρ
h−1∗ ||L||γ−1
X (x)||+ ||γε,0||L||ρ∗||h−1
L ||γ−1X (x)− γ−1
X πkT (x)||.
Using similar arguments in the proof of Lemma 3.2, we can show that ||γ−1X − γ−1
X πkT ||L = op(1).Moreover, under the assumption that |||γε,0|||1 < ∞ and
∑+∞s1,s2,s3=1 |||Rε
s1,s2,s3 |||1 < ∞, it can be
shown∑T−1
h=11h2E||γε1,Xh
−γε,0ρh−1∗ ||2S = O(1/T ) (see arguments in the proof of Theorem 3.3). Notice
that ||γ−1X (x)|| ≤ ||γ−1
X πkT (x)|| + op(1) = λ−1kT
||x|| + op(1). The conclusion follows provided that
Tλ2kT
→ +∞. ♦
6
Proof of Lemma 6.9. Using the arguments in the proof of Lemma 6.8 and the proof of Theorem3.3, we can show that
∑T−1h=1
1Th2
∑MT
s=1 ||∑
t∈BsLj,t,h||2I2 = op(1) with j = 1, 2, 3 provided that
lim sup bT /√T < ∞. Below we consider L4,t,h. Decompose L4,t,h as,
L4,t,h =DTρh−1γε,0(τ1, τ2)−DTγXπkT γ
−1X ρh−1γε,0(τ1, τ2)
+DT (Xt−1)(τ1)εt−h(τ2)−DTρh−1γε,0(τ1, τ2)
+DTγXπkT γ−1X ρh−1γε,0(τ1, τ2)−DT (Xt−1)(τ1)γε,0ρ
h−1∗ γ−1
X πkT (Xt−1)(τ2)
+DT (Xt−1)(τ1)γε,0ρh−1∗ γ−1
X πkT (Xt−1)(τ2)−DT (Xt−1)(τ1)IT,h(Xt−1)(τ2)
=L(1)4,t,h + L
(2)4,t,h + L
(3)4,t,h + L
(4)4,t,h. (S.2)
We note that∑T−1
h=11
Th2
∑MT
s=1 ||∑
t∈BsL(1)4,t,h||2I2 ≤ ∑T−1
h=1bTh2 ||DT ||2L||ρh−1γε,0||2S = op(1) because
lim sup bT /√T < ∞. It is straightforward to show that
∑T−1h=1
1Th2
∑MT
s=1 ||∑
t∈BsL(2)4,t,h||2I2 = o(1)
by using similar arguments as before. For the third term in (S.2), notice that
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
DTγXπkT γ−1X ρh−1γε,0(τ1, τ2)−DT (Xt−1)(τ1)γε,0ρ
h−1∗ γ−1
X πkT (Xt−1)(τ2)∣∣∣∣∣
∣∣∣∣∣
2
I2
=||DT ||2L||γε,0ρh−1∗ γ−1
X πkT ||2L
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
Xt−1(τ1)Xt−1(τ2)− γX(τ1, τ2)∣∣∣∣∣
∣∣∣∣∣
2
I2
.
Using the fact that ||γε,0ρh−1∗ γ−1
X πkT ||2L ≤ C/λ2kT
and the assumption that 1/(Tλ4kT
) = o(1) (as
T−1/4∑kT
j=1 αj/λj = o(1)), we can show∑T−1
h=11
Th2
∑MT
s=1 ||∑
t∈BsL(3)4,t,h||2I2 = op(1). Following
the proof of Lemma 6.8 we have∑T−1
h=11h2 ||IT,h − γε,0ρ
h−1∗ γ−1
X πkT ||L = op(1). It is then not
hard to show that∑T−1
h=11
Th2
∑MT
s=1 ||∑
t∈BsL(4)4,t,h||2I2 = op(1). The contribution from L6,t,h is
asymptotically negligible provided that 1T E||
∑Tt=2 εtγε,0ρ
h−1∗ γ−1
X πkT (Xt−1)||I2 = O(1). Finally, let
γX,−h(·) =∑T
j=h+1 < Xj−1, · > Xj−h−1 and note that
T−1∑
h=1
1
Th2
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
L5,t,h
∣∣∣∣∣
∣∣∣∣∣
2
I2
≤T−1∑
h=1
1
Th2||DT γX,−hγ
−1X ||2L
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
εtXt−1
∣∣∣∣∣
∣∣∣∣∣
2
I2
≤||DT ||2L||γ−1X ||2L
T−1∑
h=1
1
Th2||γX,−h||2S
MT∑
s=1
∣∣∣∣∣
∣∣∣∣∣∑
t∈Bs
εtXt−1
∣∣∣∣∣
∣∣∣∣∣
2
I2
= op(1),
where we have used the fact that ||DT ||2L/λ2kT
→ 0, 1T
∑MT
s=1
∣∣∣∣∑t∈Bs
εtXt−1
∣∣∣∣2I2 = Op(1), and
∑T−1h=1
1h2 ||γX,−h||2S = Op(1). ♦
References
[1] Bosq, D. (2000), Linear Process in Function Spaces: Theory and Applications, New York:Springer.
[2] Hormann, S., and Kokoszka, P. (2010), “Weakly Dependent Functional Data,” Annals of Statis-
tistics, 38, 1845-1884.
7
top related