covariance matrix estimation and linear process bootstrap for multivariate...

COVARIANCE MATRIX ESTIMATION AND LINEAR PROCESS

BOOTSTRAP FOR MULTIVARIATE TIME SERIES OF POSSIBLY

INCREASING DIMENSION

CARSTEN JENTSCH AND DIMITRIS N. POLITIS

Abstract. Multivariate time series present many challenges, especially when they are highdimensional. The paper’s focus is twofold. First, we address the subject of consistently esti-mating the autocovariance sequence; this is a sequence of matrices that we conveniently stackinto one huge matrix. We are then able to show consistency of an estimator based on the so-called flat-top tapers; most importantly, the consistency holds true even when the time seriesdimension is allowed to increase with the sample size. Secondly, we revisit the linear processbootstrap (LPB) procedure proposed by McMurry and Politis (Journal of Time Series Analysis,2010) for univariate time series. Based on the aforementioned stacked autocovariance matrixestimator, we are able to define a version of the LPB valid for multivariate time series. Underrather general assumptions, we show that our multivariate linear process bootstrap (MLPB)has asymptotic validity for the sample mean in two important cases: (a) when the time seriesdimension is fixed, and (b) when it is allowed to increase with sample size. As an aside, incase (a) we show that the MLPB works also for spectral density estimators which is a novelresult even in the univariate case. We conclude with a simulation study that demonstrates thesuperiority of the MLPB in some important cases.

1. Introduction

Resampling methods for dependent data such as time series have been studied extensivelyover the last decades. For an overview of existing bootstrap methods see the monograph of Lahiri(2003), and the review papers by Buhlmann (2002), Paparoditis (2002), Hardle, Horowitz andKreiss (2003), Politis (2003a) or the recent review paper by Kreiss and Paparoditis (2011).Among the most popular bootstrap procedures in time series analysis we mention the autore-gressive (AR) sieve bootstrap [cf. Kreiss (1992, 1999), Buhlmann (1997), Kreiss, Paparoditisand Politis (2011)], and block bootstrap and its variations [cf. Kunsch (1989), Liu and Singh(1992), Politis and Romano (1992,1994), etc.]. A recent addition to the available time seriesbootstrap methods was the linear process bootstrap (LPB) introduced by McMurry and Politis(2010) who showed its validity for the sample mean for univariate stationary processes withoutactually assuming linearity of the underlying process.

The main idea of the LPB is to consider the time series data of length n as one large n-dimensional vector and to estimate appropriately the entire covariance structure of this vector.This is executed by using tapered covariance matrix estimators based on flat-top kernels thatwere defined in Politis (2001). The resulting covariance matrix is used to whiten the data by pre-multiplying the original (centered) data with its inverse Cholesky matrix; a modification of theeigenvalues, if necessary, ensures positive definiteness. This decorrelation property is illustratedin Figures 5 and 6 in Jentsch and Politis (2013). After suitable centering and standardizing, thewhitened vector is treated as having independent and identically distributed ( i.i.d.) componentswith zero mean and unit variance. Finally, i.i.d. resampling from this vector and pre-multiplying

Date: February 23, 2014.2000 Mathematics Subject Classification. 62G09; 62M10.Key words and phrases. Asymptotics, bootstrap, covariance matrix, high dimensional data, multivariate time

series, sample mean, spectral density.

1

2 CARSTEN JENTSCH AND DIMITRIS N. POLITIS

the corresponding bootstrap vector of residuals with the Cholesky matrix itself results in abootstrap sample that has (approximately) the same covariance structure as the original timeseries.

Due to the use of flat-top kernels with compact support, an abruptly dying-out autocovari-ance structure is induced to the bootstrap residuals. Therefore, the LPB is particularly suitablefor—but not limited to—time series of moving average (MA) type. In a sense, the LPB couldbe considered the closest analog to an MA-sieve bootstrap which is not practically feasible dueto nonlinearities in the estimation of the MA parameters. A further similarity of the LPB toMA fitting, at least in the univariate case, is the equivalence of computing the Cholesky de-composition of the covariance matrix to the innovations algorithm; cf. Rissanen and Barbosa(1969), Brockwell and Davis (1988), and Brockwell and Mitchell (1997)—the latter addressingthe multivariate case.

Typically, bootstrap methods extend easily from the univariate to the multivariate case,and the same is true for time series bootstrap procedures such as the aforementioned AR- sievebootstrap and the block bootstrap. By contrast, it has not been clear to date if the LPB couldbe applied in the context of multivariate time series data. Here we attempt to fill this gap: weshow how to implement the LPB in a multivariate context, and prove its validity for the samplemean and for spectral density estimators—the latter being a new result even in the univariatecase. Furthermore, in the spirit of the times, we consider the possibility that the time seriesdimension is increasing with sample size, and identify conditions under which the multivariatelinear process bootstrap (MLPB) maintains its asymptotic validity even in this case. The keyhere is to address the subject of consistently estimating the autocovariance sequence; this isa sequence of matrices that we conveniently stack into one huge matrix. We are then able toshow consistency of an estimator based on the aforementioned flat-top tapers; most importantly,the consistency holds true even when the time series dimension is allowed to increase with thesample size.

The paper is organized as follows. In Section 2, we introduce the notation of this paper,discuss tapered covariance matrix estimation for multivariate stationary time series and stateassumptions used throughout the paper; we then present our results on convergence with respectto operator norm of tapered covariance matrix estimators. The MLPB bootstrap algorithm andsome remarks can be found in Section 3 and results concerned with validity of the MLPB for thesample mean and kernel spectral density estimates are summarized in Section 4. Asymptoticresults established for the case of increasing time series dimension are stated in Section 5, whereoperator norm consistency of tapered covariance matrix estimates and a validity result for thesample mean are discussed. A finite-sample simulation study is presented in Section 6. Finally,all proofs are deferred to Section 7.

2. Preliminaries

Suppose we consider an Rd-valued time series process {X t, t ∈ Z} with Xt = (X1,t, . . . ,Xd,t)

T

and we have data X1, . . . ,Xn at hand. The process {X t, t ∈ Z} is assumed to be strictly sta-tionary and its (d × d) autocovariance matrix C(h) = (Cij(h))i,j=1,...,d at lag h ∈ Z is

C(h) = E((X t+h − µ)(X t − µ)T

), (2.1)

where µ = E(X t) and the sample autocovariance C(h) = (Cij(h))i,j=1,...,d at lag |h| < n isdefined by

C(h) =1

n

min(n,n−h)∑

t=max(1,1−h)

(X t+h − X)(X t − X)T , (2.2)

COVARIANCE MATRIX ESTIMATION AND LINEAR PROCESS BOOTSTRAP 3

where X = 1n

∑nt=1 Xt is the d-variate sample mean vector. Here and throughout the paper, all

matrix-valued quantities are written as bold letters, all vector-valued quantities are underlined,

AT indicates the transpose of a matrix A, A the complex conjugate of A and AH = AT

denotes the transposed conjugate of A. Note that it is also possible to use unbiased sampleautocovariances, i.e., having n−|h| instead of n in the denominator of (2.2). Usually the biasedversion as defined in (2.2) is preferred because it guarantees a positive semi-definite estimatedautocovariance function, but our tapered covariance matrix estimator discussed in Section 2.2is adjusted in order to become positive definite in any case.

Now, let X = vec(X) = (X1, . . . ,Xdn)T be the dn-dimensional vectorized version of the(d × n) data matrix X = [X1 : X2 : · · · : Xn] and denote the covariance matrix of X , which issymmetric block Toeplitz, by Γdn, that is,

Γdn =

C(0) C(−1) · · · C(−(n − 1))

C(1) C(0). . .

......

. . .. . . C(−1)

C(n − 1) · · · C(1) C(0)

=

(Γdn(i, j)

i, j = 1, . . . , dn

), (2.3)

where Γdn(i, j) = Cov(Xi,Xj) is the covariance between the ith and jth entry of X . Note thatthe second order stationarity of {Xt, t ∈ Z} does not imply second order stationary behaviorof the vectorized dn-dimensional data sequence X. This means that the covariances Γdn(i, j)truly depend on both i and j and not only on the difference i − j. However, the followingone-to-one correspondence between {Cij(h), h ∈ Z, i, j = 1, . . . , d} and {Γdn(i, j), i, j ∈ Z} holdstrue. Precisely, we have

Γdn(i, j) = Cov(Xi,Xj) = Cov(Xm1(i),m2(i),Xm1(j),m2(j)) = Cm1(i,j)(m2(i, j)), (2.4)

where m1(i, j) = (m1(i),m1(j)) and m2(i, j) = m2(i) − m2(j) with m1(k) = (k − 1)mod d + 1and m2(k) = ⌈k/d⌉ and ⌈x⌉ denotes the smallest integer greater or equal to x ∈ R.

If one is interested in estimating the quantity Γdn, it seems natural to plug in the sample

covariances C(i − j) and Γdn(i, j) = Cm1(i,j)(m2(i, j)) in Γdn and to use

Γdn =

(C(i − j)

i, j = 1, . . . , n

)=

(Γdn(i, j)

i, j = 1, . . . , dn

).

But unfortunately this estimator is not a consistent estimator for Γdn in the sense that the

operator norm of Γdn −Γdn does not converge to zero. This was shown by Wu and Pourahmadi(2009) and to dissolve this problem in the univariate case, they proposed a banded estimator ofthe sample covariance matrix to achieve consistency, which has been generalized by McMurryand Politis (2010), who considered general flat-top kernels as weight functions.

In Section 2.2, we follow the paper of McMurry and Politis (2010) and propose a taperedestimator of Γdn and show its consistency in Theorem 2.1 for the case of multivariate processes.Moreover, we state a modified estimator that is guaranteed to be positive definite for any finitesample size and show its consistency in Theorem 2.2 and of related quantities in Corollary 2.1.But prior to that, we state the assumptions that are used throughout this paper in the following.

2.1. Assumptions.

(A1) {X t, t ∈ Z} is an Rd-valued strictly stationary time series process with mean E(X t) = µ

and autocovariances C(h) defined in (2.1) such that∑∞

h=−∞ |h|g|C(h)|1 < ∞ for some

g ≥ 0 to be further specified, where |A|p = (∑

i,j |aij |p)1/p for some matrix A = (aij).


(A2) There exists a constant M < ∞ such that for all n ∈ N, all h with |h| < n and alli, j = 1, . . . , d, we have∥∥∥∥∥

n∑

t=1

(Xi,t+h − Xi)(Xj,t − Xj) − nCij(h)

∥∥∥∥∥2

≤ M√

n.

where ‖A‖p = (E(|A|pp))1/p.(A3) There exists an n0 ∈ N large enough such that for all n ≥ n0 the eigenvalues λ1, . . . , λdn

of the (dn × dn) covariance matrix Γdn are bounded uniformly away from zero.(A4) Define the projection operator Pk(X) = E(X |Fk) − E(X |Fk−1) for Fk = σ(X t, t ≤ k)

and suppose that for all i = 1, . . . , d, we have∑∞

m=0 ‖P0Xi,m‖q < ∞ and ‖X i − µi‖q =

O( 1√n), respectively, for some q ≥ 2 to be further specified.

(A5) For the sample mean, a CLT holds true. That is, we have√

n(X − µ)D−→ N (0,V),

whereD−→ denotes convergence in distribution, N (0,V) is a normal distribution with

zero mean vector and covariance matrix V =∑∞

h=−∞ C(h) with V positive definite.

(A6) For kernel spectral density estimates fjk(ω) as defined in (4.2) in Section 4, a CLT holdstrue. That is, for arbitrary frequencies 0 ≤ ω1, . . . , ωs ≤ π, we have that

√nb(fpq(ωl) − fpq(ωl) : p, q = 1, . . . , d; l = 1, . . . , s

)

converges to an sd2-dimensional normal distribution for b → 0 and nb → ∞ such thatnb5 = O(1) as n → ∞, where the limiting covariance matrix is obtained from

nbCov(fpq(ω), frs(λ)

)=(fpr(ω)fqs(ω)δω,λ + fps(ω)fqr(ω)τ0,π

) 1

2π

∫K2(u)du + o(1)

and the limiting bias from

E(fpq(ω)

)− fpq(ω) = b2f ′′

pq(ω)1

4π

∫K(u)u2du + o

(b2)

for all p, q, r, s = 1, . . . , d, where δω,λ = 1 if ω = λ and τ0,π = 1 if ω = λ ∈ {0, π}and zero otherwise, respectively. Therefore, f(ω) is assumed to be component-wise twicedifferentiable with Lipschitz-continuous second derivatives.

Assumption (A1) is quite standard and the uniform convergence of sample autocovariancesin (A2) is satisfied under different types of conditions [cf. Remark 2.1 below] and appears to bea crucial condition here. The uniform boundedness of all eigenvalues away from zero in (A3) isimplied by a non-singular spectral density matrix f of (X t, t ∈ Z). This follows with (2.3) andthe inversion formula from

cT Γdnc = cT

(∫ π

−πJT

ω f(ω)Jωdω

)c ≥ 2π|c|22 inf

ωλmin(f(ω))

for all c ∈ Rdn, where Jω = (e−i1ω, . . . , e−inω) ⊗ Id and ⊗ denotes the Kronecker product.

The requirement of condition (A3) fits into the theory for the univariate autoregressive sievebootstrap as obtained in Kreiss, Paparoditis and Politis (2011). Similarly, a non-singular spectraldensity matrix f implies positive definiteness of the long-run variance V = 2πf(0) defined in(A5). Assumption (A4) is for instance fulfilled, if the underlying process is linear or α-mixingwith summable mixing coefficients by Ibragimov’s inequality [cf. e.g. Davidson (1994), Theorem14.2]. To achieve validity of the MLPB for the sample mean and for kernel spectral densityestimates in Section 4, we have to assume unconditional CLTs in (A5) and (A6), which aresatisfied also under certain mixing conditions [cf. Doukhan (1994), Brillinger (1981)], linearity


[cf. Brockwell and Davis (1989), Hannan (1970)] or weak dependence [cf. Dedecker et al. (2007)].Note also that the condition nb5 = O(1) includes the optimal bandwidth choice nb5 → C2, C > 0for second-order kernels, which leads to a non-vanishing bias in the limiting normal distribution.

Remark 2.1. Assumption (A2) is implied by different types of conditions imposed on the un-derlying process {X t, t ∈ Z}. We present sufficient conditions for (A2) under assumed linearityand under mixing- and weak dependence type conditions. More precisely, (A2) is satisfied if theprocess {X t, t ∈ Z} fulfills one of the following conditions:

(i) Linearity: Suppose the underlying process is linear, i.e. X t =∑∞

k=−∞ Bket−k, t ∈ Z,where {et, t ∈ Z} is an i.i.d. white noise with finite fourth moments E(ei,tej,tek,tel,t) < ∞for all i, j, k, l = 1, . . . , d and the sequence of (d × d) coefficient matrices {Bk, k ∈ Z} iscomponent-wise absolutely summable.

(ii) Mixing-type condition: Let cuma1,...,ak(u1, . . . , uk−1) = cum(Xa1,u1

, . . . ,Xak−1,uk−1,Xak ,0)

denote the kth order joint cumulant of Xa1,u1, . . . ,Xak−1,uk−1

,Xak ,0 [cf. Brillinger (1981)]and suppose

∑∞s,h=−∞ |cumi,j,i,j(s + h, s, h)| < ∞ for all i, j = 1, . . . , d. Note that this is

satisfied if {X t, t ∈ Z} is α-mixing such that E(|X1|4+δ2 ) < ∞ and

∑∞j=1 j2α(j)δ/(4+δ) <

∞ for some δ > 0 [cf. Shao (2010), p.221].(iii) Weak dependence-type condition: Suppose for all i, j = 1, . . . , d, we have

|Cov((Xi,t+h − µi)(Xj,t − µj), (Xi,t+s+h − µi)(Xj,t+s − µj))| ≤ const · νs,h,

where (νs,h, s, h ∈ Z) is an absolutely summable sequence in both arguments, that is,∑∞s,h=−∞ |νs,h| < ∞ [cf. Dedecker et al. (2007)].

2.2. Tapered covariance matrix estimation of multiple time series data.To adopt the technique of McMurry and Politis (2010), let

κ(x) =

1, |x| ≤ 1

0, |x| > cκ

g(|x|), otherwise

(2.5)

be a so-called flat-top taper [cf. Politis (2001)], where |g(x)| < 1 and cκ ≥ 1. The l-scaledversion of κ(·) is defined by κl(x) = κ

(xl

). As Politis (2011) argues, it is advantageous to

having a smooth taper κ(x), so the truncated kernel that corresponds to g(x) = 0 for all x isnot recommended. The simplest example of a continuous taper function κ with cκ > 1 is thetrapezoid

κ(x) =

1, |x| ≤ 1

2 − |x|, 1 < |x| ≤ 2

0, |x| > 2

(2.6)

which is used in Section 6 for the simulation study; the trapezoidal taper was first proposedby Politis and Romano (1995) in a spectral estimation setup. Observe also that the banding

parameter l > 0 does not need to be an integer. The tapered estimator Γκ,l of Γdn is given by

Γκ,l =

(κl(i − j)C(i − j)

i, j = 1, . . . , n

)=

(Γκ,l(i, j)

i, j = 1, . . . , dn

), (2.7)

where Γκ,l(i, j) = Cκ,lm

1(i,j) (m2(i, j)) and Cκ,l

i,j (h) = κl(h)Ci,j(h).

The following Theorem 2.1 deals with consistency of the tapered estimator Γκ,l with respectto operator norm convergence. It extends Theorem 1 in McMurry and Politis (2010) to themultivariate case and does not rely on the concept of physical dependence only. The operator


norm of a complex-valued (d × d) matrix A is defined by

ρ(A) = maxx∈Cd:|x|2=1

|Ax|2,

and it is well known that ρ2(A) = λmax(AHA) = λmax(AAH), where λmax(B) denotes the

largest eigenvalue of a matrix B [cf. Horn and Johnson (1990), p.296].

Theorem 2.1. Suppose that assumptions (A1) with g = 0 and (A2) are satisfied. Then, it holds

‖ρ(Γκ,l − Γdn)‖2 ≤ 4Md2(⌊cκl⌋ + 1)√n

+ 2

⌊cκl⌋∑

h=0

|h|n|C(h)|1 + 2

n−1∑

h=l+1

|C(h)|1. (2.8)

The second term on the right-hand side of (2.8) can be written as 2 ⌊cκl⌋n

∑⌊cκl⌋h=0

|h|⌊cκl⌋ |C(h)|1

and vanishes asymptotically due to the Kronecker-Lemma for any choice of l and is of ordero(l/n). The third one converges to zero for l = l(n) → ∞ as n → ∞ and the leading first termfor l = o(

√n). Hence, for 1/l+ l/

√n = o(1) the right-hand side of (2.8) vanishes asymptotically.

However, if |C(h)|1 = 0 for |h| > h0 for some h0 ∈ N, setting l = h0 fixed suffices. In this case,

the expression on the right-hand side of (2.8) is of faster order O(n−1/2).

As already pointed out by McMurry and Politis (2010), the tapered estimator Γκ,l is notguaranteed to be positive semi-definite or even to be positive definite for finite sample sizes.

However, Γκ,l is at least “asymptotically positive definite” under assumption (A3) and due to(2.8) if 1/l+l/

√n = o(1) holds. In the following, we require a consistent estimator for Γdn which

is positive definite for all finite sample sizes to be able to compute its Cholesky decompositionfor the linear process bootstrap scheme that will be introduced in Section 3 below.

To obtain an estimator of Γdn related to Γκ,l that is assured to be positive definite for all

sample sizes, we construct a modified estimator Γǫκ,l in the following. Let V = diag(Γdn) be

the diagonal matrix of sample variances and define Rκ,l = V−1/2Γκ,lV−1/2. Now, we consider

the spectral factorization Rκ,l = SDST , where S is an (dn × dn) orthogonal matrix and D =

diag(r1, . . . , rdn) is the diagonal matrix containing the eigenvalues of Rκ,l such that r1 ≥ r2 ≥· · · ≥ rdn. It is worth noting that this factorization always exists due to symmetry of Rκ,l, butthat the eigenvalues can be positive, zero or even negative. Now, define

Γǫκ,l = V1/2Rǫ

κ,lV1/2 = V1/2SDǫST V1/2, (2.9)

where Dǫ = diag(rǫ1, . . . , r

ǫdn) and rǫ

i = max(ri, ǫn−β). Here, β > 1/2 and ǫ > 0 are user defined

constants that ensure the positive definiteness of Γǫκ,l. Contrary to the univariate case discussed

in McMurry and Politis (2010), we propose to adjust the eigenvalues of the (equivariant) cor-

relation matrix Rκ,l instead of Γκ,l, which then comes along without a scaling factor in thedefinition of rǫ

i . Further, note that setting ǫ = 0 leads to a positive semi-definite estimate if

Γκ,l is indefinite, which does not suffice for computing the Cholesky decomposition, and also

that Γǫκ,l generally loses the banded shape of Γκ,l. Theorem 2.2 below which extends Theorem

3 in McMurry and Politis (2010), shows that the modification of the eigenvalues does affect theconvergence results obtained in Theorem 2.1 just slightly.

Theorem 2.2. Under the assumptions of Theorem 2.1, it holds

‖ρ(Γǫκ,l − Γdn)‖2 ≤ 8Md2(⌊cκl⌋ + 1)√

n+ 4

⌊cκl⌋∑

h=0

|h|n|C(h)|1 + 4

n−1∑

h=l+1

|C(h)|1,

+ǫ maxi Cii(0)n−β + O

(1

n1/2+β

). (2.10)


In comparison to the upper bound established in Theorem 2.1, two more terms appear onthe right-hand side of (2.10) which do converge as well to zero as n tends to infinity. Note thatthe first three summands that the right-hand sides of (2.8) and (2.10) have in common, remainthe leading terms if β > 1

2 .

We also need convergence and boundedness in operator norm of quantities related to Γǫκ,l.

The required results are summarized in the following corollary.

Corollary 2.1. Under assumptions (A1) with g = 0, (A2) and (A3), we have

(i) ρ(Γǫκ,l − Γdn) and ρ((Γǫ

κ,l)−1 − Γ−1

dn ) are terms of order OP (rl,n), where

rl,n =l√n

+∞∑

h=l+1

|C(h)|1, (2.11)

and rl,n = o(1) if 1/l + l/√

n = o(1).

(ii) ρ((Γǫκ,l)

1/2−Γ1/2dn ) and ρ((Γǫ

κ,l)−1/2−Γ

−1/2dn ) are of order OP (log2(n)rl,n) and log2(n)rl,n =

o(1) if 1/l + log2(n)l/√

n = o(1) and (A1) holds for some g > 0.

(iii) ρ(Γdn), ρ(Γ−1dn ), ρ(Γ

−1/2dn ), ρ(Γ

1/2dn ) are bounded from above and below. ρ(Γǫ

κ,l), ρ((Γǫκ,l)

−1)

and ρ((Γǫκ,l)

−1/2), ρ((Γǫκ,l)

1/2) are bounded from above and below (in probability) if rl,n =

o(1) and log2(n)rl,n = o(1), respectively.

Remark 2.2. In Section 2.2, we propose to use a global banding parameter l that down-weightsthe autocovariance matrices for increasing lag, i.e. the entire matrix C(h) is multiplied with thesame κl(h) in (2.7). However, it is possible to use individual banding parameters lpq for eachsequence of entries {Cpq(h), h ∈ Z}, p, q = 1, . . . , d as proposed in Politis (2011).

2.3. Selection of tuning parameters.

To get a tapered estimate Γκ,l of the covariance matrix Γdn some parameters have to be chosenby the practitioner. These are the flat-top taper κ and the banding parameter l, which are both

responsible for the down-weighting of the empirical autocovariances C(h) with increasing lag h.To select a suitable taper κ from the class of functions (2.5), we have to select cκ ≥ 1 and

the function g which determine the range of the decay of κ to zero for |x| > 1 and its form overthis range, respectively. For some examples of flat-top tapers, compare Politis (2003b, 2011).However, the selection of the banding parameter l appears to be more crucial than choosingthe tapering function κ among the family of well-behaved flat-top kernels as discussed in Politis(2011). This is comparable to nonparametric kernel estimation where usually the bandwidthplays a more important role than the shape of the kernel.

We focus on providing an empirical rule for banding parameter selection that has alreadybeen used in McMurry and Politis (2010) for the univariate LPB and which has been generalizedto the multivariate case in Politis (2011). In the following, we adopt this technique based onthe correlogram/cross-correlogram [cf. Politis (2011, Section 6)] for our purposes. Let

Rjk(h) =Cjk(h)√

Cjj(0)Ckk(0), j, k = 1, . . . , d (2.12)

be the sample (cross-)correlation between the two univariate time series (Xj,t, t ∈ Z) and(Xk,t, t ∈ Z) at lag h ∈ Z. Now, define qjk as the smallest nonnegative integer such that

|Rjk(qjk + h)| < M0

√log10(n)/n

for h = 1, . . . ,Kn, where M0 > 0 is a fixed constant, and Kn is a positive, nondecreasing integer-valued function of n such that Kn = o(log(n)). Note that the constant M0 and the form of Kn

are the practitioner’s choice. As a rule of thumb, we refer to Politis (2003b, 2011) who makes


the concrete recommendation M0 ≃ 2 and Kn = max(5,√

log10(n)). After having computed qjk

for all j, k = 1, . . . , d, we take

l = maxj,k=1,...,d

qjk (2.13)

as a data-driven global choice of the banding parameter l. By setting ljk = qjk, we get data-drivenindividual banding parameter choices as discussed in Remark 2.2. For theoretical justificationof this empirical selection of a global cut-off point as the maximum over individual choices andassumptions that lead to successful adaptation, we refer to Theorem 6.1 in Politis (2011).

Note also that for positive definite covariance matrix estimation, i.e. for computing Γǫκ,l,

one has to select two more parameters ǫ and β, which have to be nonnegative and might be setequal to one as suggested in McMurry and Politis (2010).

3. The multivariate linear process bootstrap procedure

In this section, we describe the multivariate linear process bootstrap (MLPB) in detail dis-cuss some modifications and comment on the special case where the tapered covariance estimatorbecomes diagonal.

Step 1. Let X be the (d×n) data matrix consisting of Rd-valued time series data X1, . . . ,Xn of

sample size n. Compute the centered observations Y t = X t−X, where X = 1n

∑nt=1 X t,

let Y be the corresponding (d×n) matrix of centered observations and define Y = vec(Y)to be the dn-dimensional vectorized version of Y.

Step 2. Compute W = (Γǫκ,l)

−1/2Y , where (Γǫκ,l)

1/2 denotes the lower left triangular matrix L

of the Cholesky decomposition Γǫκ,l = LLT .

Step 3. Let Z be the standardized version of W , that is, Zi = Wi−WσW

, i = 1, . . . , dn, where

W = 1dn

∑dnt=1 Wt and σ2

W = 1dn

∑dnt=1(Wt − W )2.

Step 4. Generate Z∗ = (Z∗1 , . . . , Z∗

dn)T by i.i.d. resampling from {Z1, . . . , Zdn}.Step 5. Compute Y ∗ = (Γǫ

κ,l)1/2Z∗ and let Y∗ be the matrix that is obtained from Y ∗ by putting

this vector column-wise into an (d × n) matrix and denote its columns by Y ∗1, . . . , Y

∗n.

Regarding the Steps 3 and 4 above and due to the multivariate nature of the data, it appearsto be even more natural to split the dn-dimensional vector Z in Step 3 above in n sub-vectors,to center and standardize them and to apply i.i.d. resampling to these vectors to get Z∗. Moreprecisely, Steps 3 and 4 can be replaced by

Step 3′. Let Z = (ZT1 , . . . , ZT

n )T be the standardized version of W = (W T1 , . . . ,W T

n )T , that is,

Zi = Σ−1/2W (W i −W ), where W = 1

n

∑nt=1 W t and ΣW = 1

n

∑nt=1(W t −W )(W t −W )T .

Step 4′. Generate Z∗ = (Z∗1, . . . , Z

∗n)T by i.i.d. resampling from {Z1, . . . , Zn}.

This might preserve more higher order features of the data that are not captured by Γǫκ,l.

However, comparative simulations (not reported in the paper) indicate that the finite sampleperformance is only slightly affected by this sub-vector resampling.

Remark 3.1. If 0 < l < 1cκ

, the banded covariance matrix estimator Γκ,l (and Γǫκ,l as well)

becomes diagonal. In this case and if Steps 3′ and Steps 4′ are used, the LPB as described aboveis equivalent to the classical i.i.d. bootstrap. Here, note the similarity to the autoregressive sievebootstrap which boils down to an i.i.d. bootstrap if the autoregressive order is p = 0.

4. Bootstrap consistency for fixed time series dimension

4.1. Sample mean.In this section, we establish validity of the MLPB for the sample mean. The following theorem


generalizes Theorem 5 of McMurry and Politis (2010) to the multivariate case under somewhatmore general conditions.

Theorem 4.1. Under assumptions (A1) for some g > 0, (A2), (A3), (A4) for q = 4, (A5) and1/l + log2(n)l/

√n = o(1), the MLPB is asymptotically valid for the sample mean X, that is,

supx∈Rd

∣∣∣P{√

n(X − µ

)≤ x

}− P ∗

{√n Y

∗ ≤ x}∣∣∣ = oP (1)

and V ar∗(√

n Y∗)

=∑∞

h=−∞ C(h)+ oP (1), where Y∗

= 1n

∑nt=1 Y ∗

t . The short-hand x ≤ y for

x, y ∈ Rd is used to denote xi ≤ yi for all i = 1, . . . , d.

4.2. Kernel spectral density estimates.Here we prove consistency of the MLPB for kernel spectral density matrix estimators; this resultis novel even in the univariate case. Let In(ω) = Jn(ω)JH

n (ω) the periodogram matrix, where

Jn(ω) =1√2πn

n∑

t=1

Y te−itω (4.1)

is the discrete Fourier transform (DFT) of Y 1, . . . , Y n, Y t = X t − X. We define the estimator

f(ω) =1

n

⌊n2⌋∑

k=−⌊n−1

2⌋

Kb(ω − ωk)In(ωk) (4.2)

for the spectral density matrix f(ω), where ⌊x⌋ is the integer part of x ∈ R, ωk = 2π kn , k =

−⌊n−12 ⌋, . . . , ⌊n

2 ⌋ are the Fourier frequencies, b is the bandwidth and K is a symmetric and

square integrable kernel function K(·) that satisfies∫

K(x)dx = 2π and∫

K(u)u2du < ∞ and

we set Kb(·) = 1bK( ·

b). Let I∗n(ω) be the bootstrap analogue of In(ω) based on Y ∗1, . . . , Y

∗n

generated from the MLPB scheme and let f∗(ω) be the bootstrap analogue of f(ω).

Theorem 4.2. Suppose assumptions (A1) with g ≥ 0 specified below, (A2), (A3), (A4) forq = 8 and (A6) are satisfied. If b → 0 and nb → ∞ such that nb5 = O(1) as well as 1/l +√

bl log2(n) +√

nb log2(n)/lg and 1/k + bk4 +√

nb log2(n)/kg for some sequence k = k(n), the

MLPB is asymptotically valid for kernel spectral density estimates f(ω). That is, for all s ∈ N

and arbitrary frequencies 0 ≤ ω1, . . . , ωs ≤ π (not necessarily Fourier frequencies), it holds

supx∈Rd2s

∣∣∣P{(√

nb(fpq(ωl) − fpq(ωl)) : p, q = 1, . . . , d; l = 1, . . . , s)≤ x

}

−P ∗{(√

nb(f∗pq(ωl) − fpq(ωl)) : p, q = 1, . . . , d; l = 1, . . . , s

)≤ x

}∣∣∣ = oP (1),

where fpq(ω) = 12π

∑n−1h=−(n−1) κl(h)Cpq(h)e−ihω and, in particular,

nbCov∗(f∗

pq(ω), f∗rs(λ)

)=(fpr(ω)fqs(ω)δω,λ + fps(ω)fqr(ω)τ0,π

) 1

2π

∫K2(u)du + oP (1),

and E∗(f∗pq(ω)) − fpq(ω) = b2f ′′

pq(ω) 14π

∫K(u)u2du + pP (b2), for all p, q, r, s = 1, . . . , d and all

ω, λ ∈ [0, π], respectively.

4.3. Other statistics and LPB-of-blocks bootstrap.For statistics Tn contained in the broad class of functions of generalized means, Jentsch andPolitis (2013) discussed how by using a preliminary blocking scheme tailor-made for a specific


statistic of interest, the MLPB can be shown to be consistent. This class of statistics containsestimates Tn of w(ϑ) with ϑ = E(g(X t, . . . ,Xt+m−1)) such that

Tn = w

{1

n − m + 1

n−m+1∑

t=1

g(X t, . . . ,Xt+m−1)

},

for some sufficiently smooth functions g : Rd×m → R

k, w : Rk → R and fixed m ∈ N. They

propose to block the data first according to the known function g and to apply then the (M)LPBto the blocked data. More precisely, the multivariate LPB-of-blocks bootstrap is as follows:

Step 1. Define X t := g(X t, . . . ,Xt+m−1) and let X1, . . . , Xn−m+1 be the set of blocked data.

Step 2. Apply the MLPB scheme of Section 3 to the k-dimensional blocked data X1, . . . , Xn−m+1

to get bootstrap observations X∗1, . . . , X

∗n−m+1.

Step 3. Compute T ∗n = w{(n − m + 1)−1

∑n−m+1t=1 X

∗t }.

Step 4. Repeat Steps 2 and 3 B-times, where B is large, and approximate the unknown distribu-tion of

√n{Tn−w(ϑ)} by the empirical distribution of

√n{T ∗

n,1−Tn}, . . . ,√

n{T ∗n,B−Tn}.

The validity of the multivariate LPB-of-blocks bootstrap for some statistic Tn can be verified

by checking the assumptions of Theorem 4.1 for the sample mean of the new process {X t, t ∈ Z}.

5. Asymptotic results for increasing time series dimension

In this section, we consider the case when the time series dimension d is allowed to increasewith the sample size n, i.e. d = d(n) → ∞ as n → ∞. In particular, we show consistency oftapered covariance matrix estimates and derive rates that allow for an asymptotic validity resultof the MLPB for the sample mean in this case.

The recent paper by Cai, Ren and Zhou (2013) gives a thorough discussion of the estimationof Toeplitz covariance matrices for univariate time series. In their setup, that covers also thepossibility of having multiple datasets from the same data generating process, Cai, Ren andZhou (2013) establish the optimal rates of convergence using the two simple flat-top kernelsdiscussed in Section 2.2, namely the truncated (i.e., case of pure banding—no tapering), andthe trapezoid taper. When the strength of dependence is quantified via a smoothness conditionon the spectral density, they show that the trapezoid is superior to the truncated taper, thusconfirming the intuitive recommendations of Politis (2011). The asymptotic theory of Cai, Renand Zhou (2013) allows for increasing number of time series and increasing sample size, but theirframework does not contain the multivariate time series case neither for fixed nor for increasingtime series dimension, which will be discussed in this section.

Note that Theorem 1 in McMurry and Politis (2010) for the univariate case, as well as ourTheorem 2.1 for the multivariate case of fixed time series dimension, give upper bounds thatare quite sharp, coming within a log-term to the (Gaussian) optimal rate found in Theorem 2of Cai, Ren and Zhou (2013).

Instead of the assumption (A1)–(A5) that have been introduced in Section 2.1 and usedin Theorem 4.1 to obtain bootstrap consistency for the sample mean for fixed dimension d, we

impose the following conditions on the sequence of time series process ({X(n)t , t ∈ Z})n∈N of now

increasing dimension.

5.1. Assumptions.

(A1′) ({X t = (X1,t, . . . ,Xd(n),t)T , t ∈ Z})n∈N is a sequence of R

d(n)-valued strictly stationarytime series processes with mean vectors E(X t) = µ = (µ1, . . . , µd(j)) and autocovariancesC(h) = (Cij(h))i,j=1,...,d(n) defined as in (2.1). Here, (d(n))n∈N is a non-decreasing


sequence of positive integers such that d(n) → ∞ as n → ∞ and, further, suppose

∞∑

h=−∞

{supn∈N

supi,j=1,...,d(n)

|h|g|Cij(h)|}

< ∞

for some g ≥ 0 to be further specified.(A2′) There exists a constant M ′ < ∞ such that for all n ∈ N and all h with |h| < n, we have

supi,j=1,...,d(n)

∥∥∥∥∥

n∑

t=1

(Xi,t+h − Xi)(Xj,t − Xj) − nCij(h)

∥∥∥∥∥2

≤ M ′√n.

(A3′) There exists an n0 ∈ N large enough such that for all n ≥ n0 and all d ≥ d0 = d(n0) theeigenvalues λ1, . . . , λdn of the (dn × dn) covariance matrix Γdn are bounded uniformlyaway from zero and from above.

(A4′) Define the sequence of projection operators P(n)k (X) = E(X |F (n)

k ) − E(X |F (n)k−1) for

F (n)k = σ(X t, t ≤ k) and suppose

∞∑

m=0

{supn∈N

supi=1,...,d(n)

‖P (n)0 Xi,m‖4

}< ∞ and sup

n∈N

supi=1,...,d(n)

‖X i − µi‖4 = O(n−1/2).

(A5′) For the sample mean, a Cramer-Wold-type CLT holds true. That is, for any real-valuedsequence b = b(d(n)) of d(n)-dimensional vectors with 0 < M1 ≤ |b(d(n))|22 ≤ M2 < ∞for all n ∈ N and v2 = v2

d(n) = V ar(√

n(bT(X − µ

))), we have

√n(bT(X − µ

))/v

D−→ N (0, 1).

The assumptions (A1′)–(A4′) are uniform analogues of (A1)–(A4), which are required hereto tackle the increasing time series dimension d. In particular, (A1′) implies

∞∑

h=−∞|C(h)|1 = O(d2). (5.1)

Observe also that the autocovariances Cij(h) are assumed to decay with increasing lag h, i.e. intime direction, but they are not assumed to decay with increasing |i − j|, i.e., with respect toincreasing time series dimension. Therefore, we have to make use of square summable sequencesin (A5′) to get a CLT result. This techniques has been used e.g. by Lewis and Reinsel (1985) andGoncalves and Kilian (2007) to establish central limit results for the estimation of an increasingnumber of autoregressive coefficients.

5.2. Operator norm convergence for increasing time series dimension.The following theorem generalizes the results of Theorems 2.1 and 2.2 and of Corollary 2.1 tothe case where d = d(n) is allowed to increase with the sample size. In contrast to the case of astationary spatial process on the plane Z

2 (where a data matrix is observed that grows in bothdirections asymptotically as in our setting), we do not assume that the autocovariance matrixdecays in all directions. Therefore, to be able to establish a meaningful theory, we have toreplace (A1)–(A5) by the uniform analogues (A1′)–(A5′) and due to (5.1), an additional factord2 turns up in the convergence rate and has to be taken into account.

Theorem 5.1. Under assumptions (A1′) with g ≥ 0 specified below, (A2′) and (A3′), we have


(i) ρ(Γǫκ,l − Γdn) and ρ((Γǫ

κ,l)−1 − Γ−1

dn ) are terms of order OP (d2rl,n), where

rl,n =l√n

+

∞∑

h=l+1

{supn∈N

supi,j=1,...,d(n)

|Cij(h)|}

, (5.2)

and d2rl,n = o(1) if 1/l + d2l/√

n + d2/lg = o(1).

(ii) ρ((Γǫκ,l)

1/2 − Γ1/2dn ) and ρ((Γǫ

κ,l)−1/2 − Γ

−1/2dn ) are both of order OP (log2(dn)d2rl,n) and

log2(dn)d2rl,n = o(1) if 1/l + log2(dn)d2l/√

n + log2(dn)d2/lg = o(1).

(iii) ρ(Γdn), ρ(Γ−1dn ), ρ(Γ

−1/2dn ) and ρ(Γ

1/2dn ) are bounded from above and below. ρ(Γǫ

κ,l) and

ρ((Γǫκ,l)

−1) as well as ρ((Γǫκ,l)

−1/2) and ρ((Γǫκ,l)

1/2) are bounded from above and below

in probability if d2rl,n = o(1) and log2(dn)d2rl,n = o(1), respectively.

The required rates for the banding parameter l and the time series dimension d to get

operator norm consistency ρ(Γǫκ,l − Γdn) = oP (1) can be interpreted nicely. If g is chosen to

be large enough, d2l/√

n becomes the leading term and there is a trade-off between capturingmore dependence of the time series in time direction (large l) and growing dimension of the timeseries in cross-sectional direction (large d).

5.3. Bootstrap validity for increasing time series dimension.The subsequent theorem is a Cramer-Wold-type generalization of Theorem 4.1 to the case whered = d(n) is allowed to grow at an appropriate rate with the sample size. To tackle the increasingtime series dimension and to prove such a CLT result, we have to make use of appropriatesequences of square summable vectors b = b(d(n)) as described in (A5′) above.

Theorem 5.2. Under assumptions (A1′) with g ≥ 0 specified below, (A2′), (A3′), (A4′) forq = 4, (A5′) as well as 1/l + log2(dn)d2l/

√n + log2(dn)d2/lg = o(1) and 1/k + d5k4/n +

log2(dn)d2/kg = o(1) for some sequence k = k(n), the MLPB is asymptotically valid for thesample mean X. That is, for any real-valued sequence b = b(d(n)) of d(n)-dimensional vectors

with 0 < M1 ≤ |b(d(n))|22 ≤ M2 < ∞ for all n ∈ N and v2 = v2d(n) = V ar∗(

√n(bT Y

∗)), we have

supx∈R

∣∣∣P{√

n(bT(X − µ

))/v ≤ x

}− P ∗

{√n(bT Y

∗)/v ≤ x

}∣∣∣ = oP (1)

and |v2 − v2| = oP (1).

In practice, the computational requirements can become very demanding for large d and n.In this case, we suggest to split the data vector X in few subsamples X(1), . . . ,X(S), say, with

X(i) = (XTni−1+1, . . . ,X

Tni−1+ni

)T such that∑S

i=1 ni = n and to apply the MLPB scheme to each

subsample separately. Here, averages over the matrices obtained from the (lower-dimensional)subsamples are used to regain efficiency and to improve computation time. This doing can bejustified by the fact that dependence structure is distorted only very few times.

6. Simulations

In this section we compare systematically the performance of the multivariate linear processbootstrap (MLPB) to that of the vector-autoregressive sieve bootstrap (ARsieve), the movingblock bootstrap (MBB) and the tapered block bootstrap (TBB) by means of simulation. In orderto make such a comparison, we have chosen a statistic for which all methods lead to asymptoti-cally correct approximations. In particular, we are interested in the unknown distribution of thesample mean of bivariate time series data. We compare the aforementioned bootstrap methodsby plotting

a) root mean squared errors (RMSE) for estimating the covariance matrix of√

n(X − µ)b) coverage rates (CR) of bootstrap confidence intervals for both components of µ


for three models and three sample sizes for different choices of corresponding tuning parameters.These are the banding parameter l (MLPB), the autoregressive order p (ARsieve) and the blocklength s (MBB, TBB). For the MLPB, we report also RMSE and CR for data-adaptively chosenglobal and individual banding parameters as discussed in Section 2.3. The R code is availableat http://www.math.ucsd.edu/∼politis/SOFT/function_MLPB.R.

For each case, we have generated T = 500 time series and B = 500 bootstrap replicationshave been used in each step. For a), the exact covariance matrix of

√n(X − µ) is estimated

by 20, 000 Monte Carlo replications. Further, we use the trapezoidal kernel defined in (2.6) totaper the sample covariance matrix for the MLPB and the blocks for the TBB. To correct the

covariance matrix estimator Γκ,l to be positive definite, if necessary, we set ǫ = 1 and β = 1

to get Γǫκ,l. This choice has already been used by McMurry and Politis (2010) and simulation

results (not reported in this paper) indicate that the performance of the MLPB reacts onlyslightly to this choice. We have used the sub-vector resampling scheme, i.e. Steps 3’ and 4’.

6.1. The data generating processes (DGP).We consider realizations X1, . . . ,Xn of length n = 100, 200, 500 from three bivariate models.Precisely, we study an i.i.d. white noise process

WN model Xt = et,

a first order vector moving average process

VMA(1) model X t = Aet−1 + et

and a first order vector autoregressive process

VAR(1) model Xt = AXt−1 + et,

where et ∼ N (0, I2) is a normally distributed i.i.d. white noise process and

Σ =

(1 0.5

0.5 1

)and A =

(0.9 −0.40 0.5

)

have been used in all cases. It is worth noting, that (asymptotically) all bootstrap proceduresunder consideration yield valid approximations for all three models above. In contrast to thei.i.d. case, where all proposed techniques remain valid for all choices of sufficiently small tuningparameters, this is not true for the other DGPs. For the VMA(1) model, MLPB is valid for all(sufficiently small) choices of banding parameters l ≥ 1, but ARsieve is valid only asymptoticallyfor p = p(n) tending to infinity at an appropriate rate with increasing sample size n. Thisrelationship of MLPB and ARsieve is reversed for the VAR(1) model. For the MBB and theTBB, the block length has to increase with the sample size for the VMA(1) and VAR(1) models.

6.2. Simulation results.In Figures 1–3, by plotting RMSE and CR for different choices of tuning parameters, the MLPBis compared to ARsieve, MBB and TBB for the three models under investigation, respectively.In each figure, the upper half of panels corresponds to the estimation of V ar(

√n(X1 − µ1))

(first row, RMSE) and µ1 (second row, CR) and the second half of panels to V ar(√

n(X2 −µ2))(third row, RMSE) and µ2 (fourth row, CR).

6.2.1. WN model.In Figure 1, the data is i.i.d. and in fact a bootstrap for dependent data is redundant to captureany dependence structure. However, we compare MLPB, ARsieve, MBB and TBB for l, p, s =1, 2, . . . , 20, i.e. the data generating process is over-fitted and hence slightly misspecified by

MLPB and ARsieve. Observe that Γǫκ,l computed for the MLPB becomes block-diagonal for

l < 0.5 and the scheme degenerates to become an i.i.d. bootstrap as is true for the ARsieve ifp = 0 [compare Remark 3.1], which would be of course most appropriate in this case, but are not


reported here. This is in contrast to MBB and TBB, which simplify to become Efron’s bootstrapfor s = 1. However, the MLPB with data-adaptively selected banding parameters performs verywell. In Figure 1, it can be seen that wrt to RMSE and CR, all bootstrap procedures lose interms of efficiency with redundantly increasing tuning parameters.

6.2.2. VMA(1) model.For data generated by the VMA(1) model, Figure 2 shows that the MLPB outperforms ARsieve, MBB and TBB for adequate tuning parameter choice, that is, l ≈ 1. In this case, theMLPB behaves generally superior wrt RMSE and CR to the other bootstrap methods for alltuning parameter choices of p and s. This was not unexpected since, by design, the MLPB canapproximate very efficiently the covariance structure of moving average processes. Nevertheless,due to the fact that all proposed bootstrap schemes are valid at least asymptotically, ARsievegets rid of its bias with increasing order p, but at the expense of increasing variability andconsequently also increasing RMSE. However, the performance wrt to CR is comparable toMLPB already for small AR orders p. The MLPB with data-adaptively chosen l performs quitewell, where the individual choice tends to performs superior to the global choice in most cases. Incomparison, MBB and TBB perform quite well for adequate block length and, actually, appearto be more robust to this choice. In particular, MBB and TBB are superior to MLPB andARsieve for a large range of tuning parameters wrt RMSE and CR.

6.2.3. VAR(1) model.The data from the VAR(1) model is highly persistent due to the coefficient A11 = 0.9 near tounity. This leads to autocovariances that are rather slowly decreasing with increasing lag and,consequently, to large variances of

√n(X−µ). Figure 3 shows that ARsieve outperforms MLPB,

MBB and TBB wrt to CR for small AR orders p ≈ 1. This was to be expected since the under-lying VAR(1) model is captured well by ARsieve even with finite sample size. But the pictureappears to be different wrt RMSE. Here, MLPB seems to perform superior for adequate tuningparameter choice, but this effect can be explained by the very small variance that compensatesits large bias in comparison to the AR sieve [not reported here] leading to a smaller RMSE.However, this is also illustrated by the poor performance of MLPB wrt to CR for small choicesof l. Nevertheless, the MLPB performs comparable to the ARsieve wrt to CR for a whole rangeof banding parameters. Furthermore, the plots indicate that the MLPB with data-adaptivelychosen l can achieve good CR, where the individual choice generally outperforms the globalchoice particularly wrt to RMSE. Similar to the considerations of MLPB for the VMA(1) modelin Figure 2, it can be seen that the performance of AR sieve worsens with increasing p at theexpense of increasing variability. The block bootstraps MBB and TBB appear to be clearlyinferior to MLPB and AR sieve, but for the upper half of Figure 3, this might be explained bythe too small block lengths reported here. However, for the lower half of plots, MBB and TBBare not capable to achieve CR as good as MLPB and AR sieve, particularly, for small sample size.

Our simulation experience may be summarized as follows:

• In case of an i.i.d. process, e.g. the WN model, MLPB and SIEVE perform comparablywell [cf. Figure 1], but are both outperformed by the two block bootstrap methods. Wrtto RMSE the tapered version of the MBB performs tends to be slightly better to theMBB, but the picture is not so clear wrt CR, where both are superior in some cases.

• If the underlying process is a moving average process, e.g. VMA(1) model, MLPB iscomparable to the ARsieve as shown in Figure 2, but tends to be slightly superior.

• In case of an autoregressive process, e.g. VAR(1) model, ARsieve performs better thanMLPB with respect CR, but not with respect to RMSE; see Figure 3. This is due to thehigher variability of ARsieve estimates in comparison to MLPB estimates.


• For MLPB and ARsieve, all simulations show that larger tuning parameters l and pintroduce more and more variability into the bootstrap estimation. This is due to in-volving sample autocovariances of lags up to p and l, respectively; each of those involvedsample autocovariances carries with it its own standard error that is of order 1/

√n.

Consequently, both bootstrap procedures perform best in situations where l and p canbe chosen small, respectively.

• MLPB with data-adaptively chosen banding parameters performs well, where the indi-vidual choice tends to be superior to the global choice in many cases.

7. Proofs

Proof of Theorem 2.1.Symmetry of Γκ,l − Γdn together with problem 21, p. 313 in Horn and Johnson (1990) and

plugging-in for Γκ,l(i, j) and Γdn(i, j) yields

ρ(Γκ,l − Γdn) ≤ max1≤j≤dn

dn∑

i=1

|κl(m2(i, j))Cm1(i,j)(m2(i, j)) − Cm

1(i,j)(m2(i, j))|

≤ 2

d∑

i,j=1

n−1∑

h=0

|κl(h)Cij(h) − Cij(h)|,

where the second inequality is implied by the definitions of m1(·, ·) and m2(·, ·) in (2.4). Bysplitting-up the expression on the last right-hand side above and plugging-in for κl(h), we get

ρ(Γκ,l − Γdn) ≤ 2

d∑

i,j=1

l∑

h=0

|Cij(h) − Cij(h)| +⌊cκl⌋∑

h=l+1

|κl(h)Cij(h) − Cij(h)| +n−1∑

h=⌊cκl⌋+1

|Cij(h)|

=: A1 + A2 + A3.

Now, before considering A1 and A2 separately, note that

‖Cij(h) − Cij(h)‖2 ≤ 1

n‖

min(n,n−h)∑

t=max(1,1−h)

(Xi,t+h − Xi)(Xj,t − Xj) − (n − |h|)Cij(h)‖2 +|h|n‖Cij(h)‖2

≤ M√n

+|h|n|Cij(h)|, (7.1)

where the second inequality follows from assumption (A2). Using this result, we obtain

‖A1‖2 ≤ 2d∑

i,j=1

l∑

h=0

(M√n

+|h|n|Cij(h)|

)≤ 2Md2(l + 1)√

n+ 2

l∑

h=0

|h|n|C(h)|1.

Now, consider A2. First, it holds

A2 ≤ 2

d∑

i,j=1

⌊cκl⌋∑

h=l+1

|κl(h)||Cij(h) − Cij(h)| + 2

⌊cκl⌋∑

h=l+1

|κl(h) − 1||C(h)|1

≤ 2d∑

i,j=1

⌊cκl⌋∑

h=l+1

|Cij(h) − Cij(h)| + 2

⌊cκl⌋∑

h=l+1

|C(h)|1


and straightforward application of (7.1) results in

‖A2‖2 ≤ 2d∑

i,j=1

⌊cκl⌋∑

h=l+1

(M√n

+|h|n|Cij(h)|

)+ 2

⌊cκl⌋∑

h=l+1

|C(h)|1

≤ 2Md2(⌊cκl⌋ − l)√N

+ 2

⌊cκl⌋∑

h=l+1

|h|n|C(h)|1 + 2

⌊cκl⌋∑

h=l+1

|C(h)|1.

�

Proof of Theorem 2.2.Without loss of generality, the eigenvalues of Rκ,l can be ordered such that r1 ≥ r2 ≥ · · · ≥ rdn

and let λmax(A) and λmin(A) denote the largest and the smallest eigenvalue of a matrix A,respectively. Then, by Corollary 4.3.3 in Horn and Johnson (1990), it holds

−rdn = −λmin(Rκ,l) = λmax(−Rκ,l) ≤ λmax(Rdn − Rκ,l) ≤ ρ(Rdn − Rκ,l), (7.2)

where Rdn = V−1/2ΓdnV−1/2 and the last inequality follows from the definition of the operator

norm and the spectral factorization of a symmetric matrix. Further, it holds

Γǫκ,l − Γκ,l = V1/2S(Dǫ − D)ST V1/2,

where

Dǫ − D = diag(max(ri, ǫn

−β) − ri, i = 1, . . . , dn)

= diag(max(0, ǫn−β − ri), i = 1, . . . , dn

).

Together with (7.2) and

ρ(Γdn − Γκ,l) ≤ ρ2(V1/2)ρ(S(Dǫ −D)ST ) = maxi

Cii(0) · max(0, ǫn−β − rdn)

this leads to

ρ(Γǫκ,l − Γdn) ≤ max(0, ǫ maxi Cii(0)n

−β − rdn) + ρ(Γκ,l − Γdn)

≤ maxi

Cii(0)

(ǫn−β +

1

maxi Cii(0)ρ(Γdn − Γκ,l)

)+ ρ(Γdn − Γκ,l)

= ǫ maxi

Cii(0)n−β + 2ρ(Γκ,l − Γdn).

From ‖Cii(0)‖2 = Cii(0) + O( 1√n), i = 1, . . . , d and Theorem 2.1, we get the desired result. �

Proof of Corollary 2.1.

By (A1), Gerschgorins Theorem and by (A3), we have that ρ(Γdn), ρ(Γ−1dn ), ρ(Γ

1/2dn ) and ρ(Γ

−1/2dn )

are bounded from above and from below. The claimed result for ρ(Γǫκ,l − Γdn) follows directly

from Theorem 2.2 and the corresponding result for the inverses of Γǫκ,l and Γdn follows from the

proof of Theorem 2 in McMurry and Politis (2010). The claimed convergence of the Choleskymatrices is established through Theorem 2.1 of Drmac, Omladic and Veselic (1994), whichprovides the bound

ρ((Γǫ

κ,l)1/2 − Γ

1/2dn

)≤

2cnρ(Γ1/2dn )ρ(Γ

−1/2dn (Γǫ

κ,l − Γdn)(Γ−1/2dn )T )

√1 − 4c2

nρ(Γ−1/2dn (Γǫ

κ,l − Γdn)(Γ−1/2dn )T )

, (7.3)

where cn = 12 + ⌊log2(dn)⌋ if the radicand in the denominator above is strictly positive. Since

ρ(Γ1/2dn ) and ρ(Γ

−1/2dn ) are bounded from above, the desired results hold if log2(n) · rl,n = o(1).


The corresponding result for their inverses follows from

Γ−1/2dn − (Γǫ

κ,l)−1/2 = (Γǫ

κ,l)−1/2

((Γǫ

κ,l)1/2 − Γ

1/2dn

)Γ−1/2dn ,

which also implies boundedness of ρ((Γǫκ,l)

1/2) and ρ((Γǫκ,l)

−1/2) from above and from below,which concludes this proof. �

Proof of Theorem 4.1.Let Z

∗be the bootstrap sample that is defined analogue to Z∗ in Step 4 of Section 3 except that

the resample is drawn from the standardized values of W = Γ−1/2dn Y instead of W = (Γǫ

κ,l)−1/2Y .

Now, the proof of Theorem 4.1 proceeds through a sequence of lemmas. Lemma 7.1(i) gives thejustification for using

Y∗

= Γ1/2dn Z

∗(7.4)

in all subsequent computations instead of Y ∗. Furthermore, we define C(k)(h) = C(h)1(|h| ≤ k)

and let Γdn,k = (C(k)(i − j), i, j = 1, . . . , n) be the k-banded version of Γdn. The matrix Γdn,k

is banded in the sense that only the (2k + 1)d main diagonals are not equal to zero and for allsequences k = k(n) → ∞, it holds ρ(Γdn,k − Γdn) → 0 as n → ∞ due to ρ(Γdn,k − Γdn) ≤2∑∞

h=k+1 |C(h)|1, which is obtained analogue to the proof of Theorem 2.1. Let Γ1/2dn,k be the

Cholesky decomposition of Γdn,k which exists for sufficiently large k by (A3) and note that onlyits main diagonal and the d(k+1) secondary diagonals below the main diagonal contain non zero

elements which are all bounded by maxi C1/2ii (0). The second part of Lemma 7.1 allows us also

to replace Γ1/2dn in (7.4) by Γ

1/2dn,k and to get asymptotically the same results. Lemma 7.2 gives

the proper limiting bootstrap variance, while Lemma 7.3 proves boundedness in probability of

E∗(Z∗4i ) and Lemma 7.4 deals with asymptotic normality of 1√

n

∑nt=1 Y

∗t .

Lemma 7.1. Under the assumptions of Theorem 4.1, it holds

(i)1√n

n∑

t=1

Y ∗t −

1√n

n∑

t=1

Y∗t = oP ∗(1) and (ii)

1√n

n∑

t=1

Y∗t −

1√n

n∑

t=1

Y∗,kt = oP ∗(1),

where Y∗,k

= Γ1/2dn,kZ

∗.

Proof. (i) Let J = [Id : · · · : Id] be the (d × dn) matrix that consists of n (d × d) unit matrices.

In the following, we show that 1√nbT ∑n

t=1 Y ∗t = 1√

nbT ∑n

t=1 Y∗t + OP ∗(log2(n) · rl,n) for any

Rd-valued vector b. We have

1√n

bTn∑

t=1

Y ∗t =

1√n

bTJ(Γdn)1/2Z∗+

1√n

bTJΓ1/2dn (Z∗ − Z

∗) +

1√n

bTJ((Γǫ

κ,l)1/2 − Γ

1/2dn

)Z∗

=1√n

bTn∑

t=1

Y∗t + R∗

1 + R∗2

and it remains to show that R∗1 and R∗

2 are asymptotically negligible. Considering R∗2, we get

E∗(R∗22 ) =

1

nbTJ((Γǫ

κ,l)1/2 − Γ

1/2dn )((Γǫ

κ,l)1/2 − Γ

1/2dn )TJT b

≤ 1

nbTJJT bλmax

(((Γǫ

κ,l)1/2 − Γ

1/2dn

)((Γǫ

κ,l)1/2 − Γ

1/2dn

)T)

= |b|22ρ2((Γǫ

κ,l)1/2 − Γ

1/2dn

)


and this leads to R∗2 = OP ∗(log2(n) · rl,n) by Corollary 2.1. Now, we turn to R∗

1. First of all,

observe that Z∗ and Z∗

may be represented as

Z∗ = M∗ 1

σW

(Idn − 1

dn1dn×dn

)(Γǫ

κ,l)−1/2Y ,

Z∗

= M∗ 1

σW

(Idn − 1

dn1dn×dn

)(Γdn)−1/2Y ,

where Idn is the (dn × dn) unit matrix, 1dn×dn is the (dn × dn) matrix of ones and each rowof the (dn × dn) matrix M∗ is independently and uniformly selected from the standard basisvectors e1, . . . , edn. This yields

R∗1 =

1

σW

1√n

bTJΓ1/2dn M∗

(Idn − 1

dn1dn×dn

)((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)Y

+

(1

σW− 1

σW

)1√n

bTJΓ1/2dn M∗

(Idn − 1

dn1dn×dn

)(Γǫ

κ,l)−1/2Y

= R∗3 + R∗

4

and for R∗3, we get

E∗(R∗23 ) = E∗

(1

σ2W

1

nbTJΓ

1/2dn M∗

(Idn − 1

dn1dn×dn

)((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)Y

×Y T((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)T(Idn − 1

dn1dn×dn

)T

M∗T (Γ1/2dn )TJT b

)

=1

σ2W

1

nbTJΓ

1/2dn E∗ (V ∗V ∗T ) (Γ

1/2dn )T JT b,

where V ∗ is a bootstrap vector drawn from (Idn − 1dn1dn×dn)

((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)Y . Due

to independent resampling, it holds E∗ (V ∗V ∗T ) = σ2V Idn with

σ2V = Y T

((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)T(Idn − 1

dn1dn×dn

)T

×E∗(M∗Tj• M∗

j•)

(Idn − 1

dn1dn×dn

)((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)Y

=1

dnY T

((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)T(Idn − 1

dn1dn×dn

)2 ((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)Y

≤ 1

dnY T Y λmax

(((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)T(Idn − 1

dn1dn×dn

)2 ((Γǫ

κ,l)−1/2 − (Γdn)−1/2

))

≤ 1

dnY T Y ρ2

(((Γǫ

κ,l)−1/2 − (Γdn)−1/2

)T)

ρ2

(Idn − 1

dn1dn×dn

),

where M∗j• denotes the jth row of M∗ and E∗(M∗T

j• M∗j•) = 1

dnIdn. Thanks to Y T Y = OP (dn),

ρ(((Γǫκ,l)

−1/2 − (Γdn)−1/2)T ) = OP (log2(n) · rl,n) and ρ(Idn − 1

dn1dn×dn

)= O(1), we obtain

E∗(R∗23 ) ≤ σ2

V

σ2W

1

nbTJJT bλmax(Γdn) =

σ2V

σ2W

|b|22ρ2(Γ1/2dn )

resulting in R∗3 = OP ∗(log2(n) · rl,n) because σ2

Wis bounded away from zero and from above

with probability tending to one. Finally, since the same holds true for σ2W , to handle R∗

4, it is


sufficient to show |σ2W − σ2

W| = OP (log2(n) · rl,n). By plugging-in, we have

|σ2W

− σ2W | =

∣∣∣∣∣1

dnY T (Γ

−1/2dn )T

(Idn − 1

dn1dn×dn

)2 (Γ−1/2dn − (Γǫ

κ,l)−1/2

)Y

∣∣∣∣∣

+

∣∣∣∣∣1

dnY T(Γ−1/2dn − (Γǫ

κ,l)−1/2

)T(Idn − 1

dn1dn×dn

)2

(Γǫκ,l)

−1/2Y

∣∣∣∣∣

and by Cauchy-Schwarz inequality, the first term above is bounded by

(1

dnY T (Γ

−1/2dn )T

(Idn − 1

dn1dn×dn

)2

(Γ−1/2dn )Y

)1/2

×(

1

dnY T

(Γ−1/2dn − (Γǫ

κ,l)−1/2

)T(Idn − 1

dn1dn×dn

)2 (Γ−1/2dn − (Γǫ

κ,l)−1/2

)Y

)1/2

≤ 1

dnY T Y ρ

((Γ

−1/2dn )T

(Idn − 1

dn1dn×dn

))ρ

((Γ−1/2dn − (Γǫ

κ,l)−1/2

)T(Idn − 1

dn1dn×dn

))

= OP (log2(n) · rl,n).

Analogue computations for the second term leads to R∗4 = OP ∗(log2(n) · rl,n), which concludes

the proof of the first assertion. The claimed equality in (ii) is obtained from a similar calculation

as executed for R∗2, where ρ(Γ

1/2dn,k −Γ

1/2dn ) = O(log2(n) · sk) is used. The last result follows as in

the proof of Corollary 2.1, from (7.3) with (Γǫκ,l)

1/2 replaced by Γ1/2dn,k and ρ(Γdn,k−Γdn) = O(sk),

where sk =∑∞

h=k+1 |C(h)|1. �

Lemma 7.2. Under the assumptions of Theorem 4.1, V ar∗( 1√n

∑nt=1 Y

∗t ) =

∑h∈Z

C(h)+oP (1).

Proof. For any Rd-valued vector b, we get by standard arguments

V ar∗(

bT 1√n

n∑

t=1

Y∗t

)=

1

nbTJΓdnJ

T b = bT

1

n

n∑

i,j=1

C(i − j)

b = bT

( ∞∑

h=−∞C(h)

)b + o(1).

�

Lemma 7.3. Under (A1)–(A4) with some q ≥ 2, we have E∗(|Z1|∗q) = OP (1). Under (A1′)–(A4′), we have E∗(|Z1|∗q) = OP (dq/2).

Proof. Due to E∗(|Z1|∗q) = 1dn

∑dnt=1 |Zt|q, we show E(|Zt|q) = ‖Zt‖q

q = O(1) uniformly in t. By

defining (Idn − 1dn1dn×dn)(Γdn)−1/2 = (aij)i,j=1,...,dn and At(j) = (at,(j−1)d+1, . . . , at,jd)

T , we get

Zt =1

σW

eTt

(Idn − 1

dn1dn×dn

)(Γdn)−1/2Y =

1

σW

n∑

j=1

ATt (j)(X j − µ) −

n∑

j=1

ATt (j)

(X − µ

)

and thanks to boundedness of 1σ

W

, it suffices to consider the two terms in parentheses above in

the following. For the second one, we get from triangle inequality and (A4) that

‖n∑

j=1

ATt (j)

(X − µ

)‖q

q = ‖n∑

j=1

d∑

s=1

at,(j−1)d+s

(Xs − µs

)‖q

q ≤

n∑

j=1

d∑

s=1

|at,(j−1)d+s|

q

Op(n−q/2)


holds and together with

n∑

j=1

d∑

s=1

|at,(j−1)d+s| =

dn∑

j=1

|at,j | ≤

dn

dn∑

j=1

a2t,j

1/2

≤√

dnρ(Γ−1/2dn ) = O(

√dn) (7.5)

by Cauchy-Schwarz, this term is of order OP (dq/2) and bounded in probability for fixed d. Now,we turn to the first term. We define Pj−mXj = E(Xj − µ|Fj−m) − E(Xj − µ|Fj−m−1), where

Ft = σ(Xs − µ, s ≤ t) and set Mm,n,t =∑n

j=1 ATt (j)

{E(Xj − µ|Fj−m) − E(Xj − µ|Fj−m−1)

}.

This yields∑n

j=1 ATt (j)(X j−µ) =

∑∞m=0 Mm,n,t almost surely and now the desired result follows

from ‖∑∞m=0 Mm,n,t‖q

q ≤ (∑∞

m=0 ‖Mm,n,t‖q)q < ∞ uniformly in t. By Proposition 4 in Dedecker

and Doukhan (2003), we get ‖Mm,n,t‖q ≤ (2q∑n

i=1 bi,m,n,t)1/2, where

bi,m,n,t = maxi≤l≤n

‖ATt (i)Pi−mX i

l∑

k=i

E(ATt (k)Pk−mXk|Mi)‖q/2

with Mi = σ(ATt (k)Pk−mXk, 0 ≤ k ≤ i). For the conditional expectations above, we obtain

E(ATt (i)Pi−mXi|Mi) = AT

t (i)Pi−mXi for k = i and

E(ATt (k)Pk−mXk|Mi) = AT

t (k)E(E(Xk|Fk−m)|Mi) − ATt (k)E(E(Xk|Fk−m−1)|Mi) = 0

for all k > i due to Mi ⊂ σ(Fk−m−1). This leads to

bi,m,n,t = ‖(ATt (i)Pi−mXi)

2‖q/2 ≤ ATt (i)At(i)‖(Pi−mX i)

T Pi−mXi‖q/2

by Cauchy-Schwarz, where the first factor equals∑id

s=(i−1)d+1 a2ts and the second is bounded by

d∑

p=1

(E(Pi−mXp,i)q)2/q ≤ d max

p=1,...,d(E(Pi−mXp,i)

q)2/q = d maxp=1,...,d

(E(P0Xp,m)q)2/q .

This results in

‖Mm,n,t‖2q ≤ 2q

n∑

i=1

id∑

s=(i−1)d+1

a2tsd max

p=1,...,d(E(P0Xp,m)q)1/2 =

(2qd

dn∑

i=1

a2ti

)max

p=1,...,d‖P0Xp,m‖2

q(7.6)

and (∑∞

m=0 ‖Mm,n,t‖q)q = O(dq/2) by similar arguments used to get (7.5). In particular, the

last term is bounded in probability for fixed d, which concludes this proof. �

Lemma 7.4. Under the assumptions of Theorem 4.1, 1√n

∑nt=1 Y

∗t converges in distribution in

probability to a centered normal distribution with variance obtained in Lemma 7.2.

Proof. By Lemma 7.1(ii), we may consider bT 1√n

∑nt=1 Y

∗,kt for some R

d-valued vector b and

prove its asymptotic normality by using the Cramer-Wold device. We have

bT 1√n

n∑

t=1

Y∗,kt =

1√n

bTJ(Γ1/2dn,k)Z

∗=

dn∑

i=1

n−1/2bTJΓ1/2dn,k(•, i)Z∗

i =

dn∑

i=1

U∗i (7.7)

with an obvious notation for U∗i , where Γ

1/2dn,k(•, i) denotes the ith column of Γ

1/2dn,k. Then, the

desired result follows from a CLT for triangular arrays [cf. Billingsley (1995, p.362)] whichfollows if the Lyapunov condition (for δ = 2), i.e.

1

(V ar∗(∑dn

i=1 U∗i ))2

dn∑

i=1

E∗(U∗4i ) → 0 (7.8)


in probability as n → ∞ is satisfied. Considering the denominator, we get that(

dn∑

i=1

E∗(U∗2i )

)2

=

(dn∑

i=1

1

nbTJΓ

1/2dn,k(•, i)bTJΓ

1/2dn,k(•, i)

)2

=

(1

nbTJΓdn,kJ

T b

)2

is bounded from below and from above due to ρ(Γdn,k − Γdn) → 0 for any k → ∞ and (A3).With Lemma 7.3, we get for the numerator

dn∑

i=1

E∗(U∗4i ) =

dn∑

i=1

1

n2

(bTJΓ

1/2dn,k(•, i)

)4E∗(Z∗4

i ) ≤dn∑

i=1

1

n2|b|42|JΓ

1/2dn,k(•, i)|42OP (d2)

and with |JΓ1/2dn,k(•, i)|42 = O(k4d2) uniformly in i, we obtain

∑dni=1 E∗(U∗4

i ) = OP (d5k4/n).

Altogether, this leads to

1

(V ar∗(∑n

j=1 U∗j ))2

n∑

j=1

E∗(U∗4j ) = OP

(k4

n

)= oP (1)

for some appropriate sequence k = k(n) that satisfies log2(n)sk = o(1) and k4 = o(n), which isassured to exist by (A1) with some g > 0. �

Proof of Theorem 4.2.

Let J∗n(ω) = 1√

2πn

∑nt=1 Y

∗t e

−itω and J∗,kn (ω) = 1√

2πn

∑nt=1 Y

∗,kt e−itω be the discrete Fourier

transforms based on Y ∗1, . . . , Y

∗n and Y ∗,k

1 , . . . , Y ∗,kn as defined in (7.4) and in Lemma 7.1,

respectively. The corresponding periodograms are denoted by I∗n(ω) = J∗n(ω)J

∗Hn (ω) and

I∗,kn (ω) = J∗,kn (ω)J

∗,kH

n (ω) such that

f∗(ω) =1

n

⌊n2⌋∑

j=−⌊n−1

2⌋

Kb(ω − ωj )I∗n(ωj)

and f∗,k(ω) analogue with I∗n(ωj) replaced by I∗,kn (ωj) are approximations to the bootstrap kernel

spectral density estimator f∗(ω). The proof proceeds through a sequence of lemmas. Lemma7.5 gives the justification for considering

√nb(f∗

pq(ω) − fpq(ω)) =√

nb(f∗pq(ω) − E∗(f∗

pq(ω))) +√

nb(E∗(f∗pq(ω)) − fpq(ω))

in the following and to prove the CLT for these expressions, where f as defined in Lemma 7.5below. Lemma 7.6 gives the covariance structure of the stochastic leading term above, Lemma7.7 deals with the asymptotics of the bias term and Lemma 7.8 provides asymptotic normality.


(i) J∗n(ω) − J

∗n(ω) = oP ∗(1) and (ii) J

∗n(ω) − J

∗,kn (ω) = oP ∗(1)

uniformly in ω, respectively. Further, it holds

(iii)√

nb(f∗(ω) − f∗(ω)) = oP ∗(1) and (iv)√

nb(f∗(ω) − f∗,k(ω)) = oP ∗(1)

and, for f(ω) = 12π

∑n−1h=−(n−1) C(h)e−ihω, we have (v)

√nb(f(ω) − f(ω)

)= oP (1) for all ω.

Proof. (i) Let Jω = (e−i1ω, . . . , e−inω) ⊗ Id and b ∈ Cd. Then, we have

bT (J∗n(ω) − J

∗n(ω)) =

1√2πn

bTJω

((Γdn)1/2(Z∗ − Z

∗) +

((Γκ,l)

1/2 − (Γdn)1/2)

Z∗)


and analogue to the proof of Lemma 7.1 we obtain the claimed result, where uniformity follows

from the fact that bTJωJωTb is independent of ω. Part (ii) follows similarly. (iii) Plugging-in

for f∗(ω) and f∗(ω), yields

ρ(√

nb(f∗(ω) − f∗(ω))) ≤√

b

n

⌊n2⌋∑

j=−⌊n−1

2⌋

Kb(ω − ωj)ρ

(J∗

n(ωj)(J∗

n(ωj) − J∗n(ωj)

)H)

+

√b

n

⌊n2⌋∑

j=−⌊n−1

2⌋

Kb(ω − ωj)ρ

((J∗

n(ωj) − J∗n(ωj)

)(J∗n(ωj)

)H)

= A1 + A2.

By using part (i) of this lemma, we get

A1 ≤√

b

n

⌊n2⌋∑

j=−⌊n−1

2⌋

Kb(ω − ωj)ρ (J∗n(ωj)) ρ

(J∗H

n (ωj) − J∗Hn (ωj)

)= OP ∗

(√nb log2(n)rl,n

)

and an analogue result for A2. Part (iv) follows in the same way and is omitted. (v) By

plugging-in for f(ω) and f(ω), we get

|√

nb(f(ω) − f(ω))|1 ≤√

nb

2π

n−1∑

h=−(n−1)

(1 − κl(h))|C(h)|1 +

√nb

2π

n−1∑

h=−(n−1)

κl(h)∣∣∣C(h) − C(h)

∣∣∣1

and the first summand above is of order OP (√

nb · sl) and the second one is OP (√

bl). �


nbCov∗(f∗

pq(ω), f∗rs(λ)

)=(fpr(ω)fqs(ω)δωλ

+ fps(ω)fqr(ω)τ0,π

) 1

2π

∫K2(v)dv + oP (1)

for all p, q, r, s = 1, . . . , d and all ω, λ ∈ [0, π].

Proof. We have

nbCov∗(f∗pq(ω), f∗

rs(λ)) =b

n

⌊n2⌋∑

k1,k2=−⌊n−1

2⌋

Kb(ω − ωj1)Kb(λ − ωj2)Cov∗(I∗n,pq(ωj1), I∗n,rs(ωj2))(7.9)

and the conditional covariance on the last right-hand side above becomes

1

4π2n2

n∑

t1,t2,t3,t4=1

dn∑

i1,i2,i3,i4=1

Γ1/2dn ((t1 − 1)d + p, i1)Γ

1/2dn ((t2 − 1)d + q, i2)Γ

1/2dn ((t3 − 1)d + r, i3)

×Γ1/2dn ((t4 − 1)d + s, i4)Cov∗(Z∗

i1 Z∗i2, Z

∗i3 Z

∗i4)e

−i(t1−t2)ωj1 ei(t3−t4)ωj2 (7.10)

and due to i.i.d. resampling, we have

Cov∗(Z∗i1Z

∗i2 , Z

∗i3Z

∗i4) =

1, i1 = i3, i2 = i4 or i1 = i4, i2 = i3

E∗(Z∗4i ) − 3, i1 = i2 = i3 = i4

0, otherwise

. (7.11)


Both combinations of the first case in (7.11) together with (7.10) lead to 1

2πn

n∑

t1,t3=1

Cpr(t1 − t3)e−it1ωj1eit3ωj2

1

2πn

n∑

t2,t4=1

Cqs(t2 − t4)eit2ωj1e−it4ωj2

+

1

2πn

n∑

t1,t4=1

Cps(t1 − t4)e−it1ωj1e−it4ωj2

1

2πn

n∑

t2,t3=1

Cqr(t2 − t3)eit2ωj1eit3ωj2

= fpr(ωj1)fqs(ωj1)1(j1 = j2) + fps(ωj1)fqr(ωj1)1(j1 = −j2) + o(1) (7.12)

uniformly in ωj1 and ωj2, respectively. To handle the second case in (7.11), observe that we may

replace all entries of Γ1/2dn by the corresponding entries of its k-banded version Γ

1/2dn,k in (7.10) by

Lemma 7.5. Therefore, we obtain

1

4π2n2

dn∑

i=1

(n∑

t1=1

Γ1/2dn,k((t1 − 1)d + p, i)e−it1ωj1

)(n∑

t2=1

Γ1/2dn,k((t2 − 1)d + q, i)eit2ωj1

)

×(

n∑

t3=1

Γ1/2dn,k((t3 − 1)d + r, i)eit3ωj2

)(n∑

t4=1

Γ1/2dn,k((t4 − 1)d + s, i)e−it4ωj2

)(E∗(Z∗4

1 ) − 3)

= OP

(k4

n

)

uniformly in ωj1 and ωj2 due to the banded shape of Γ1/2dn,k and Lemma 7.3. Together with (7.9),

the second case in (7.11) becomes OP (bk4) = oP (1) under the assumptions. �


E∗(f∗

pq(ω))− fpq(ω) = b2f ′′

pq(ω)1

4π

∫ π

−πv2K(v)dv + oP (b2)

for all p, q = 1, . . . , d and all ω.

Proof. Straightforward calculations yield

E∗(f∗(ω)

)− f(ω) =

1

n

⌊n2⌋∑

j=−⌊n−1

2⌋

Kb(ω − ωj)1

2π

n−1∑

h=−(n−1)

(1 − |h|

n

)C(h)

(e−ihωj − e−ihω

)

+

1

n

⌊n2⌋∑

j=−⌊n−1

2⌋

Kb(ω − ωj) − 1

1

2π

n−1∑

h=−(n−1)

C(h)e−ihω,

where the second term is negligible thanks to | 1n∑⌊n

2⌋

j=−⌊n−1

2⌋ Kb(ω − ωj) − 1| = O( 1

nb). The

first term can be treated as in Lemma 7.4 in Jentsch and Kreiss (2010), which concludes thisproof. �

Lemma 7.8. Under the assumptions of Theorem 4.2,√

nb(f∗pq(ωl) − E∗(f∗

pq(ωl)) : p, q =

1, . . . , d; l = 1, . . . , s) converges in distribution in probability to a centered d2s-dimensional nor-mal distribution with covariance matrix as obtained in Lemma 7.6.

Proof. We prove asymptotic normality only for√

nb(f∗pq(ω) − E∗(f∗

pq(ω))) in the following anda more general result follows from the Cramer-Wold device. First note that the quantity of


interest may be expressed as a generalized quadratic form, i.e.

√nb(f∗

pq(ω) − E∗(f∗

pq(ω)))

=

dn∑

i1,i2=1

√b

2π√

n

n∑

t1,t2=1

1

n

⌊n2⌋∑

j=−⌊n−1

2⌋

Kb(ω − ωj)e−i(t1−t2)ωj

×Γ1/2dn ((t1 − 1)d + p, i1)Γ

1/2dn ((t2 − 1)d + q, i2)

(Z∗

i1Z∗i2 − E∗(Z∗

i1Z∗i2))

=

dn∑

i1i2=1

wi1i2(Z∗i1 , Z

∗i2)

=∑

1≤i1<i2≤dn

Wi1i2 +dn∑

i=1

wii(Z∗i , Z∗

i )

with an obvious notation for wi1i2(Z∗i1

, Z∗i2

) and with Wi1i2 := wi1i2(Z∗i1

, Z∗i2

) + wi2i1(Z∗i2

, Z∗i1

).Due to i.i.d. resampling, we can apply Theorem 2.1 of deJong (1987) to the quadratic form. Itis clean by an easy computation and its variance is bounded, such that it remains to show that

(i) max1≤i1≤dn

dn∑

i2=1

E∗(|Wi1i2 |2) → 0 and (ii)E∗|∑1≤i1<i2≤dn Wi1i2 |4

(E∗|∑1≤i1<i2≤dn Wi1i2 |2)2→ 3

hold in probability, respectively. To show (i), examplarily, we consider only the first term of

|Wi1i2|2 in more detail, that is, |wi1i2(Z∗i1

, Z∗i2

)|2 and we get

E∗(|wi1i2(Z∗i1 , Z

∗i2)|2) (7.13)

=b

4π2n

n∑

t1,t2,t3,t4=1

1

n

⌊n2⌋∑

j1=−⌊n−1

2⌋

Kb(ω − ωj1)e−i(t1−t2)ωj1

1

n

⌊n2⌋∑

j2=−⌊n−1

2⌋

Kb(ω − ωk2)ei(t3−t4)ωj2

×Γ1/2dn ((t1 − 1)d + p, i1)Γ

1/2dn ((t2 − 1)d + q, i2)Γ

1/2dn ((t3 − 1)d + p, i1)Γ

1/2dn ((t4 − 1)d + q, i2)V ar∗(Z∗

i1Z∗i2).

The conditional variance above equals 1+(E∗(Z∗41 )−2)1(i1 = i2) and we may replace all entries

of Γ1/2dn by the corresponding entries of its k-banded version Γ

1/2dn,k by Lemma 7.5 in the following.

For any fixed i1, we obtain by summing over i2 for the first case

b

4π2n

n∑

t1,t2,t3,t4=1

1

n

⌊n2⌋∑

j1=−⌊n−1

2⌋


1

n

⌊n2⌋∑

j2=−⌊n−1

2⌋

Kb(ω − ωj2)ei(t3−t4)ωj2

×Γ1/2dn,k((t1 − 1)d + p, i1)Γdn,k((t2 − 1)d + q, (t4 − 1)d + q)Γ

1/2dn,k((t3 − 1)d + p, i1)

= O

(k2

n

)+ o

(bk2)


and with use of Lemma 7.3, we get for the second case

b

4π2n

n∑

t1,t2,t3,t4=1

1

n

⌊n2⌋∑

j1=−⌊n−1

2⌋


1

n

⌊n2⌋∑

j2=−⌊n−1

2⌋

Kb(ω − ωj2)ei(t3−t4)ωj2

×Γ1/2dn,k((t1 − 1)d + p, i1)Γ

1/2dn,k((t2 − 1)d + q, i2)Γ

1/2dn,k((t3 − 1)d + p, i1)Γ

1/2dn,k((t4 − 1)d + q, i2)

= OP

(bk4

n

)

uniformly in i1, respectively, and both terms above vanish asymptotically for some suitablychosen sequence k which is possible by the imposed conditions. Being concerned with (ii), weconsider first the numerator and get

E∗

∣∣∣∣∣∣

∑

1≤i1<i2≤dn

Wi1i2

∣∣∣∣∣∣

4 = E∗

∣∣∣∣∣∣∣∣

dn∑

i1,i2=1

i1 6=i2

wi1i2(Z∗i1 , Z

∗i2)

∣∣∣∣∣∣∣∣

4

and by expanding the above expression, we see that the most contributing case is where wehave four twins of equal indices. All other cases are of lower order. If we take all summandsabove into account, only three combinations of twins do not vanish and each of them yieldsconst2 · f4

pq(ω). For the denominator, we get (const · f2pq(ω))2 which concludes this proof. For

details compare for instance the proof of Theorem 2 in Jentsch (2012). �

Proof of Theorem 5.1.Under assumptions (A1′) and (A2′), we get the same bounds as obtained in Theorems 2.1 and2.2, respectively, and |C(h)|1 ≤ d2 supn∈N supi,j=1,...,d(n) |Cij(h)| leads to the first part of (i).By similar arguments as employed in the proof of Corollary 2.1, we get also the second partunder (A3′) and convergence to zero in probability under the imposed conditions. Analogue tothe proof of Corollary 2.1, we get (ii) and (iii) by exploiting the bound in (7.3) for the Choleskyfactorization. �

Proof of Theorem 5.2.We follow the proof of Theorem 4.1 and adopt the notation therein. First, analogue to the proofof Lemma 7.1, we get for any real-valued sequence b = b(d(n)) of d(n)-dimensional vectors with

0 < M1 ≤ |b(d(n))|22 ≤ M2 < ∞ for all n ∈ N (7.14)

that the following holds

(i) bT

(1√n

n∑

t=1

Y ∗t

)= bT

(1√n

n∑

t=1

Y∗t

)+ OP ∗(log2(dn)d2rl,n),

(ii) bT

(1√n

n∑

t=1

Y∗t

)= bT

(1√n

n∑

t=1

Y∗,kt

)+ OP ∗(log2(dn)d2sk),

where rl,n is defined in Theorem 5.1 and sk =∑∞

h=k+1{supn∈N supi,j=1,...,d(n) |Cij(h)|}. BothOP ∗-terms above vanish under the imposed conditions. Similar to the proof of Lemma 7.4, wewant to show asymptotic normality for (7.7) and, therefore, we have to check the Lyapunovcondition (7.8). For the denominator, we get that

(dn∑

i=1

E∗(U∗2i )

)2

=

(dn∑

i=1

1

nbTJΓ

1/2dn,k(•, i)bTJΓ

1/2dn,k(•, i)

)2

=

(1

nbTJΓdn,kJ

T b

)2


is bounded from below and from above due to (7.14), (A3′) and ρ(Γdn,k−Γdn) → 0 for a suitablesequence k → ∞ assured to exist by the imposed assumptions. For the numerator, we get

dn∑

i=1

E∗(U∗4i ) ≤

dn∑

i=1

1

n2|b|42|JΓ

1/2dn,k(•, i)|42OP (d2) = OP (d5k4/n),

where Lemma 7.3 for increasing time series dimension has been used. This gives

1

(V ar∗(∑n

j=1 U∗j ))2

n∑

j=1

E∗(U∗4j ) = OP

(d5k4

n

)= oP (1)

for some appropriate sequence k = k(n) that satisfies log2(dn)d2sk = o(1) and d5k4 = o(n),which is assured to exist by (A1′) with some sufficiently large g > 0. Further, we get

|v2 − v2| =1

nbTJ

{E(XX

T) − E∗(XX

T)}

JT b =1

nbTJ{Γdn − Γǫ

κ,l}JT b

≤(

1

nbTJJT b

)1/2( 1

nbTJ{Γdn − Γǫ

κ,l}{Γdn − Γǫκ,l}T JT b

)1/2

≤ |b|22ρ(Γdn − Γǫκ,l)

which concludes the proof as ρ(Γdn − Γǫκ,l) = oP (1) and |b2

2| = O(1) by assumption. �

Acknowledgement 1. The research of the first author was supported by a fellowship within thePhD-Program of the German Academic Exchange Service (DAAD) when Carsten Jentsch wasvisiting the University of California, San Diego, USA. The reasearch of the second author waspartially supported by NSF grant DMS 13-08319. The authors thank Timothy McMurry for hishelping advice on the univariate case and three anonymous referees and the editor who helped tosignificantly improve the presentation of the paper.

References

[1] Brillinger, D. (1981): Time Series: Data Analysis and Theory. Holden-Day, San Francisco.[2] Billingsley, P. (1999): Convergence of Probability Measures, 2nd ed. Chichester, Wiley.[3] Brockwell, P.J. and Davis, R.A. (1988). Simple consistent estimation of the coefficients of a linear filter.Stochastic Processes and Their Applications 22, 47-59.

[4] Brockwell, P.J. and Davis, R.A. (1991). Time series: theory and methods. 2nd ed. Springer-Verlag, Berlin.[5] Brockwell, P.J. and Mitchell, H. (1997). Estimation of the coefficients of a multivariate linear filter using theinnovations algorithm. J. Time Ser. Anal. 18, No. 2, 157-179.

[6] Buhlmann, P. (1997). Sieve bootstrap for time series. Bernoulli 3:123–148.[7] Buhlmann, P. (2002). Bootstraps for Time Series. Statist. Sci., 17, 52-72.[8] Cai, T., Ren, Z. and Zhou, H. (2013). Optimal Rates of Convergence for Estimating Toeplitz CovarianceMatrices. Probab. Theory Relat. Fields, 156, 101-143.

[9] Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press, New York.[10] Dedecker, J. and Doukhan, P. (2003). A new covariance inequality and applications. Stochastic Process.Appl. 106, No. 1, 63-80.

[11] Dedecker, J., Doukhan, P., Lang, G., Leon, J.R.R., Louhichi, S. and Prieur, C. (2007) Weak dependence.With examples and applications. Lecture Notes in Statistics 190. New York, NY: Springer.

[12] De Jong, P. (1987). A central limit theorem for generalized quadratic forms. Probab. Theory Relat. Fields75, 261-277.

[13] Doukhan, P. (1994). Mixing: Properties and examples. Lecture Notes in Statistics 85. Springer, New York.[14] Drmac, Z., Omladic, M. and Veselic, K. (1994). On the perturbation of the cholesky factorization. SIAMJournal on Matrix Analysis and Applications 15, 1319-1332.

[15] Goncalves, S. and Kilian, L. (2007). Asymptotic and bootstrap inference for AR(∞) processes with condi-tional heteroskedasticity. Econometric Reviews 26, 609–641.

[16] Hannan, E.J. (1970). Multiple Time Series. New York: Wiley.[17] Hardle, W, Horowitz, J. and Kreiss, J.-P. (2003): Bootstrap Methods for Time Series. Int. Statist. Rev., 71,No. 2, 435-459.


[18] Horn, R.A. and Johnson, C.R. (1990). Matrix analysis. New York: Cambridge University Press.[19] Jentsch, C. (2012). A new frequency domain approach of testing for covariance stationarity and for periodicstationarity in multivariate linear processes. J. Time Ser. Anal., 33, 177-192.

[20] Jentsch, C. and Kreiss, J.-P. (2010). The multiple Hybrid Bootstrap - Resampling multivariate linear pro-cesses. J. Multivariate Anal., 101, 2320-2345.

[21] Jentsch, C. and Politis, D.N. (2013). Valid resampling of higher order statistics using linear process bootstrapand autoregressive sieve bootstrap. Commun. Stat., Theory Methods, 42, 1277-1293.

[22] Kreiss, J.-P. (1992). Bootstrap Procedures for AR(∞) Processes. In: Bootstrapping and Related Techniques.Jockel, K. H., Rothe, G., Sender, W., eds. Lecture Notes in Economics and Mathematical Systems 376, Heidel-berg: Springer-Verlag, pp. 107–113.

[23] Kreiss, J.-P., Paparoditis, E. (2011). Bootstrap for dependent data: a review. J. Korean Statist. Soc. 40,357-378.

[24] Kreiss, J.-P., Paparoditis, E., Politis, D. N. (2011). On the range of validity of the autoregressive sievebootstrap. Ann. Statist. 39, 2103-2130.

[25] Kunsch, H.R. (1989). The Jackknife and the Bootstrap for General Stationary Observations. Ann. Statist.,17, No. 3, 1217-1241.

[26] Lahiri, S.N. (2003). Resampling methods for dependent data. Springer, New York.[27] Lewis, R. and Reinsel, G. (1985). Prediction of multivariate time series of autoregressive model fitting. Journal

of Multivariate Analysis. 16, 393-411.[28] Liu, R.Y. and Singh, K. (1992). Moving blocks jackknife and bootstrap capture weak dependence. In: Ex-ploring the Limits of Bootstrap (Eds.: LePage, R. and Billard, L.), 225-248. Wiley, New York.

[29] McMurry, T.L. and Politis, D.N. (2010). Banded and tapered estimates for autocovariance matrices and thelinear process bootstrap. J. Time Ser. Anal. 31, 471-482. CORRIGENDUM J. Time Ser. Anal., vol. 33, 2012.

[30] Paparoditis, E. (2002). Frequency Domain Bootstrap for Time Series. In Empirical Process Techniques for

Dependent Data (H. Dehling, T. Mikosch and M. Sorensen, eds.) 365-381. Boston: Birkhauser.[31] Politis, D.N. (2001). On nonparametric function estimation with infinite-order flat-top kernels, in Probabilityand Statistical Models with applications, Ch. Charalambides et al. (Eds.), Chapman and Hall/CRC, Boca Raton,469–483.

[32] Politis, D.N. (2003a). The impact of bootstrap methods on time series analysis, Statistical Science, 18, No.2, 219-230.

[33] Politis, D.N. (2003b). Adaptive bandwidth choice. J. Nonparametric Statist., 15, No. 4-5, 517–533.[34] Politis, D.N. (2011). Higher-order accurate, positive semi-definite estimation of large-sample covariance andspectral density matrices. Econometric Theory, 27, no. 4, 703-744.

[35] Politis, D.N. and Romano, J.P. (1992). A General Resampling Scheme for Triangular Arrays of alpha-mixingRandom Variables with Application to the Problem of Spectral Density Estimation. Ann. Statist., 20, No. 4,1985-2007.

[36] Politis, D.N. and Romano, J.P. (1994). Limit Theorems for Weakly Dependent Hilbert Space Valued RandomVariables with Applications to the Stationary Bootstrap, Statistica Sinica, 4, No. 2, 461-476.

[37] Politis, D.N. and Romano, J.P. (1995). Bias-Corrected Nonparametric Spectral Estimation, J. Time Ser.Anal., 16, No. 1, 67-104.

[38] Rissanen, J. and Barbosa, L. (1969). Properties of infinite covariance matrices and stability of optimumpredictors. Information Sci. 1, 221-236.

[39] Shao, X. (2010). The dependent wild bootstrap. J. Am. Stat. Assoc., 105:489, 218-235.[40] Wu, W.B. (2005). Nonlinear system theory: another look at dependence. Proceedings of the National Acad-emy of Sciences 102, 1415014154.

[41] Wu, W.B. and Pourahmadi, M. (2009). Banding sample autocovariance matrices of stationary processes.Statistica Sinica 19, No.4, 1755–1768.

Department of Economics, University of Mannheim, L7, 3-5, 68131 Mannheim, Germany

E-mail address: [email protected]

Department of Mathematics, University of California, San Diego, La Jolla, CA 92093-0112,

USA

E-mail address: [email protected]


0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

n=100

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

n=200

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

n=500

tuning parameter

RM

SE

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=100

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=200

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=500

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

n=100

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

n=200

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8n=500

tuning parameter

RM

SE

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=100

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=200

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=500

tuning parameter

Cov

erag

e R

ate

Figure 1. Comparison of bootstrap performance: RMSE for estimating thecovariance matrix of

√n(X − µ) and CR of bootstrap confidence intervals for

µ by MLPB (solid), ARsieve (dashed), MBB (dotted) and TBB (dash-dotted)are reported for the WN model with sample sizes n ∈ {100, 200, 500} and tun-ing parameters l, p, s ∈ {1, . . . , 20}. The upper half of panels corresponds toV ar(

√n(X1 − µ1)) and µ1 and the second one to V ar(

√n(X2 − µ2)) and µ2.

Line segments indicate the results for MLPB with data-adaptive individual (grey)and global (black) banding parameter choices.


0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

n=100

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

n=200

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

n=500

tuning parameter

RM

SE

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=100

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=200

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=500

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

n=100

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

n=200

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0n=500

tuning parameter

RM

SE

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=100

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=200

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=500

tuning parameter

Cov

erag

e R

ate



µ by MLPB (solid), ARsieve (dashed), MBB (dotted) and TBB (dash-dotted)are reported for the VMA(1) model with sample sizes n ∈ {100, 200, 500} andtuning parameters l, p, s ∈ {1, . . . , 20}. The upper half of panels corresponds toV ar(

√n(X1−µ1)) and µ1 and the second one to V ar(

√n(X2−µ2)) and µ2. Line

segments indicate the results for MLPB with data-adaptive individual (grey) andglobal (black) banding parameter choices.


0 5 10 15 20

020

4060

80

n=100

tuning parameter

RM

SE

0 5 10 15 20

020

4060

80

n=200

tuning parameter

RM

SE

0 5 10 15 20

020

4060

80

n=500

tuning parameter

RM

SE

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=100

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=200

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=500

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n=100

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n=200

tuning parameter

RM

SE

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

2.5

3.0n=500

tuning parameter

RM

SE

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=100

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=200

tuning parameter

Cov

erag

e R

ate

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

n=500

tuning parameter

Cov

erag

e R

ate



µ by MLPB (solid), ARsieve (dashed), MBB (dotted) and TBB (dash-dotted)are reported for the VAR(1) model with sample sizes n ∈ {100, 200, 500} andtuning parameters l, p, s ∈ {1, . . . , 20}. The upper half of panels corresponds toV ar(

√n(X1−µ1)) and µ1 and the second one to V ar(

√n(X2−µ2)) and µ2. Line

segments indicate the results for MLPB with data-adaptive individual (grey) andglobal (black) banding parameter choices.

covariance matrix estimation and linear process bootstrap for multivariate...

Documents