degenerate u- and v-statistics under weak …leucht/u_stat_dep.pdfu- and v-statistics of dependent...

29
Degenerate U - and V -statistics under weak dependence: Asymptotic theory and bootstrap consistency Anne Leucht* Friedrich-Schiller-Universit¨ at Jena Institut f¨ ur Stochastik Ernst-Abbe-Platz 2 D-07743 Jena Germany E-mail: [email protected] Abstract We devise a general result on the consistency of model-based bootstrap methods for U - and V -statistics under easily verifiable conditions. For that purpose we derive the limit distribution of degree-2 degenerate U - and V -statistics for weakly dependent R d -valued random variables first. To this end, only some moment conditions and smoothness assump- tions concerning the kernel are required. Based on this result, we verify that the bootstrap counterparts of these statistics converge to the same limit distributions. Finally, some applications to hypothesis testing are presented. 2000 Mathematics Subject Classification. 62E20, 62G09. Keywords and Phrases. Bootstrap, consistency, U -statistics, V -statistics, weak de- pendence. Short title. Bootstrap for U -statistics under weak dependence.

Upload: others

Post on 12-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

Degenerate U- and V -statistics under weak dependence:Asymptotic theory and bootstrap consistency

Anne Leucht*Friedrich-Schiller-Universitat Jena

Institut fur StochastikErnst-Abbe-Platz 2

D-07743 JenaGermany

E-mail: [email protected]

Abstract

We devise a general result on the consistency of model-based bootstrap methods for U -and V -statistics under easily verifiable conditions. For that purpose we derive the limitdistribution of degree-2 degenerate U - and V -statistics for weakly dependent Rd-valuedrandom variables first. To this end, only some moment conditions and smoothness assump-tions concerning the kernel are required. Based on this result, we verify that the bootstrapcounterparts of these statistics converge to the same limit distributions. Finally, someapplications to hypothesis testing are presented.

2000 Mathematics Subject Classification. 62E20, 62G09.Keywords and Phrases. Bootstrap, consistency, U -statistics, V -statistics, weak de-pendence.Short title. Bootstrap for U -statistics under weak dependence.

Page 2: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

1

1. Introduction

Numerous test statistics can be formulated or approximated in terms of degen-erate U - or V -type statistics multiplied with the sample size. Examples include theCramer-von Mises statistic, the Anderson-Darling statistic or the χ2-statistic. Fori.i.d. random variables the limit distributions of U - and V -statistics can be derivedvia a spectral decomposition of its kernel if the latter is squared integrable. To use thesame method for dependent data, often restrictive assumptions are required whosevalidity is quite complicated or even impossible to check in many cases. The firstof our two main results is the derivation of the asymptotic distributions of U - andV -statistics under assumptions that are fairly easy to check. This approach is basedon a wavelet decomposition instead of a spectral decomposition of the kernel.

These limit distributions for both independent and dependent observations de-pend on certain parameters which in turn depend on the underlying situation in acomplicated way. Therefore, problems arise as soon as critical values for test statis-tics of U - and V -type have to be determined. The bootstrap offers a convenient wayto circumvent these problems, see Arcones and Gine [1], Dehling and Mikosch [8] orLeucht and Neumann [23] for the i.i.d. case. To our knowledge, there are no resultsconcerning bootstrapping general degenerate U -statistics of non-independent obser-vations. As a second main result of the paper, we establish consistency of model-basedbootstrap methods for statistics of weakly dependent data.

In order to describe the dependence structure of the sample we do not use theconcept of mixing. It is inappropriate in some sense since not only the asymptoticbehaviour of U - and V -type statistics but also bootstrap consistency is focussed inthis paper. Unfortunately, model-based bootstrap methods can yield samples thatare no longer mixing even though the original sample satisfies some mixing condition.For further discussion of this issue, see Doukhan and Neumann [14]. Instead of mixingwe suppose the sample to be τ -dependent in the sense of Dedecker and Prieur [7].

Our paper is organized as follows. We start with an overview of asymptotic resultson degenerate U - and V -statistics of dependent random variables. In Subsection 2.2we introduce the underlying concept of weak dependence and derive the asymptoticdistributions of U - and V -statistics. On the basis of these results, we deduce con-sistency of general bootstrap methods in Section 3. Some applications of the theoryto hypotheses testing are presented in Section 4. All proofs are deferred to a finalSection 5.

2. Asymptotic distributions of U- and V -statistics

2.1. Survey of literature. Let (Xn)n∈N be a sequence of Rd-valued random vari-ables on some probability space. In the case of i.i.d. random variables, the limitdistribution of degenerate U - and V -type statistics, i.e.

nUn =1

n

n∑j=1

∑k 6=j

h(Xj, Xk) and nVn =1

n

n∑j,k=1

h(Xj, Xk),

with h : Rd × Rd → R symmetric and∫

Rd h(x, y)dF (x) = 0, ∀ y ∈ Rd, can bederived by using a spectral decomposition of h(x, y) =

∑∞k=1 λkΦk(x)Φk(y) which

holds true in the L2-sense. Here, (Φk)k denote orthonormal eigenfunctions and (λk)k

Page 3: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

2

the corresponding eigenvalues of the integral equation∫Rd

h(x, y)g(y)dF (y) = λg(y), (2.1)

where F denotes the distribution function of X1. Approximate nUn by nU(K)n :=∑K

k=1 λk

(1√n

∑ni=1 Φk(Xi)

)2

− 1n

∑ni=1 Φ2

k(Xi)

. Then the sum under the round

brackets is asymptotically normal while the latter sum converges in probability to 1.Finally, one obtains

nUnd−→

∞∑k=1

λk(Z2k − 1), (2.2)

where (Zk)k is a sequence of i.i.d. standard normal random variables, cf. Serfling [25].

If additionally E|h(X1, X1)| < ∞, Slutsky’s theorem implies Vnd−→∑∞

k=1 λk(Z2k −

1) + Eh(X1, X1). (Here,d−→ denotes convergence in distribution.)

So far, most previous attempts to derive the limit distribution of degenerateU - and V -statistics of dependent random variables are based on the adoption ofthis method of proof. Eagleson [15] developed the asymptotic theory in the caseof a strictly stationary sequence of φ-mixing, real-valued random variables underthe assumption of absolutely summable eigenvalues. This condition is satisfied ifthe kernel function is of the form h(x, y) =

∫R h1(x, z)h2(z, y)dF (z). Using general

heavy-tailed weight functions instead of F , the eigenvalues are not necessarily abso-lutely summable. For an example see de Wet [11]. Carlstein [4] analysed U -statisticsof α-mixing, real-valued random variables in the case of finitely many eigenfunc-tions. He derived a limit distribution of the form (2.2), where (Zk)k∈N is a sequenceof centered normal random variables. Denker [9] considered stationary functionals(Xn = f(Zn, Zn+1, . . . ))n of β-mixing random variables (Zn)n. He assumed f andthe cumulative distribution function of X0 to be Holder continuous. Imposing somesmoothness condition on h the limit distribution of nUn is derived under the addi-tional assumption ‖Φk‖∞ < ∞, ∀ k ∈ N. The condition on (Φk)k is difficult or evenimpossible to check in a multitude of cases since this requires to solve the associatedintegral equation (2.1). Similar difficulties occur if one wants to apply the results ofDewan and Prakasa Rao [10] or Huang and Zhang [21]. They studied U -statisticsof associated, real-valued random variables. Besides the absolute summability of theeigenvalues some regularity conditions have to be satisfied uniformly by the eigen-functions, in order to obtain the asymptotic distribution of nUn.

A different approach was used by Babbel [2] to determine the limit distribution ofU -statistics of φ- and β-mixing random variables. She deduced the limit distributionvia a Haar-wavelet decomposition of the kernel and empirical process theory with-out imposing the critical conditions mentioned above. However, this approach is notsuitable when dealing with U -statistics of τ -dependent random variables since Lip-schitz continuity will be the crucial property of the (approximating) kernel in orderto exploit the underlying dependence structure.

2.2. Main results. Let (Xn)n∈N be a sequence of Rd-valued random variables onsome probability space (Ω,A, P ). In this subsection we derive the limit distributions

Page 4: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

3

of

nUn =1

n

n∑j=1

∑k 6=j

h(Xj, Xk) and nVn =1

n

n∑j,k=1

h(Xj, Xk),

where h : Rd × Rd → R is a symmetric function with∫

Rd h(x, y)dF (x) = 0 ∀ y ∈ Rd.In order to describe the dependence structure of (Xn)n∈N, we recall the definition ofthe τ -dependence coefficient for Rd-valued random variables of Dedecker and Prieur[7]:

Definition 2.1. Let (Ω,A, P ) be a probability space, M a σ-algebra of A and Xan Rd-valued random variable (d ∈ N). Assume that E‖X‖l1 < ∞, where ‖x‖l1 =∑d

i=1 |xi|, and define

τ(M, X) = E

(sup

f∈Λ1(Rd)

∣∣∣∣∫Rd

f(x)dFX|M(x)−∫

Rd

f(x)dF (x)

∣∣∣∣).

Here, FX|M denotes the conditional distribution function of the distribution of Xgiven M, and Λ1(Rd) denotes the set of 1-Lipschitz functions from Rd to R.

We assume

(A1) (i) (Xn)n∈N is a (strictly) stationary sequence of Rd-valued random variables(d ∈ N) on some probability space (Ω,A, P ) with E‖X0‖l1 <∞.

(ii) The sequence (τr)r∈N defined by

τr := supτ(σ(Xs1 , . . . , Xsu), (X ′t1, X ′

t2, X ′

t3)′) |

u ∈ N, s1 ≤ · · · ≤ su < su + r ≤ t1 ≤ t2 ≤ t3 ∈ N

satisfies∑∞

r=1 rτδr < ∞ for some δ ∈ (0, 1). (Here, prime denotes the

transposition.)

Remark 1. If Ω is rich enough, due to Dedecker and Prieur [6] the validity of (A1)

allows for the construction of a random vector(X ′

t1, X ′

t2, X ′

t3

)′ d=(X ′

t1, X ′

t2, X ′

t3

)′,

independent of Xs1 , ..., Xsu with

τr =3∑

i=1

E‖Xti −Xti‖l1 . (2.3)

We obtain an upper bound for the dependence coefficient τr ≤ 6∫ β(r)

0Q|X0|(u)du,

where Q|X0|(u) = inft ∈ R |P (‖X0‖l1 > t) ≤ u, u ∈ [0, 1] and β(r) denotesthe ordinary β-mixing coefficient β(r) := E supB∈σ(Xs,s>t+r),t∈Z |P (B|σ(Xs, s ≤ t)) −P (B)|). This is a consequence of Remark 2 of Dedecker and Prieur [6]. Moreover,(A1) is related to the concept of weak dependence which was introduced by Doukhanand Louhichi [13]. Remark 1 immediately implies

|cov (h(Xs1 , . . . , Xsu), k(Xt1 , . . . , Xtv))| ≤ 2‖h‖∞Lip(k)⌈v3

⌉τr (2.4)

for s1 ≤ · · · ≤ su < su+r ≤ t1 ≤ · · · ≤ tv ∈ N and for all functions h : Ru → R and k :Rv → R in L := f : Rp → R for some p ∈ N | Lipschitz continuous and bounded.Therefore, a sequence of random variables, that satisfies (A1), is ((τr)r,L, ψ)-weakdependent with ψ(h, k, u, v) = 2‖h‖∞Lip(k)

⌈v3

⌉. (Here and in the sequel, Lip(g)

denotes the Lipschitz constant of a generic function g.)

Page 5: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

4

A list of examples for τ -dependent processes including causal linear and functionalautoregressive processes is provided by Dedecker and Prieur [7].

Besides the conditions on the dependence structure of (Xn)n∈N we make the followingassumptions concerning the kernel:

(A2) (i) The kernel h : Rd × Rd → R is a symmetric, measurable function anddegenerate under F , i.e.

∫Rd h(x, y)dF (x) = 0, ∀ y ∈ Rd.

(ii) For a δ satisfying (A1)(ii) the following moment constraints hold truewith some ν > (2− δ)/(1− δ):

supk∈N

E|h(X1, Xk)|ν <∞ and E|h(X1, X1)|ν <∞,

where X1 is an independent copy of X1.

(A3) The kernel h is Lipschitz continuous.

Using an appropriate kernel truncation it is possible to reduce the problem of derivingthe asymptotic distribution of nVn to statistics with bounded kernel functions.

Lemma 2.1. Suppose that (A1), (A2), and (A3) are fulfilled. Then there exists afamily of bounded functions (hc)c∈R+ satisfying (A2) and (A3) uniformly and suchthat

limc→∞

lim supn→∞

n2 E(Vn − Vn,c)2 = 0, (2.5)

where Vn,c = 1n2

∑nj,k=1 hc(Xj, Xk).

After this simplification of the problem, we intend to develop a decomposition ofthe kernel that allows for the application of a central limit theorem (CLT) for weaklydependent random variables. One could try to imitate the proof of the i.i.d. case.According to the discussion in the previous subsection this leads to prerequisites thatcan hardly be checked in numerous cases. Therefore, we do not use a spectral de-composition of the kernel in order to apply the CLT but a wavelet decompositioninstead. It turns out that Lipschitz continuity is the central property, the kernelfunction should satisfy, in order to exploit (2.3). For this reason the choice of Haarwavelets, as they were employed by Babbel [2], is inappropriate. Instead, the appli-cation of Lipschitz continuous scale and wavelet functions is more suitable here.In the sequel, let φ and ψ denote scale and wavelet functions associated with an one-dimensional multiresolution analysis. As illustrated by Daubechies [5], Sect. 8, thesefunctions can be selected in such a manner that they possess the following properties:

(1) φ and ψ are Lipschitz continuous,(2) φ and ψ have compact support,(3)

∫∞−∞ φ(x)dx = 1 and

∫∞−∞ ψ(x)dx = 0.

It is well-known that an orthonormal basis of L2(Rd) can be constructed on thebase of φ and ψ. For this purpose define E := 0, 1d\0d, where 0d denotes thed-dimensional null vector. In addition, set

ϕ(i) :=

φ for i = 0,

ψ for i = 1

Page 6: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

5

and define the functions Ψ(e)j,k : Rd → R for j ∈ Z, k = (k1, . . . , kd)

′ ∈ Zd by

Ψ(e)j,k(x) := 2jd/2

d∏i=1

ϕ(ei)(2jxi−ki), ∀ e = (e1, . . . , ed)′ ∈ E, ∀x = (x1, . . . , xd)

′ ∈ Rd.

The system(Ψ

(e)j,k

)e∈E,j∈Z,k∈Zd

is an orthonormal basis of L2(Rd), see Wojtaszczyk

[26], Sect. 5. The same holds true for

(Φj0,k)k∈Zd

⋃(Ψ

(e)j,k

)e∈E,j≥j0,k∈Zd

, j0 ∈ Z,

where Φj,k : Rd → R is given by Φj,k(x) := 2jd/2∏d

i=1 φ(2jxi − ki) for j ∈ Z, k ∈ Zd.Thus, an L2-approximation of Vn,c by a statistic based on a wavelet approximation

of hc can be established. To this end we introduce h(K,L)c with

h(K,L)c (x, y) :=

∑k1,k2∈−L,...,Ld

α(c)j0;k1,k2

Φj0,k1(x)Φj0,k2(y)

+

J(K)−1∑j=j0

∑k1,k2∈−L,...,Ld

∑e∈E

β(c,e)j;k1,k2

Ψ(e)j;k1,k2

(x, y),

where E := (E × E) ∪ (E × 0d) ∪ (0d × E),

Ψ(e)j;k1,k2

:=

Ψ

(e1)j,k1

Ψ(e2)j,k2

for (e′1, e′2)′ ∈ E × E,

Ψ(e1)j,k1

Φj,k2 for (e′1, e′2)′ ∈ E × 0d,

Φj,k1Ψ(e2)j,k2

for (e′1, e′2)′ ∈ 0d × E

α(c)j0;k1,k2

=∫∫

Rd×Rd hc(x, y)Φj0,k1(x)Φj0,k2(y)dx dy and

β(c,e)j;k1,k2

=∫∫

Rd×Rd hc(x, y)Ψ(e)j;k1,k2

(x, y)dx dy. We refer to the degenerate version of

h(K,L)c as h

(K,L)c , given by

h(K,L)c (x, y) = h(K,L)

c (x, y)−∫

Rd

h(K,L)c (x, y)dF (x)−

∫Rd

h(K,L)c (x, y)dF (y)

+

∫∫Rd×Rd

h(K,L)c (x, y)dF (x)dF (y).

The associated V -type statistic will be denoted by V(K,L)n,c .

Lemma 2.2. Assume that (A1), (A2), and (A3) are fulfilled. Then (J(K))K∈N ⊆ Nwith J(K) −→

K→∞∞ can be chosen such that

limK→∞

limL→∞

lim supn→∞

n2 E(Vn,c − V (K,L)

n,c

)2= 0.

Employing the CLT of Neumann and Paparoditis [24] and the continuous mappingtheorem we obtain the limit distribution of nV K,L

n,c . Finally, based on this result,the limit distribution of the V -type statistic nVn can be derived. Moreover, thelaw of large numbers allows for deducing the asymptotic distribution of nUn sincenUn = n

n−1

[nVn − 1

n

∑nk=1 h(Xk, Xk)

].

Page 7: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

6

Theorem 2.1. Suppose that the assumptions (A1), (A2), and (A3) are fulfilled.Then, as n→∞,

nVnd−→ Z and nUn

d−→ Z − Eh(X1, X1)

with

Z := limc→∞

∑k1,k2∈Zd

α(c)j0;k1,k2

Zk1Zk2 +∞∑

j=j0

∑k1,k2∈Zd

∑e=(e′1,e′2)′∈E

β(c,e)j;k1,k2

Z(e1)j;k1

Z(e2)j;k2

.

Here, (Zk)k∈Zd and (Z(e)j;k)j≥j0, k∈Zd, e∈0,1d are centered normally distributed random

variables and the r.h.s. converges in the L2-sense.

As in the case of i.i.d. random variables, the limit distribution of nVn is a weightedsum of products of centered normal random variables. In contrast to many other re-sults in the literature, the prerequisites of this theorem, namely moment constraintsand Lipschitz continuity of the kernel, can be checked fairly easily in many cases.Unfortunately, the asymptotic distribution still has a complicated structure. Hence,quantiles can hardly be determined on the basis of the previous result. However, thisproblem plays a minor role since we show in the following section that the condi-tional distribution of the bootstrap counterparts of nUn and nVn, given X1, . . . , Xn,converge to the same limits in probability.

Of course, the assumption of Lipschitz continuous kernels is rather restrictive. Thus,we intend to extend our theory to a more general class of kernel functions. The costsfor enlarging the class of feasible kernels are additional moment constraints.Besides (A1) and (A2) we assume

(A4) (i) The kernel function satisfies

|h(x, y)− h(x, y)| ≤ f(x, x, y, y) [‖x− x‖l1 + ‖y − y‖l1 ] ,∀x, x, y, y ∈ Rd

where f : R4d → R is continuous. Moreover, for η := 1/(1− δ) and somea > 0

supk1,...,k5∈N

E(

maxc1,c2∈[−a,a]d

f(Yk1 , Yk2 + c1, Yk3 , Yk4 + c2)η(1 + ‖Yk5‖l1)

)<∞,

for any sequence (Yk)k∈N with Yk = Xk or Yk = Xk, where Xk denotes anindependent copy of Xk.

(ii)∑∞

r=1 r(τr)δ2<∞.

Even though the assumption (A4)(i) has a rather technical structure, it is satisfied e.g.by polynomial kernel functions as long as the sample variables have sufficiently manyfinite moments. Analogously to Lemma 2.1 and Lemma 2.2 the following assertionholds.

Lemma 2.3. Suppose that (A1), (A2), and (A4) are fulfilled. Then (J(K))K∈N ⊂ Nwith J(K) −→

K→∞∞ can be chosen such that

limc→∞

limK→∞

limL→∞

lim supn→∞

E(Vn − V (K,L)n,c )2 = 0.

This auxiliary result implies the analogue to Theorem 2.1 for non-Lipschitz kernels.

Page 8: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

7

Theorem 2.2. Assume that (A1), (A2), and (A4) are satisfied. Then, as n→∞,

nVnd−→ Z and nUn

d−→ Z − Eh(X1, X1),

where Z is defined as in Theorem 2.1.

3. Consistency of general bootstrap methods

As we have seen in the previous section, the limit distributions of degenerate U -and V -statistics have a rather complicated structure. Therefore, in the majority ofcases it is quite difficult to determine quantiles which are required in order to deriveasymptotic critical values of U - and V -type test statistics. The bootstrap offers asuitable way of determining these quantities.

Given X1, . . . , Xn, let X∗ and Y ∗ denote vectors of bootstrap random variableswith values in Rd1 and Rd2 . In order to describe the dependence structure of thebootstrap sample, we introduce Xn := (X ′

1, . . . , X′n)′,

τ ∗(Y ∗, X∗, ω) := E

(sup

f∈Λ1(Rd1 )

∣∣∣∣∫Rd1

f(x)dFX∗|Y ∗(x)−

∫Rd1

f(x)dFX∗(x)

∣∣∣∣ ∣∣∣Xn = ω

)and make the following assumptions:

(A1∗) (i) The sequence of bootstrap variables is stationary with probability tending

to one. Additionally, (X∗′t1, X∗′

t2)′

d−→ (X ′t1, X ′

t2)′, ∀ t1, t2 ∈ N, holds true

in probability.(ii) Conditionally on X1, . . . , Xn, the random variables (X∗

k)k∈Z are τ -weaklydependent, i.e. there exist a sequence of coefficients (τr)r∈N and a sequence

of sets (Ω(1)n )n∈N with P (Xn ∈ Ω

(1)n ) −→

n→∞1 and the following property: For

any sequence (ωn)n∈N with ωn ∈ Ω(1)n ,

τ ∗r (ωn) := supτ ∗((X∗′s1, . . . , X∗′

su)′, (X∗′

t1, X∗′

t2, X∗′

t3)′, ωn) |

u ∈ N, s1 ≤ · · · ≤ su < su + r ≤ t1 ≤ t2 ≤ t3 ∈ N

can be bounded by some τr such that the sequence (τr)r∈N satisfies∑∞r=1 r(τr)

δ <∞.

Remark 2. Neumann and Paparoditis [24] proved that in case of stationary Markovchains of finite order, the key for convergence of the finite-dimensional distributionsis convergence of the conditional distributions, cf. their Lemma 4.2. In particular,they showed that AR(p)-bootstrap and ARCH(p)-bootstrap yield samples that satisfy(A1∗)(i).

Lemma 3.1. Suppose that (A1) and (A1∗) hold true. Further let h : Rd × Rd → Rbe a bounded, symmetric, Lipschitz continuous function such thatEh(X1, y) = E(h(X∗

1 , y)|X1, . . . , Xn) = 0, ∀ y ∈ Rd. Then,

1

n

n∑j,k=1

h(X∗j , X

∗k)

d−→ Z and1

n

n∑j=1

∑k 6=j

h(X∗j , X

∗k)

d−→ Z − Eh(X1, X1)

hold in probability. Here, Z is defined as in Theorem 2.1.

Page 9: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

8

In order to deduce bootstrap consistency, additionally, convergence in a certain

metric ρ is required, i.e. ρ(P ( 1

n

∑nj,k=1 h(X

∗j , X

∗k) ≤ x|X1, ..., Xn), P (nVn ≤ x)

)P−→

0. (Here,P−→ denotes convergence in probability.) Convergence in the uniform

metric follows from Lemma 3.1 if the limit distribution has a continuous cumulativedistribution function. The next assertion gives a necessary and sufficient conditionfor this.

Lemma 3.2. The limit variable Z, derived in Theorem 2.1 / Theorem 2.2 under(A1), (A2), and (A3)/(A4), has a continuous cumulative distribution function ifvar(Z) > 0.

Kernels of statistics emerging from goodness-of-fit tests for composite hypothesesoften depend on an unknown parameter. We intend to employ bootstrap consistencyfor this setting, i.e. when parameters have to be estimated. Moreover, we enlarge theclass of feasible kernels. For this purpose we additionally assume

(A2∗) (i) θnP−→ θ ∈ Rp.

(ii) E(h(X∗

1 , y, θn)|X1, . . . , Xn

)= 0, ∀ y ∈ Rd.

(iii) For some δ satisfying (A1∗)(ii), ν > (2−δ)/(1−δ) and a constant C1 <∞there exists a sequence of sets (Ω

(2)n )n∈N such that P (Xn ∈ Ω

(2)n ) −→

n→∞1

and ∀ (ωn)n∈N with ωn ∈ Ω(2)n the following moment constraint holds true

sup1≤k≤n

E(|h(X∗

1 , X∗k , θn)|ν + |h(X∗

1 , X∗1 , θn)|ν |Xn = ωn

)≤ C1,

where (conditionally on X1, . . . , Xn) X∗1 denotes an independent copy of

X∗1 .

and

(A3∗) (i) The kernel satisfies

|h(x, y, θn)− h(x, y, θn)| ≤ f(x, x, y, y, θn) [‖x− x‖l1 + ‖y − y‖l1 ] ,∀x, x, y, y ∈ Rd,

where f : R4d × Rp → R is continuous on R4d × U(θ) for some compactneighborhood U(θ) of θ. Moreover, for η := 1/(1 − δ), some a > 0

and some C2 < ∞ there exists a sequence of sets (Ω(3)n )n∈N such that

P (Xn ∈ Ω(3)n ) −→

n→∞1 and ∀ (ωn)n∈N with ωn ∈ Ω

(3)n the following moment

constraint holds true

sup1≤k1,...,k5≤n

E[ maxc1,c2∈[−a,a]d

f(Y ∗k1, Y ∗k2

+ c1, Y∗k3, Y ∗k4

+ c2, θn)η(1 + ‖Y ∗k5‖l1)|Xn = ωn] ≤ C2,

for any sequence (Y ∗k )k∈Z with Y ∗k = X∗k or Y ∗k = X∗

k , where X∗k denotes

an independent copy of X∗k , conditionally on X1, . . . , Xn.

(ii)∑∞

r=1 r(τr)δ2<∞ .

Under these assumptions we derive a result concerning the asymptotic distribu-

tions of nV ∗n = n−1

∑nj,k=1 h(X

∗j , X

∗k , θn) and nU∗n = n−1

∑nj=1

∑k 6=j h(X

∗j , X

∗k , θn).

To this end we denote the U - and V -statistics with kernel h(·, ·, θ) by Un and Vn,respectively.

Page 10: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

9

Theorem 3.1. Suppose that (A1), (A2), and (A4), as well as (A1∗), (A2∗), and(A3∗) are fulfilled. Then, as n→∞,

nV ∗n

d−→ Z, and nU∗nd−→ Z − Eh(X1, X1, θ), in probability,

where Z is defined as in Theorem 2.1. Moreover, if var(Z) > 0,

sup−∞<x<∞

|P (nV ∗n ≤ x|X1, ..., Xn)− P (nVn ≤ x)| P−→ 0

and

sup−∞<x<∞

|P (nU∗n ≤ x|X1, ..., Xn)− P (nUn ≤ x)| P−→ 0.

This implies that bootstrap-based tests of U - or V -type have asymptotically aprescribed size α, i.e. P (nUn > t∗u,α) −→

n→∞α and P (nVn > t∗v,α) −→

n→∞α, where t∗u,α and

t∗v,α denote the (1− α)-quantiles of nU∗n and nV ∗n , respectively, given X1, . . . , Xn.

4. L2-tests for weakly dependent observations

This section is dedicated to two applications in the field of hypothesis testing.For sake of simplicity we restrict ourselves to real-valued random variables and con-sider only a simple null hypothesis. The test for symmetry as well as the model-specification test can be extended to problems with composite hypotheses, cf. Leucht[22].

4.1. Test for symmetry. Answering the question whether a distribution is sym-metric or not is interesting for several reasons. There is a pure statistical interestsince symmetry characterizes the relation of location parameters. Moreover, symme-try plays a central role in analyzing and modelling real-life phenomena. For instance,it is often presumed that an observed process can be described by an AR(p)-processwith Gaussian innovations, which in turn implies a Gaussian marginal distribution.Rejecting the hypothesis of symmetry contradicts this type of marginal distribution.Furthermore, this result of the test excludes any kind of symmetric innovations inthis context.Suppose that (Xn)n∈N is a sequence of real-valued random variables with distributionPX and satisfying (A1). For some µ ∈ R we are given the problem

H0 : PX−µ = P µ−X vs. H1 : PX−µ 6= P µ−X .

Similar to Feuerverger and Mureika [17], who studied the problem for i.i.d. randomvariables, we propose the following test statistic

Tn = n

∫R[=(cn(t)e−iµt)]2w(t)dt =

1

n

n∑j,k=1

∫R

sin(t(Xj − µ)) sin(t(Xk − µ))w(t)dt

which makes use of the fact that symmetry of a distribution is equivalent to a vanish-ing imaginary part of the associated characteristic function. Here, =(z) denotes theimaginary part of z ∈ C, cn denotes the empirical characteristic function and w issome positive, measurable weight function with

∫R(1+ |t|)w(t)dt <∞. Obviously, Tn

is a V -type statistic whose kernel satisfies (A2) and (A3). Thus, its limit distributioncan be determined by Theorem 2.1. Assuming that the observations come from astationary AR(p)- or ARCH(p)-process, the validity of (A1∗) is assured by using theAR(p)- or ARCH(p)-bootstrap methods, given by Neumann and Paparoditis [24], in

Page 11: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

10

order to generate the bootstrap counterpart of the sample. Hence, in these cases theprerequisites of Lemma 3.1 are satisfied excluding degeneracy. Inspired by Dehlingand Mikosch [8] who discussed this problem for Efron’s Bootstrap in the i.i.d. case,we propose a bootstrap statistic with the following kernel

h∗n(x, y) = h(x, y)−∫

Rh(x, y)dF ∗n(x)−

∫Rh(x, y)dF ∗n(y) +

∫R2

h(x, y)dF ∗n(x)dF ∗n(y).

Here, h denotes the kernel function of Tn and F ∗n the distribution function of X∗1 con-

ditionally on X1, . . . , Xn. Similar to the proof of Theorem 3.1 the desired convergenceproperty of T ∗n can be verified.

4.2. Model-specification test. Let (Xk)k∈Z be a stationary real-valued nonlinearautoregressive process with centered i.i.d. innovations (εk)k∈Z, i.e. Xk = g(Xk−1)+εk.Suppose that E|ε0|4+δ <∞ for some δ > 0 and thatg ∈ G := f : R → R | f Lipschitz-continuous with Lip(f) < 1. Thus, the process(Xk)k∈Z is τ -dependent with exponential rate, see Dedecker and Prieur [7], Example4.2.We will present a test for the problem

H0 : P (E(X1|X0) = g0(X0)) = 1 vs. H1 : P (E(X1|X0) = g0(X0)) < 1

with g0 ∈ G. For sake of simplicity we stick to this small classes of functions G andprocesses (Xk)k∈Z. An extension to a more comprehensive variety of model specifica-tion tests is investigated in a forthcoming paper, cf. Leucht [22].

Similar to Fan and Li [16] we propose the following test statistic

Tn =1

n√h

n∑j=1

∑k 6=j

(Xj − g0(Xj−1))(Xk − g0(Xk−1))K

(Xj−1 −Xk−1

h

)

=:1

n

n∑j=1

∑k 6=j

H(Zj, Zk),

i.e. a kernel estimator of E([X1 − g(X0)]E(X1 − g(X0)|X0)f(X0)) = 0 multiplied

with n√h. Here, Zk := (Xk, Xk−1)

′, k ∈ Z, and f denotes the density of thedistribution of X0. Fan and Li [16], who considered β-mixing processes, used asimilar test statistic with a vanishing bandwidth. In contrast, we consider the case ofa fixed bandwidth. These tests a more powerful against Pitman alternatives g1,n(x) =g0(x) + n−βw(x) + o(n−β), β > 0, w ∈ G. For a detailed discussion of this topic seeGhosh and Huang [19].

Obviously, Tn is degenerate. If we assume K to be a bounded, even, and Lip-schitz continuous function, then there exists a function f : R8 → R with |H(z1, z2)−H(z1, z2)| ≤ f(z1, z1, z2, z2)(‖z1 − z2‖l1 + ‖z1 − z2‖l1) and such that (A4) is valid.Moreover, under these conditions H satisfies (A2). Hence, the assertion of Theo-rem 2.2 holds true. In order to determine critical values of the test we propose thebootstrap procedure given by Franke and Wendel [18] (without estimating the regres-sion function). The bootstrap innovations (ε∗t )t are drawn with replacement of the setεt = εt − n−1

∑nk=1 εkn

t=1, where εt = Xt − g0(Xt−1), t = 1, . . . , n. After choosinga starting value independently of (ε∗t )t≥1 the bootstrap sample X∗

t = g(X∗t−1) + ε∗t as

well as the bootstrap counterpart of the test statistic T ∗n = 1n

∑nj=1

∑k 6=j H(Z∗j , Z

∗k)

Page 12: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

11

with Z∗k = (X∗k , X

∗k−1)

′, k = 1, . . . , n, can be computed. In contrast to the previ-ous subsection the proposed bootstrap method leads to a degenerate kernel func-tion. Obviously, the bootstrap sample is τ -dependent in the sense of (A1∗) andsatisfies E(|X∗

k | |Z1, . . . , Zn) < C for some C < ∞ with probability tending toone. By Theorem 1 of Diaconis and Freedman [12] we obtain stationarity of thebootstrap sample. In order to verify convergence of the finite dimensional dis-tributions, we apply Lemma 4.2 of Neumann and Paparoditis [24]. The appli-cation of this result requires the convergence of the conditional distribution, i.e.

supx∈K d(PX∗t |X∗

t−1=x, PXt|Xt−1=x)P−→ 0 for every compact K ⊂ R and d(P,Q) =

infX∼P,Y∼Q E(|X − Y | ∧ 1). In the present context this can be confirmed similarly tothe proof of Lemma 4.1 by Neumann and Paparoditis [24]. Summing up, all prereq-uisites of Theorem 3.1 are satisfied. Hence, the asymptotic critical value of the abovetest can be determined using the proposed model-based bootstrap procedure.

5. Proofs

5.1. Proofs of the main theorems. Throughout this section C denotes a positive,finite generic constant.

Proof of Theorem 2.1. First the limit distribution of nV(K,L)n , defined before Lemma 2.2,

is derived. Afterwards the asymptotic distribution of nVn and nUn are deduced bymeans of Lemma 2.1, Lemma 2.2, and the weak law of large numbers.

The following modified representation of h(K,L)c will be useful in the sequel

h(K,L)c (x, y) =

M(K,L)∑k,l=1

γ(c)k,l qk(x)ql(y),

where (qkql)M(K,L)k,l=1 is an ordering of

⋃k1,k2∈−L,...,Ld

(Φj0,k1Φj0,k2) ∪ (Ψ

(e)j;k1,k2

)e∈E,j∈j0,...,J(K)−1

and γ

(c)k,l = γ

(c)l,k , k, l ∈ 1, . . . ,M(K,L), are the associated coefficients. Moreover, the

introduction of qk(Xi) := qk(Xi) − Eqk(Xi), k ∈ 1, . . . ,M(K,L), i ∈ 1, . . . , n,allows for the compact notation of nV

(K,L)n,c ,

nV (K,L)n,c =

M(K,L)∑k,l=1

γ(c)k,l

(1√n

n∑i=1

qk(Xi)

)(1√n

n∑j=1

ql(Xj)

).

In order to derive its limit distribution, we consider 1√n

∑ni=1

(q1(Xi), ..., qM(K,L)(Xi)

)′first. Due to the Cramer-Wold-device it suffices to investigate

∑M(K,L)k=1 tk

1√n

∑ni=1 qk(Xi)

∀ (t1, ..., tM(K,L))′ ∈ RM(K,L). Asymptotic normality can be established by applying

the CLT of Neumann and Paparoditis [24] to Qi :=∑M(K,L)

k=1 tkqk(Xi), i = 1, . . . , n.To this end the prerequisites of this tool have to be checked. Obviously, we aregiven a strictly stationary sequence of centered and bounded random variables. Thisimplies in conjunction with the dominated convergence theorem that the Lindebergcondition is fulfilled. In order to show 1

nvar(Q1 + · · · + Qn) −→

n→∞σ2 := var(Q1) +

2∑∞

k=2 cov(Q1, Qk), the validity of (A1) can be employed which moreover assures the

Page 13: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

12

existence of σ2. Then,∣∣∣∣ 1nvar(Q1 + ...+Qn)− σ2

∣∣∣∣ =

∣∣∣∣∣ 2nn∑

r=2

(n− [r − 1])cov(Q1, Qr)− 2∞∑

k=2

cov(Q1, Qk)

∣∣∣∣∣≤ 2

∞∑r=2

min

r − 1

n, 1

|cov(Q1, Qr)|

≤ 4‖Q1‖∞Lip(Q1)∞∑

r=2

min

r − 1

n, 1

τr−1,

where the latter inequality follows from (2.4). The summability condition of the de-pendence coefficients in connection with Lebesgue’s dominated convergence theoremyields the desired result. Since Qt1Qt2 forms a Lipschitz continuous function, inequal-ity (6.4) of Neumann and Paparoditis [24] holds true with θr = Lip(Qt1Qt2)τr. It iseasy to convince oneself that their condition (6.3) is not needed to be checked if theinvolved random variables are uniformly bounded. Finally, we obtain

n−1/2(Q1 + ...+Qn)d−→ N(0, σ2)

and hence,

nV (K,L)n,c

d−→ Z(K,L)c :=

∑k1,k2∈−L,...,Ld

α(c)j0;k1,k2

Zk1Zk2

+

J(K)−1∑j=j0

∑k1,k2∈−L,...,Ld

∑e=(e′1,e′2)′∈E

β(c,e)j;k1,k2

Z(e1)j;k1

Z(e2)j;k2

.

Here, (Zk)k∈−L,...,Ld and (Z(e)j;k)j∈j=,...,J(K),e∈0,1d,k∈−L,...,Ld , respectively, are cen-

tered normally distributed random variables. Note that also n(V

(K,L1)n,c − V

(K,L2)n,c

)d−→

Z(K,L1)c − Z

(K,L2)c . By Lemma 2.1 and Lemma 2.2 we have

limc→∞

limK→∞

limL→∞

lim supn→∞

n2 E(V (K,L)n,c − Vn)2 = 0.

Hence, it remains to show that limc→∞ limK→∞ limL→∞ E(Z(K,L)c − Z)2 = 0. To this

end we first show that (ZK,Lc )L is a Cauchy sequence in L2(Ω,A, P ) which, due to com-

pleteness of L2, implies limL→∞ nV(K,L)n,c

d−→ Z(K)c := L2− limL→∞ Z

(K,L)c . According

to Theorem 5.3 of Billingsley [3] we obtain E(Z(K,L1)c −Z(K,L2)

c )2 ≤ lim infn→∞ n2E(V

(K,L1)n,c −

V(K,L2)n,c )2. The second part of the proof of Lemma 2.2 implies

lim infn→∞ n2 E(V

(K,L1)n,c − V

(K,L2)n,c

)2

−→L2→∞

0 for L1 > L2. Iterating this method in

conjunction with Lemma 2.1 yields limc→∞ limK→∞ E(Z − Z(K)c )2 = 0 and hence

limc→∞ limK→∞ limL→∞ E(Z(K,L)c − Z)2 = 0 which in turn leads to the desired limit

distribution of nVn.Based on the result concerning V -type statistics, the limit distribution of nUn can

be established. Since Un = nn−1

Vn − 1n(n−1)

∑ni=1 h(Xi, Xi), it remains to verify that

1n−1

∑ni=1 h(Xi, Xi)

P−→ Eh(X1, X1). According to Markov’s inequality we prove that

Page 14: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

13

var(

1n−1

∑ni=1 h(Xi, Xi)

)−→n→∞

0. For this purpose we investigate cov(h(Xj, Xj), h(Xk, Xk))

for j < k. Let Xkd= Xk be independent of Xj with E‖Xk − Xk‖l1 = τk−j. The con-

dition (A2) implies

cov(h(Xj, Xj), h(Xk, Xk)) = E(h(Xj, Xj)[h(Xk, Xk)− h(Xk, Xk)]

≤ Cτ δk−j

(E|h(Xj, Xj)|1/(1−δ)|h(Xk, Xk)− h(Xk, Xk)|

)1−δ

≤ Cτ δk−j.

Therefore, by (A1) we get with τ0 = [E(h(X1, X1))2]1/δ

var

(1

n− 1

n∑i=1

h(Xi, Xi)

)≤ C

(n− 1)2

n−1∑r=0

(n− r)τ δr −→

n→∞0.

Proof of Theorem 2.2. On the basis of Lemma 2.3 similar arguments as in the proof

of Theorem 2.1 yield nVnd−→ Z. In order to obtain nUn

d−→ Z − Eh(X1, X1),var( 1

n−1

∑nk=1 h(Xk, Xk)) −→

n→∞0 has to be verified. To this end the approximation of

cov(h(Xj, Xj), h(Xk, Xk)) has to be modified:

cov(h(Xj, Xj), h(Xk, Xk)) ≤ τ δ2

k−j(2 Ef(Xk, Xk, Xk, Xk)1−δ[‖Xk‖l1 + ‖Xk‖l1 ])

δ(1−δ)

× (E|h(Xj, Xj)|1/(1−δ)|h(Xk, Xk)− h(Xk, Xk)|)1−δ

=O(τ δ2

k−j).

The summability assumption concerning the dependence coefficients in (A4) impliesthe desired convergence property.

Proof of Theorem 3.1. Due to Lemma 3.2 it suffices to verify distributional conver-gence. To this end we introduce

Ωθn ⊆ Ω(1)

n ∩ Ω(2)n ∩ Ω(3)

n ∩Xn| ‖θn − θ‖l1 < δn

such that for any sequence (ωn)n∈N with ωn ∈ Ωθ

n

L((X∗′t1, X∗′

t2)′|Xn = ωn) = L((X∗′

t1+k, X∗′t2+k)

′|Xn = ωn), k ∈ N,L((X∗′

t1, X∗′

t2)′|Xn = ωn) =⇒ L((X ′

t1, X ′

t2)′).

Moreover, the null sequence (δn)n∈N can be chosen such that on Ωθn, θn ∈ U(θ) and

P (Xn ∈ Ωθn) −→

n→∞1 holds. Hence, to prove V ∗

nd−→ Z, in probability, it suffices to

verify V ∗n

d−→ Z conditionally on Xn = ωn for any sequence (ωn)n with ωn ∈ Ωθn.

Now, we take an arbitrary sequence (ωn)n with ωn ∈ Ωθn.

In order to show that it suffices to investigate statistics with bounded kernels, weconsider the degenerate version h∗(c) of

h∗(c)(x, y, θn) :=

h(x, y, θn), for x, y ∈ [−c, c]d or x, y /∈ [−c, c] with |h(x, y, θn)| ≤ ch,

−ch, for x, y /∈ [−c, c]d with h(x, y, θn) < −ch,ch, for x, y /∈ [−c, c]d with h(x, y, θn) > ch,

Page 15: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

14

with ch := maxc,maxx,y∈[−c,c]d,θ∈U(θ) |h(x, y, θ)|. The associated V -statistic is de-noted by V ∗

n,c. Now, imitating the proof of Lemma 2.1 results in

lim supn→∞

n2E[(V ∗n − V ∗

n,c)2|Xn = ωn] ≤ εc

with εc −→c→∞

0. Next we approximate the bounded kernel by the degenerate version of

h∗(K,L)c :=

∑k1,k2∈−L,...,Ld

α(c)j0;k1,k2

Φj0,k1Φj0,k2

+

J(K)−1∑j=j0

∑k1,k2∈−L,...,Ld

∑e∈E

β(c,e)j;k1,k2

Ψ(e)j;k1,k2

,

where α(c)j0;k1,k2

=∫∫

Rd×Rd h∗(c)(x, y, θn)Φj0,k1(x)Φj0,k2(y)dx dy and

β(c,e)j;k1,k2

=∫∫

Rd×Rd h∗(c)(x, y, θn)Ψ

(e)j;k1,k2

(x, y)dxdy. Denoting the associated V -statistic

by V∗(K,L)n,c leads to

limK→∞

limL→∞

lim supn→∞

n2 E[(V ∗n,c − V ∗(K,L)

n,c )2|Xn = ωn] = 0.

This conjecture can be proven by following the lines of the proof of Lemma 2.3. Here,J(K) is chosen as follows: Since the Portmanteau theorem implies lim supn→∞ P (X∗

1 /∈[−b, b]d|X ′

1, . . . , X′n)′ = ωn) ≤ P (X1 /∈ (−b, b)d), we first select some b = b(K) <

∞ such that P (X1 /∈ (−b, b)d) ≤ 1/K. Afterwards we choose J(K) such that

maxx,y∈[−b,b]d |h∗(c)(x, y, θn) − h∗(K)c (x, y, θn)| ≤ 1/K and Sφ/2

J(K) < a, where Sφ

denotes the length of the support of the scale function φ. Because of the continuityassumptions on f the index J(K) can be determined independently of n on (Ωθ

n)n,cf. proof of Lemma 5.2. Obviously,

α(c)j0;k1,k2

−→n→∞

α(c)j0;k1,k2

:=

∫∫Rd×Rd

h∗(c)(x, y, θ)Φj0,k1(x)Φj0,k2(y)dx dy,

β(c,e)j;k1,k2

−→n→∞

β(c,e)j;k1,k2

:=

∫∫Rd×Rd

h∗(c)(x, y, θ)Ψ(e)j;k1,k2

(x, y)dx dy

on (Ωθn)n. Hence,

n2 E[(V ∗(K,L)n,c − V ∗(K,L)

n,c )2|Xn = ωn] −→n→∞

0,

where the kernel of V∗(K,L)n,c is obtained by substituting α

(c)j0;k1,k2

and β(c,e)j;k1,k2

in the

kernel of V∗(K,L)n,c through α

(c)j0;k1,k2

and β(c,e)j;k1,k2

.Thus, the next step is the application of the CLT of Neumann and Paparoditis [24]

to nV∗(K,L)n,c . For this purpose we introduce Q∗i :=

∑M(K,L)k=1 tkqk(X

∗i ), t1, ..., tM(K,L) ∈

R, where qk is the centered version (w.r.t. PX∗1 |Xn=ωn) of qk (qk)k is defined as

in the proof of Theorem 2.1. Obviously, given X1, . . . , Xn, the sequence (Q∗i )i iscentered and has uniformly bounded second moments. Due to (A1∗)(i) the Lin-deberg condition is satisfied. In order to show that for arbitrary ε > 0 the in-equality | 1

nvar(Q∗1 + · · · + Q∗n|Xn = ωn) − σ2| < ε,∀n ≥ n0(ε), holds true with

σ2 = var(Q1) + 2∑∞

r=2 cov(Q1, Qr) and (Qk)k as in the proof of Theorem 2.1, the

Page 16: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

15

abbreviations var∗(·) = var(·|Xn = ωn) and cov∗(·) = cov(·|Xn = ωn) are used.Hence,∣∣∣∣ 1nvar∗[Q∗1 + · · ·+Q∗n]− σ2

∣∣∣∣≤ 2

∞∑r=2

min

r − 1

n, 1

|cov∗(Q∗1, Q∗r)|+

∣∣∣∣∣var∗(Q∗1) + 2∞∑

r=2

cov∗(Q∗1, Q∗r)− σ2

∣∣∣∣∣≤ 2

∞∑r=2

min

r − 1

n, 1

|cov∗(Q∗1, Q∗r)|+ 2

∣∣∣∣∣R−1∑r=1

[cov∗(Q∗1, Q∗r)− cov(Q1, Qr)]

∣∣∣∣∣+ 2

∣∣∣∣∣∑r≥R

cov∗(Q∗1, Q∗r)

∣∣∣∣∣+ 2

∣∣∣∣∣∑r≥R

cov(Q1, Qr)

∣∣∣∣∣ .By (A1) and (A1∗), R can be chosen such that

∣∣∑r≥R cov(Q1, Qr)

∣∣+∣∣∑r≥R cov∗(Q∗1, Q

∗r)∣∣ ≤

ε4. Moreover, (A1∗) implies that the first summand can be bounded from above by

ε4

as well, if n ≥ n0(ε) for some n0(ε) ∈ N. According to the convergence of thetwo-dimensional distributions and the uniform boundedness of (Q∗k)k∈Z it is possi-ble to pick n0(ε) such that additionally the two remaining summands are boundedby ε

8. For the validity of the CLT of Neumann and Paparoditis [24] in probability,

it remains to verify their inequality (6.4). By Lipschitz continuity of Q∗t1Q∗t2

thisholds with θr = Lip(Q∗t1Q

∗t2)τr = Lip(Qt1Qt2)τr. The application of the continuous

mapping theorem results in V∗(K,L)n,c

d−→ Z(K,L)c , in probability, which in turn implies

V ∗n

d−→ Z.In order to obtain the analogous result of convergence for nU∗n, additionally,

P

(∣∣∣∣∣ 1

n− 1

n∑i=1

h(X∗i , X

∗i , θn)− Eh(X1, X1, θ)

∣∣∣∣∣ > ε

∣∣∣∣∣Xn = ωn

)−→n→∞

0

has to be proven for arbitrary ε > 0. Due to the continuity of h and according to(A1∗)(i)and (A2∗)(i) it suffices to verify

P

(∣∣∣∣∣ 1

n− 1

n∑k=1

[h(X∗

k , X∗k , θn)− E(h(X∗

1 , X∗1 , θn)|Xn = ωn)

]∣∣∣∣∣ > ε

2

∣∣∣∣∣Xn = ωn

)−→n→∞

0.

The l.h.s. tends to zero if

E

[ 1

n− 1

∑k=1

h(X∗k , X

∗k , θn)− E(h(X∗

1 , X∗1 , θn)|Xn = ωn)

]2 ∣∣∣Xn = ωn

vanishes asymptotically. This can be proven using the same arguments as in the proofof Theorem 2.2 for verifying the convergence of U -statistics. Bootstrap consistencyfollows from Lemma 3.2.

5.2. Proofs of auxiliary results. In order to prove Lemma 2.1, Lemma 2.2, andLemma 2.3 an approximation of terms of the structure

Zn :=1

n2

n∑i,j,k,l=1

EH(Xi, Xj)H(Xk, Xl)

Page 17: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

16

will be required. Here, H denotes a symmetric, degenerate kernel function. Assumingthat (Xn)n∈N satisfies (A1) we obtain

Zn ≤8

n2

n∑i≤j;k≤l;i≤k

|EH(Xi, Xj)H(Xk, Xl)| ≤ 8 sup1≤k≤n

E|H(X1, Xk)|2 +8

n2

n−1∑r=1

4∑t=1

Z(t)n,r

with

Z(1)n,r =

n∑i≤j,k≤l

r:=minj,k−i≥l−maxj,k

|EH(Xi, Xj)H(Xk, Xl)− EH(Xi, Xj)H(Xk, Xl)|,

Z(2)n,r =

n∑i≤j,k≤l

r:=l−maxj,k>minj,k−i

|EH(Xi, Xj)H(Xk, Xl)− EH(Xi, Xj)H(Xk, Xl)|,

Z(3)n,r =

n∑i≤k≤l<j

r:=i−k≥j−l

|EH(Xi, Xj)H(Xk, Xl)− EH(Xi, Xj)H(Xk, Xl)|,

Z(4)n,r =

n∑i≤k≤l<j

r:=j−l>i−k

|EH(Xi, Xj)H(Xk, Xl)− EH(Xi, Xj)H(Xk, Xl)|.

Here, in each summand of Z(1)n,r and Z

(3)n,r the vector (X ′

j, X′k, X

′l)′ is chosen such that

it is independent of Xi, (X ′j, X

′k, X

′l)′ d= (X ′

j, X′k, X

′l)′ and (2.3) holds. Within Z

(2)n,r

(respectively Z(4)n,r) the random variable X

(r)l (respectively X

(r)j ) is chosen to be inde-

pendent of (X ′i, X

′j, X

′k)′ (respectively (X ′

i, X′k, X

′l)′) such that X

(r)l

d= Xl (respectively

X(r)j

d= Xj) and (2.3) holds. (This may possibly require an enlarging of the underlying

probability space.) Thus, by degeneracy the subtrahends of these expressions vanish.

Moreover, note that the number of summands of Z(t)n,r, t = 1, . . . , 4, is bounded by

2(r + 1)n2.

Proof of Lemma 2.1. For c > 0 we define ch := maxc,maxx,y∈[−c,c]d |h(x, y)|,

h(c)(x, y) :=

h(x, y) for (x′, y′)′ ∈ [−c, c]2d or (x′, y′)′ /∈ [−c, c]2d with |h(x, y)| ≤ ch,

−ch for (x′, y′)′ /∈ [−c, c]2d with h(x, y) < −ch,ch for (x′, y′)′ /∈ [−c, c]2d with h(x, y) > ch

and its degenerate version

hc(x, y) := h(c)(x, y)−∫

Rd

h(c)(x, y)dF (x)−∫

Rd

h(c)(x, y)dF (y)

+

∫∫Rd×Rd

h(c)(x, y) dF (x) dF (y).

The approximation error n2 E(Vn − Vn,c)2 can be reformulated in terms of Zn with

kernel H = H(c) := h−h(c). Hence, it remains to verify that supk∈N E|H(c)(H1, Xk)|2

and lim supn→∞1n2

∑n−1r=1

∑4t=1 Z

(t)n,r tend to zero as c → ∞. First, we investigate

Page 18: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

17

lim supn→∞1n2

∑n−1r=1 Z

(1)n,r, the remaining summands can be treated similarly. The

summands of Z(1)n,r can be bounded as follows

∣∣∣EH(c)(Xi, Xj)H(c)(Xk, Xl)− EH(c)(Xi, Xj)H

(c)(Xk, Xl)∣∣∣

≤ E∣∣∣H(c)(Xk, Xl)

[H(c)(Xi, Xj)−H(c)(Xi, Xj)

]1(X′

k,X′l)′∈[−c,c]2d

∣∣∣+ E

∣∣∣H(c)(Xk, Xl)[H(c)(Xi, Xj)−H(c)(Xi, Xj)

]1(X′

k,X′l)′ /∈[−c,c]2d

∣∣∣+ E

∣∣∣H(c)(Xi, Xj)[H(c)(Xk, Xl)−H(c)(Xk, Xl)

]1(X′

i,X′j)′∈[−c,c]2d

∣∣∣+ E

∣∣∣H(c)(Xi, Xj)[H(c)(Xk, Xl)−H(c)(Xk, Xl)

]1(X′

i,X′j)′ /∈[−c,c]2d

∣∣∣= E1 + E2 + E3 + E4.

(5.1)

The Lipschitz constant of H(c) is obviously independent of c. Therefore, theiterative application of Holder’s inequality to E2 yields

E2 ≤(E∣∣∣H(c)(Xi, Xj)−H(c)(Xi, Xj)

∣∣∣)δ

×(E|H(c)(Xk, Xl)|1/(1−δ)|H(c)(Xi, Xj)−H(c)(Xi, Xj)|1(X′

k,X′l)′ /∈[−c,c]2d

)1−δ

≤Cτ δr

(E|H(c)(Xk, Xl)|(2−δ)/(1−δ)

1(X′k,X′

l)′ /∈[−c,c]2d)1/(2−δ)

[(E|H(c)(Xi, Xj)|(2−δ)/(1−δ))(1−δ)/(2−δ) + (E|H(c)(Xi, Xj)|(2−δ)/(1−δ))(1−δ)/(2−δ)

]1−δ

≤Cτ δr (E|H(c)(Xk, Xl)|(2−δ)/(1−δ)

1(X′k,X′

l)′ /∈[−c,c]2d)(1−δ)/(2−δ).

(5.2)As supk∈N E|h(X1, Xk)|ν < ∞ for ν > (2 − δ)/(1 − δ), we obtain E2 ≤ τ δ

r ε1(c) withε1(c)−→

c→∞0 after employing Holder’s inequality once again. Analogous calculations

yield E4 ≤ τ δr ε2(c) with ε2(c)−→

c→∞0. Likewise, the approximation methods for E1 and

E3 are equal. Therefore, only E1 is investigated:

E1 ≤ E∣∣∣∣∫

Rd

h(c)(Xk, y)dF (y)[H(c)(Xi, Xj)−H(c)(Xi, Xj)

]1Xk∈[−c,c]d

∣∣∣∣+ E

∣∣∣∣∫Rd

h(c)(y,Xl)dF (y)[H(c)(Xi, Xj)−H(c)(Xi, Xj)

]1Xl∈[−c,c]d

∣∣∣∣+ E

∣∣∣∣∫∫Rd×Rd

h(c)(x, y)dF (x)dF (y)[H(c)(Xi, Xj)−H(c)(Xi, Xj)

]∣∣∣∣= E1,1 + E1,2 + E1,3.

Page 19: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

18

Analogously to (5.2) we obtain

E1,1 ≤ Cτ δr

(E∣∣∣∣∫

Rd

h(Xk, y)− h(c)(Xk, y)dF (y)

∣∣∣∣(2−δ)/(1−δ)

1Xk∈[−c,c]d

)1/(2−δ)

×[2(sup

k∈NE|H(c)(X0, Xk)|(2−δ)/(1−δ) + E|H(c)(X,Xj)|(2−δ)/(1−δ))(1−δ)/(2−δ)

1−δ

≤ Cτ δr

(∫∫Rd×Rd

|h(x, y)− h(c)(x, y)|(2−δ)/(1−δ)dF (y)1x∈[−c,c]ddF (x)

)(1−δ)/(2−δ)

≤ τ δr ε3(c)

with ε3(c)−→c→∞

0. The investigation of E1,2 coincides with the previous one. The

expression E1,3 can be approximated as follows

E1,3 ≤ Cτr

∫∫Rd×Rd

|h(x, y)− h(c)(x, y)|dF (x)dF (y)

≤ Cτr

∫∫Rd×Rd

|h(x, y)|1(x′,y′)′ /∈[−c,c]2ddF (x)dF (y)

≤ τrε4(c)−→c→∞

0.

To sum up, we have E1 + E2 + E3 + E4 ≤ ε5(c)τδr , where ε5(c)−→

c→∞0 uniformly in n.

This leads to

limc→∞

lim supn→∞

1

n2

n−1∑r=1

Z(1)n,r ≤

1

n2

n−1∑r=1

[2(r + 1)n2τ δ

r ε5(c)]

= 0.

It remains to examine

supk∈N

E[H(c)(X1, Xk)]2 ≤ C( sup

k∈NE[h(X1, Xk)− h(c)(X1, Xk)]

2

+ E[h(X1, X1)− h(c)(X1, X1)]2).

Here, X1 denotes an independent copy of X1. Similar arguments as before yieldlimc→∞ supk∈N E[H(c)(X1, Xk)]

2 = 0.

The characteristics, stated in the following two lemmas, will be essential for awavelet approximation of the kernel function h.

Lemma 5.1. Given a Lipschitz continuous function g : Rd → R, define a functiongj by gj(x) :=

∑k∈Zd αj,kΦj,k(x), j ∈ Z, where αj,k =

∫Rd g(x)Φj,k(x)dx. Then gj is

Lipschitz continuous with a constant that is independent of j.

Proof of Lemma 5.1. In order to establish Lipschitz continuity, gj is decomposed intotwo parts

gj(x) =∑k∈Zd

[∫Rd

Φj,k(u)g(x)du

]Φj,k(x) +

∑k∈Zd

[∫Rd

Φj,k(u)[g(u)− g(x)]du

]Φj,k(x)

= H1(x) +H2(x).

According to the above choice of the scale function (with characteristics (1) - (3) ofSubsection 2.2) the prerequisites of Theorem 8.2 of Hardle et al. [20] are fulfilled for

Page 20: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

19

N = 1. This implies that∫∞−∞∑

l∈Z φ(y − l)φ(z − l)dz = 1 ∀y ∈ R. Based on thisresult we obtain

H1(x) = g(x)2jd

d∏i=1

∫ ∞

−∞

∑l∈Z

φ(2jui − l)φ(2jxi − l)dui = g(x)

by applying an appropriate variable substitution. This in turn immediately impliesthe desired continuity property. Note that for every fixed x the sum has finitely manynon-vanishing summands because of the finite support of φ and since the number ofsummands is independent of j. Therefore, the order of summation and integration isinterchangeable. In order to investigate H2, we define a sequence of functions (κk)k∈Zby

κk(x) =

∫Rd

Φj,k(u)[g(u)− g(x)]du.

These functions are Lipschitz continuous with a constant decreasing in j:

|κk(x)− κk(x)| ≤ Lip(g)O(2−jd/2)‖x− x‖l1 . (5.3)

Moreover, boundedness and Lipschitz continuity of φ yield

‖Φj,k‖∞ = O(2jd/2) and |Φj,k(x)− Φj,k(x)| = O(2j(d/2+1))‖x− x‖l1 . (5.4)

Thus,

|H2(x)−H2(x)| ≤∑k∈Zd

|Φj,k(x)| |κk(x)− κk(x)|+∑k∈Zd

|κk(x)| |Φj,k(x)− Φj,k(x)|

≤C‖x− x‖l1 +∑k∈Zd

|κk(x)| |Φj,k(x)− Φj,k(x)| .

It has to be distinguished whether or not x ∈ supp Φj,k in order to approximatethe second summand. (Here, supp denotes the support of a function.) In the firstcase it is helpful to investigate |κk(x)| = |

∫Rd Φj,k(u)[g(u)− g(x)]du|. The integrand

is non-trivial only if u ∈ supp Φj,k. In these situations, |g(u) − g(x)| = O(2−j) byLipschitz continuity. Consequently, we achieve

|κk(x)| ≤ O(2−j)

∫Rd

|Φj,k(u)|du = O(2−j(d/2+1))

which leads to ∑k∈Zd

|κk(x)| |Φj,k(x)− Φj,k(x)| = O(‖x− x‖l1)

since the number of non-vanishing summands is finite, independently of the values ofx and x. Therefore, Lipschitz continuity of H2 is obtained as long as x ∈ supp Φj,k.In the opposite case, we only have to consider the situation of x ∈ supp Φj,k sincethe setting x, x /∈ supp Φj,k is trivial. With the aid of (5.3) and (5.4), the first termof the r.h.s. of

|κk(x) [Φj,k(x)− Φj,k(x)]| ≤ |κk(x)− κk(x)| |Φj,k(x)|+ |κk(x)| |Φj,k(x)− Φj,k(x)|(5.5)

can be approximated from above by C‖x − x‖l1 . The investigation of the secondsummand is identical to the analysis of the case x ∈ supp Φj,k.Finally, we obtain |H2(x)−H2(x)| ≤ C‖x− x‖l1 where C <∞ is a constant that isindependent of j. This yields the assertion of the lemma.

Page 21: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

20

Lemma 5.2. Let g : Rd → R be a continuous function that is Lipschitz continuous onsome interval (−c, c)d, c > Sφ, where Sφ denotes the length of the support of the scalefunction φ. For arbitrary 0 < b < c−Sφ and K ∈ N there exists J = J(K, b) ∈ N suchthat for g and its approximation g(K,b) given by g(K,b)(x) =

∑k∈Zd αJ(K,b),kΦJ(K,b),k(x)

it holds

maxx∈[−b,b]d

|g(x) − g(K,b)(x)| ≤ 1/K.

Proof of Lemma 5.2. Given 0 < b < ∞, we define g(b)(x) := g(x)wb(x), where wb

is a Lipschitz continuous weight function with compact support. Moreover, wb isassumed to be bounded from above by 1 and wb(x) := 1 for x ∈ [−b − Sφ, b + Sφ]

d.

Additionally, we set α(b)J(K,b),k :=

∫Rd g

(b)(u)ΦJ(K,b),k(u)du. Hence,

maxx∈[−b,b]d

∣∣g(x) − g(K,b)(x)∣∣ ≤ max

x∈[−b,b]d

∣∣∣∣∣g(b)(x)−∑k∈Zd

α(b)J(K,b),kΦJ(K,b),k(x)

∣∣∣∣∣+ max

x∈[−b,b]d

∣∣∣∣∣∑k∈Zd

α(b)J(K,b),kΦJ(K,b),k(x) − g(K,b)(x)

∣∣∣∣∣= max

x∈[−b,b]dA(J)(x) + max

x∈[−b,b]dB(J)(x).

Since g(b) ∈ L2(Rd),

A(J) =

∣∣∣∣∣∣∑

j≥J(K,b)

∑k∈Zd

∑e∈E

β(b,e)j,k Ψ

(e)j,k

∣∣∣∣∣∣ with β(b,e)j,k =

∫Rd

g(b)(u)Ψ(e)j,k(u)du

holds true in the L2-sense. Lemma 5.1 implies equicontinuity of (A(J))J∈N on [−b, b]d.Therefore, the above equality is valid pointwise on [−b, b]d if additionally∑

j≥J(K,b)

∑k∈Zd

∑e∈E β

(b,e)j,k Ψ

(e)j,k is continuous on [−b, b]d. This can be proven by veri-

fying uniform convergence of∑N

j=J(K,b)

∑k∈Zd

∑e∈E β

(b,e)j,k Ψ

(e)j,k to

∑j≥J(K,b)

∑k∈Zd

∑e∈E β

(b,e)j,k Ψ

(e)j,k.

For all x ∈ [−b, b]d, the number of k ∈ Zd with∑

e∈E β(b,e)j,k Ψ

(e)j,k(x) 6= 0 is bounded

uniformly in j. Furthermore, ‖Ψ(e)j,k‖∞ = O(2jd/2),∀ j ∈ Z, k ∈ Zd, e ∈ E. As g(b) is

Lipschitz continuous and∫∞−∞ ψ(x)dx = 0, we obtain

|β(b,e)j,k | ≤

∫Rd

|g(b)(u)− g(b)(x)|Ψ(e)j,k(u)du = O(2−j(d/2+1)), ∀j ≥ j0(b)

for some j0(b) ∈ Z if Ψ(e)j,k(x) 6= 0 for some x ∈ [−b, b]d. Finally, we end up with

maxx∈[−b,b]d

∣∣∣∣∣∞∑

j=N+1

∑k∈Zd

∑e∈E

β(b,e)j,k Ψ

(e)j,k(x)

∣∣∣∣∣ ≤ C∞∑

j=N+1

2−j −→N→∞

0 (5.6)

and consequently,

maxx∈[−b,b]d

A(J)(x) = maxx∈[−b,b]d

∣∣∣∣∣∣∑

j≥J(K,b)

∑k∈Zd

∑e∈E

β(b,e)j,k Ψ

(e)j,k

∣∣∣∣∣∣ .

Page 22: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

21

Moreover, the inequality maxx∈[−b,b]d A(J)(x) ≤ 1

K, ∀ J(K, b) ≥ J0(K, b), for some

J0(K, b) ∈ N results from (5.6). The introduction of the finite set of indices

Z(J) :=k ∈ Zd |ΦJ(K,b),k(x) 6= 0 for some x ∈ [−b, b]d

leads to

maxx∈[−b,b]d

B(J)(x) = maxx∈[−b,b]d

∣∣∣ ∑k∈Z(J)

(αJ(K,b),k − α

(b)J(K,b),k

)ΦJ(K,b),k(x)

∣∣∣.This term is equal to 0 since the definition of g(b) implies αJ(K,b),k−α(b)

J(K,b),k = 0, ∀ k ∈Z.

Proof of Lemma 2.2. The assertion of the Lemma is verified in two steps. First,the bounded kernel hc, constructed in the proof of Lemma 2.1, is approximated by

h(K)c which is defined by h

(K)c (x, y) =

∑k1,k2∈Rd α

(c)J(K);k1,k2

ΦJ(K),k1(x)ΦJ(K),k2(y) with

α(c)J(K);k1,k2

=∫∫

Rd×Rd hc(x, y)ΦJ(K),k1(x)ΦJ(K),k2(y)dx dy. Here, the indices (J(K))K∈N

with J(K) −→K→∞

∞ are chosen such that the assertion of Lemma 5.2 holds true for

b ∈ R with P (X1 /∈ [−b, b]d) ≤ 1K

. Since the function h(K)c is not degenerate in

general, we introduce its degenerate counterpart

h(K)c (x, y) = h(K)

c (x, y)−∫

Rd

h(K)c (x, y)dF (x)−

∫Rd

h(K)c (x, y)dF (y)

+

∫∫Rd×Rd

h(K)c (x, y)dF (x)dF (y).

and denote the corresponding V -statistic by V(K)n,c .

Now, the structure of the proof is as follows. First, we prove

lim supn→∞

n2 E(Vn,c − V (K)

n,c

)2 −→K→∞

0. (5.7)

In a second step, it remains to show that for every fixed K

lim supn→∞

n2 E(V (K)

n,c − V (K,L)n,c

)2 −→L→∞

0. (5.8)

In order to verify (5.7), we rewrite n2 E(Vn,c − V

(K)n,c

)2

in terms of Zn with kernel func-

tionH := H(K) = hc−h(K)c . Hence it remains to verify that lim supn→∞

1n2

∑n−1r=1

∑4t=1 Z

(t)n,r

and supk∈N E|H(K)(H1, Xk)|2 tend to zero as K → ∞. Exemplarily, we investigate

lim supn→∞1n2

∑n−1r=1 Z

(1)n,r. The summands of Z

(1)n,r can be bounded as follows∣∣∣EH(K)(Xi, Xj)H

(K)(Xk, Xl)−H(K)(Xi, Xj)H(K)(Xk, Xl)

∣∣∣≤ E

∣∣∣H(K)(Xk, Xl)[H(K)(Xi, Xj)−H(K)(Xi, Xj)

]∣∣∣+ E

∣∣∣H(K)(Xi, Xj)[H(K)(Xk, Xl)−H(K)(Xk, Xl)

]∣∣∣≤ 2‖H(K)‖∞Lip(H(K))τr.

Page 23: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

22

According to Lemma 5.1 Lip(H(K)) does not depend on K. The degeneracy of hc

implies ‖H(K)‖∞ ≤ 4‖hc − h(K)c ‖∞. As already indicated in the proof of Lemma 5.1,∫∞

−∞∑

l∈Z φ(y1 − l)φ(y2 − l)dy2 = 1 holds true. This leads to

‖hc − h(K)c ‖∞ =

∥∥∥∥∥∥∑

k1,k2∈Zd

ΦJ(K),k1(x)ΦJ(K),k2(y)

×∫∫

Rd×Rd

[hc(x, y)− hc(u, v)]ΦJ(K),k1(u)ΦJ(K),k2(v)dudv

∥∥∥∥∞.

If (x′, y′)′ does not lie in the support of ΦJ(K),k1ΦJ(K),k2 , the corresponding summandvanishes. Otherwise, Lipschitz continuity of h is employed for further approximations.Similarly to the proof of Lemma 5.1 we obtain∣∣∣∣∫

Rd

∫Rd

[hc(x, y)− hc(u, v)]ΦJ(K),k1(u)ΦJ(K),k2(v)du dv

∣∣∣∣ = O(2−J(K)(d+1)

).

This implies ‖hc− h(K)c ‖∞ = O(2−J(K)) since for every fixed tuple (x′, y′)′ only a finite

number of summands is non-vanishing. This number does neither depend on J(K)nor on the value of (x, y). It follows that

∣∣EH(K)(Xi, Xj)H(K)(Xk, Xl)

∣∣ ≤ C2−J(K)τr.

Due to ‖H(K)‖∞ ≤ 4‖hc − h(K)c ‖∞ = O(2−J(K)), supk∈N E|H(K)(H1, Xk)|2 → 0 as

K →∞. Set τ0 = 1. Summing up the achievements results in

lim supn→∞

n2E(Vn,c − V (K)n,c )2 ≤ C

[lim sup

n→∞

1

n2

n−1∑r=0

2−J(K)(r + 1)n2τr + 2−2J(K)

]−→

K→∞0,

which finishes the proof of (5.7).The main goal of the previous step was the multiplicative separation of the random

variables which are cumulated in hc. The aim of the second step is the approximation

of h(K)c , whose representation is given by an infinite sum, by a function consisting

of only finitely many summands. Similar to the foregoing part of the proof the

approximation error n2 E(V

(K)n,c − V

(K,L)n,c

)2

is reformulated in terms of Zn with kernel

H := H(L) = h(K)c − h

(K,L)c . As before, we exemplarily take n−2

∑n−1r=1 Z

(1)n,r and

supk∈N E|H(K)(X1, Xk)|2 into further consideration. Concerning the summands of

Z(1)n,r we obtain∣∣∣EH(L)(Xi, Xj)H

(L)(Xk, Xl)− EH(L)(Xi, Xj)H(L)(Xk, Xl)

∣∣∣≤ E

∣∣∣H(L)(Xk, Xl)[H(L)(Xi, Xj)−H(L)(Xi, Xj)

]1(X′

k,X′l)′∈[−B,B]2d

∣∣∣+ E

∣∣∣H(L)(Xk, Xl)[H(L)(Xi, Xj)−H(L)(Xi, Xj)

]1(X′

k,X′l)′ /∈[−B,B]2d

∣∣∣+ E

∣∣∣H(L)(Xi, Xj)[H(L)(Xk, Xl)−H(L)(Xk, Xl)

]1(X′

i,X′j)′∈[−B,B]2d

∣∣∣+ E

∣∣∣H(L)(Xi, Xj)[H(L)(Xk, Xl)−H(L)(Xk, Xl)

]1(X′

i,X′j)′ /∈[−B,B]2d

∣∣∣=E1 + E2 + E3 + E4

Page 24: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

23

for arbitrary B > 0. Obviously, it suffices to take the first two summands into furtherconsiderations. The both remaining terms can be treated similarly. Since φ and ψhave compact support, the number of overlapping functions within (Φj0,k)k∈−L,...,Ld

and (Ψj,k)k∈−L,...,Ld,j0≤j<J(K) can be bounded by a constant that is independent ofL. By Lipschitz continuity of φ and ψ this leads to uniform Lipschitz continuity of

(h(K,L)c )L∈N. Moreover, note that (H(L))L is uniformly bounded. Due to the reformu-

lation

h(K)c (x, y) =

∑k1,k2∈Zd

α(c)j0;k1,k2

Φj0,k1(x)Φj0,k1(y) +

J(K)−1∑j=j0

∑k1,k2∈Zd

∑e∈E

β(c,e)j;k1,k2

Ψ(e)j;k1,k2

(x, y)

one can choose (B = B(K,L))L∈N such that supx,y∈[−B,B]d |h(K)c (x, y)− h(K,L)

c (x, y)| =0 and B −→

L→∞∞. This setting allows for the following approximations

E1 ≤ Cτ δr

[E|H(L)(Xk, Xl)|1/(1−δ)

1(X′k,X′

l)′∈[−B,B]d

]1−δ

≤ Cτ δr

[P (X1 /∈ [−B,B]d)

]1−δ,

E2 ≤ Cτ δr

[P (X1 /∈ [−B,B]2d)

]1−δ.

Analogously, it can be shown that supk∈N0E[H(L)(X0, Xk)]

2 ≤ CP (X1 /∈ [−B,B]d).Finally, we obtain

lim supn→∞

n2E(V (K)n,c −V (K,L)

n,c )2 ≤ C(P (X1 /∈ [−B,B]d))1−δ

[lim sup

n→∞

1

n2

n−1∑r=1

(r + 1)n2τ δr

]−→L→∞

0.

Hence, (5.7) and (5.8) hold.

Proof of Lemma 2.3. In order to prove this result we follow the lines of the proofs ofLemma 2.1, Lemma 2.2 and Lemma 5.1 and carry out some modifications.In a first step we reduce the problem to statistics with bounded kernels. To this endwe use the modified approximation

|H(c)(x, y)−H(c)(x, y)| ≤ [2 f(x, x, y, y) + g(x, x) + g(y, y)] [‖x− x‖l1 + ‖y − y‖l1 ]

=: f1(x, x, y, y) [‖x− x‖l1 + ‖y − y‖l1 ],

where g is given by g(x, x) :=∫

Rd f(x, x, z, z)dF (z). Under (A4)(i) Holder’s inequalityyield

E|H(c)(Yk1 , Yk2)−H(c)(Yk3 , Yk4)|

(E[f1(Yk1 , Yk2 , Yk3 , Yk4)]

1/(1−δ)

4∑i=1

‖Yki‖l1

)1−δ

(E‖Yk1 − Yk3‖l1 + E‖Yk2 − Yk4‖l1)δ

for Yki(ki ∈ N, i = 1, . . . , 4), as defined in (A4). Plugging in this inequality into the

calculations of the proof of Lemma 2.1 yields lim supn→∞ n2 E(Vn − V

(c)n )2 −→

c→∞0.

The next step will be the wavelet approximation of the bounded kernel hc. Defining

h(K)c and V

(K)n,c as in the proof of Lemma 2.2 analogous to the proof of Lemma 5.1

Page 25: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

24

there exists a C > 0 such that

|h(K)c (x, y)− h(K)

c (x, y)|≤ f1(x, x, y, y)[‖x− x‖l1 + ‖y − y‖l1 ] + |H2(x, y)−H2(x, y)|≤ Cf1(x, x, y, y)[‖x− x‖l1 + ‖y − y‖l1 ]

+∑

k1,k2∈Zd

(|κk1,k2(x, y)||ΦJ(K),k1(x)ΦJ(K),k2(y)− ΦJ(K),k1(x)ΦJ(K),k2(y)|

),

(5.9)

where κk1,k2 is given by κk1,k2(x, y) :=∫

Rd

∫Rd ΦJ(K);k1,k2(u, v)[hc(u, v)−hc(x, y)]du dv

and H2 is defined as in the proof of Lemma 5.1. In order to approximate the lastsummand of (5.9), we distinguish again between the cases whether or not (x′, y′)′ ∈supp ΦJ(K),k1ΦJ(K),k2 . In the first case, an upper bound of orderO(maxc1,c2∈[−Sφ/2J(K),Sφ/2J(K)]d f1(x + c1, x, y + c2, y))(‖x − x‖l1 + ‖y − y‖l1) can beachieved. Here, Sφ denotes the length of the support of φ. In the second casea decomposition, similar to (5.5), can be employed which leads to the upper boundO(f1(x, x, y, y)+maxc1,c2∈[−Sφ/2J(K),Sφ/2J(K)]d f1(x+c1, x, y+c2, y))(‖x−x‖l1+‖y−y‖l1).Consequently, we obtain

|h(K)c (x, y)− h(K)

c (x, y)| ≤O(f1(x, x, y, y) + max

c1,c2∈[−Sφ/2J(K),Sφ/2J(K)]df1(x+ c1, x, y + c2, y)

+ maxc1,c2∈[−Sφ/2J(K),Sφ/2J(K)]d

f1(x+ c1, x, y + c2, y))

× (‖x− x‖l1 + ‖y − y‖l1)

=: f2(x, x, y, y)(‖x− x‖l1 + ‖y − y‖l1).

This leads to |H(K)(x, y) − H(K)(x, y)| ≤ f3(x, x, y, y)(‖x − x‖l1 + ‖y − y‖l1) withf3(x, x, y, y) = 2f2(x, x, y, y) +

∫Rd f2(x, x, z, z)dF (z) +

∫Rd f2(z, z, y, y)dF (z). Note

that under (A4) Ef3(Yi, Yj, Yk, Yl)η(‖Yi‖l1| + ‖Yj‖l1 + ‖Yk‖l1 + ‖Yl‖l1) < ∞ if J(K)

is sufficiently large. Making some modifications in the proof of Lemma 2.3, Zn =

n2 E(Vn,c−V (K)n,c )2 can be approximated. As before we restrict ourselves to investigate

supk∈N E[H(K)(X1, Xk)]2 and

∑n−1r=1 Z

(1)n,r. The summands of Z

(1)n,r can be bounded as

follows ∣∣∣EH(K)(Xi, Xj)H(K)(Xk, Xl)− EH(K)(Xi, Xj)H

(K)(Xk, Xl)∣∣∣

≤ E∣∣∣H(K)(Xk, Xl)

[H(K)(Xi, Xj)−H(K)(Xi, Xj)

]∣∣∣+ E

∣∣∣H(K)(Xi, Xj)[H(K)(Xk, Xl)−H(K)(Xk, Xl)

]∣∣∣ .Since further approximations are similar for both summands, we concentrate on thefirst one. Iterative application of Holder’s inequality leads to

E∣∣∣H(K)(Xk, Xl)

[H(K)(Xi, Xj)−H(K)(Xi, Xj)

]∣∣∣≤ O(τ δ2

r )[E‖Xj − Xj‖l1f3(Xi, Xi, Xj, Xj)1/(1−δ)]δ(1−δ)

[E|H(K)(Xk, Xl)|1/(1−δ)

]1−δ,

as boundedness of hc implies uniform boundedness of (H(K))K . The middle factor isbounded because of (A4)(i). Therefore, it remains to examine E|H(K)(Xk, Xl)|1/(1−δ).

Page 26: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

25

Let b(K) be chosen such that P (X1 /∈ [−b(K), b(K)]d) ≤ 1/K, then

E|H(K)(Xk, Xl)|1/(1−δ)

= E|H(K)(Xk, Xl)|1/(1−δ)1Xk,Xl∈[−b(K),b(K)]d +O

(P (X1 /∈ [−b(K), b(K)]d)

)≤ 4 sup

x,y∈[−b(K),b(K)]d|hc(x, y)− h(K)

c (x, y)|1/(1−δ) +O(P (X1 /∈ [−b(K), b(K)]d)

).

Due to Lemma 5.2 and the continuity of f we obtain 1/K as an upper bound for thefirst term if J(K) is chosen sufficiently big. Consequently,

∣∣EH(K)(Xi, Xj)H(K)(Xk, Xl)

∣∣ ≤C Kδ−1 τ δ2

r which implies that Z(1)n,r tends to zero as K increases. Furthermore, one ob-

tains supk∈N0E[H(K)(X0, Xk)]

2 −→K→∞

0 similarly to the investigation of E|H(K)(Xk, Xl)|1/(1−δ)

above. Thus, we get lim supn→∞ n2 E(Vn,c − V

(K)n,c

)2

−→K→∞

0.

Step three of the proof contains the verification of lim supn→∞ n2 E(V

(K)n,c −V (K,L)

n,c )2 −→L→∞

0.

For this purpose it suffices to plug in a modified approximation of H(L)(x, y) −H(L)(x, y) into the second part of the proof of Lemma 2.2. Lipschitz continuity

of h(K,L)c implies

|H(L)(x, y)−H(L)(x, y)| ≤ f4(x, x, y, y)[‖x− x‖l1 + ‖y − y‖l1 ]

with f4(x, x, y, y) = C + f3(x, x, y, y). Since, for J(K) sufficiently large, f4 satisfiesthe moment assumption of (A4)(i) with a = 0, we obtain

E|H(L)(Yk1 , Yk2)−H(L)(Yk3 , Yk4)| ≤ C[E(‖Yk1 − Yk3‖l1 + ‖Yk2 − Yk4‖l1)]δ.

Hence, lim supn→∞ n2 E(V

(K)n,c − V

(K,L)n,c )2 −→

L→∞0 under (A4). Summing up the three

steps leads tolimc→∞

limK→∞

limL→∞

lim supn→∞

n2 E(Vn − V (K,L)n,c )2 = 0.

Proof of Lemma 3.2. A positive variance of Z implies the existence of constants V >0 and c0 > 0 such that for every c ≥ c0 we can findK0 ∈ N such that for everyK ≥ K0

there is an L0 with var(Z(K,L)c ) ≥ V, ∀L ≥ L0. Therefore, uniform equicontinuity of

the distribution functions of (((Z(K,L)c )L)K)c yields the desired property of Z. By

matrices-based notation of Z(K,L)c we obtain

Z(K,L)c =

M(K,L)∑k1,k2=1

γ(c,K,L)k1,k2

Z(K,L)k1

Z(K,L)k2

= Z(K,L)′Γ(K,L)c Z(K,L),

with a symmetric matrix of coefficients Γ(K,L)c and a normal vector

Z(K,L) = (Z(K,L)1 , . . . , Z

(K,L)M(K,L))

′. Hence, Z(K,L) can be rewritten in the following way

Z(K,L)c

d= Y ′ U (K,L)′Λ(K,L)

c U (K,L)Y = Y ′ Λ(K,L)c Y =

M(K,L)∑k=1

λ(c,K,L)k Y 2

k .

Here U (K,L) is a certain orthogonal matrix, Λ(K,L)c := diag(λ

(c,K,L)1 , ..., λ

(c,K,L)M(K,L)) with

|λ(c,K,L)1 | ≥ ... ≥ |λ(c,K,L)

M(K,L)| and, Y as well as Y are multivariate standard normally

Page 27: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

26

distributed random vectors. For notational simplicity we suppress the upper in-dex (c,K, L) in the sequel. Due to the above choice of the triple (c,K, L), either∑4

k=1 (λk)2 or

∑M(K,L)k=5 (λk)

2 is bounded from below by V/4. In the first case,

λ1 ≥√V/16 holds true which implies for arbitrary ε > 0

P (Z(K,L)c ∈ [x− ε, x+ ε]) ≤

∫ 2ε

0

fλ1Y 21(t)dt ≤ C(ε)√

V/16∀x ∈ R,

where C(ε)−→ε→0

0. Here, the first inequality results from the fact that convolution

preserves the continuity properties of the smoother function.

In the opposite case, i.e.∑M(K,L)

k=5 (λk)2 ≥ V/4, it is possible to bound the uniform

norm of the density function of Z(K,L)c by means of its variance. To this end, we

first consider the characteristic function ϕZ

(K,L)c

of Z(K,L)c and assume w.l.o.g. that

M(K,L) is divisible by 4. Defining a sequence (µk)M(K,L)/4k=1 by µk = λ4k for k ∈

1, ...,M(K,L)/4 allows for the following approximation

∣∣∣ϕZ

(K,L)c

(t)∣∣∣ =

M(K,L)∏

j=1

(1 + (2λjt)

2)

−1/4

M(K,L)/4∏

j=1

(1 + (2µjt)

2)

−1

≤ 1

1 + 4(µ21 + · · ·+ µ2

M(K,L)/4)t2.

By inverse Fourier transform we obtain the following result concerning the density

function of Z(K,L)c∥∥∥f

Z(K,L)c

∥∥∥∞≤ 1

2π‖ϕ

Z(K,L)c

‖1 ≤1

∫ ∞

−∞

1

1 + 4(õ2

1 + · · ·+ µ2M(K,L)/4t)

2dt

=1√

µ21 + · · ·+ µ2

M(K,L)/4

1

∫ ∞

0

1

1 + u2du

≤ 1

2√

4(µ21 + · · ·+ µ2

M(K,L)/4−1)

≤ 1

2√λ2

5 + · · ·+ λ2M(K,L)

≤ 1√V.

Thus, P (Z(K,L)c ∈ [x − ε, x + ε]) ≤ 2√

Vε which completes the studies of the case∑M(K,L)

k=5 (λk)2 > V/4 and finally yields the assertion.

Proof of Lemma 3.2. This result is an immediate consequence of Theorem 3.1.

Acknowledgment . The author thanks Michael H. Neumann for his constructive adviceand fruitful discussions.

Page 28: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

27

This research was funded by the German Research Foundation DFG (project: NE606/2-1).

References

[1] Arcones, M. A. and Gine, E. (1992). On the bootstrap of U and V statistics.Ann. Statist. 20, 655–674.

[2] Babbel, B. (1989). Invariance principles for U -statistics and von Mises function-als. J. Statist. Plann. Inference 22, 337–354.

[3] Billingsley, P. (1968). Convergence of Probabilty measures. New York: Wiley.[4] Carlstein, E. (1989). Degenerate U -statistics based on non-independent observa-

tions. Calcutta Statist. Assoc. Bull. 37, 55–65.[5] Daubechies, I. (2002). Ten Lectures on Wavelets. Philadelphia, PA: Society for

Industrial and Applied Mathematics.[6] Dedecker, J. and Prieur, C. (2004). Couplage pour la distance minimale. C. R.

Acad. Sci. Paris, Ser I 338, 805–808.[7] Dedecker, J. and Prieur, C. (2005). New dependence coefficients. Examples and

applications to statistics. Probab. Theory Relat. Fields 132, 203–236.[8] Dehling, H. and Mikosch, T. (1994). Random Quadratic Forms and the Bootstrap

for U -Statistics. J. Multivariate Anal. 51, 392–413.[9] Denker, M. (1982). Statistical decision procedures and ergodic theory. In Ergodic

theory and related topics (Vitte, 1981) (ed. Michel, H.). Math. Res. 12, 35–47.Berlin: Akademieverlag.

[10] Dewan, I. and Prakasa Rao, B. L. S. (2001). Asymptotic normality for U -statistics of associated random variables. J. Statist. Plann. Inference 97, 201–225.

[11] de Wet, T. (1987). Degenerate U - and V -statistics. South African Statist. J. 21,99–129.

[12] Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Rev.41, 45–76.

[13] Doukhan, P. and Louhichi, S. (1999). A weak dependence condition and appli-cations to moment inequalities. Stochastic Process. Appl. 84, 313–342.

[14] Doukhan, P. and Neumann, M. H. (2008). The notion of ψ-weak dependence andits applications to bootstrapping time series. Probab. Surv. 5, 146–168.

[15] Eagleson, G. K. (1979). Orthogonal expansions and U -statistics. Austral. J.Statist. 21, 221-237.

[16] Fan, Y. and Li, Q. (1999). Central limit theorem for degenerate U -statistics ofabsolutely regular processes with applications to model specification testing. J.Nonparametr. Stat. 10, 245–271.

[17] Feuerverger, A. and Mureika, R. A. (1977). The empirical characteristic functionand its applications. Ann. Statist. 5, 88-97.

[18] Franke, J. and Wendel, M. (1992). A Bootstrap Approach for Nonlinear Autore-gressions - Some Preliminary Results. In Bootstrapping and Related Techniques(eds. Jockel, K. H., Rothe, G. and Sendler, W.). Lecture Notes in Economics andMathematical Systems 376, Berlin: Springer.

[19] Ghosh, B. K. and Huang, W.-M. (1991). The Power and Optimal Kernel of theBickel-Rosenblatt Test for Goodness of Fit. Ann. Statist. 19, 999–1009.

[20] Hardle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets,Approximation, and Statistical Applications. New York: Springer.

[21] Huang, W. and Zhang, L.-X. (2006). Asymptotic normality for U -statistics ofnegatively associated random variables. Statist. Probab. Lett. 76, 1125-1131.

Page 29: Degenerate U- and V-statistics under weak …Leucht/u_stat_dep.pdfU- and V-statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15]

28

[22] Leucht, A. (2010). Some tests of L2-type for weakly dependent observations. Inpreparation.

[23] Leucht, A. and Neumann, M. H. (2009). Consistency of general bootstrap meth-ods for degenerate U - and V -type statistics, J. Multivariate Anal. 100, 1622-1633.

[24] Neumann, M. H. and Paparoditis, E. (2008). Goodness-of-fit tests for Markoviantime series models: Central limit theory and bootstrap approximations. Bernoulli14, 14-46.

[25] Serfling, R. J. (1980). Approximation theorems of mathematical statistics. NewYork: Wiley.

[26] Wojtaszczyk, P. (1997). A Mathematical Introduction to Wavelets. LondonMathematical Society, Students Texts 37. Cambridge: Cambridge Univ. Press.