gordon j. johnston department of statistics university … · gordon j. johnston" department...

·e•

«

~ OONPARAMETRIC REGRESSION ANALYSIS

"Gordon J. Johnston

Department of StatisticsUniversity of North Carolina, Chapel Hill

*This research was supported in part by the Air Force Office of ScientificResearch under Grant AFOSR 75-2796 and by the Office of Naval Research underGrant N00014-75-C-0809.

CHAPTER 1.

TABLE OF CONTENTS

Introduction t Summary and Related Work.

1.1 Introduction------------------------------------------------- 11.2 Summary------------------------------------------------------ 31.3 Related Work------------------------------------------------- 4

CHAPTER 2. ConsistencYt Normality.

2.12.22.32.42.52.6

o-Function Sequences-----------------------------------------11Nonparametric Density Function Estimation--------------------16Pointwise Consistency Properties of m and m ----------------18n nAsymptotic Distribution of m --------------------------------24nAsymptotic Distribution of m--------------------------------36nMean Integrated Squate Error---------------------------------37

CHAPTER 3. Asymptotic Properties of Maximum Absolute Deviation.

3.1 Pre1iminaries------------------------------------------------423.2 Maximum Absolute Deviation of m~-----------------------------47

3.3 Uniform Consistency of m and m -----------------------------71n n

CHAPTER 4. An Examp1e t Further Research.

4. 14.24.3

The Estimators m and m-------------------------------------74n nAn Examp1e---------------------------------------------------75Further Research---------------------------------------------76

•

FIGURES-----------------------------------------------------------78

BIBLIOGRAPHV------------------------------------------------------85

CHAPTER 1. INTRODUCTION, SUMf1ARY AND RELATED WORK

1.1 Introduction

Regression analysis is concerned with the study of the relationship

between a response variable Y and a set of predictor variables

~ = (XI,X2""'~)' An important aspect of regression analysis is the

estimation of the regression function, i.e. of the conditional expecta~

tion of Y, given~. In classical regression analysis, the functional

form Qf the regression function is assumed to be known up to a finite set

of unknown parameters, which may be estimated from data.

If no such prior knowledge of the regression function exists, then

classical methods do not apply. However, in this case, it may still be

desirable to obtain an estimate of the regression function, either for

direct analysis or to establish a plausible model for use in the classical

regression analysis mentioned above.

Thus there is a need for regression analysis methods which do not

assume a specific mathematical form for the regression function, i.e.

nonparametric methods.

In this study, a type of nonparametric estimator of the regression

function m(x) = E[YIX=x] will be investigated, where (X,Y) is a bivariate

random vector.

Let X and Y be random variables defined on a probability space

(Q,F,P) with EIYI < 00. Denote the marginal distribution function of X

by F. Then the regression function m(x) is defined by

2

m(x) = E[Ylx=x], i.e. the (unique a.e. (dF)) Borel measurable function

m sat~sfying

(1.1.1) f YelP = f m(x)aF(x)X-IB B

for all Borel sets B. If X and Y have a joint density function f, then

it follows that00

J yf(x,y)dy

(1.1. 2)

-00

f(x)if f(x) > 0

o if f(x) = 0

is a version of the regression function, where f denotes the marginal

density of X. Motivated by (1.1. 2) and previous work on estimation of

density functions by o-function sequences, Watson (1964) suggested an

estimator of m(x) of the form

(1.1. 3)

n(lIn) l y.o (x-X.)

. lIn 11=mn(x) = ---n-----

(l/n).L °n(x-XJ.)

J=l

e...

where (X1,Y1), (X2'YZ)""'(Xn,Yn) are independent observations on (X,Y)

and {on(x)} is a sequence of weighting functions called a a-function

sequence. The estimator mn(x) defined in (1.1.3) will be investigated

here. By rewriting (1.1.3) as

n °n(X-Xi ) 1m (x) = L Y. -n----n . 1 1

1= L 0 (x-X.)L j=l n J J

we have the intuitively appealing interpretation of mn(x) as a weighted

average of the Y-observations, with the weights depending on x through

..

....

f(x).e

3

0n(x). Also, if one desires a smooth estimate of m(x) , this can be

achieved through the choice of 0n(x).

In certain situations, the marginal density f of X is known. For

example, suppose in an experiment, we are able to fix the level of the

predictor variable X, but we wish to randomize X to reduce sampling bias.

Then we would choose X randomly according to a known density f. This

situation also arises in certain optimization problems, where the value

of the function to be optimized can only be detennined up to a random

error term (see Devroye (1978)). Since the denominator of (1.1.3) is

intended to estimate f, a reasonable way to use the knowledge of f might

be to use the modified estimatorn

(lIn) LY.O (x-X.). 1 1 n 11=

iii (x) = ------n

We provide some preliminary comparisons of the estimators mn and mn in

the known density case.

1. 2 Surrma ry

Since we will assume a specific mathematical form for neither the

regression f~ction m nor the underlying probability distribution of

(X,Y) , we could not reasonably expect to obtain small sample results for

the estimators in question. Thus we shall concern ourselves here almost

exclusively with asymptotic results, as the sample size grows larger.

In Chapter 2, we rigorously establish (weak) pointwide consistency

of mn(x) , which was proved heuristically by Watson (1964). Asymptotic

joint normality of rnn(x) , taken at a finite number of points, is demon

strated. In this last result, we significantly weaken a condition of

4

Schuster (1972) on the 6-function sequence used, at the expense of

some mild additional regularity conditions. In the known density case,

we establish consistency and asymptotic normality for mn, so as to provide

a comparison with the asymptotic normality and consistency results for ffln.

We consider the mean integrated square error (MISE) of the ntDnerator of

the estimator mn. An explicit expression for the Fourier transform of the

6-function which minimizes this MISE for each san~le size n is derived,

much as Watson and Leadbetter (1963) did for density function estimators.

In Chapter 3, we consider the ntDllerator of ffln' and show that the

supremum, taken over a finite interval, if properly centered and normalized,

converges in distribution to a random variable having an extreme value type

distribution. This result is then applied to establish uniform (weak)

consistency of ffln' with an associated rate of convergence.

In Chapter 4, we give some examples of calculations of the estimators

mn and ron from simulated data.

1.3 Related Work

Estimators of the form (1.1.3) of the regression function and several

other types of nonparametric estimators of the regression function have

recently received attention in the literature. Here we survey the

recent literature on this subject.

1.3.1 Kernel Type Estimators

Several authors have considered estimators of the form (1.1.3) when

the 6-function sequence is of kernel type. Kernel type o-function seq

uences (defined rigorously in Lemma 2.1.2) are of the form-10n(x) = En K(x/En) , '

where K is, e.g. a probability density function and En is a positive real

e...

5

sequence with £n ~ 0 as n ~ 00.

Schuster (1972) considers the asymptotic normality of this type of

estimator. Because of the close relation between Schuster's work and

work presented in this dissertation, we defer discussion tmtil Section

2.4.

Schuster and Yokowitz (1978) consider a global error criterion for

this type of estimator, and for its derivatives as estimators of the

derivatives of the regression ftmction. Let g(r) denote the r-th

derivative of the function g. Schuster and Yokowitz give conditions lUlder

which, for any £ > 0 and n sufficiently large,

(1. 3.1)

< C/( 2N+2 2)- n£ n £ ,

where N is a positive integer, [a,b] is a closed, bOtmded interval and2N+2C is a constant not depending on n. If {£n} is such that n£n ~ 00

as n ~ 00, then (1.3.1) implies that

as n ~ 00, so that this result is a type of tmifonn consistency result.

It would be of interest to detennine a rate of convergence to be assoc-

iated with the

bn ~ 00 such that

result, i.e., a positive, real sequence {bn} with

This question is addressed in Chapter 3 of this dissertation for the case

N = O. Schuster and yakowitz also consider the case where the X

6

variable is non-stochastic, i.e., we have {F(·;x), xE[O,l]} as a family

of probability density functions and the object is to estbnate

w(x) = !yF(y;x)dy

on the basis of an independent sample Yi' i = 1, ••• ,n where Yi has

density f(.;xi ). They give conditions under which a result stmi1ar to

(1.3.1) holds for the so-called Priestly-Chao estbnator

(1.3.2)

of w(x) (see Priestly and Chao (1972)).

In the non-stochastic X variable case as described above, Benedetti

(1974) shows that both the Watson estbnator and the Priestly-Chao

estimator are (weakly) consistent and asymptotically normal for appro

priate values of x, but he points out sane computational advantages of

the Watson esttmator over the Priestly-Chao.

Konakov (1975) considers a quadratic deviation error criterion for

the Watson estimator with kernel type o-sequences. Define the quadratic

deviation to be

where fn is a kernel type estimator of the marginal density f of X and

p is a bounded integrable weight ftmction. Konakov gives conditions

tmder which Tn' if properly normalized and centered, is asymptotically

noma!. We do not consider quadratic deviation in this dissertation.

1.3.2 Nearest Neighbor Type Estimators.

Watson (1964) proposed estimating m(x) with the average of the Y

values corresponding to the k X values nearest to x, "where k is some

, I

e.

7

integer. This type of estimator is called the k nearest neighbor

estimator of the regression ftDlction. Earlier work had been done on

the classification problem and on estimation of a probability density

ftDlction using nearest neighbor teclmiques. (See Fix and Hodges (1951),

Cover and Hart (1967), Cover (1968), and Loftgaarden and Quesenberry

(1965) for work in these areas.)

Let ken) be an integer depending on the sample size n and denote

by Ik(n) (x) the smallest open interval centered at x containing no less

than k of the X-observations. Then the k-nearest neighbor estimator

~, can be written as

(1. 3. 3)

.e

~ -1 \mn(x) = k l y. •{i:Xi€Ik(n)(X)} 1

Stone (1977) points out that ~ (x) may be a discontinuous ftDlction, and

that in same cases, smoothness is a desirable property in a regression

,(1. 3.4)

function estimator. Lai (1977) proposes a modification of the k nearest

neighbor estimator which can have the desired smootlmess property. This

estimator is very similar to the Watson estimator (1.1.3) with kernel

type o-ftDlction sequence. Let Wbe a probability density with W(x) = 0 '

for Ix I > 1. Then Lai's estimator is defined by

L~=lYiW((x-Xi)/Rk(n)(x))"-mn (x) =-n---------

Li=lW((X-Xi)/Rk(n) (x))

where Rk(n) is the radius of the interval Ik(n). This estimator reduces

to the form (1.3.3) when W(x) = 1/2 for Ixl ~ 1. Lai proves the following.

1.3.1 Theorem. AsslDlle W is continuous a.e., botDlded and W(x) = 0 for

Ix I > 1. If there exists an open set Uo in R on which

i) f(x) is continuous, bounded, and f(x) > 0 ,

ii) E((lyllx=x) and E(max(Y,O) Ix=x) are continuous functions of x,

then

iii)

iv)

8

lim sup E(IYI-I{I 1~}(Y)IX=xl = 0 ,M+oo X€Uo y

k(n)/n .... 0 and k(n)/Iil .... 00,

suplrn (z) - m(z)/ .... 0nz€A

and if Ey2 < 00 and

in probability for any compact set AcUO• 0

A similar result is proved for the estimator (1.1. ), with A an

interval, in Chapter 3 of this dissertation. There, more regularity

conditions are applied to obtain an associated rate of convergence.

Stone (1977) considers the following type of nonparametric regres-

sion function estimator:

where Wni(x) = Wni(x; X1' ••• '~) is a weight function. This estimator

reduces to the nearest neighbor, modified nearest neighbor and o-function

type estimators discussed above for appropriate choices of the function

(1.3.5) rn (x) = LW • (x)Y.n i nl 1 e.

A

Wni • Stone gives general conditions on the weight ftmctions Wni for nnto be consistent in Lr , Le., for

EI~(x) - m(x)l r .... 0

whenever Ely/ r < 00. Stone's work applies to give minimal conditions for

this type of consistency for some types of estimators, e.g., if ken) .... 00

and k(n)/n .... 0, then the k nearest neighbor estimator is consistent in

Lr • Stone, however, points out that it is not clear from his results

when an estimator of the Watson type is consistent in Lr •

.e

9

1.3.3 Potential Function Methods.

In this method, introduced by Aizernan, Braverman, and Rozonoer

(1970), the regression function rn is assumed to belong to a Hilbert

space H and have the representation m(x) = ~Ci~i(x) , where {~i} is1

a complete system of functions of H. The estimator Jlh is calculated

recursively by the formula

where

and K is a "potential function" of the form

and {Yn} is a sequence of real numbers, and rnO is chosen arbitrarily.

We have the following type of consistency for this set-up. Suppose

\.(C./A.)2< 00, \. y. = 00, \. y? < 00. ThenII 1 1 II 1 II 1

f [m (x) - rn(x)]2f (x)dx -+- 0n

in probability as n -+- 00.

Fisher and Yokowitz (1976) obtain more general results for this

type of estimation, but for a more complicated error criterion.

1.3.4 Estimates ,Based on Ranks.

Let Xms Xn2 s ••• s Xnn denote the ordered X values and define

the concomitant of Xni = Xj to be Yni = Yj • The set Yni , i = l, ••• ,n,

are sometimes called the induced order statistics. Yang (1977) proposes

10

the following estimator of m(x) based on concomitants:

where E~lK(X/En) is a kernel type o-function sequence and Fn is the

empirical distribution function of the X values. Yang gives conditions

under which ~ is (weakly) consistent and asymptotically nonna1 at appro

priate points x.

Bhattacharya (1976) discusses estimation of a function related to the

regression function based on concomitants. Let F denote the marginal

distribution function of X and define

h(t) = mOF-1(t); H(t)

Natural estimators of Hare

[nt]H (t) =n-1 r Y.n . 1 n11=

and

t= f h(s)ds .,

o

e.

if F is known. Bhattacharya obtains weak convergence in D[O,l] res~ts

for these estimators and applies them to estimation and hyPOthesis

testing problems.

.e

2. CONSISTENCY, NORMALITY

2. 1 o-function Sequences

A certain class of o-function sequences was suggested originally by

Rosenblatt (1956) and, under slightly weaker conditions, by Parzen (1962)

for use in probability density function estimation. Leadbetter (1963)

and Watson and Leadbetter (1964) introduced a more general notion of

o-function sequences, and our approach throughout this study will be

to obtain results for the more general type of o-functions whenever

possible.

The following (2.1.1 - 2.1.4) is due to Leadbetter (1963) .

2.1.1 Definition. A sequence of integrable functions {on(x)} is called

a o-function sequence if it satisfies the following set of conditions

(integrals with no limits of integration are meant to extend over the en-

tire real line):

Cl. JIOn(x) Idx < A for all n and some fixed A,

C2. Jon(x)dx = 1 for all n,

C3. 0n(x) -+ 0 uniformly on Ixl ~ A for any fixed A > 0,

C4. J 0 (x)dx -+ 0 for any fixed A > 0Ixl~A n o

The next lermna describes the type of o-function sequence used by

Rosenblatt (1956) and Parzen (1962), although the conditions on the func

tion Kare slightly different. This type of o-function sequence will be

12

referred to as "kernel type" and K as a "kernel ftmction."

2.1.2 Lemma. Let {E } be a sequence of non-zero constants with E -+ 0n n

as n -+ 00 and let K be an integrable ftmction such that f K(x)dx = 1

-1 I -1and K(x) = o(x ) as Ix -+ co. Then {En K(X/En)} is a o-flUlction sequence.

D

The following lemma demonstrates the similarity of o-ftmction

sequences as defined above and the Dirac o-function.

2.1.3 Lemma. If g(x) is integrable and continuous at x = 0 and {on}

is a o-ftmction sequence, then g(x)on(x) is integrable for each n and

!g(x)on(x)dx -+ g(O) as n -+ co. o

2.1.4 Lemma. Let {on(x)} be a o-ftmction sequence such that, for P ~ 1,

~~(p) = f 10 (u)IPdu < 00 for each n. Then a (p) -+ 00 andn n n

{on,p(x)} = {lon(x)IP/an(p)}

is a o-flUlction sequence for P ~ 1. o

Rosenblatt (1971) states the following lemma, which gives a rate.

of convergence for Lemma 2.1.3, when the o-function sequence is of

kernel type. We include a proof for cOOlpleteness.

2.1.5 Lemma. Suppose g is an integrable function with botmded, contin-

uous 1st and 2nd derivatives. Let

be a o-ftmction sequence of kernel type with

13

J uK(u)du = a and

Then-1 2£ J K((x-u)/£ )g(u}du = g(x) + 0(£ ) ,n n n

where the sequence represented by 0 does not depend on x.

Proof. Write

£-1 J K((x-u)/£ )g(u)du = JK(y)g(x-£ny)dyn n

and

g(x-£nY)

= g(x) - g' (x)£nY + g"(T)£~y2/2

where T = Tn(X,y) is between x and x~£nY. Thus

-1£n J K((x-u)/£n)g(u)du

=

and hence

g(x)! K(y)dy - g'(x)£n J yK(y)dy

2 2+ (£n/2) J g"(T)y K(y)dy

1£~1 J K((x-u)/£n)g(u)du - g(x) I

s £2 sup I (g"(t))/21 J ly2K(y)ldyn t

since J K(u)du = 1 and J uK(u)du = o. The conclusion follows from

the last inequality. 0

The following 1enunas will be useful in the sequel.

14

2.1.6 Lemma. Let {on} be a o-function sequence and g an integrable

function. If g is continuous at x and y and x ~ y, then

as n ~ 00.

Proof. For convenience, assume x < y. By Lemma 2.1.3, 0 (y-u)g(u)n

is an integrable function for each n, as is 0n(x··u)g(u). Choose A so

that x < A < y. Then

I f 0n(x-u)on(y-u)g(u)dul

As f 10n(x-u)on(y-u)g(u)ldu

-00 ,

00

+ J 10 (x-u)o (y-u)g(u)lduAnn

S sUPlo (y-u) I J 10 (x-u)g(u)lduUSA n n

+ suplo (x-u) I J 10 (y-u)g(u)ldu .t2A n n

Now suplon(y-u)IUSA

2.1. 1. Further,

and suplo (x-u) I converge to zero by C3 of Definitiont2A n

f IOn(x-u)g(u)ldu < 00 for each n by the preceding

remark, and, in fact, by Lemmas 2.1.3 and 2.1.4,

and

15

for sane constant A. Thus f 10n(x-u)g(u)ldu is a bounded sequence,

and we have

suplo (y-u) I f 10 (x-u)g(u)ldu + 0USA n n

as n + 00. The same argument applies when x and yare interchanged, and

the conclusion follows. o

2.1.7 LelTlTla. Let {on} be a o-function sequence such that on is an

even function for each n and g an integrable function such that g has

both left and right hand limits at O. Then 0n(x)g(x) is an integrable

function for each n and

f 0n(x)g(x)dx + (g(O+) + g(0-))/2

as n + 00.

Proof. Define

g(x) , x > 0+g ex) = g(O+) , x = 0

g( -x), x < 0

g( -x), x > 0

g-(x) = g(O-), x = 0

g(x) , x < 0 .

+ -Clearly, g and g are even functions, continuous at O. Further, they are

both integrable functions, since, e.g.

00

J Ig+(u)ldu = 2 fo

00

+Ig (u)ldu

= 2 f Ig(u)ldu < 00

o

16

Thus, by Lermna 2. 1. 3,

+ +J 0n(x)g (x)dx ~ g (0) = g(O+)

as n ~ 00. But

00 0= J 10 (x)g(x) Idx + J 10 (x)g(x)ldxo n -00 n

by Lermna 2.1.3, so that 0n(x)g(x) is integrable, and

00 0= J 0 (x)g(x)dx + J 0 (x)g(x)dxo n -00 n

~ (g(O+) + g(0-))/2

by the preceding remark. o

e.

2.2 Nonparametric Density Function Estimation.

Nonparametric methods of density function estimation have been studied

in great detail (see, e.g., Wegman (1972a) and Wegman (1972b) for a survey

and comparison of work in this area). Estimators of a density f(x) of the

form

(2.2.1)n

= (lin) I 0n(x-X.). 1 11=

17

are of particular interest here because of the reliance of our proposed

regression function estimators on the same type of weighting functions

on' and because fn' defined in (2.2.1) appears in the denominator of mn

as defined in equation (1.1.3).

Since Xi' i = 1, .•. ,n are i. i.d. with conunon densi ty f, we have

Efn = f 0n(x-u)f(u)du ,

which has f(x) as its limiting value as n -+ 00, provided f is continuous

at x, by Lemma 2.1.3. That is, fn is an asyrnptotica11yunbiased estima

tor of f at continuity points of f. Further,

n(1/n)2E{ L 0n

2(x-xl·) + LL° (X-Xl')

i=l i#j n

= (lIn) f o~(x-u)f(u)du

2Ef (x) =n

+ ((n-1)/n)[ f 0n(x-u)f(u)du]2 ,.e

so that

Var[fn(x)] = (lIn) f o~(x-u)f(U)du

- (lIn) [ f 0n(x-u)f(u)du]2

and we thus have, if a = f o2(u)du < 00 for each n, by Lenuna 2.1.4,n n•

as n -+ 00 at continuity points x of f. The above calculations (which

appear in Watson and Leadbetter (1964)) may be combined to give condi-

tions under which the mean square error of fn converges to zero, as the

following lemma shows.

18

2.2.1 Lemma. Let {a } be a a-function sequence for whichn

a = J a2(u)du < 00 for each n and a In + 0 as n + 00. Let x be an n ncontinuity point of f. Then

2E[fn(x) - f(x)] + 0 as n + 00

2.2.2 Remark. By Chebychev's inequality,

-2 2P{lfn(x)-f(x)I > £} ~ £ E[fn(x)-f(x)]

o

for any £ > o. We thus have fn(x) + f (x) in probability, provided the

conditions of Lenuna 2.2.1 are satisfied. That is, fn(x) is a weakly

consistent estimator of f(x) for appropriate a-fLmction sequences and

points x.

The preceding discussion on density estimation will suffice for our

discussion of pointwise consistency of our proposed regression estimators.

We will include other pertinent results on density estimation as they are

needed.

2.3 Pointwise Consistency Properties of mn and rnn.

We begin our discussion by considering the numerator of the estima

tors ~ and mn defined in (1.1.3) and (1.1.4),. respectively. Denote, for

convenience, the numerator by m~, i.e.,

e.

n= (lin) LYloa (X-Xl·)

o 1 n1=

Then we have the following.

19

2.3.1 Lenvna. Let {on} be a o-function sequence for which

0, = J o2(u)du < co for each n. Suppose Ey2 < 00 and x is a point of con-n ntinuity of the functions feu), m(u) = E[YIX?u] and s(u) = E[y2IX=u].

Then

*(i) Emn(x) + m(x)f(x)

*(ii) (n/o,n)Var[mn(x)] + s(x)f(x)

as n + 00.

Proof. We will use the following two well known properties of the

regression function:

(2.3.1) Eh(Y) = J E[h(Y)lx=x]f(x)dx

for any function h and random variable Y such that Elh(Y)I < 00 ,

(2.3.2) E[g(X)h(Y)IX=x] = g(x)E[h(Y)lx=x]

for any functions g and h such that Elg(X)h(Y)I < 00.

Since (Xi,Yi ), i = l, ... ,n are i.i.d., we have

*Em (x) = EYo (x-X)n n

= JE[Yon(x-X)IX=u]f(u)du

by (2.3.1),

=J 0n(x-u)E[YIX-u]f(u)du

by (2.3.2),

=J 0n(x-u)m(u)f(u)du .

20

Now, by assumption, m(u) feu) is continuous at u =: x. Thus (i) will

follow from Lemma (2.1.3) if we demonstrate that m(u)f(u) is an inte-

grable ftmction. To verify this, note that, by Jensen's inequality

Im(x) I = IE[YIX=x] I S E[ IYllx=-x] .

Thus

f Im(u)f(u) Idu s fE[IYI IX=u]f(u)du = EIY\ < 00

by assumption, and (i) follows.

For (ii), note

* 2 2 n 2E[m (x)] = (l/n) E{ L [Y.o (x-X.)]n i=l 1 n 1

+ LLY.O (x-X.)Y.o (x-X,.)}'4' 1 n 1 J n ,1rJ .

= (l/n) f s(u)f(u)o~(X-U)dll

+ ((n-l)/n)[f m(u)f(u)on(x-U)du]2 ,

the last step following from (2.3.1) and (2.3.2):, as used in the proof

of (i). Thus

*Var[~(x)] = (l/n) f s(u)f(u)o2(x-u)dun

- (l/n)[f m(u)f(u)on(x-u)du]2 ,

and, since an .... 00 and {on/an} is a o-ftmction sequence, we have

*(n/an)Var[~(x)] .... s(x)f(x) ,

as desired. tI

•

.e

21

We now Use the preceding result to demonstrate the consistency of

the estimators II)" and inn'

2.3.2 Theorem. Let {a } be a a-function sequence such thatn

a = f a2(u)du < 00 for each n and a =c(n). Suppose Ey2 < 00 and x'n n nis a continuity point of f(u) , m(u) and S(u) , and that f(x) > O. Then

(i) m (x) ~ m(x)n

(ii) m(x) £m(x) .n

*Proof. Since an/n -+ 0 by asslDIIption, we have Var[mn(x)] -+ 0 by

Lemma 2.3.1. Thus

E[m*(x) - m(x)f(x)]2 -+ 0n

and by applying Chebychev's inequality as in Remark 2.2.2, we see that

* pmn(x) -+ m(x)f(x). Since, by definition

and f(x) > 0, (ii) follows immediately.

For (i), write

1m -ml =n

*m - mfn m(f-f )1

+ nfn I

f - fn

where we have suppressed the argument x for convenience.

by (ii), and since fn(x) ~ f(x) > 0, we have

* PNow m -+ mfn

22

Similarly,

f - f Pnf

-+ 0 ,n

and (i) follows. o

We have, then, that both the estimators ~ and mn are weakly con

sistent estimators of m at continuity points of s, m and f. The follow-

ing corollary specializes the preceding theorem to kernel type a-function

sequences.

the other conditions of Theorem 2.3.2 are satisfied. Then the conclusions

2.3.3 Corollary.

sequence of kernel

Suppose that {a (x)} = {£-lK(X/£ )} is a a-functionn n ntype with n£n -+ 00 as n -+ 00 and f K2(u) du < 00. Asstnne

of Theorem 2.3.2 hold.

Proof. We need only verify that CJ.n/n -+ o. By definition

Then

by asstnnption. o

We have so far been asstnning that the density f of X is continuous

and positive at points where we wish to estimate the, regression function.

An important case where these assumptions may not hold is when X is a

bounded random variable, Le. when f has bounded support, and we desire an e

•

.e

•23

estimate at a bOlmdary point of the support of f. We have the following

result, which demonstrates that if m(x+) =m(x) , rnn(x) is consistent

but mn(oc) is not.

2.3.4 Theorem. Let {on} be a o-function sequence such that an < 00

for each n and an/n -+ o. Suppose Ey2 < 00 and x is a point such that

f, m and s have left and right hand limits at x and f(x+) = f(x) > 0,

f(x-) = o. Then

(ii) rnn(x) -+ m(x+) •

Proof. By Lemma 2.1.7 it follows that

*Ernn(x) -+ m(x+)f(x)/2 .

By an argument similar to the one used in the proof of Lenma 2.3.1, it

follows that

pfn(x) -+ f(x)/2 ,

* prnn(x) -+ m(x+)f(x)/2 .

Since

(i) follows. Since

(ii) follows by an argument similar to the one used in Theorem 2.3.2.

o

•

24

This theorem demonstrates that the estimator ~ displays an

"end effect" at the boundaries of the range of X which ~ does not

display. As we shall see in Chapter 4, this end effect represents a

possible disadvantage for mn' depending on how m is defined at the

boundaries of its support. We now turn our attention to asymptotic

distributional properties of mn and mn'

2.4 Asymptotic Distribution of mn.

Nadaraya (1964) and Schuster (1972) have considered the asymptotic

nonnality of the estimator ~ in tenns of kernel type 0-function sequences.

Nadaroya states that, if Y is a bounded random variable and nE 2 ~ 00n

then (nEn)~(~(x) - Ernn(x)) has an asymptotically normal distribution

with zero mean and variance s(x)J K2(u)du/f(x) , where

sex) = E[y2IX=x] .

Schuster (1972) points out that this expression for the variance is in-

correct and presents a result with the correct variance which at the same

time removes the restriction that Y be bounded and centers at m(x) instead

of Ernn(x). We state Schuster's result here for comparison with a new

result which represents, in some respects, an improvement over Schuster's.

2.4.1 Theorem. Let {E~lK(X/En)} be a o-functioJl sequence satisfying the

condition:

(i) K(u) and uK(u) are bounded,

(ii) f uK(u)du = 0, J u2K(u)du < 00 ,

25

(iii)

•

.e

Suppose x1,x2""'~ are distinct points with f(xi ) > 0, i = 1, •.• ,P.

Let w(u) = m(u) f(u) and asstune f', w', s', fll ,w" exist and are bounded,

and that Ey3 < 00. Then

converges in distribution to a multivariate normal random vector with

zero mean vector and diagonal covariance matrix with i-th diagonal

element given by

where2 2a (u) = s(u) - m (u) .

Schuster proves this theorem by using the Berry-Esseen theorem to

show the joint asymptotic normality of the ntunerator and denominator-of

~. An application of the Mann-Wold theorem (Billingsley (1968)) then

yields the desired result. As we shall see, it is not necessary to con-

sider the joint distribution of the ntunerator and denominator of mn in

order to establish asymptotic normality. Schuster's proof can thus be

simplified. Also, by using the Lindeberg-Fe11er central limit theorem,

instead of the Berry-Esseen theorem, we will be able to require the

o-function sequence to satisfy a less restrictive condition, namely that

nEn ~ 00 instead of nE~ ~ 00.

We now present the new asymptotic normality result for ~. The most

important difference between this theorem (when stated in terms of kernel

type o-sequences) and Theorem 2.4.1 is that it only requires nEn -+ 00

26

instead of n£~ +~. Also, it applies to general a-function sequences.

There are minor differences in the other conditions which will be evi-

dent in the statement of the theorem. We first state the main theorem,

and then prove a preliminary lenma before returning to the proof of the

theorem. We then specialize to kernel type a-function sequences.

2.4.2 Theorem. Let {an} be a sequence of a-functions such that

J 12+ny = la (u) du < ~ for each n ,n n

for some n > 0 ,

Za = J a (u)du = O(n) as n + ~n n

and

Suppose EIYI Z+n < ~ and the distinct points xl,xZ, ... ,xp are continuity

points of each of the functions f(x), m(x), sex) = E[YZ,X=x] andZ+nE[IYI IX=x], and that f(xi) > 0, i = l, •.. ,P. Then

converges in distribution to a multivariate normal random vector with

zero mean vector and identity covariance matrix, 'where

mn(x) - ~(x)Zn (x) = -~-----.,;,---

Z .k[(an/n)(a (x)/f(x))] 2

e.

27

*Emn(x)gn(x) =

Efn(x)

and

o

.e

*Since ffin = mn/fn is a ratio of sums of random variables, a direct

central limit argument is not possible. However, note that

*m - Q = [m If - (f If)Q](f/f )n ~ n n ~ n

and f/fn £1, so that ffin - gn will h~ve the same asymptotic distribution

as the tern within square brackets above. The tern within square brackets

is a sum of random variables, and thus standard arguments may be used to

establish its asymptotic nonnality • This is the outline which the

proof of Theorem 2.4.2 will follow, although the notation will be more

complicated since the proof will be in a multivariate setting.

The following lemma establishes the asymptotic variance and covariance

2.4.3 Lenma. Let {on} be a a-function sequence such that

ex. = J o2(u)du < 00 for each n. Suppose x ;. yare continuity points ofn n

f, m, .and s, and that f(x) > 0, fey) > 0, and Ey2 < 00. Define

~(Z) = f(z)/fn(Z)

and

Then

and

28

(ii) n Cov[Hn(x) , ~(y)] ~ 0 as n ~ 00

Proof. By definition,

andif

E~(x) = ~(x)/f(x) - ~(x)Efn(x)/f(x)

= 0

sinceif

~(x) = ~(x)/Efn(x) ,

and we thus have

2Var[~(x)] = ~(x)

n= (nf(x))-2E{ L [(Yo-Q(x)]o (x-X

1o)]2

o 1 1 11 n1=

Now

since

Hence

(n/().n)Var[~(x)]

= ().~1(f(x))-2E[(Y-&n(x))on(x-X)]2

-,

e.

•

.e

29

- 2&.n(x)! m(u)f(u)o~(x-u)du

+ ~(x) ! f(u)o~(x-u)dU}

+ (f(x))-2{s(x)f(x) - m2(x)f(x)}

= 02(x) /f(x) ,

since {o~(u)/~n} is a o-functionsequence by Lemma 2.1.4 and ~(x) + m(x).

For (ii), note

n COV[~(x), ~(y)] = ~(x)~(y)

n= (nf(x)F(y))-lE LL [(Y.-Q (x)]o (x-X.)] •

. . 1 1 ~ n 11,J=

[(Yj-&.n(y))on(y-Xj )]

n= (nf(x)f(y))-l. LE[(Yi-&.n(x))on(x-Xi )] •

1=1

[(Yi-gn(y))on(y-Xi )]

= (f(x)f(y))-l ! 0n(x-u)on(y-u)qn(u)du

where

qn(u) = f(u) [s(u)-m(u) (&.n(x)+&.n(y)) + &.n(x)&.n(y)]

is continuous at u = x and y = y by assumption. Thus

n Cov[l\t (x), ~ (y)] + 0

by Lemma 2.1.6, and (i) is true. o

30

We now return to the proof of Theorem 2.4.2.. By the Cramer-Wold

device (e.g., Billingsley (1968)), it suffices to show

or, equivalently,

where ~ and ~ are as defined in Lemma 2.4.3. Since fn(xk) ~ f(xk),

k = 1, ... ,p, it follows that Rn(xk) £ 1, k = l, ... ,p, and it thus suffices

to show that

L-+ N(O,l).

Now

for each n implies

2a = J 0 (u)du < ~n n

e.

•

..

31

for each n, since f o2 Cu)du < ~ for any finite A > 0 by ~older'slul<A n

inequality, and by Cl and C3 of Defiilition 2.1.1. Thus we have, by

Lenuna 2.4.3,

Var[ I tkH Cxk)]k=1 n

= I t k2 Var[H Cxk)]

k=1 n

as n + 00. Hence it suffices to prove

.e v =n{Var[ I tkH CXk)]}~

. 1 n1=

L+ NCO,I) .

Since, by definition,

H CX)n

we may write

n= (nf(x))-1 L (Y.-g (x))o (x-X.)

. 1 1 n n 11=

nV = l V .

n i=1 n,1

where the LLd. random variables V ., i = 1, ... ,n, are defined byn,1

where

32

It then follows from the Lindeberg-Feller central limit theorem that

if, for same n > 0,

nElV ,2+n -+ an,l

then

v k N(O,I) as n -+ 00n

Now by applying the cr inequality of Loeve (1963) repeatedly we have

nElV ,2+nn,l

= I ck(n)[Ik n + J k n] ,k=l ' ,

2+n

where ck(n) depends only on k and n and the constants tl, ... ,tk• It

is easily seen that

the last step following by earlier calculations. Further,

•

•

33

EIYa (x_X)!2+nn

** 2+n= y f a (x-u)E[IYI IX=u]f(u)dun n

where {an**} = {Ia ,2+n/y } is a a-function sequence by Lemma 2.1.4.n n

Thus, for k = 1,2, ..• ,p, we have

-+ 0 as n -+ 00

since

by assumption. Similar calculations yield

-+ 0 as n -+ 00 , and the proof is complete. o

We now give a version of Theorem 2.4.2 for kernel-type a-function

sequences, which may be compared with Schuster t s Theorem 2.4.1.

2.4.4 Theorem. Suppose {an(x)}

of kernel type satisfying

(i) JIK(u)1 2+ndu < 00 for some n > 0

34

(ii) f uK(u)du = 0, f u2K(u)du < 00

(iii)5n£ -+ 00, n£ -+ 0 as n -+ 00

n n

Suppose m(x) and f(x) have bounded, continuous 1st and 2nd derivatives,

EIYI 2+n < 00 , the distinct points xl'x2""'~ are continuity points of

sex) and E[lYI 2+n IX=x] and f(xk) > 0, k = l, ... ,p. Then

(Z~(xl)'."'Z~(~)) converges in distribution to a multivariate normal

random vector with zero mean vector and identity covariance matrix, where

Z~(x) =

Proof. We first verify that this a-function sequence satisfies the

conditions of Theorem 2.4. 2. Now

for each n since £n t- 0, f K2(u)du < 00. Further,

since n£n -+ 00 by assumption. Similarly,

for each n, and

I n/2 1+n/2 -n/2y n ex ex (n£ ) -+ 0 as n..;~ 00n n n

by assumption. Thus this type of a-function sequence satisfies the

requirements of Theorem 2.4.2, and since the remaining regularity condi

tions of Theorem 2.4.2 are clearly satisfied under the present assumptions,

•

35

we have that the conclusion holds when m(x) is replaced by

*&n(x) =Emn(x)/Efn(x) in the expression for Z~(x). Hence, if we show

that

then the conclusion will follows. Now

l.:(n£n) 2(&n(X) - m(x))

~ {£~1 f K((x-u)/En)m(u)f(u)du= (nEn) .-.;..;,-.------~---

£n1 f 'K((x-u)/En)f(u)du- meXl}

•

-1 f )+ m(x)f(x) - m(x) n K((X-U)/En)f(U)dU}.

£-1 f K((x-u)/£ )f(u)dun n

By Lemma 2.1.5, the numerator of the term within the brackets above is

O(£~) and the denominator converges to f(x) > o. Thus

.5 0Slnce n£n -+ • o

36

2.5 Asymptotic Distribution of rnn.

- *It is evident that, since m =m /f is a StDl1 of independent randomn n

variables, we may apply the Lindeberg-Feller central limit theorem in

much the same way as we did in Theorem 2.4.2 to establish the asymptotic

normality of (mn(x) - Eiii (x))/Var[m]. We established jn (ii) of Lenrnan n

2.3.1 that

Var[m~(x)] "" (cxn/n)s(x)f(x)

for appropriate points x. Hence we have

Var[m ] "" (cx /n)s(x)/f(x) .n n

We therefore have the following theorems, which we state without proof,

since the proofs follow those of Theorems 2.4.2 and 2.4.4 very closely.

The first theorem concerns the asymptotic normality of m for generaln

o-function sequences; the second for kernel type o-sequences.

2.5.1 Theorem. Under the conditions of Theorem 2.4.2,

(Wn(x1), ... ,wn(xp)) converges in distribution to a multivariate normal

random vector with zero mean vector and identify covariance matrix, where

m(x) - Eiii (x)) n nW

n(x = -....::.::.....-----::.;;---r"J.:

{(cxn/n)s(x)/f(x)} 2

2.5.2 Theorem. tJnder the conditions of Theorem 2.4.4, (~(x1) ... '~(~))

converges in distribution to a multivariate normal random vector with zero

mean and identity covariance matrix, where

•

e.

~(x) =(n£n) ~(mn (x) - m(x))

2 J.:{s (x) J K (u)du/f(x)} 2

..

37

2.6 Mean Integrated Square Error.

The mean integrated square error (MISE) I n of an estimator

1 nfn(x) = n- L° (x-X.). 1 n 11=

of a density f is defined as

where on and f are assumed to be square integrable. Watson and Leadbetter

(1963) show that I n is minimized for each n if on is chosen to have a

Fourier transform ~o expressible asn

I~f(t) 12

~ (t) = ---..,;;;------=on (lIn) + ((n-1)/n)l~f(t)12

where ~f is the Fourier transform of f. (Fourier transforms of square

integrable functions have the usual interpretation here.) For the

regression estimation problem, Watson (1964) considers the error criterion

J~ defined by

where appropriate assumptions are made on on and m to insure the finite

ness of the integral. Watson states that J~ is minimized for each n if

on is chosen so as to have Fourier transform

where ~£m is the Fourier transfonn of £m.

38

We asstDlle here that on and fm are square integrable flU1ctions and2that EY < 00. We define the error criterion In by

* 2In = E f (mn(x) - f(x)m(x)) dx

*where mn is the ntDllerator of mn. We will show here that In is also

minimized for each n by choosing on to have Fourier transfonn given by

*<Po ,defined above. Note that In may be interpreted as the MISE of then

ntDllerator of mn or ~, disregarding the denominator.

By the definition of In and Parseva1's fonnu1a, we have

(2.6.1) In = E f (m~(x) - f(x)m(x))2dx

-1 f 2= (21T) E I</> *(t) - <Pfm(t) I dtJIh

*is the Fourier transfonn of mn' so that In may be minimizedwhere <P *mn

by minimizing the extreme right hand side of expression (2.6.1) above.

Now, by Fubini' s theorem for positive :functions ,

E J I<p *(t) - <P fm (t)1 2dtJIh

= f EI<p *(t) - <P fm (t)1 2dt ,JIh

so that In may be minimized by minimizing

(2.6.2)

39

for each t, where g denotes the conjugate of the complex function g.

Note that since fm is an integrable function,

..(2.6.3)

Further,

(2.6.4)

so that

(2.6.5)

Now

and thus

~fm(t) = J eitu £(u)m(u)du .

In.

J- \ ItU

~ *(t) = [n .L YJ.On(u-XJ.)]e dum J=ln

1 n itX. .= n- L Y.e J J 0 (u)e1tudu

j=l J n

1 n itX.= ~o (t)[n- .L YJ.e J],

n J=l

n itX.= I~o (t)1 2

Eln-1 l Y.e JI 2n j=l J

2 2 n itX. -it~= I~o (t)1 -n- ELL Y.Yke Je

n j ,k=l J

itX J ituEYe = m(u)f(u)e du

(2.6.5) I

40

= l~o (t)1 2[(1/n)Ey2+ ((n-l)/n)f~fm(t)12] •

n ..Finally, fran ~.6.3)and ~.6.41,we have

(2.6.6) E[~fm(t)~ #ret) + ~fm(tH *(t)]:mn mn

n -itX.= ~fm(t)~o (t)E[n- l r Y.e J]

n j=l J

1 n i.tX.+ ~fm(t)~o (t)E[n- r Y.e J]

n j=l J

= l~fm(t)12[~o (t) + ~o (t)]n n

Combining (2.6.3), (2.6.5) and (2.6.6) yields

(2.6.7)

= l~fm(t)12 - 2 Re[~o (t)]I~fm(t)12n

+ I~o (t)1 2[(1/n)EY2+ ((n-l)/n)t~fm(t)12]

n-

2 2= [(l/n)EY + ((n-l)/n)l~fm(t)1 ] •

l~fm(t)12 12

~on (t) - (1/n)Ey2 + ((n-l)/n)l~fm(t)12 I

"

.e

..

4i

the last equality following by completing the square and rearranging

tenns. Now

s J Im(u)lf(u)du

s J E[IYI]X=u]f(u)du = EIYI ,

so that

for all t, and hence (2.6.7) is minimized for each t by choosing'If

<Po = <Po .n n

.e

3. ASYMPTOTIC PROPERTIES OF MAXIMUM ABSOLUTE DEVIATION

3.1 Preliminaries

Since our goal in many cases is the estimation of the regression

function over the entire real line, or some subset of the real line, it

is natural to investigate the behavior of our estimators Wlder some

global error criterion. An attempt at this direction was made in

Section 2.6, where we considered mean integrated square error. This was

not entirely satisfactory, however, since we were only able to detennine

the <5-function sequence which minimized the MISE of the m.unerator of the

estimators in question, disregarding the denominator. In this chapter,

we consider a different global error criterion, the maxinn.un absolute

deviation, defined as suplmn(x) - m(x) I where I is a closed, boundedxd

interval of the real line, which we will take without loss of generality

to be [0,1]. We shall mainly be concerned here with conditions under

which the maximum absolute deviation converges to zero in probability

(in this case we say that the estimator in question is unifo~Zy aonsis-

tent over I). We will also be able to find a large sample confidence

bound for the regression function, based on the estimator ron.

Our method of analysis will follow the one briefly outlined below

used by Bickel and Rosenblatt (1973) and Rosenblatt (1976) for probability

density estimators. For a density function estimator

the deviation about the mean fn(u) - Efn(u), normalized so as to have

43

non-zero asymptotic standard deviation, may be written as

(3.1.1)k

(nEn) 2(fn(u) - Bfn(u))

[f(u)]li

where Zn is the empiriaaZ proaess defined by

where Fn is the empirical distribution function (EDF) of Xi'

i = l, ... ,n, and F is the cumulative distributioIl function of Xl'

Kornlas, Major and Tusnady (1975) have shown that a sequence of Brownian

bridges {Bn} on [0,1] may be constructed such that

(3.1.2) sup IZ (u) - Bn(F(u)) I = 0 (n-~log n)-oo<u<oo n

a.s. This fact is exploited, using integration by parts in (3.1.l), to

show that

k(log n) 2 sup IY (u)1

. o~wa n

= (log n)~ sup IYl (u)1 + opel)O~~l ,n

where Yl,n is the stochastic process obtained by replacing Zn(s) with

Bn(F(s)) in the defining expression for Yn. Further stages of approx

imation finally yield

44

(3.1.3)

.e

where YZ,n is the Gaussian process on [0,1] defined by

-J.: IYz (u) = E 2 K«U-S)/En)dW(s),n n

where W is a Wiener process on R. The asymptotic distribution ofJ.:

(log n) 2 sup IYZ n(u) I with proper centering constants, is detennined,Osus1 '

J.:and, in light of (3.1.3), (log n) 2 sup \Yn(u)I has the same asymptotic

Osus1distribution.

We will employ this method to detennine the asymptotic distribution

*of the rnaximtun absolute deviation of the ntunerator Jlh of the estimators

Jlh and mn, properly normalized and centered. Algebraic manipUlation and

elementary analysis will then yield unifonn consistency of the estimators,

with an associated rate of convergence. Since the denominator of mn

is non-stochastic, an asymptotic confidence band for m, based on m ,n

may also be specified.

In the forthcoming development, we will need to use integrals of

the fonn

(3.1.4) Yn(t) = II Yk(~~X)dW(T(X,y)) ,

where T: lR Z -+ [ 0, 1] Z is the trans formation defined by

T(x,y) = (FXIY(X'y), Fy(Y)) ,

and W(·,·) is the Wiener process on [0,1]2. In this section, we will

give conditions for the existence of (3.1.4) and prove some useful

properties.

45

If H(s,t) is a real, measurable function on [0,1]2, then it is

well-known that the L2 integral

II H(s,t)dW(s,t)

exists if

II H2(s,t)dsdt < 00

(see Masani (1968), Chap. 5).

Suppose that f(x,y) > °for all real x and y so that T is one-to

one and hence T- l is a well-defined function on [0,1] 2 to lR 2. Denote,

for fixed n and t

Then, by Theorem 5.19 of Masani (1968), we have e.(3.1.5) II yK( (t-x)/ en)dW(T(x,y))

lR 2

= II Gt(T-l(s,u))dW(s,u)[0,1]2

in the sense that if either integral exists, then so does the other

and they are equal. By the previous remark, the integral on the right

hand side of (3.1.5) exists if

II 2-1Gt(T (s,u))dsdu < 00 •

-ro,l]2

Now

(3.1.6) I 2-1J 2 Gt(T (s,u))dsdu[0,1]

= II G~(x,Y)IJ(x,Y)ldxdYlR 2

46

where J(x,y) is the Jacobian of't (see, e~g. Buck (l965), Sec. 6.l,

Thm. 4), if IJ(x,y)! > 0 for all real x and yand Gt(x,y), f(x,y) and

fey) are continuous. By definition,

J(x,y) =

= f(x,y) > 0

by assumption, using the obvious notation for conditional and marginal

densities. Thus

II G~(T-l(s,u))dSdU

~~ [0,1]2

= II y2K((t-X)/E )f(x,y)dxdy2 n

lR

= Ey2K2((t-X)/E ) < 00n

if, e.g., Ey2 < 00 and K is bOlmded. We note that the above development

holds if, instead of having f(x,y) > 0 for all real x and y, we have

f (x,y) > 0 for x and y in some rectangle of lR 2, and the range of

integration is appropriately adjusted. We will henceforth assume this

to be true without comment.

We will now give properties of the integral (3.1.4) which will be

useful in the future development. We will show

(3.1. 7)

(3.1.8)

47

EYn(tl)Yn(t2)

=II y2K((tl -X)/£n)K((t2-X)/£n)f(x,y)dxdy ,

(AI)

for t l i t 2. In view of (3.1.5) and the definition of the stochastic

integral, (3.1.7) follows. For (3.1.8), we note, by (3.1.5) and (5.2)

of Masani (1968)

=II Gt (T-l(s,U))Gt (T-l(s,u))dsdu1 2

= II y2K((tl -X)/£n)K((t2-X)/£n)f(x,y)dxdy

as in (3.1. 8).

We finally note to close this section that, since Wis a Gaussian

process on [0,1]2 and since Yn(t) is an L2 limit of linear cambinati ons

of W(-,-), we have that Yn(t) is itself a (one-parameter) Gaussian

stochastic process for each n, with mean given by (3.1. 7) and covariance

function given by (3.1.8).

*3.2 Maximum Absolute Deviation of mnIt is convenient to introduce certain assumptions at this point which

will be in force in our main theorem. Let f(x,y) denote the joint density

of (X,Y), fy(Y) the marginal density of Y, and let {an} be a real sequence

wi th an ~ 00 as n, ~ 00. We make the following assumptions:

-3 I 2(log n)£n y fy(y)dy ~ c

IYI~an

for all n and some constant c,

(A2)

(A3)

48

-J.: -1/6 2~En ~ (log n) -+ 0 as n -+ 00 ,

(log n) sup f y2f (x,y)dy -+ 0 as n -+ 00 •

osxsl Iyl >an

(A4) There exists a constant n > 0 such thatan 2

&n(x) = f. Y f(x,y)dy-an

satisfies

&n(x) > n VXE: [0,1] and some n ,

and g~ has a continuous 1st derivative on some interval [-A,A].

Further, the flUlctions

s(x)f(x) = f y2f (x,y)dy ,

E[lYllx=x]f(x) = f IYlf(x,y)dy

are lUliformlybolUlded.

If Y is a bOlUlded random variable, then clearly any sequence {an}

with an -+ 00 satisfies assumptions Al and A3. If the marginal distribu

tion of Y is normal and En = n-o as in Theorem 3.2.1, then it is readily

checked that {an} = {log n} satisfies Al and A2.

* * -J.: J.:We normalize mn(t) - Bron(t) by (nEn) 2[S(t) f(t)] 2, which is pro-

portional to its asymptotic standard deviation, thus defining the

following stochastic process on [0,1]:

(3.2.1)J.: * *(nEn)2[JIh(t) - Emn(t)]

= -------,,-----[s (t) f(t)]~

49

Then we have the following theorem.

3.2.1 Theorem. Suppose the kernel flmction K vanishes outside a finite

interval [-A,A] and is absolutely continuous and has a bOlmded derivative

on [-A,A] and that the marginal density of X is positive on an interval con

taining [0,1]. Suppose Al-A4 hold. Then, for En = n-0, 0 < 0 < ~ ,

- sup IYn (t) Ip{(ZO log n)~ O~t~l k

[A(K)] 2

as n -+ 00, where

}-x-Ze

< x -+ e

A(K) = f KZ(u)du ,

!.: c (K)d = (ZO log n) 2 + (ZO log n)-~{log( l~ )n TI

+ ~[log 0 + log log n]}

ifZ Z

(K) = K (A) + K (-A)c l ZACK)

and otherwise

> 0 ,

where

!.: _!.: C:3(K)dn

= (Zo log n) 2 + (20 log n) 2[log(----z:jT)]

/[1<' (u)] Zduc2(K) = ZA CK)

The proof of Theorem 3.Z.l is based on Theorems 3.Z.Z and 3.3.3,

which follow. Theorem 3.2.Z is due to Bickel and Rosenblatt (1973),

who used it in proving a result similar to Theorem 3.Z.l for probability

density estimators. We will here denote by f K(t)dW(t) the LZ

integral

.e

50

of k with respect to the Wiener process W (see e.g. Doob (1953),

Chap. IX, Sec. Z).

3.2.2 Theorem. Suppose K(e) is a kernel function which vanishes outside

[-A,A] and is absolutely continuous on [-A,A]. Define on [0,1] the

stochastic process

Zn(t) = E-~ J K(t-x)dW(x)n En

where-0

E = nn

with a < 0 < ~ and W(x) is a Wiener process on (-00,00). Then

- sup IZn(t)I ] -xp{{ZO log n)~ Ostsl t: - dn < x} -+ e- Ze

[A(t()] 2

as n -+ 00, where dn and A(K) are as in Theorem 1.

Theorem 3.Z.3 is a special case of Theorem E of Revesz (1976).

3.2.3 Theorem. Let Xl and Xz be independent random variables, each

uniformly distributed over [0,1]. Define the empirical process of

Zon [0,1] , where Fn denotes the empirical distribution ftmction of

(Xl ,XZ) . Then one can define a sequence {Bn} of independent Brownian

bridges on [O,l]Z such that

(3.2.2)

51

sup IZn (x1,x2) - Bn(x1,x2)I0~x1,xis1

-1/6 3/2=O(n (log n) ) a.s.

Proof of Theorem 3.2.1: For convenience, denote sup Ig(t)1 byO~t~l

I Igi I and note that, for any sequence of processes {Zn(t)} defined on

[0,1],

Thus, if we show that

and

converges in law, then

also converges in law, and has the same l~iting distribution.

We will apply the preceding remark to eventually "approx~ate"

the process Yn with the process Zn of Theorem 3.2.2, thus obtaining

the desired result. We will proceed through several stages of such

approx~ation, and the details will be given in the sequence of lemmas

which innnediately follows the proof of this theorem.

e.

52

We first note that Y may be written asn

(3.2.3)

where Zn is the empirical process defined by

Now define the following processes on [0,1]:

(3.2.4)

(3.2.5) = [sn(t)f(t)] -\:n-~ {f yK(t-x)dZ (x,y)Iy s~ En n

where

(3.2.6)

where {Bn} is a sequence of Brownian bridges as in Theorem 3.2.3 and. 2 2T. lR -+ [0,1] is the transfonnation defined by

(3.2.7)

(3.2.8)

T(x,y) = (FXly(xly), Fy(Y)) ,

where {Wn} is a sequence of independent Wiener processes used in con

structing {Bn} as

53

(Revesz (1976)), 0 s u, s s 1 ,

(3.2.9)

(3.2.10)

Y4 (t) = [Sn(t)f(t)J-~E-~ J [Sn(X)f(X)]~K(t-X)dW(x) ,,n n En

Y5

(t) = E-~ J K(t~X)dW(x) ,,n n n

where Wis a Wiener process on (-00,00).

We have, by Lemma 3.2.4,

where 0p(an) refers to a sequence of random variables ~ such that

A /a + 0 in probability. Lemma 3.2.8 givesn n

I IYO - Yl II = 0p ( (log n) -~) .. ,n ,n

By Lemma 3.2.5

and by A2,

-1/2 -1/6 2~En n (log n) + 0 ,

so that

By Lemma 3. 2.6,

54

-0since En = n and hence En log n + O.

Now Y3 is zero mean Gaussian process on [0,1] with covariance,nfunction

(3.2.11)

t -x t-xE-1 {f y2K(_1_) K(_2_) f (x,y) dxdyn Iy :!>a

nEn En

(cf. Section 3.1). The integral on the right hand side of (3.2.11) may

be written as

t -x t-x= E -1 f s (x) f (x)K(_l_) K(_2_) dxnnE En n

Thus the process Y4 is a Gaussian process with the same covariance,nfunction as Y3 ' Le., they have the same finite dimensiona1distribu,ntions. Hence the asymptotic distribution of

•

tilY II

(20 log n)~ 3,nt[>.. (K)] 2

is the same as that of

55

J.: 1- IIY4,nll 1(20 log n) 2 I - - dn,I .

_ [A (K) ]~ J

Further, by Lenuna 3.2.7

By Theorem 3.2.2

(20 log n)~ [IIY5,nl~ - d 1[A (K)] 2 nJ

has the desired limit distribution, and the theorem is proved.

3.2.4 Lemma. If Al is satisfied and

2g(x) - s(x)f(x) = J Y f(x,y)dy

is bOlDlded away from zero on [0,1], then

Proof. Note

o

Y (t) - YO (t)n ,n

so that

56

By assumption,

and thus it suffices to prove that

Now

(3.2.12)

n= LX .(t) =Un(t) ,

i=l n,1

say, where X 1-(t), i = 1, ... ,n are i.i.d. withn,

EX . (t) = 0n,1

for each te:[O,l]. Thus

(3.2.13)

and

(3.2.14) EX2 . (t)n,1

-1 2 2 t-Xi~ (log n) (nE )EY. I (Y.)K (--)

n 1 {Iyl>a} 1 Enn

57

where

k= sup r(u)-M.usA

and A is as defined in Theorem 3.2.1.

Combining (3.2.13) and (3.2.14) yields

nE{t X .(t)}2,

i=l n,l

- -1 J 2= k (log n)E y fy(y)dy ~ 0n Iyl>an

as n ~ 00 by AI. This implies that

(3.2.15)

for 0 ~ t ~ 1.

In order to show that

(3.2.16)

we note that Un(t) is an element of the space D[O,l] of right continuous

functions with left hang. 1:iJ!tits for each n, and that, if we show that

Un converges weakly to the zero element of D[O,l], then (3.2.16) will

follow, since II-I I is a continuous functional on D[O,l]. Since (3.2.15)

implies

in Rk for distinct points t 1,t2, ... ,tk of [0,1], it suffices to verify the ~

•

58

following moment condition to show weak convergence of Un (Billingsley,

(1968), Th. 15.6):

E{IUn(t) - Un(t1)I IUn(tZ) - Un(t)I}

Z~ B(tZ-t1)

where B is a constant.

By the Schwarz inequality,

E{IUn(t) - Un(t1)1-IUn(tz) - Un(t)I}

s {E[Un(t) - Un(t1)]z - E[Un(tZ) - Un(t)]z}~ .

Defining

we have

{E[Un(t) - Un(t1)]z}

-1 n= (log n)(nEn){E[i~l[YiI{lyl>aJ(Yi)Gn(t,tl'Xi )

Z- EYiI{lyl>a }(Yi )Gn(t,t1 ,Xi )]] }

n-1 n

= (log n) (nEn) . {i~lE[YiI{IYI>an}(Yi)Gn(t,tl'Xi)

Z .- EYiI{lyl>a }(Yi )Gn(t,t1 ,Xi )] }

n

-1 n Z Z~ (log n) (nE) o{ L EY.I{I I> }(Y.)G (t,t1,X.)}

n "11 ya 1n 11= n

Since K has a bOlmded derivative on [-A,A], it satisfies a Lipschitz

condition:

59

IK(u) - K(s) I s Bl u-sl

where Bl is a constant. Thus

{E[Un(t) - Un(tl)]Z}~

~ ~ -3/Z I I Z ~s Bl(log n) En t-tl {EYlI{lyl>an}(Yi )}

= Bl(log n)~E-3/Zlt-tll{ f yZfy(Y)dY}~.n IYI>a·n

Applying the same argument to

yie]jds

by Al and using the fact that t l s t s t Z. The moment condition is

therefore satisfied, and the result follows. 0

Before going on to Lemma 3.Z.5, we state the useful integration by

parts formula for Riemann-Stieltjes integrals on rectangles of lR Z.

Let f and g be two functions defined on [0,1]2. If all of the integrals

below exist and are finite, then we have

(3.2.17)1 1 1 1

f f f(x,y)dg(x,y) = J f g(x,y)df(x,y)o 0 0 0

1 1+ f f(l,y)dg(l,y) - f f(O,y)dg(O,y)

o 0

1+ f g(x,l)df(x,l)

°

601

- f g(x,O)df(x,O) •

°We note that if g(x,y) is a Wiener process on [0,1]2 and f(x,y) is a meas"

urab1e nmctionon [0,1]2 such that 11 11f(x,y)dg(x,y) exists, then

° °(3.2.17) remains valid provided the integrals on the right hand side of

(3.2.17) also exist.

3.2.5 Lenma. If K is absolutely continuous on [-A,A] and zero outside

[-A,A], then

II I -1/2 -1/6 3/2Y1,n - Y2,n l = O(anEn n (logn) )

a.s.

Proof. First we note that the random pair

.e (3.2.18) (X' ,Y') = T(X,Y) ,

where T: ]R2 -+ [0,1] is defined by (3.2.7), is jointly unifonn1y dis

tributed on [0,1]2, X' and Y' are independent, and Zn(T-1(X',y')),

°~ x', y'~ 1, is the empirical process of (X',Y') (Rosenblatt (1952)).

Theorem 3.2.3 thus applies to (X' ,Y'), and we may conclude that

sup IB (x' ,y') - Z (T-1(x' ,y')) IO~x' ,y'~l n n

a. s. ,

or, equivalently,

•(3.2.19) sup IB (T(x,y)) - Z (x,y) I

x,ye:lR n n

-1/6 3/2=O(n (log n) ) a.s.

61

Applying the integration by parts fonnu1a (3.2.17), we have

(3.2.20){

J YK(t-X)dZn(x,y)Iy san e:n

A ans J J yK(u)dZn(t-e:nu,y)

u=-A y=-an

A an= J J Zn(t-e:nu,y)d[yK(u)]

-A -an

A+ a J Zn(t-e: u,a )dK(u)

n u=-A n n

A+ a J Z (t-e: u,-a )dK(u)

n n=-A n n n

e.

The second to last integral above may be written, using ordinary one

dimensional) integration by parts,

and similarly for the last integral on the right hand side of (3.2.20).

By using a similar argtunent, we obtain (where the integrals are

defined in the L2 sense)

(3.2.21)

62

J f YK(t~x)dBn(T(X,y))Iy!san n

A an= f f Bn(T(t-Enu,y))d[yX(u)]

u=-A y=-anA

+ a f B (T(t-E u,an))dK(u)n u=-A n n

an

+ k(a){ f Bn(T(t-EnA,y))dyy=-an

Subtracting (3.2.21) from (3.2.20) and using (3.2.19) and the assumption

that .K is absolutely continuous on [-A,A], we obtain

(3.2.22) E~I~(t)I~IY1,n(t) - Y2,n(t)I

= O(n-1/ 6(log n)3/2) •

A{4a f IK'(u) Idu + 4a [K(A) + K(-A)J} a.s.

n -A n

= O(a n-1/ 6(log n)3/2)n

63

since

Thus, since

AJ IK' (u) Idu < 00

-A

II~II is a bounded sequence by asstmtption,

and the proof is complete.

We may write the sequence of Brownian bridges {Bn} of Theorem

3.2.3 as

o

(3.2.23)

o s x,y s 1, where {Wn} is a sequence of independent Wiener processes

on [0,1]2 (Revesz (1976)). The next lemma shows that, for our purposes,

the only significant part of (3.2.23) is Wn(x,y).

3.2.6 Lemma. If A4 holds, then

e.

Proof. By definition of Y2 and Y3 ,we have,n ,n

IY2 (t) = Y3 (t)1,n ,n

since the Jacobian of the transfonnation T is f(x,y). Thus

-~ I I£ I Y2 - Y3 In ,n ,n

64

-1 JJ I t-x I• sup E yK(--) f(x,y)dxdyustsl n Iylsa En

n

s IWn (l,l)I Ilg~~11

• sup E~lJ [J lylf(x,y)dylIK(t-x)ldx .usts1 En

By A4,

hex) = J IYlf(x,y)dy

is a bounded function and II~~ II is a bounded sequence, so that for

some constant M we have

E -~II YZ

- Y3

IIn ,n ,n

= IWn(l,l)1M J IK(u)ldu

=0 (1) .P

Thusl.:

II YZ - Y3 II = 0 (E 2),n ,n p n

and the proof is complete.

3.2.7 Lemma. Under the asstunptions of Theorem 3. Z.l,


o

(3.2.24)

65

1Y4,n(t) - Y5,n(t)I

~I &n(x) k t I= E~ f{[ &net) ] 2 -l}K( ~:)dW(X)

~I A g (t~UE) k I= E~ f {[ n (t)n ] 2 -l}k(u)dW(t-UEn) •-A gn

(3.2.25)

By using integration by parts andth~ assumptions that ~ and K are

absolutely continuous, we may bound the integral on the extreme right

hand side of (3.2.24) with

Ik A d &n(t~E ) k I

E~ 2 -i W(t-UEn) dU { [( &nCt)n) 2 ··l]K(u) }du

= J l net) + J 2 (t) + J 3 (t) ,, ,n ,n

say. We will show that the supremum over [0,1] of each of these three

terms in Op(E~), thus completing the proof.

First of all, note

and

..

.e

66

Now

-11 &n(t-AEn) ~ Isup E [ ] -1O~t~l n &net)

By asstnnption, II ~~II is a bounded sequence, and by the mean value

theorem,

E~ll [&n(t-AEn)]~ - [gn(t)]~1

-1 I I-~~En Ign(t-AEn) - &net) .lxn(t,A) 2

where ~ (t,A) is between &n (t-AEn) and &n(t) . Applying the mean value

theorem to &n yields

where t n (t ,A) is between t - AEn and t. Thus

A I~(t (t,A))Is K(A) sup IW(t) I •- sup n

-Astsl 20~tsll~(t,A)I~

since ~ is uniformly bOlDlded and &n is bO\Dlded away from zero, by

67

assumption, and thus

A similar argtDtlent shows that

so we now consider JI,n.

Carrying out the differentiation in the integrand of JI,n' we have

= leI (t) - C2 (t)l,,n ,n

say. Now the non-stochastic tenns in the integrand of C2 are tmifonnly,nbotmded in their arguments and in n, by assumption. We therefore have

where C2 is a constant. For CI,n' apply the same argtDtlent used in con

sidering J 2 to conclude that,n

where CI is a constant. Then

e.

68A

IICI II SCI sup f \w(t-m:n)Kt(u)uldu,n Ostsl -A

and the proof is complete. o

We now use the results proved thus far in showing that YO ,n and

Yl are sufficiently close to one another.,n

3.2.8 LelTll1a. Under the asstDllptions of Theorem 3.2.1,

.e

Proof. We must show that

sup {I [get) r~ - [~(t)] -~II E-~ if yK(t-X)dZ (x,y) I}Ostsl n Iy sa En n

n

-k= 0p((log n) 2) •

By the preceding four lenunas and Theoran 3. 2. 2,

converges in distribution to same random variable, and is therefore a

Opel) sequence. Since, by definition,

k~ = O((log n) 2) ,

we have

-kand since I~ 211 is a bOlmded sequence, we have

69

Thus it suffices to prove

as n -+ 00. By the mean value theorem,

-k -ok, I , -3/21Ig 2 - g 2 = Q~ -g I- hn ""'Il n

where ~ is between &n and g. Since ~ and g are bounded away from zero,

11~3/21' is a bounded sequence, and since, by A3,

(log n) II &n-g II -+ 0 ,

the result is proved. o

* .Since mn(t) is an asymptotically unbiased estimator of m*(t) =I':

m(t)f(t), it is natural to seek conditions under which Emn(t) may be

replaced by m*(t) in Theorem 3.2.1. Define the process

~ * it(n£n) [ffln(t) - m (t)]Y' (t) = ----...,-----n [s(t)f(t)]~

Then we have the following corollary to Theorem 3.2.1.

3.2.9 Corollary. Suppose all the conditions of Theorem 3.2.1 hold

and in addition

-0£ = n ,1/5 < 0 < 1/2 ,n

K satisfies

-e

70

f uK(u)du = 0 ,

and the ftmction

m*(t) =m(t)f(t) = f yf(t,y)dy

has botmded, continuous 1st and 2nd derivatives. Then the conclusion

of Theorem 3.2.1 holds, with Y~ replacing Yn.

Proof. According to the remark at the beginning of the proof

of Theorem 3.2.1, it suffices to show

But

By asstDl1ption,

and we know that, tmder the assumptions on m* and K,

Since

then

and the proof is canp1ete. o

71

Based on this corollary, we may construct a confidence band for met) , eo ~ t ~ 1 as follows. Using the asymptotic distribution, we have

where

suplY' (t) IP{(20 log n)~[ n~

[>. (K)]- d ] < C(a)} ~ 1 - a

n

C(a) = log 2 - 10g110g (1-0.)1 .

Inverting the above expression in the usual way, we obtain as a (l-a)xIOO%

confidence bond for met):

(3.2.26)

O~t~l.

ron(t) ± (nE )-~[ ~ ]~[ c(>.) + ~][>'(K)]~ ,n 1TtJ (20 log n)~

e-3.3 Uniform Consistency of mn and ron.

We saw in Corollary 3.2.9 that the sequence of random variables

*m (t) - m(t)f(t) I(log n)~[(En)~ sup I n k - d ]

O~t~l [s(t)f(t)] 2 n

converges in distribution, and is thus a 0p(l) sequence. We employ this

fact to show the tmifonn consistency of m*, and specify a rate of con-n

vergence.

3.3.1 Lemma. Under the conditions of Corollary 3.2.9,

(3.3.1) I * k -~sup mn(t) - m(t)f(t) I = 0p[(log n) 2(nEn) ] .O~t~l


kdn

= O((log n) 2 ,

and thus

72

*1.:: I mn(t) - met) f(t)(n£n) 2 sup t

O~tSl [s(t)f(t)] 2

= 0p((log n)-~) + O((log n)~)

= 0p ((log n)~) .

Now, using the assumption that get) = s(t)f(t) is bounded away from,

zero, the conclusion follows. o

We now use the preceding lenuna to show uniform consistency of II\,.

and mn.

3.3.2 Theorem. Under the conditions of Corollary 3.2.9, we have

-eC3.3.2)

(3.3.3)

limn - mil = 0p[(log n)~(n£n) -~] ,

I ~-1.::limn - m I = Op[ (log n) (n£n) 2] •

Proof. Note that

where

mtt) = f(t)m(t) .

By assumption, f is bOWlded away from zero on [0,1], and thus

I If-I, I < ~. An application of Lemma 3.3.1 thus proves (3.3.2).

For (3.3.3), note

limn-mil = Ilm:ff-/nm"II

n

A s

= A + B ,

say. Now

by (3.3.2). Further,

II~II . II fn

; f IIn

*= II;nlIOp[(10g n)~(I1En) -~]

n

(Bickel and Rosenblatt (1973)). Since

*II ~III s II m* II • [ inf If (t) Ir 1

n n Osts1 n

*and it is easily verified that 11m IIn

inf If(t)1 > 0 , (3.3.3) follows.Osts1

p *-+- I 1m II, inf If (t) I -+-

Osts1 no

e-

-e

4. AN EXAMPLE, FURTHER RESEARCH

As we noted in the introductory chapter, if the density of X is

mown, then either the estimator JI\" or ron may be used to estimate the

regression ftmction. Here we will sununarize some results given in

Chapter 2 which relate to the relative performance of mn and mn in this

case. We then present an example in which JI\" and mn are computed from

a set of simulated data.

4.1 The Estimators mn and ron·

We first note that, according to Theorem 2.3.4, if the density

function of X has, say, an interval for its support and is non-zero at

the endpoints of the interval, then JI\" is a consistent estimator at the

endpoints, whereas ron is not. The implication of this for finite sample

sizes is that m is likely to display a bias near the endpoints of then

X variable which JI\" will not have.

According to Theorems 2.4.4 and 2.5.2, under appropriate conditions,

both IDn(x) and mn(x) have asymptotic normal distributions with mean

m(x) (for kernel type estimators). However, the sequence of scaling

constants required for unit asymptotic variance differs for the two est

imators; for IDn(x) it is {a2(x) JK2(u)du/(nEn)flex)}~ and for mn(x) it

is {sex) J K2(u)du/(nEn)fl(x)}~. Since

a2ex) = sex) - m2(x) s sex) ,

this indicates that mn may display more dispersion about m for finite

sample sizes than mn•

75

4.2 An Example.

In order to illustrate the behavior of the stimators in one specific

case, we have computed II\,. and ron for a set of artificial data. We have

also computed the approximate confidence intervals given by (3.2.26) for

m, based on ron. The results of the computations are depicted in Figures

1-6, and we have also shown a scatterplot of the data and the true regres

sion function on each figure. The data consists of n = 200 points

(Xi,Yi ) chosen independently with Xi ~ U(-3,2)and

3 2y. = X./3 + X. + z.1 1 1 1

where Zi is a standard nonnal variable independent of Xi. Thus, for this

data·32m(x) = x /3 + x •

All calculations are for kernel type estimators with kernel function given

by a standard nonnal density function, truncated at + 3 and normalized

so as to be a probability density.

Figures 1 and 2 show the estimators II\,. and 11\,., respectively, with

£n = n-· 2l and Figures 3 and 4 show II\,. and II\,. with slightly less smoothing,

£n = n-· 4 The previously discussed bias of ron is evident at the upper

endpoint on Figures 2 and 4, although II\,. and ron do not differ by very

much at the lower endpoint. The difference in the asymptotic variances

of m and m does not manifest itself in this example, although m inn n nFigure 4 has a slightly more variable appearance than mn in Figure 3.

Figures ,5 and 6 show the approximate confidence bands given by

(3.2.26) for a = .1, and £n = n-· 2l and £n = n-· 4, respectively. The

confidence bands (3.2.26) are asymptotically valid for any subinterval

of [-3,2]. In practice, however, one should consider these confidence

76

bands to be approximately valid only for intervals well within the

support of X, since the earlier remarks on the endpoint bias of mnapply to the confidence bands also. These confidence bands were

calculated using the true conditional second moment

In practice, one would use an estimator of set), e.g. the consistent

estimator

4.3 Further Research.

Theorem 3.2.1 was proved for the process

(nEn)~[m*(t) - Em*(t)]Y (t) = n nn [s(t)f(t)]~

It should be possible to carry out a similar program for the process

*~ Emn(t)(nEn) [rnn(t) - Ef (t) ]

Vn

(t) = _--::;: .....,..~n~_

[02(t) /f(t) ]~ .

A first step in such a proof might be to show the equivalence of Vn to

the process

(in the sense of IIVn - V~ II = 0p ((log n) -~)) . Successive approximations,

as in Theorem 3.2.1 would lead eventually to the equivalence of V~ to

the Wiener process of Theorem 3.2.2, and thus to the asymptotic distri-/

bution of the maximum absolute deviation of Vn.

77

We have not been able to carry out the technical details of the

proof of such a theorem. However, if it were to be proved, one applica-

tion would be a confidence band such as (3.2.26), but based on rnn

instead of rn , and therefore narrower since rn is asymptotically lessn n

variable than rnn'

e• ,

- eESTIMATED REGRESSION, TRUE REGRESSION. SCATTER PLOT

7 . 0~ -- ur--------! ! -_,- ._-

6.0 f-

I5.0 r-

If. 0~I

3.0~

i2.0r--

I

I1.0 r=

........0=;

x

x

xx

x

x

x

x

xx

xo

~

2.01.0o-1.0-2.0

x

I II I-1.0' I

--3.0

-.21Figure 1. The Estimator mn With En = n

ESTIMATED REGRESSION, TRUE REGRESSION, SCATTER PLOT7 .0iii Iii

5.0

6.0

'-J\0

xx

x

~0x x

x x xx "x

x

xx

xx

xo

1.0

if.O

3.0

2.0

2.01.0o-1.0-2.0-1.0 ! , , I I ,

-3.0X

Figure 2 The Estimator ffi With E = n-· 21. n n

e e e•

e e e

TRUE REGRESSION. SCATTER PLOT--I --- -.---.. --~ --.

00o

I

xl!'If><

X

XX

XX

X

X.X

x• I~ X

X ~ _~ x

XX X ~ X

~0x. X

X l'X

X

X

I

6.0Li

I

5.0 r-if. 0~

3.0~I

i2,Or X X

X X X

, X ~ ~ XM<--2c:XX X ~ ~ X

1. f) rx~- I

IoX

iSTIMATeD REGRESSION.7.0 1 ----- --,-- i-

Ii

2.01.0o-1. 0-2.0--1 . 0 ' I , I , ,

--3.0

XFigure 3, The Estimator m With E = n~,4

n n

<Xl:~

2.01.0o

x

-1.0

x

-2.0

if.O

ESTIMATED REGRESSION, TRUE REGRESSION, SCATTER PLOT7 .0iii iii

o

1.0

6.0

5.0

2.0

3.0

-1.0 ' , , , J ,

-3.0x

Ffgure 4. The Estfmator ffin With En = n~·4

e" ••

e e

82

o

o

0 .....N

IC

II

C

eW

-'=~

0.....3

''''- c, I

.--. IE.. I II C

I0

"'0

I QJ(/1

I ttl

ICO

"'0Cttl

~0

caQJ

N uc

I QJ"'0.....4-C0

U

Ln

QJ~

0~

en.....('/') u.

0 0 0 0 0 0 0 0I

r- 1.0 Lf) =:t- (Y) N .--. 0 .--... I

e

83

x

o:::t.I

eII

e ew ...c~.....

0 3: ...e -

.....-l IEI e

0

"'0Q)tilttlo:l

I"'0ettl

CO

I0 Q)

u---,C\J e

I Q)

I "'0.....4-e0u

1.0

Q)S-:::l

0 O'l.....LJ..

0 0 0 0 0 0 0 0 (Y)

1 ..r- ~ If) :::t- ty) C\J .....-l 0 .....-l

I e

1.

2.

BIBLIOGRAPHY

Aizerman, M.A., Bravennan, E.M. and Rozonoer, J.T. (1970). Extrapolation problems in automatic control and the method of potentialfunctions. American Mathematical Society Translations (2)" Vol.87.

Benedetti, J.K. (1974). Kernel estimation of regression functions.Ph.D. Dissertation, Department of Biostatistics, University ofWashington.

3.

4.

....

Bhattacharya, P.K. (1976). An invariance principal in regressionanalysis. Annals of Statistics,,!, no. 3, 621-624.

Bickel, P.J. and Rosenblatt, M. (1973). On some global measures ofthe deviation of density f\.D1.ction estimators. Annals of Mathematical Statistics" ~ 1071-1095.

S. Billinglsey, P. (1968). Convergence of Probability Measures, Wiley.

6. Buck, R.C. (1965). Advanced Calculus, MC-Graw-Hill.

7. Cover, T.M. (1968). Estimation by the nearest neighbor rule. IEEETransactions on Information Theory" IT-14" SO-55.

8. Cover, T.M. and Hart, P.E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Info~ation Theory" IT-1J" 21-27 •

9. Devroye, L.P. (1978). The uniform convergence of nearest neighborregression function estlinators and their application in optimization. IEEE Transactions on InfoT'rrtation Theory" IT-24" no. 2.

10. Fisher, 1. and Yakowitz, S. (1976). Uniform convergence of the potential function algorithm. SIAM Journal on Control" J:.i., 95-103.

11. Fix, E. and Hodges, J.L., Jr. (1951). Discriminatory analysis,nonparametric discrimination, consistency properties. RandolphField, Texas, Project 21-49-004, Report no. 4.

12. Kornl6s, J., Major, P. and Tusn:ldy, G. (1975). An approximation ofpartial sums of independent R.V. 's and the sample DF, I.z. Wahrscheinliahkeitstheorie und Ve~. Gebiete" ~ 111-131.

13. Konakov, V.D. (1977). On a global measure of deviation. for an estimate of the regression line. Theor,?, of ProbabiUty and Appliaations" ~ no. 4, 858-868.

14. Lai, S.L. (1977). Large sat'lple properties of K-nearest neighborprocedures. Ph.D. dissertation, Department of Mathematics,UCLA.

85

15. Leadbetter, M.R. (1963). On the nonpararnetric estimation of probab- ..i1ity densities. Ph.D. dissertation, Department of Statistics, ..,University of North Carolina at Chapel HilL

16. Lo~ve, M. (1963). Pl'obabiUty Theory~ 3rd ed., D. Van Nostrand Co.

17. Loftgaarden, D.O. and Quesenberry, C.P. (1965). A nonparametricestimate of a multivariate density ftmction. Annals of MathematiaalStatistias~ ~ 1049-1051.

18. Masani, P. (1968). OlathogonaUy Saatter>ed MeasUI'es~ AdvancedMathematics.

19. Nadaraya, E.A. (1964). On estimating regression. Theory of Pl'obability and Appliaations~ ~ 141-142.

20. Parzen, E. (1962). On estimation of a probability density ftmction.Annals of Mathematiaal Statistias~ ~ 1065-1076.

21. Priestly, M.B. and Chao, M.T. (1972). Nonpararnetric ftmction fitting.JOUI'Ylal of the Royal Statistwal Soaiety~ Benes B~ ~. 385-392.

22. R~v~sz, P. (1976). On strong approximation of the multidimensionalempirical process. Annals of Pl'obability~ ~ 729-743.

23. Rosenblatt, M. (1952). Remarks on a 1!Rl1tivariate transfonnation.Annals of Mathematiaal Statistias~ ~ 470-472.

24. Rosenblatt, M. (1956). Remarks on some nonpararnetric estimators ofa density ftmction. Annals of MathematiaaZ Statistias~ ~

832-837.

25. Rosenblatt, M. (1971). Curve estimates. Annals of MathematiaalStatistias~ ~ 1815-1842.

26. Rosenblatt, M. (1976). On the maximal deviation of a k-dimensiona1density estimator. Annals of Pl'obability~ ~ 1009-1015.

27. Schuster, E.F. (1972). Joint asymptotic distribution of the estimatedregression function at a finite number of distinct points. Annalsof Mathematiaal Statistias~ ~ 84-88.

28. Schuster, E.F. and Yakowitz, S. (1979). Contributions to the theoryof nonpararnetric regression, with applications to system identification. Annals of Statistias~ ~ 139-149.

29. Stone, C. (1977). Consistent nonpararnetric regression. Annals ofStatistias~ ~ 595-645.

30. Watson, G.S. (1964). Smooth regression analysis. Sankhya~ Senes A~

~ 359-372.

"

J

•

...

86

31. Watson, G.S. and Leadbetter, M.R. (1963). On the estimation of theprobability density, I. AnnaZs of MathematiaaZ Statistias~ ~

480-491.

32. Watson, G.S. and Leadbetter, M.R. (1964). Hazard analysis, II •Sankhya~ Series A~ ~ 101-116. .

33. Wegman, E.J. (1972a). Nonparametric probability density estimation I:A SUlJD11ary of available methods. Teahnometl'ias~ J.iJ 533-546.

34. Wegman, E. J • (1972b). Nonparametric probability density estimation II:A comparison of density estimation methods. Jou.:rrnaZ of StatistiaaZComputation and SimuZation~ ~ 225-245.

35. Yang, S.S. (1977). Linear functions of concomitants of order statistics. Tedmica1 Report No.7, Department of Mathematics, M.!.T.

•

•

UNCLASSIFIEDSECURITY CLASSIFICATION OF THIS PAGE (Wh.n D.t. Ent.r.d)

REPORT DOCUMENTAllON PAGE READ INSTRUCTIONSBEFORE COMPLETING FORM

l. REPORT NUMBER r' GOVT ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER

4. TITLE (end Sublltl.) 5. TYPE OF REPORT a PERIOD COVERED

Smooth Nonparametric Regression Analysis TECHNICAL6. PERFORMING O~G. REPORT NUMBER

Mimeo Series No.7. AUTHOR(.) h8§ir%C_T2~98RANTNUMBER(.)

Gordon JolmstonONR NOOO14-75-C-0809

9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASKAREA a WORK UNIT NUMBERS

Department of StatisticsUniversity of North CarolinaChapel Hill, North Carolina 27514

11. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE

1. Air Force Office of Scientific Research September 1979Bolling AFB, DC 13. NUMBER OF PAGES

2. Office of Naval Research 8614. MO~1~gt3~~CV.KAM2~21Q)'RESS(1f dllf.r.nt from Controllln, Oft/c.) 15. SECURITY CLASS. (of thl. r.port)

UNCLASSIFIED

15•. DECLASSIFICATION/DOWNGRADINGSCHEDULE

16. DISTRIBUTION STATEMENT (01 thl. R.port) I

Approved for Public Release: Dist.ribution Unlimited

17. DISTRIBUTION STATEMENT (01 th • • batr.ct .nter.d In Block 20, If dlU.rent from R.port)

18. SUPPLEMENTARY NOTES

Ph.D. dissertation under the joint direction of Professors R.J. Carrolland MuR. Leadbetter

19. KEY WORDS (Contlnu. on r.v.r...Id. If n.c.....,. end Id.ntlfy by block numb.r)

Regression function, Watson-type estimator, Pointwise weak consistency,Asymptotic joint nonna1ity.

20. ABSTRACT (Conl/nu. on r.v.r•••Id. If n.c••••ry end Id.ntlfy by block numb.r)

This work investigates properties of the Watson type estimator of the re-gression function E[YIX=x] of a bivariate random vector (X,Y) • Pointwiseweak consistency of the estimator is demonstrated. Sufficient conditionsare given for asymptotic joint normality of the estimator taken at a finitenumber of distinct points, and for the uniform consistency of the estimatorover a bounded interval. A large sample confidence band for the regressionftmction, based on the estimator, is derived.

DO FORM1 JAN 73 1473 EDITION OF 1 NOV 55 IS OBSOLETE UNCLASSIFIED

SECURITY CLASSIFICATION OF THIS PAGE (Wh.n D.t. Ent.r.d)

..

SECURITY CLASSIFICATION OF THIS PAGE("".. D.t. Ent.,.d)

SECURITY CLASSI,.,CATIOII OF Tu>< PAGE(Wh.n Data E,. .. ,

•

..

•

gordon j. johnston department of statistics university … · gordon j. johnston" department...

Documents