gordon j. johnston department of statistics university … · gordon j. johnston" department...
TRANSCRIPT
·e•
«
~ OONPARAMETRIC REGRESSION ANALYSIS
"Gordon J. Johnston
Department of StatisticsUniversity of North Carolina, Chapel Hill
*This research was supported in part by the Air Force Office of ScientificResearch under Grant AFOSR 75-2796 and by the Office of Naval Research underGrant N00014-75-C-0809.
CHAPTER 1.
TABLE OF CONTENTS
Introduction t Summary and Related Work.
1.1 Introduction------------------------------------------------- 11.2 Summary------------------------------------------------------ 31.3 Related Work------------------------------------------------- 4
CHAPTER 2. ConsistencYt Normality.
2.12.22.32.42.52.6
o-Function Sequences-----------------------------------------11Nonparametric Density Function Estimation--------------------16Pointwise Consistency Properties of m and m ----------------18n nAsymptotic Distribution of m --------------------------------24nAsymptotic Distribution of m--------------------------------36nMean Integrated Squate Error---------------------------------37
CHAPTER 3. Asymptotic Properties of Maximum Absolute Deviation.
3.1 Pre1iminaries------------------------------------------------423.2 Maximum Absolute Deviation of m~-----------------------------47
3.3 Uniform Consistency of m and m -----------------------------71n n
CHAPTER 4. An Examp1e t Further Research.
4. 14.24.3
The Estimators m and m-------------------------------------74n nAn Examp1e---------------------------------------------------75Further Research---------------------------------------------76
•
FIGURES-----------------------------------------------------------78
BIBLIOGRAPHV------------------------------------------------------85
CHAPTER 1. INTRODUCTION, SUMf1ARY AND RELATED WORK
1.1 Introduction
Regression analysis is concerned with the study of the relationship
between a response variable Y and a set of predictor variables
~ = (XI,X2""'~)' An important aspect of regression analysis is the
estimation of the regression function, i.e. of the conditional expecta~
tion of Y, given~. In classical regression analysis, the functional
form Qf the regression function is assumed to be known up to a finite set
of unknown parameters, which may be estimated from data.
If no such prior knowledge of the regression function exists, then
classical methods do not apply. However, in this case, it may still be
desirable to obtain an estimate of the regression function, either for
direct analysis or to establish a plausible model for use in the classical
regression analysis mentioned above.
Thus there is a need for regression analysis methods which do not
assume a specific mathematical form for the regression function, i.e.
nonparametric methods.
In this study, a type of nonparametric estimator of the regression
function m(x) = E[YIX=x] will be investigated, where (X,Y) is a bivariate
random vector.
Let X and Y be random variables defined on a probability space
(Q,F,P) with EIYI < 00. Denote the marginal distribution function of X
by F. Then the regression function m(x) is defined by
2
m(x) = E[Ylx=x], i.e. the (unique a.e. (dF)) Borel measurable function
m sat~sfying
(1.1.1) f YelP = f m(x)aF(x)X-IB B
for all Borel sets B. If X and Y have a joint density function f, then
it follows that00
J yf(x,y)dy
(1.1. 2)
-00
f(x)if f(x) > 0
o if f(x) = 0
is a version of the regression function, where f denotes the marginal
density of X. Motivated by (1.1. 2) and previous work on estimation of
density functions by o-function sequences, Watson (1964) suggested an
estimator of m(x) of the form
(1.1. 3)
n(lIn) l y.o (x-X.)
. lIn 11=mn(x) = ---n-----
(l/n).L °n(x-XJ.)
J=l
e...
where (X1,Y1), (X2'YZ)""'(Xn,Yn) are independent observations on (X,Y)
and {on(x)} is a sequence of weighting functions called a a-function
sequence. The estimator mn(x) defined in (1.1.3) will be investigated
here. By rewriting (1.1.3) as
n °n(X-Xi ) 1m (x) = L Y. -n----n . 1 1
1= L 0 (x-X.)L j=l n J J
we have the intuitively appealing interpretation of mn(x) as a weighted
average of the Y-observations, with the weights depending on x through
..
....
f(x).e
3
0n(x). Also, if one desires a smooth estimate of m(x) , this can be
achieved through the choice of 0n(x).
In certain situations, the marginal density f of X is known. For
example, suppose in an experiment, we are able to fix the level of the
predictor variable X, but we wish to randomize X to reduce sampling bias.
Then we would choose X randomly according to a known density f. This
situation also arises in certain optimization problems, where the value
of the function to be optimized can only be detennined up to a random
error term (see Devroye (1978)). Since the denominator of (1.1.3) is
intended to estimate f, a reasonable way to use the knowledge of f might
be to use the modified estimatorn
(lIn) LY.O (x-X.). 1 1 n 11=
iii (x) = ------n
We provide some preliminary comparisons of the estimators mn and mn in
the known density case.
1. 2 Surrma ry
Since we will assume a specific mathematical form for neither the
regression f~ction m nor the underlying probability distribution of
(X,Y) , we could not reasonably expect to obtain small sample results for
the estimators in question. Thus we shall concern ourselves here almost
exclusively with asymptotic results, as the sample size grows larger.
In Chapter 2, we rigorously establish (weak) pointwide consistency
of mn(x) , which was proved heuristically by Watson (1964). Asymptotic
joint normality of rnn(x) , taken at a finite number of points, is demon
strated. In this last result, we significantly weaken a condition of
4
Schuster (1972) on the 6-function sequence used, at the expense of
some mild additional regularity conditions. In the known density case,
we establish consistency and asymptotic normality for mn, so as to provide
a comparison with the asymptotic normality and consistency results for ffln.
We consider the mean integrated square error (MISE) of the ntDnerator of
the estimator mn. An explicit expression for the Fourier transform of the
6-function which minimizes this MISE for each san~le size n is derived,
much as Watson and Leadbetter (1963) did for density function estimators.
In Chapter 3, we consider the ntDllerator of ffln' and show that the
supremum, taken over a finite interval, if properly centered and normalized,
converges in distribution to a random variable having an extreme value type
distribution. This result is then applied to establish uniform (weak)
consistency of ffln' with an associated rate of convergence.
In Chapter 4, we give some examples of calculations of the estimators
mn and ron from simulated data.
1.3 Related Work
Estimators of the form (1.1.3) of the regression function and several
other types of nonparametric estimators of the regression function have
recently received attention in the literature. Here we survey the
recent literature on this subject.
1.3.1 Kernel Type Estimators
Several authors have considered estimators of the form (1.1.3) when
the 6-function sequence is of kernel type. Kernel type o-function seq
uences (defined rigorously in Lemma 2.1.2) are of the form-10n(x) = En K(x/En) , '
where K is, e.g. a probability density function and En is a positive real
e...
5
sequence with £n ~ 0 as n ~ 00.
Schuster (1972) considers the asymptotic normality of this type of
estimator. Because of the close relation between Schuster's work and
work presented in this dissertation, we defer discussion tmtil Section
2.4.
Schuster and Yokowitz (1978) consider a global error criterion for
this type of estimator, and for its derivatives as estimators of the
derivatives of the regression ftmction. Let g(r) denote the r-th
derivative of the function g. Schuster and Yokowitz give conditions lUlder
which, for any £ > 0 and n sufficiently large,
(1. 3.1)
< C/( 2N+2 2)- n£ n £ ,
where N is a positive integer, [a,b] is a closed, bOtmded interval and2N+2C is a constant not depending on n. If {£n} is such that n£n ~ 00
as n ~ 00, then (1.3.1) implies that
as n ~ 00, so that this result is a type of tmifonn consistency result.
It would be of interest to detennine a rate of convergence to be assoc-
iated with the
bn ~ 00 such that
result, i.e., a positive, real sequence {bn} with
This question is addressed in Chapter 3 of this dissertation for the case
N = O. Schuster and yakowitz also consider the case where the X
6
variable is non-stochastic, i.e., we have {F(·;x), xE[O,l]} as a family
of probability density functions and the object is to estbnate
w(x) = !yF(y;x)dy
on the basis of an independent sample Yi' i = 1, ••• ,n where Yi has
density f(.;xi ). They give conditions under which a result stmi1ar to
(1.3.1) holds for the so-called Priestly-Chao estbnator
(1.3.2)
of w(x) (see Priestly and Chao (1972)).
In the non-stochastic X variable case as described above, Benedetti
(1974) shows that both the Watson estbnator and the Priestly-Chao
estimator are (weakly) consistent and asymptotically normal for appro
priate values of x, but he points out sane computational advantages of
the Watson esttmator over the Priestly-Chao.
Konakov (1975) considers a quadratic deviation error criterion for
the Watson estimator with kernel type o-sequences. Define the quadratic
deviation to be
where fn is a kernel type estimator of the marginal density f of X and
p is a bounded integrable weight ftmction. Konakov gives conditions
tmder which Tn' if properly normalized and centered, is asymptotically
noma!. We do not consider quadratic deviation in this dissertation.
1.3.2 Nearest Neighbor Type Estimators.
Watson (1964) proposed estimating m(x) with the average of the Y
values corresponding to the k X values nearest to x, "where k is some
, I
e.
7
integer. This type of estimator is called the k nearest neighbor
estimator of the regression ftDlction. Earlier work had been done on
the classification problem and on estimation of a probability density
ftDlction using nearest neighbor teclmiques. (See Fix and Hodges (1951),
Cover and Hart (1967), Cover (1968), and Loftgaarden and Quesenberry
(1965) for work in these areas.)
Let ken) be an integer depending on the sample size n and denote
by Ik(n) (x) the smallest open interval centered at x containing no less
than k of the X-observations. Then the k-nearest neighbor estimator
~, can be written as
(1. 3. 3)
.e
~ -1 \mn(x) = k l y. •{i:Xi€Ik(n)(X)} 1
Stone (1977) points out that ~ (x) may be a discontinuous ftDlction, and
that in same cases, smoothness is a desirable property in a regression
,(1. 3.4)
function estimator. Lai (1977) proposes a modification of the k nearest
neighbor estimator which can have the desired smootlmess property. This
estimator is very similar to the Watson estimator (1.1.3) with kernel
type o-ftDlction sequence. Let Wbe a probability density with W(x) = 0 '
for Ix I > 1. Then Lai's estimator is defined by
L~=lYiW((x-Xi)/Rk(n)(x))"-mn (x) =-n---------
Li=lW((X-Xi)/Rk(n) (x))
where Rk(n) is the radius of the interval Ik(n). This estimator reduces
to the form (1.3.3) when W(x) = 1/2 for Ixl ~ 1. Lai proves the following.
1.3.1 Theorem. AsslDlle W is continuous a.e., botDlded and W(x) = 0 for
Ix I > 1. If there exists an open set Uo in R on which
i) f(x) is continuous, bounded, and f(x) > 0 ,
ii) E((lyllx=x) and E(max(Y,O) Ix=x) are continuous functions of x,
then
iii)
iv)
8
lim sup E(IYI-I{I 1~}(Y)IX=xl = 0 ,M+oo X€Uo y
k(n)/n .... 0 and k(n)/Iil .... 00,
suplrn (z) - m(z)/ .... 0nz€A
and if Ey2 < 00 and
in probability for any compact set AcUO• 0
A similar result is proved for the estimator (1.1. ), with A an
interval, in Chapter 3 of this dissertation. There, more regularity
conditions are applied to obtain an associated rate of convergence.
Stone (1977) considers the following type of nonparametric regres-
sion function estimator:
where Wni(x) = Wni(x; X1' ••• '~) is a weight function. This estimator
reduces to the nearest neighbor, modified nearest neighbor and o-function
type estimators discussed above for appropriate choices of the function
(1.3.5) rn (x) = LW • (x)Y.n i nl 1 e.
A
Wni • Stone gives general conditions on the weight ftmctions Wni for nnto be consistent in Lr , Le., for
EI~(x) - m(x)l r .... 0
whenever Ely/ r < 00. Stone's work applies to give minimal conditions for
this type of consistency for some types of estimators, e.g., if ken) .... 00
and k(n)/n .... 0, then the k nearest neighbor estimator is consistent in
Lr • Stone, however, points out that it is not clear from his results
when an estimator of the Watson type is consistent in Lr •
.e
9
1.3.3 Potential Function Methods.
In this method, introduced by Aizernan, Braverman, and Rozonoer
(1970), the regression function rn is assumed to belong to a Hilbert
space H and have the representation m(x) = ~Ci~i(x) , where {~i} is1
a complete system of functions of H. The estimator Jlh is calculated
recursively by the formula
where
and K is a "potential function" of the form
and {Yn} is a sequence of real numbers, and rnO is chosen arbitrarily.
We have the following type of consistency for this set-up. Suppose
\.(C./A.)2< 00, \. y. = 00, \. y? < 00. ThenII 1 1 II 1 II 1
f [m (x) - rn(x)]2f (x)dx -+- 0n
in probability as n -+- 00.
Fisher and Yokowitz (1976) obtain more general results for this
type of estimation, but for a more complicated error criterion.
1.3.4 Estimates ,Based on Ranks.
Let Xms Xn2 s ••• s Xnn denote the ordered X values and define
the concomitant of Xni = Xj to be Yni = Yj • The set Yni , i = l, ••• ,n,
are sometimes called the induced order statistics. Yang (1977) proposes
10
the following estimator of m(x) based on concomitants:
where E~lK(X/En) is a kernel type o-function sequence and Fn is the
empirical distribution function of the X values. Yang gives conditions
under which ~ is (weakly) consistent and asymptotically nonna1 at appro
priate points x.
Bhattacharya (1976) discusses estimation of a function related to the
regression function based on concomitants. Let F denote the marginal
distribution function of X and define
h(t) = mOF-1(t); H(t)
Natural estimators of Hare
[nt]H (t) =n-1 r Y.n . 1 n11=
and
t= f h(s)ds .,
o
e.
if F is known. Bhattacharya obtains weak convergence in D[O,l] res~ts
for these estimators and applies them to estimation and hyPOthesis
testing problems.
.e
2. CONSISTENCY, NORMALITY
2. 1 o-function Sequences
A certain class of o-function sequences was suggested originally by
Rosenblatt (1956) and, under slightly weaker conditions, by Parzen (1962)
for use in probability density function estimation. Leadbetter (1963)
and Watson and Leadbetter (1964) introduced a more general notion of
o-function sequences, and our approach throughout this study will be
to obtain results for the more general type of o-functions whenever
possible.
The following (2.1.1 - 2.1.4) is due to Leadbetter (1963) .
2.1.1 Definition. A sequence of integrable functions {on(x)} is called
a o-function sequence if it satisfies the following set of conditions
(integrals with no limits of integration are meant to extend over the en-
tire real line):
Cl. JIOn(x) Idx < A for all n and some fixed A,
C2. Jon(x)dx = 1 for all n,
C3. 0n(x) -+ 0 uniformly on Ixl ~ A for any fixed A > 0,
C4. J 0 (x)dx -+ 0 for any fixed A > 0Ixl~A n o
The next lermna describes the type of o-function sequence used by
Rosenblatt (1956) and Parzen (1962), although the conditions on the func
tion Kare slightly different. This type of o-function sequence will be
12
referred to as "kernel type" and K as a "kernel ftmction."
2.1.2 Lemma. Let {E } be a sequence of non-zero constants with E -+ 0n n
as n -+ 00 and let K be an integrable ftmction such that f K(x)dx = 1
-1 I -1and K(x) = o(x ) as Ix -+ co. Then {En K(X/En)} is a o-flUlction sequence.
D
The following lemma demonstrates the similarity of o-ftmction
sequences as defined above and the Dirac o-function.
2.1.3 Lemma. If g(x) is integrable and continuous at x = 0 and {on}
is a o-ftmction sequence, then g(x)on(x) is integrable for each n and
!g(x)on(x)dx -+ g(O) as n -+ co. o
2.1.4 Lemma. Let {on(x)} be a o-ftmction sequence such that, for P ~ 1,
~~(p) = f 10 (u)IPdu < 00 for each n. Then a (p) -+ 00 andn n n
{on,p(x)} = {lon(x)IP/an(p)}
is a o-flUlction sequence for P ~ 1. o
Rosenblatt (1971) states the following lemma, which gives a rate.
of convergence for Lemma 2.1.3, when the o-function sequence is of
kernel type. We include a proof for cOOlpleteness.
2.1.5 Lemma. Suppose g is an integrable function with botmded, contin-
uous 1st and 2nd derivatives. Let
be a o-ftmction sequence of kernel type with
13
J uK(u)du = a and
Then-1 2£ J K((x-u)/£ )g(u}du = g(x) + 0(£ ) ,n n n
where the sequence represented by 0 does not depend on x.
Proof. Write
£-1 J K((x-u)/£ )g(u)du = JK(y)g(x-£ny)dyn n
and
g(x-£nY)
= g(x) - g' (x)£nY + g"(T)£~y2/2
where T = Tn(X,y) is between x and x~£nY. Thus
-1£n J K((x-u)/£n)g(u)du
=
and hence
g(x)! K(y)dy - g'(x)£n J yK(y)dy
2 2+ (£n/2) J g"(T)y K(y)dy
1£~1 J K((x-u)/£n)g(u)du - g(x) I
s £2 sup I (g"(t))/21 J ly2K(y)ldyn t
since J K(u)du = 1 and J uK(u)du = o. The conclusion follows from
the last inequality. 0
The following 1enunas will be useful in the sequel.
14
2.1.6 Lemma. Let {on} be a o-function sequence and g an integrable
function. If g is continuous at x and y and x ~ y, then
as n ~ 00.
Proof. For convenience, assume x < y. By Lemma 2.1.3, 0 (y-u)g(u)n
is an integrable function for each n, as is 0n(x··u)g(u). Choose A so
that x < A < y. Then
I f 0n(x-u)on(y-u)g(u)dul
As f 10n(x-u)on(y-u)g(u)ldu
-00 ,
00
+ J 10 (x-u)o (y-u)g(u)lduAnn
S sUPlo (y-u) I J 10 (x-u)g(u)lduUSA n n
+ suplo (x-u) I J 10 (y-u)g(u)ldu .t2A n n
Now suplon(y-u)IUSA
2.1. 1. Further,
and suplo (x-u) I converge to zero by C3 of Definitiont2A n
f IOn(x-u)g(u)ldu < 00 for each n by the preceding
remark, and, in fact, by Lemmas 2.1.3 and 2.1.4,
and
15
for sane constant A. Thus f 10n(x-u)g(u)ldu is a bounded sequence,
and we have
suplo (y-u) I f 10 (x-u)g(u)ldu + 0USA n n
as n + 00. The same argument applies when x and yare interchanged, and
the conclusion follows. o
2.1.7 LelTlTla. Let {on} be a o-function sequence such that on is an
even function for each n and g an integrable function such that g has
both left and right hand limits at O. Then 0n(x)g(x) is an integrable
function for each n and
f 0n(x)g(x)dx + (g(O+) + g(0-))/2
as n + 00.
Proof. Define
g(x) , x > 0+g ex) = g(O+) , x = 0
g( -x), x < 0
g( -x), x > 0
g-(x) = g(O-), x = 0
g(x) , x < 0 .
+ -Clearly, g and g are even functions, continuous at O. Further, they are
both integrable functions, since, e.g.
00
J Ig+(u)ldu = 2 fo
00
+Ig (u)ldu
= 2 f Ig(u)ldu < 00
o
16
Thus, by Lermna 2. 1. 3,
+ +J 0n(x)g (x)dx ~ g (0) = g(O+)
as n ~ 00. But
00 0= J 10 (x)g(x) Idx + J 10 (x)g(x)ldxo n -00 n
by Lermna 2.1.3, so that 0n(x)g(x) is integrable, and
00 0= J 0 (x)g(x)dx + J 0 (x)g(x)dxo n -00 n
~ (g(O+) + g(0-))/2
by the preceding remark. o
e.
2.2 Nonparametric Density Function Estimation.
Nonparametric methods of density function estimation have been studied
in great detail (see, e.g., Wegman (1972a) and Wegman (1972b) for a survey
and comparison of work in this area). Estimators of a density f(x) of the
form
(2.2.1)n
= (lin) I 0n(x-X.). 1 11=
17
are of particular interest here because of the reliance of our proposed
regression function estimators on the same type of weighting functions
on' and because fn' defined in (2.2.1) appears in the denominator of mn
as defined in equation (1.1.3).
Since Xi' i = 1, .•. ,n are i. i.d. with conunon densi ty f, we have
Efn = f 0n(x-u)f(u)du ,
which has f(x) as its limiting value as n -+ 00, provided f is continuous
at x, by Lemma 2.1.3. That is, fn is an asyrnptotica11yunbiased estima
tor of f at continuity points of f. Further,
n(1/n)2E{ L 0n
2(x-xl·) + LL° (X-Xl')
i=l i#j n
= (lIn) f o~(x-u)f(u)du
2Ef (x) =n
+ ((n-1)/n)[ f 0n(x-u)f(u)du]2 ,.e
so that
Var[fn(x)] = (lIn) f o~(x-u)f(U)du
- (lIn) [ f 0n(x-u)f(u)du]2
and we thus have, if a = f o2(u)du < 00 for each n, by Lenuna 2.1.4,n n•
as n -+ 00 at continuity points x of f. The above calculations (which
appear in Watson and Leadbetter (1964)) may be combined to give condi-
tions under which the mean square error of fn converges to zero, as the
following lemma shows.
18
2.2.1 Lemma. Let {a } be a a-function sequence for whichn
a = J a2(u)du < 00 for each n and a In + 0 as n + 00. Let x be an n ncontinuity point of f. Then
2E[fn(x) - f(x)] + 0 as n + 00
2.2.2 Remark. By Chebychev's inequality,
-2 2P{lfn(x)-f(x)I > £} ~ £ E[fn(x)-f(x)]
o
for any £ > o. We thus have fn(x) + f (x) in probability, provided the
conditions of Lenuna 2.2.1 are satisfied. That is, fn(x) is a weakly
consistent estimator of f(x) for appropriate a-fLmction sequences and
points x.
The preceding discussion on density estimation will suffice for our
discussion of pointwise consistency of our proposed regression estimators.
We will include other pertinent results on density estimation as they are
needed.
2.3 Pointwise Consistency Properties of mn and rnn.
We begin our discussion by considering the numerator of the estima
tors ~ and mn defined in (1.1.3) and (1.1.4),. respectively. Denote, for
convenience, the numerator by m~, i.e.,
e.
n= (lin) LYloa (X-Xl·)
o 1 n1=
Then we have the following.
19
2.3.1 Lenvna. Let {on} be a o-function sequence for which
0, = J o2(u)du < co for each n. Suppose Ey2 < 00 and x is a point of con-n ntinuity of the functions feu), m(u) = E[YIX?u] and s(u) = E[y2IX=u].
Then
*(i) Emn(x) + m(x)f(x)
*(ii) (n/o,n)Var[mn(x)] + s(x)f(x)
as n + 00.
Proof. We will use the following two well known properties of the
regression function:
(2.3.1) Eh(Y) = J E[h(Y)lx=x]f(x)dx
for any function h and random variable Y such that Elh(Y)I < 00 ,
(2.3.2) E[g(X)h(Y)IX=x] = g(x)E[h(Y)lx=x]
for any functions g and h such that Elg(X)h(Y)I < 00.
Since (Xi,Yi ), i = l, ... ,n are i.i.d., we have
*Em (x) = EYo (x-X)n n
= JE[Yon(x-X)IX=u]f(u)du
by (2.3.1),
=J 0n(x-u)E[YIX-u]f(u)du
by (2.3.2),
=J 0n(x-u)m(u)f(u)du .
20
Now, by assumption, m(u) feu) is continuous at u =: x. Thus (i) will
follow from Lemma (2.1.3) if we demonstrate that m(u)f(u) is an inte-
grable ftmction. To verify this, note that, by Jensen's inequality
Im(x) I = IE[YIX=x] I S E[ IYllx=-x] .
Thus
f Im(u)f(u) Idu s fE[IYI IX=u]f(u)du = EIY\ < 00
by assumption, and (i) follows.
For (ii), note
* 2 2 n 2E[m (x)] = (l/n) E{ L [Y.o (x-X.)]n i=l 1 n 1
+ LLY.O (x-X.)Y.o (x-X,.)}'4' 1 n 1 J n ,1rJ .
= (l/n) f s(u)f(u)o~(X-U)dll
+ ((n-l)/n)[f m(u)f(u)on(x-U)du]2 ,
the last step following from (2.3.1) and (2.3.2):, as used in the proof
of (i). Thus
*Var[~(x)] = (l/n) f s(u)f(u)o2(x-u)dun
- (l/n)[f m(u)f(u)on(x-u)du]2 ,
and, since an .... 00 and {on/an} is a o-ftmction sequence, we have
*(n/an)Var[~(x)] .... s(x)f(x) ,
as desired. tI
•
.e
21
We now Use the preceding result to demonstrate the consistency of
the estimators II)" and inn'
2.3.2 Theorem. Let {a } be a a-function sequence such thatn
a = f a2(u)du < 00 for each n and a =c(n). Suppose Ey2 < 00 and x'n n nis a continuity point of f(u) , m(u) and S(u) , and that f(x) > O. Then
(i) m (x) ~ m(x)n
(ii) m(x) £m(x) .n
*Proof. Since an/n -+ 0 by asslDIIption, we have Var[mn(x)] -+ 0 by
Lemma 2.3.1. Thus
E[m*(x) - m(x)f(x)]2 -+ 0n
and by applying Chebychev's inequality as in Remark 2.2.2, we see that
* pmn(x) -+ m(x)f(x). Since, by definition
and f(x) > 0, (ii) follows immediately.
For (i), write
1m -ml =n
*m - mfn m(f-f )1
+ nfn I
f - fn
where we have suppressed the argument x for convenience.
by (ii), and since fn(x) ~ f(x) > 0, we have
* PNow m -+ mfn
22
Similarly,
f - f Pnf
-+ 0 ,n
and (i) follows. o
We have, then, that both the estimators ~ and mn are weakly con
sistent estimators of m at continuity points of s, m and f. The follow-
ing corollary specializes the preceding theorem to kernel type a-function
sequences.
the other conditions of Theorem 2.3.2 are satisfied. Then the conclusions
2.3.3 Corollary.
sequence of kernel
Suppose that {a (x)} = {£-lK(X/£ )} is a a-functionn n ntype with n£n -+ 00 as n -+ 00 and f K2(u) du < 00. Asstnne
of Theorem 2.3.2 hold.
Proof. We need only verify that CJ.n/n -+ o. By definition
Then
by asstnnption. o
We have so far been asstnning that the density f of X is continuous
and positive at points where we wish to estimate the, regression function.
An important case where these assumptions may not hold is when X is a
bounded random variable, Le. when f has bounded support, and we desire an e
•
.e
•23
estimate at a bOlmdary point of the support of f. We have the following
result, which demonstrates that if m(x+) =m(x) , rnn(x) is consistent
but mn(oc) is not.
2.3.4 Theorem. Let {on} be a o-function sequence such that an < 00
for each n and an/n -+ o. Suppose Ey2 < 00 and x is a point such that
f, m and s have left and right hand limits at x and f(x+) = f(x) > 0,
f(x-) = o. Then
(ii) rnn(x) -+ m(x+) •
Proof. By Lemma 2.1.7 it follows that
*Ernn(x) -+ m(x+)f(x)/2 .
By an argument similar to the one used in the proof of Lenma 2.3.1, it
follows that
pfn(x) -+ f(x)/2 ,
* prnn(x) -+ m(x+)f(x)/2 .
Since
(i) follows. Since
(ii) follows by an argument similar to the one used in Theorem 2.3.2.
o
•
24
This theorem demonstrates that the estimator ~ displays an
"end effect" at the boundaries of the range of X which ~ does not
display. As we shall see in Chapter 4, this end effect represents a
possible disadvantage for mn' depending on how m is defined at the
boundaries of its support. We now turn our attention to asymptotic
distributional properties of mn and mn'
2.4 Asymptotic Distribution of mn.
Nadaraya (1964) and Schuster (1972) have considered the asymptotic
nonnality of the estimator ~ in tenns of kernel type 0-function sequences.
Nadaroya states that, if Y is a bounded random variable and nE 2 ~ 00n
then (nEn)~(~(x) - Ernn(x)) has an asymptotically normal distribution
with zero mean and variance s(x)J K2(u)du/f(x) , where
sex) = E[y2IX=x] .
Schuster (1972) points out that this expression for the variance is in-
correct and presents a result with the correct variance which at the same
time removes the restriction that Y be bounded and centers at m(x) instead
of Ernn(x). We state Schuster's result here for comparison with a new
result which represents, in some respects, an improvement over Schuster's.
2.4.1 Theorem. Let {E~lK(X/En)} be a o-functioJl sequence satisfying the
condition:
(i) K(u) and uK(u) are bounded,
(ii) f uK(u)du = 0, J u2K(u)du < 00 ,
25
(iii)
•
.e
Suppose x1,x2""'~ are distinct points with f(xi ) > 0, i = 1, •.• ,P.
Let w(u) = m(u) f(u) and asstune f', w', s', fll ,w" exist and are bounded,
and that Ey3 < 00. Then
converges in distribution to a multivariate normal random vector with
zero mean vector and diagonal covariance matrix with i-th diagonal
element given by
where2 2a (u) = s(u) - m (u) .
Schuster proves this theorem by using the Berry-Esseen theorem to
show the joint asymptotic normality of the ntunerator and denominator-of
~. An application of the Mann-Wold theorem (Billingsley (1968)) then
yields the desired result. As we shall see, it is not necessary to con-
sider the joint distribution of the ntunerator and denominator of mn in
order to establish asymptotic normality. Schuster's proof can thus be
simplified. Also, by using the Lindeberg-Fe11er central limit theorem,
instead of the Berry-Esseen theorem, we will be able to require the
o-function sequence to satisfy a less restrictive condition, namely that
nEn ~ 00 instead of nE~ ~ 00.
We now present the new asymptotic normality result for ~. The most
important difference between this theorem (when stated in terms of kernel
type o-sequences) and Theorem 2.4.1 is that it only requires nEn -+ 00
26
instead of n£~ +~. Also, it applies to general a-function sequences.
There are minor differences in the other conditions which will be evi-
dent in the statement of the theorem. We first state the main theorem,
and then prove a preliminary lenma before returning to the proof of the
theorem. We then specialize to kernel type a-function sequences.
2.4.2 Theorem. Let {an} be a sequence of a-functions such that
J 12+ny = la (u) du < ~ for each n ,n n
for some n > 0 ,
Za = J a (u)du = O(n) as n + ~n n
and
Suppose EIYI Z+n < ~ and the distinct points xl,xZ, ... ,xp are continuity
points of each of the functions f(x), m(x), sex) = E[YZ,X=x] andZ+nE[IYI IX=x], and that f(xi) > 0, i = l, •.. ,P. Then
converges in distribution to a multivariate normal random vector with
zero mean vector and identity covariance matrix, 'where
mn(x) - ~(x)Zn (x) = -~-----.,;,---
Z .k[(an/n)(a (x)/f(x))] 2
e.
27
*Emn(x)gn(x) =
Efn(x)
and
o
.e
*Since ffin = mn/fn is a ratio of sums of random variables, a direct
central limit argument is not possible. However, note that
*m - Q = [m If - (f If)Q](f/f )n ~ n n ~ n
and f/fn £1, so that ffin - gn will h~ve the same asymptotic distribution
as the tern within square brackets above. The tern within square brackets
is a sum of random variables, and thus standard arguments may be used to
establish its asymptotic nonnality • This is the outline which the
proof of Theorem 2.4.2 will follow, although the notation will be more
complicated since the proof will be in a multivariate setting.
The following lemma establishes the asymptotic variance and covariance
2.4.3 Lenma. Let {on} be a a-function sequence such that
ex. = J o2(u)du < 00 for each n. Suppose x ;. yare continuity points ofn n
f, m, .and s, and that f(x) > 0, fey) > 0, and Ey2 < 00. Define
~(Z) = f(z)/fn(Z)
and
Then
and
28
(ii) n Cov[Hn(x) , ~(y)] ~ 0 as n ~ 00
Proof. By definition,
andif
E~(x) = ~(x)/f(x) - ~(x)Efn(x)/f(x)
= 0
sinceif
~(x) = ~(x)/Efn(x) ,
and we thus have
2Var[~(x)] = ~(x)
n= (nf(x))-2E{ L [(Yo-Q(x)]o (x-X
1o)]2
o 1 1 11 n1=
Now
since
Hence
(n/().n)Var[~(x)]
= ().~1(f(x))-2E[(Y-&n(x))on(x-X)]2
-,
e.
•
.e
29
- 2&.n(x)! m(u)f(u)o~(x-u)du
+ ~(x) ! f(u)o~(x-u)dU}
+ (f(x))-2{s(x)f(x) - m2(x)f(x)}
= 02(x) /f(x) ,
since {o~(u)/~n} is a o-functionsequence by Lemma 2.1.4 and ~(x) + m(x).
For (ii), note
n COV[~(x), ~(y)] = ~(x)~(y)
n= (nf(x)F(y))-lE LL [(Y.-Q (x)]o (x-X.)] •
. . 1 1 ~ n 11,J=
[(Yj-&.n(y))on(y-Xj )]
n= (nf(x)f(y))-l. LE[(Yi-&.n(x))on(x-Xi )] •
1=1
[(Yi-gn(y))on(y-Xi )]
= (f(x)f(y))-l ! 0n(x-u)on(y-u)qn(u)du
where
qn(u) = f(u) [s(u)-m(u) (&.n(x)+&.n(y)) + &.n(x)&.n(y)]
is continuous at u = x and y = y by assumption. Thus
n Cov[l\t (x), ~ (y)] + 0
by Lemma 2.1.6, and (i) is true. o
30
We now return to the proof of Theorem 2.4.2.. By the Cramer-Wold
device (e.g., Billingsley (1968)), it suffices to show
or, equivalently,
where ~ and ~ are as defined in Lemma 2.4.3. Since fn(xk) ~ f(xk),
k = 1, ... ,p, it follows that Rn(xk) £ 1, k = l, ... ,p, and it thus suffices
to show that
L-+ N(O,l).
Now
for each n implies
2a = J 0 (u)du < ~n n
e.
•
..
31
for each n, since f o2 Cu)du < ~ for any finite A > 0 by ~older'slul<A n
inequality, and by Cl and C3 of Defiilition 2.1.1. Thus we have, by
Lenuna 2.4.3,
Var[ I tkH Cxk)]k=1 n
= I t k2 Var[H Cxk)]
k=1 n
as n + 00. Hence it suffices to prove
.e v =n{Var[ I tkH CXk)]}~
. 1 n1=
L+ NCO,I) .
Since, by definition,
H CX)n
we may write
n= (nf(x))-1 L (Y.-g (x))o (x-X.)
. 1 1 n n 11=
nV = l V .
n i=1 n,1
where the LLd. random variables V ., i = 1, ... ,n, are defined byn,1
where
32
It then follows from the Lindeberg-Feller central limit theorem that
if, for same n > 0,
nElV ,2+n -+ an,l
then
v k N(O,I) as n -+ 00n
Now by applying the cr inequality of Loeve (1963) repeatedly we have
nElV ,2+nn,l
= I ck(n)[Ik n + J k n] ,k=l ' ,
2+n
where ck(n) depends only on k and n and the constants tl, ... ,tk• It
is easily seen that
the last step following by earlier calculations. Further,
•
•
33
EIYa (x_X)!2+nn
** 2+n= y f a (x-u)E[IYI IX=u]f(u)dun n
where {an**} = {Ia ,2+n/y } is a a-function sequence by Lemma 2.1.4.n n
Thus, for k = 1,2, ..• ,p, we have
-+ 0 as n -+ 00
since
by assumption. Similar calculations yield
-+ 0 as n -+ 00 , and the proof is complete. o
We now give a version of Theorem 2.4.2 for kernel-type a-function
sequences, which may be compared with Schuster t s Theorem 2.4.1.
2.4.4 Theorem. Suppose {an(x)}
of kernel type satisfying
(i) JIK(u)1 2+ndu < 00 for some n > 0
34
(ii) f uK(u)du = 0, f u2K(u)du < 00
(iii)5n£ -+ 00, n£ -+ 0 as n -+ 00
n n
Suppose m(x) and f(x) have bounded, continuous 1st and 2nd derivatives,
EIYI 2+n < 00 , the distinct points xl'x2""'~ are continuity points of
sex) and E[lYI 2+n IX=x] and f(xk) > 0, k = l, ... ,p. Then
(Z~(xl)'."'Z~(~)) converges in distribution to a multivariate normal
random vector with zero mean vector and identity covariance matrix, where
Z~(x) =
Proof. We first verify that this a-function sequence satisfies the
conditions of Theorem 2.4. 2. Now
for each n since £n t- 0, f K2(u)du < 00. Further,
since n£n -+ 00 by assumption. Similarly,
for each n, and
I n/2 1+n/2 -n/2y n ex ex (n£ ) -+ 0 as n..;~ 00n n n
by assumption. Thus this type of a-function sequence satisfies the
requirements of Theorem 2.4.2, and since the remaining regularity condi
tions of Theorem 2.4.2 are clearly satisfied under the present assumptions,
•
35
we have that the conclusion holds when m(x) is replaced by
*&n(x) =Emn(x)/Efn(x) in the expression for Z~(x). Hence, if we show
that
then the conclusion will follows. Now
l.:(n£n) 2(&n(X) - m(x))
~ {£~1 f K((x-u)/En)m(u)f(u)du= (nEn) .-.;..;,-.------~---
£n1 f 'K((x-u)/En)f(u)du- meXl}
•
-1 f )+ m(x)f(x) - m(x) n K((X-U)/En)f(U)dU}.
£-1 f K((x-u)/£ )f(u)dun n
By Lemma 2.1.5, the numerator of the term within the brackets above is
O(£~) and the denominator converges to f(x) > o. Thus
.5 0Slnce n£n -+ • o
36
2.5 Asymptotic Distribution of rnn.
- *It is evident that, since m =m /f is a StDl1 of independent randomn n
variables, we may apply the Lindeberg-Feller central limit theorem in
much the same way as we did in Theorem 2.4.2 to establish the asymptotic
normality of (mn(x) - Eiii (x))/Var[m]. We established jn (ii) of Lenrnan n
2.3.1 that
Var[m~(x)] "" (cxn/n)s(x)f(x)
for appropriate points x. Hence we have
Var[m ] "" (cx /n)s(x)/f(x) .n n
We therefore have the following theorems, which we state without proof,
since the proofs follow those of Theorems 2.4.2 and 2.4.4 very closely.
The first theorem concerns the asymptotic normality of m for generaln
o-function sequences; the second for kernel type o-sequences.
2.5.1 Theorem. Under the conditions of Theorem 2.4.2,
(Wn(x1), ... ,wn(xp)) converges in distribution to a multivariate normal
random vector with zero mean vector and identify covariance matrix, where
m(x) - Eiii (x)) n nW
n(x = -....::.::.....-----::.;;---r"J.:
{(cxn/n)s(x)/f(x)} 2
2.5.2 Theorem. tJnder the conditions of Theorem 2.4.4, (~(x1) ... '~(~))
converges in distribution to a multivariate normal random vector with zero
mean and identity covariance matrix, where
•
e.
~(x) =(n£n) ~(mn (x) - m(x))
2 J.:{s (x) J K (u)du/f(x)} 2
..
37
2.6 Mean Integrated Square Error.
The mean integrated square error (MISE) I n of an estimator
1 nfn(x) = n- L° (x-X.). 1 n 11=
of a density f is defined as
where on and f are assumed to be square integrable. Watson and Leadbetter
(1963) show that I n is minimized for each n if on is chosen to have a
Fourier transform ~o expressible asn
I~f(t) 12
~ (t) = ---..,;;;------=on (lIn) + ((n-1)/n)l~f(t)12
where ~f is the Fourier transform of f. (Fourier transforms of square
integrable functions have the usual interpretation here.) For the
regression estimation problem, Watson (1964) considers the error criterion
J~ defined by
where appropriate assumptions are made on on and m to insure the finite
ness of the integral. Watson states that J~ is minimized for each n if
on is chosen so as to have Fourier transform
where ~£m is the Fourier transfonn of £m.
38
We asstDlle here that on and fm are square integrable flU1ctions and2that EY < 00. We define the error criterion In by
* 2In = E f (mn(x) - f(x)m(x)) dx
*where mn is the ntDllerator of mn. We will show here that In is also
minimized for each n by choosing on to have Fourier transfonn given by
*<Po ,defined above. Note that In may be interpreted as the MISE of then
ntDllerator of mn or ~, disregarding the denominator.
By the definition of In and Parseva1's fonnu1a, we have
(2.6.1) In = E f (m~(x) - f(x)m(x))2dx
-1 f 2= (21T) E I</> *(t) - <Pfm(t) I dtJIh
*is the Fourier transfonn of mn' so that In may be minimizedwhere <P *mn
by minimizing the extreme right hand side of expression (2.6.1) above.
Now, by Fubini' s theorem for positive :functions ,
E J I<p *(t) - <P fm (t)1 2dtJIh
= f EI<p *(t) - <P fm (t)1 2dt ,JIh
so that In may be minimized by minimizing
(2.6.2)
39
for each t, where g denotes the conjugate of the complex function g.
Note that since fm is an integrable function,
..(2.6.3)
Further,
(2.6.4)
so that
(2.6.5)
Now
and thus
~fm(t) = J eitu £(u)m(u)du .
In.
J- \ ItU
~ *(t) = [n .L YJ.On(u-XJ.)]e dum J=ln
1 n itX. .= n- L Y.e J J 0 (u)e1tudu
j=l J n
1 n itX.= ~o (t)[n- .L YJ.e J],
n J=l
n itX.= I~o (t)1 2
Eln-1 l Y.e JI 2n j=l J
2 2 n itX. -it~= I~o (t)1 -n- ELL Y.Yke Je
n j ,k=l J
itX J ituEYe = m(u)f(u)e du
(2.6.5) I
40
= l~o (t)1 2[(1/n)Ey2+ ((n-l)/n)f~fm(t)12] •
n ..Finally, fran ~.6.3)and ~.6.41,we have
(2.6.6) E[~fm(t)~ #ret) + ~fm(tH *(t)]:mn mn
n -itX.= ~fm(t)~o (t)E[n- l r Y.e J]
n j=l J
1 n i.tX.+ ~fm(t)~o (t)E[n- r Y.e J]
n j=l J
= l~fm(t)12[~o (t) + ~o (t)]n n
Combining (2.6.3), (2.6.5) and (2.6.6) yields
(2.6.7)
= l~fm(t)12 - 2 Re[~o (t)]I~fm(t)12n
+ I~o (t)1 2[(1/n)EY2+ ((n-l)/n)t~fm(t)12]
n-
2 2= [(l/n)EY + ((n-l)/n)l~fm(t)1 ] •
l~fm(t)12 12
~on (t) - (1/n)Ey2 + ((n-l)/n)l~fm(t)12 I
"
.e
..
4i
the last equality following by completing the square and rearranging
tenns. Now
s J Im(u)lf(u)du
s J E[IYI]X=u]f(u)du = EIYI ,
so that
for all t, and hence (2.6.7) is minimized for each t by choosing'If
<Po = <Po .n n
.e
3. ASYMPTOTIC PROPERTIES OF MAXIMUM ABSOLUTE DEVIATION
3.1 Preliminaries
Since our goal in many cases is the estimation of the regression
function over the entire real line, or some subset of the real line, it
is natural to investigate the behavior of our estimators Wlder some
global error criterion. An attempt at this direction was made in
Section 2.6, where we considered mean integrated square error. This was
not entirely satisfactory, however, since we were only able to detennine
the <5-function sequence which minimized the MISE of the m.unerator of the
estimators in question, disregarding the denominator. In this chapter,
we consider a different global error criterion, the maxinn.un absolute
deviation, defined as suplmn(x) - m(x) I where I is a closed, boundedxd
interval of the real line, which we will take without loss of generality
to be [0,1]. We shall mainly be concerned here with conditions under
which the maximum absolute deviation converges to zero in probability
(in this case we say that the estimator in question is unifo~Zy aonsis-
tent over I). We will also be able to find a large sample confidence
bound for the regression function, based on the estimator ron.
Our method of analysis will follow the one briefly outlined below
used by Bickel and Rosenblatt (1973) and Rosenblatt (1976) for probability
density estimators. For a density function estimator
the deviation about the mean fn(u) - Efn(u), normalized so as to have
43
non-zero asymptotic standard deviation, may be written as
(3.1.1)k
(nEn) 2(fn(u) - Bfn(u))
[f(u)]li
where Zn is the empiriaaZ proaess defined by
where Fn is the empirical distribution function (EDF) of Xi'
i = l, ... ,n, and F is the cumulative distributioIl function of Xl'
Kornlas, Major and Tusnady (1975) have shown that a sequence of Brownian
bridges {Bn} on [0,1] may be constructed such that
(3.1.2) sup IZ (u) - Bn(F(u)) I = 0 (n-~log n)-oo<u<oo n
a.s. This fact is exploited, using integration by parts in (3.1.l), to
show that
k(log n) 2 sup IY (u)1
. o~wa n
= (log n)~ sup IYl (u)1 + opel)O~~l ,n
where Yl,n is the stochastic process obtained by replacing Zn(s) with
Bn(F(s)) in the defining expression for Yn. Further stages of approx
imation finally yield
44
(3.1.3)
.e
where YZ,n is the Gaussian process on [0,1] defined by
-J.: IYz (u) = E 2 K«U-S)/En)dW(s),n n
where W is a Wiener process on R. The asymptotic distribution ofJ.:
(log n) 2 sup IYZ n(u) I with proper centering constants, is detennined,Osus1 '
J.:and, in light of (3.1.3), (log n) 2 sup \Yn(u)I has the same asymptotic
Osus1distribution.
We will employ this method to detennine the asymptotic distribution
*of the rnaximtun absolute deviation of the ntunerator Jlh of the estimators
Jlh and mn, properly normalized and centered. Algebraic manipUlation and
elementary analysis will then yield unifonn consistency of the estimators,
with an associated rate of convergence. Since the denominator of mn
is non-stochastic, an asymptotic confidence band for m, based on m ,n
may also be specified.
In the forthcoming development, we will need to use integrals of
the fonn
(3.1.4) Yn(t) = II Yk(~~X)dW(T(X,y)) ,
where T: lR Z -+ [ 0, 1] Z is the trans formation defined by
T(x,y) = (FXIY(X'y), Fy(Y)) ,
and W(·,·) is the Wiener process on [0,1]2. In this section, we will
give conditions for the existence of (3.1.4) and prove some useful
properties.
45
If H(s,t) is a real, measurable function on [0,1]2, then it is
well-known that the L2 integral
II H(s,t)dW(s,t)
exists if
II H2(s,t)dsdt < 00
(see Masani (1968), Chap. 5).
Suppose that f(x,y) > °for all real x and y so that T is one-to
one and hence T- l is a well-defined function on [0,1] 2 to lR 2. Denote,
for fixed n and t
Then, by Theorem 5.19 of Masani (1968), we have e.(3.1.5) II yK( (t-x)/ en)dW(T(x,y))
lR 2
= II Gt(T-l(s,u))dW(s,u)[0,1]2
in the sense that if either integral exists, then so does the other
and they are equal. By the previous remark, the integral on the right
hand side of (3.1.5) exists if
II 2-1Gt(T (s,u))dsdu < 00 •
-ro,l]2
Now
(3.1.6) I 2-1J 2 Gt(T (s,u))dsdu[0,1]
= II G~(x,Y)IJ(x,Y)ldxdYlR 2
46
where J(x,y) is the Jacobian of't (see, e~g. Buck (l965), Sec. 6.l,
Thm. 4), if IJ(x,y)! > 0 for all real x and yand Gt(x,y), f(x,y) and
fey) are continuous. By definition,
J(x,y) =
= f(x,y) > 0
by assumption, using the obvious notation for conditional and marginal
densities. Thus
II G~(T-l(s,u))dSdU
~~ [0,1]2
= II y2K((t-X)/E )f(x,y)dxdy2 n
lR
= Ey2K2((t-X)/E ) < 00n
if, e.g., Ey2 < 00 and K is bOlmded. We note that the above development
holds if, instead of having f(x,y) > 0 for all real x and y, we have
f (x,y) > 0 for x and y in some rectangle of lR 2, and the range of
integration is appropriately adjusted. We will henceforth assume this
to be true without comment.
We will now give properties of the integral (3.1.4) which will be
useful in the future development. We will show
(3.1. 7)
(3.1.8)
47
EYn(tl)Yn(t2)
=II y2K((tl -X)/£n)K((t2-X)/£n)f(x,y)dxdy ,
(AI)
for t l i t 2. In view of (3.1.5) and the definition of the stochastic
integral, (3.1.7) follows. For (3.1.8), we note, by (3.1.5) and (5.2)
of Masani (1968)
=II Gt (T-l(s,U))Gt (T-l(s,u))dsdu1 2
= II y2K((tl -X)/£n)K((t2-X)/£n)f(x,y)dxdy
as in (3.1. 8).
We finally note to close this section that, since Wis a Gaussian
process on [0,1]2 and since Yn(t) is an L2 limit of linear cambinati ons
of W(-,-), we have that Yn(t) is itself a (one-parameter) Gaussian
stochastic process for each n, with mean given by (3.1. 7) and covariance
function given by (3.1.8).
*3.2 Maximum Absolute Deviation of mnIt is convenient to introduce certain assumptions at this point which
will be in force in our main theorem. Let f(x,y) denote the joint density
of (X,Y), fy(Y) the marginal density of Y, and let {an} be a real sequence
wi th an ~ 00 as n, ~ 00. We make the following assumptions:
-3 I 2(log n)£n y fy(y)dy ~ c
IYI~an
for all n and some constant c,
(A2)
(A3)
48
-J.: -1/6 2~En ~ (log n) -+ 0 as n -+ 00 ,
(log n) sup f y2f (x,y)dy -+ 0 as n -+ 00 •
osxsl Iyl >an
(A4) There exists a constant n > 0 such thatan 2
&n(x) = f. Y f(x,y)dy-an
satisfies
&n(x) > n VXE: [0,1] and some n ,
and g~ has a continuous 1st derivative on some interval [-A,A].
Further, the flUlctions
s(x)f(x) = f y2f (x,y)dy ,
E[lYllx=x]f(x) = f IYlf(x,y)dy
are lUliformlybolUlded.
If Y is a bOlUlded random variable, then clearly any sequence {an}
with an -+ 00 satisfies assumptions Al and A3. If the marginal distribu
tion of Y is normal and En = n-o as in Theorem 3.2.1, then it is readily
checked that {an} = {log n} satisfies Al and A2.
* * -J.: J.:We normalize mn(t) - Bron(t) by (nEn) 2[S(t) f(t)] 2, which is pro-
portional to its asymptotic standard deviation, thus defining the
following stochastic process on [0,1]:
(3.2.1)J.: * *(nEn)2[JIh(t) - Emn(t)]
= -------,,-----[s (t) f(t)]~
49
Then we have the following theorem.
3.2.1 Theorem. Suppose the kernel flmction K vanishes outside a finite
interval [-A,A] and is absolutely continuous and has a bOlmded derivative
on [-A,A] and that the marginal density of X is positive on an interval con
taining [0,1]. Suppose Al-A4 hold. Then, for En = n-0, 0 < 0 < ~ ,
- sup IYn (t) Ip{(ZO log n)~ O~t~l k
[A(K)] 2
as n -+ 00, where
}-x-Ze
< x -+ e
A(K) = f KZ(u)du ,
!.: c (K)d = (ZO log n) 2 + (ZO log n)-~{log( l~ )n TI
+ ~[log 0 + log log n]}
ifZ Z
(K) = K (A) + K (-A)c l ZACK)
and otherwise
> 0 ,
where
!.: _!.: C:3(K)dn
= (Zo log n) 2 + (20 log n) 2[log(----z:jT)]
/[1<' (u)] Zduc2(K) = ZA CK)
The proof of Theorem 3.Z.l is based on Theorems 3.Z.Z and 3.3.3,
which follow. Theorem 3.2.Z is due to Bickel and Rosenblatt (1973),
who used it in proving a result similar to Theorem 3.Z.l for probability
density estimators. We will here denote by f K(t)dW(t) the LZ
integral
.e
50
of k with respect to the Wiener process W (see e.g. Doob (1953),
Chap. IX, Sec. Z).
3.2.2 Theorem. Suppose K(e) is a kernel function which vanishes outside
[-A,A] and is absolutely continuous on [-A,A]. Define on [0,1] the
stochastic process
Zn(t) = E-~ J K(t-x)dW(x)n En
where-0
E = nn
with a < 0 < ~ and W(x) is a Wiener process on (-00,00). Then
- sup IZn(t)I ] -xp{{ZO log n)~ Ostsl t: - dn < x} -+ e- Ze
[A(t()] 2
as n -+ 00, where dn and A(K) are as in Theorem 1.
Theorem 3.Z.3 is a special case of Theorem E of Revesz (1976).
3.2.3 Theorem. Let Xl and Xz be independent random variables, each
uniformly distributed over [0,1]. Define the empirical process of
Zon [0,1] , where Fn denotes the empirical distribution ftmction of
(Xl ,XZ) . Then one can define a sequence {Bn} of independent Brownian
bridges on [O,l]Z such that
(3.2.2)
51
sup IZn (x1,x2) - Bn(x1,x2)I0~x1,xis1
-1/6 3/2=O(n (log n) ) a.s.
Proof of Theorem 3.2.1: For convenience, denote sup Ig(t)1 byO~t~l
I Igi I and note that, for any sequence of processes {Zn(t)} defined on
[0,1],
Thus, if we show that
and
converges in law, then
also converges in law, and has the same l~iting distribution.
We will apply the preceding remark to eventually "approx~ate"
the process Yn with the process Zn of Theorem 3.2.2, thus obtaining
the desired result. We will proceed through several stages of such
approx~ation, and the details will be given in the sequence of lemmas
which innnediately follows the proof of this theorem.
e.
52
We first note that Y may be written asn
(3.2.3)
where Zn is the empirical process defined by
Now define the following processes on [0,1]:
(3.2.4)
(3.2.5) = [sn(t)f(t)] -\:n-~ {f yK(t-x)dZ (x,y)Iy s~ En n
where
(3.2.6)
where {Bn} is a sequence of Brownian bridges as in Theorem 3.2.3 and. 2 2T. lR -+ [0,1] is the transfonnation defined by
(3.2.7)
(3.2.8)
T(x,y) = (FXly(xly), Fy(Y)) ,
where {Wn} is a sequence of independent Wiener processes used in con
structing {Bn} as
53
(Revesz (1976)), 0 s u, s s 1 ,
(3.2.9)
(3.2.10)
Y4 (t) = [Sn(t)f(t)J-~E-~ J [Sn(X)f(X)]~K(t-X)dW(x) ,,n n En
Y5
(t) = E-~ J K(t~X)dW(x) ,,n n n
where Wis a Wiener process on (-00,00).
We have, by Lemma 3.2.4,
where 0p(an) refers to a sequence of random variables ~ such that
A /a + 0 in probability. Lemma 3.2.8 givesn n
I IYO - Yl II = 0p ( (log n) -~) .. ,n ,n
By Lemma 3.2.5
and by A2,
-1/2 -1/6 2~En n (log n) + 0 ,
so that
By Lemma 3. 2.6,
54
-0since En = n and hence En log n + O.
Now Y3 is zero mean Gaussian process on [0,1] with covariance,nfunction
(3.2.11)
t -x t-xE-1 {f y2K(_1_) K(_2_) f (x,y) dxdyn Iy :!>a
nEn En
(cf. Section 3.1). The integral on the right hand side of (3.2.11) may
be written as
t -x t-x= E -1 f s (x) f (x)K(_l_) K(_2_) dxnnE En n
Thus the process Y4 is a Gaussian process with the same covariance,nfunction as Y3 ' Le., they have the same finite dimensiona1distribu,ntions. Hence the asymptotic distribution of
•
tilY II
(20 log n)~ 3,nt[>.. (K)] 2
is the same as that of
55
J.: 1- IIY4,nll 1(20 log n) 2 I - - dn,I .
_ [A (K) ]~ J
Further, by Lenuna 3.2.7
By Theorem 3.2.2
(20 log n)~ [IIY5,nl~ - d 1[A (K)] 2 nJ
has the desired limit distribution, and the theorem is proved.
3.2.4 Lemma. If Al is satisfied and
2g(x) - s(x)f(x) = J Y f(x,y)dy
is bOlDlded away from zero on [0,1], then
Proof. Note
o
Y (t) - YO (t)n ,n
so that
56
By assumption,
and thus it suffices to prove that
Now
(3.2.12)
n= LX .(t) =Un(t) ,
i=l n,1
say, where X 1-(t), i = 1, ... ,n are i.i.d. withn,
EX . (t) = 0n,1
for each te:[O,l]. Thus
(3.2.13)
and
(3.2.14) EX2 . (t)n,1
-1 2 2 t-Xi~ (log n) (nE )EY. I (Y.)K (--)
n 1 {Iyl>a} 1 Enn
57
where
k= sup r(u)-M.usA
and A is as defined in Theorem 3.2.1.
Combining (3.2.13) and (3.2.14) yields
nE{t X .(t)}2,
i=l n,l
- -1 J 2= k (log n)E y fy(y)dy ~ 0n Iyl>an
as n ~ 00 by AI. This implies that
(3.2.15)
for 0 ~ t ~ 1.
In order to show that
(3.2.16)
we note that Un(t) is an element of the space D[O,l] of right continuous
functions with left hang. 1:iJ!tits for each n, and that, if we show that
Un converges weakly to the zero element of D[O,l], then (3.2.16) will
follow, since II-I I is a continuous functional on D[O,l]. Since (3.2.15)
implies
in Rk for distinct points t 1,t2, ... ,tk of [0,1], it suffices to verify the ~
•
58
following moment condition to show weak convergence of Un (Billingsley,
(1968), Th. 15.6):
E{IUn(t) - Un(t1)I IUn(tZ) - Un(t)I}
Z~ B(tZ-t1)
where B is a constant.
By the Schwarz inequality,
E{IUn(t) - Un(t1)1-IUn(tz) - Un(t)I}
s {E[Un(t) - Un(t1)]z - E[Un(tZ) - Un(t)]z}~ .
Defining
we have
{E[Un(t) - Un(t1)]z}
-1 n= (log n)(nEn){E[i~l[YiI{lyl>aJ(Yi)Gn(t,tl'Xi )
Z- EYiI{lyl>a }(Yi )Gn(t,t1 ,Xi )]] }
n-1 n
= (log n) (nEn) . {i~lE[YiI{IYI>an}(Yi)Gn(t,tl'Xi)
Z .- EYiI{lyl>a }(Yi )Gn(t,t1 ,Xi )] }
n
-1 n Z Z~ (log n) (nE) o{ L EY.I{I I> }(Y.)G (t,t1,X.)}
n "11 ya 1n 11= n
Since K has a bOlmded derivative on [-A,A], it satisfies a Lipschitz
condition:
59
IK(u) - K(s) I s Bl u-sl
where Bl is a constant. Thus
{E[Un(t) - Un(tl)]Z}~
~ ~ -3/Z I I Z ~s Bl(log n) En t-tl {EYlI{lyl>an}(Yi )}
= Bl(log n)~E-3/Zlt-tll{ f yZfy(Y)dY}~.n IYI>a·n
Applying the same argument to
yie]jds
by Al and using the fact that t l s t s t Z. The moment condition is
therefore satisfied, and the result follows. 0
Before going on to Lemma 3.Z.5, we state the useful integration by
parts formula for Riemann-Stieltjes integrals on rectangles of lR Z.
Let f and g be two functions defined on [0,1]2. If all of the integrals
below exist and are finite, then we have
(3.2.17)1 1 1 1
f f f(x,y)dg(x,y) = J f g(x,y)df(x,y)o 0 0 0
1 1+ f f(l,y)dg(l,y) - f f(O,y)dg(O,y)
o 0
1+ f g(x,l)df(x,l)
°
601
- f g(x,O)df(x,O) •
°We note that if g(x,y) is a Wiener process on [0,1]2 and f(x,y) is a meas"
urab1e nmctionon [0,1]2 such that 11 11f(x,y)dg(x,y) exists, then
° °(3.2.17) remains valid provided the integrals on the right hand side of
(3.2.17) also exist.
3.2.5 Lenma. If K is absolutely continuous on [-A,A] and zero outside
[-A,A], then
II I -1/2 -1/6 3/2Y1,n - Y2,n l = O(anEn n (logn) )
a.s.
Proof. First we note that the random pair
.e (3.2.18) (X' ,Y') = T(X,Y) ,
where T: ]R2 -+ [0,1] is defined by (3.2.7), is jointly unifonn1y dis
tributed on [0,1]2, X' and Y' are independent, and Zn(T-1(X',y')),
°~ x', y'~ 1, is the empirical process of (X',Y') (Rosenblatt (1952)).
Theorem 3.2.3 thus applies to (X' ,Y'), and we may conclude that
sup IB (x' ,y') - Z (T-1(x' ,y')) IO~x' ,y'~l n n
a. s. ,
or, equivalently,
•(3.2.19) sup IB (T(x,y)) - Z (x,y) I
x,ye:lR n n
-1/6 3/2=O(n (log n) ) a.s.
61
Applying the integration by parts fonnu1a (3.2.17), we have
(3.2.20){
J YK(t-X)dZn(x,y)Iy san e:n
A ans J J yK(u)dZn(t-e:nu,y)
u=-A y=-an
A an= J J Zn(t-e:nu,y)d[yK(u)]
-A -an
A+ a J Zn(t-e: u,a )dK(u)
n u=-A n n
A+ a J Z (t-e: u,-a )dK(u)
n n=-A n n n
e.
The second to last integral above may be written, using ordinary one
dimensional) integration by parts,
and similarly for the last integral on the right hand side of (3.2.20).
By using a similar argtunent, we obtain (where the integrals are
defined in the L2 sense)
(3.2.21)
62
J f YK(t~x)dBn(T(X,y))Iy!san n
A an= f f Bn(T(t-Enu,y))d[yX(u)]
u=-A y=-anA
+ a f B (T(t-E u,an))dK(u)n u=-A n n
an
+ k(a){ f Bn(T(t-EnA,y))dyy=-an
Subtracting (3.2.21) from (3.2.20) and using (3.2.19) and the assumption
that .K is absolutely continuous on [-A,A], we obtain
(3.2.22) E~I~(t)I~IY1,n(t) - Y2,n(t)I
= O(n-1/ 6(log n)3/2) •
A{4a f IK'(u) Idu + 4a [K(A) + K(-A)J} a.s.
n -A n
= O(a n-1/ 6(log n)3/2)n
63
since
Thus, since
AJ IK' (u) Idu < 00
-A
II~II is a bounded sequence by asstmtption,
and the proof is complete.
We may write the sequence of Brownian bridges {Bn} of Theorem
3.2.3 as
o
(3.2.23)
o s x,y s 1, where {Wn} is a sequence of independent Wiener processes
on [0,1]2 (Revesz (1976)). The next lemma shows that, for our purposes,
the only significant part of (3.2.23) is Wn(x,y).
3.2.6 Lemma. If A4 holds, then
e.
Proof. By definition of Y2 and Y3 ,we have,n ,n
IY2 (t) = Y3 (t)1,n ,n
since the Jacobian of the transfonnation T is f(x,y). Thus
-~ I I£ I Y2 - Y3 In ,n ,n
64
-1 JJ I t-x I• sup E yK(--) f(x,y)dxdyustsl n Iylsa En
n
s IWn (l,l)I Ilg~~11
• sup E~lJ [J lylf(x,y)dylIK(t-x)ldx .usts1 En
By A4,
hex) = J IYlf(x,y)dy
is a bounded function and II~~ II is a bounded sequence, so that for
some constant M we have
E -~II YZ
- Y3
IIn ,n ,n
= IWn(l,l)1M J IK(u)ldu
=0 (1) .P
Thusl.:
II YZ - Y3 II = 0 (E 2),n ,n p n
and the proof is complete.
3.2.7 Lemma. Under the asstunptions of Theorem 3. Z.l,
Proof. By definition,
o
(3.2.24)
65
1Y4,n(t) - Y5,n(t)I
~I &n(x) k t I= E~ f{[ &net) ] 2 -l}K( ~:)dW(X)
~I A g (t~UE) k I= E~ f {[ n (t)n ] 2 -l}k(u)dW(t-UEn) •-A gn
(3.2.25)
By using integration by parts andth~ assumptions that ~ and K are
absolutely continuous, we may bound the integral on the extreme right
hand side of (3.2.24) with
Ik A d &n(t~E ) k I
E~ 2 -i W(t-UEn) dU { [( &nCt)n) 2 ··l]K(u) }du
= J l net) + J 2 (t) + J 3 (t) ,, ,n ,n
say. We will show that the supremum over [0,1] of each of these three
terms in Op(E~), thus completing the proof.
First of all, note
and
..
.e
66
Now
-11 &n(t-AEn) ~ Isup E [ ] -1O~t~l n &net)
By asstnnption, II ~~II is a bounded sequence, and by the mean value
theorem,
E~ll [&n(t-AEn)]~ - [gn(t)]~1
-1 I I-~~En Ign(t-AEn) - &net) .lxn(t,A) 2
where ~ (t,A) is between &n (t-AEn) and &n(t) . Applying the mean value
theorem to &n yields
where t n (t ,A) is between t - AEn and t. Thus
A I~(t (t,A))Is K(A) sup IW(t) I •- sup n
-Astsl 20~tsll~(t,A)I~
since ~ is uniformly bOlDlded and &n is bO\Dlded away from zero, by
67
assumption, and thus
A similar argtDtlent shows that
so we now consider JI,n.
Carrying out the differentiation in the integrand of JI,n' we have
= leI (t) - C2 (t)l,,n ,n
say. Now the non-stochastic tenns in the integrand of C2 are tmifonnly,nbotmded in their arguments and in n, by assumption. We therefore have
where C2 is a constant. For CI,n' apply the same argtDtlent used in con
sidering J 2 to conclude that,n
where CI is a constant. Then
e.
68A
IICI II SCI sup f \w(t-m:n)Kt(u)uldu,n Ostsl -A
and the proof is complete. o
We now use the results proved thus far in showing that YO ,n and
Yl are sufficiently close to one another.,n
3.2.8 LelTll1a. Under the asstDllptions of Theorem 3.2.1,
.e
Proof. We must show that
sup {I [get) r~ - [~(t)] -~II E-~ if yK(t-X)dZ (x,y) I}Ostsl n Iy sa En n
n
-k= 0p((log n) 2) •
By the preceding four lenunas and Theoran 3. 2. 2,
converges in distribution to same random variable, and is therefore a
Opel) sequence. Since, by definition,
k~ = O((log n) 2) ,
we have
-kand since I~ 211 is a bOlmded sequence, we have
69
Thus it suffices to prove
as n -+ 00. By the mean value theorem,
-k -ok, I , -3/21Ig 2 - g 2 = Q~ -g I- hn ""'Il n
where ~ is between &n and g. Since ~ and g are bounded away from zero,
11~3/21' is a bounded sequence, and since, by A3,
(log n) II &n-g II -+ 0 ,
the result is proved. o
* .Since mn(t) is an asymptotically unbiased estimator of m*(t) =I':
m(t)f(t), it is natural to seek conditions under which Emn(t) may be
replaced by m*(t) in Theorem 3.2.1. Define the process
~ * it(n£n) [ffln(t) - m (t)]Y' (t) = ----...,-----n [s(t)f(t)]~
Then we have the following corollary to Theorem 3.2.1.
3.2.9 Corollary. Suppose all the conditions of Theorem 3.2.1 hold
and in addition
-0£ = n ,1/5 < 0 < 1/2 ,n
K satisfies
-e
70
f uK(u)du = 0 ,
and the ftmction
m*(t) =m(t)f(t) = f yf(t,y)dy
has botmded, continuous 1st and 2nd derivatives. Then the conclusion
of Theorem 3.2.1 holds, with Y~ replacing Yn.
Proof. According to the remark at the beginning of the proof
of Theorem 3.2.1, it suffices to show
But
By asstDl1ption,
and we know that, tmder the assumptions on m* and K,
Since
then
and the proof is canp1ete. o
71
Based on this corollary, we may construct a confidence band for met) , eo ~ t ~ 1 as follows. Using the asymptotic distribution, we have
where
suplY' (t) IP{(20 log n)~[ n~
[>. (K)]- d ] < C(a)} ~ 1 - a
n
C(a) = log 2 - 10g110g (1-0.)1 .
Inverting the above expression in the usual way, we obtain as a (l-a)xIOO%
confidence bond for met):
(3.2.26)
O~t~l.
ron(t) ± (nE )-~[ ~ ]~[ c(>.) + ~][>'(K)]~ ,n 1TtJ (20 log n)~
e-3.3 Uniform Consistency of mn and ron.
We saw in Corollary 3.2.9 that the sequence of random variables
*m (t) - m(t)f(t) I(log n)~[(En)~ sup I n k - d ]
O~t~l [s(t)f(t)] 2 n
converges in distribution, and is thus a 0p(l) sequence. We employ this
fact to show the tmifonn consistency of m*, and specify a rate of con-n
vergence.
3.3.1 Lemma. Under the conditions of Corollary 3.2.9,
(3.3.1) I * k -~sup mn(t) - m(t)f(t) I = 0p[(log n) 2(nEn) ] .O~t~l
Proof. By definition,
kdn
= O((log n) 2 ,
and thus
72
*1.:: I mn(t) - met) f(t)(n£n) 2 sup t
O~tSl [s(t)f(t)] 2
= 0p((log n)-~) + O((log n)~)
= 0p ((log n)~) .
Now, using the assumption that get) = s(t)f(t) is bounded away from,
zero, the conclusion follows. o
We now use the preceding lenuna to show uniform consistency of II\,.
and mn.
3.3.2 Theorem. Under the conditions of Corollary 3.2.9, we have
-eC3.3.2)
(3.3.3)
limn - mil = 0p[(log n)~(n£n) -~] ,
I ~-1.::limn - m I = Op[ (log n) (n£n) 2] •
Proof. Note that
where
mtt) = f(t)m(t) .
By assumption, f is bOWlded away from zero on [0,1], and thus
I If-I, I < ~. An application of Lemma 3.3.1 thus proves (3.3.2).
For (3.3.3), note
limn-mil = Ilm:ff-/nm"II
n
A s
= A + B ,
say. Now
by (3.3.2). Further,
II~II . II fn
; f IIn
*= II;nlIOp[(10g n)~(I1En) -~]
n
(Bickel and Rosenblatt (1973)). Since
*II ~III s II m* II • [ inf If (t) Ir 1
n n Osts1 n
*and it is easily verified that 11m IIn
inf If(t)1 > 0 , (3.3.3) follows.Osts1
p *-+- I 1m II, inf If (t) I -+-
Osts1 no
e-
-e
4. AN EXAMPLE, FURTHER RESEARCH
As we noted in the introductory chapter, if the density of X is
mown, then either the estimator JI\" or ron may be used to estimate the
regression ftmction. Here we will sununarize some results given in
Chapter 2 which relate to the relative performance of mn and mn in this
case. We then present an example in which JI\" and mn are computed from
a set of simulated data.
4.1 The Estimators mn and ron·
We first note that, according to Theorem 2.3.4, if the density
function of X has, say, an interval for its support and is non-zero at
the endpoints of the interval, then JI\" is a consistent estimator at the
endpoints, whereas ron is not. The implication of this for finite sample
sizes is that m is likely to display a bias near the endpoints of then
X variable which JI\" will not have.
According to Theorems 2.4.4 and 2.5.2, under appropriate conditions,
both IDn(x) and mn(x) have asymptotic normal distributions with mean
m(x) (for kernel type estimators). However, the sequence of scaling
constants required for unit asymptotic variance differs for the two est
imators; for IDn(x) it is {a2(x) JK2(u)du/(nEn)flex)}~ and for mn(x) it
is {sex) J K2(u)du/(nEn)fl(x)}~. Since
a2ex) = sex) - m2(x) s sex) ,
this indicates that mn may display more dispersion about m for finite
sample sizes than mn•
75
4.2 An Example.
In order to illustrate the behavior of the stimators in one specific
case, we have computed II\,. and ron for a set of artificial data. We have
also computed the approximate confidence intervals given by (3.2.26) for
m, based on ron. The results of the computations are depicted in Figures
1-6, and we have also shown a scatterplot of the data and the true regres
sion function on each figure. The data consists of n = 200 points
(Xi,Yi ) chosen independently with Xi ~ U(-3,2)and
3 2y. = X./3 + X. + z.1 1 1 1
where Zi is a standard nonnal variable independent of Xi. Thus, for this
data·32m(x) = x /3 + x •
All calculations are for kernel type estimators with kernel function given
by a standard nonnal density function, truncated at + 3 and normalized
so as to be a probability density.
Figures 1 and 2 show the estimators II\,. and 11\,., respectively, with
£n = n-· 2l and Figures 3 and 4 show II\,. and II\,. with slightly less smoothing,
£n = n-· 4 The previously discussed bias of ron is evident at the upper
endpoint on Figures 2 and 4, although II\,. and ron do not differ by very
much at the lower endpoint. The difference in the asymptotic variances
of m and m does not manifest itself in this example, although m inn n nFigure 4 has a slightly more variable appearance than mn in Figure 3.
Figures ,5 and 6 show the approximate confidence bands given by
(3.2.26) for a = .1, and £n = n-· 2l and £n = n-· 4, respectively. The
confidence bands (3.2.26) are asymptotically valid for any subinterval
of [-3,2]. In practice, however, one should consider these confidence
76
bands to be approximately valid only for intervals well within the
support of X, since the earlier remarks on the endpoint bias of mnapply to the confidence bands also. These confidence bands were
calculated using the true conditional second moment
In practice, one would use an estimator of set), e.g. the consistent
estimator
4.3 Further Research.
Theorem 3.2.1 was proved for the process
(nEn)~[m*(t) - Em*(t)]Y (t) = n nn [s(t)f(t)]~
It should be possible to carry out a similar program for the process
*~ Emn(t)(nEn) [rnn(t) - Ef (t) ]
Vn
(t) = _--::;: .....,..~n~_
[02(t) /f(t) ]~ .
A first step in such a proof might be to show the equivalence of Vn to
the process
(in the sense of IIVn - V~ II = 0p ((log n) -~)) . Successive approximations,
as in Theorem 3.2.1 would lead eventually to the equivalence of V~ to
the Wiener process of Theorem 3.2.2, and thus to the asymptotic distri-/
bution of the maximum absolute deviation of Vn.
77
We have not been able to carry out the technical details of the
proof of such a theorem. However, if it were to be proved, one applica-
tion would be a confidence band such as (3.2.26), but based on rnn
instead of rn , and therefore narrower since rn is asymptotically lessn n
variable than rnn'
e• ,
- eESTIMATED REGRESSION, TRUE REGRESSION. SCATTER PLOT
7 . 0~ -- ur--------! ! -_,- ._-
6.0 f-
I5.0 r-
If. 0~I
3.0~
i2.0r--
I
I1.0 r=
........0=;
x
x
xx
x
x
x
x
xx
xo
~
2.01.0o-1.0-2.0
x
I II I-1.0' I
--3.0
-.21Figure 1. The Estimator mn With En = n
ESTIMATED REGRESSION, TRUE REGRESSION, SCATTER PLOT7 .0iii Iii
5.0
6.0
'-J\0
xx
x
~0x x
x x xx "x
x
xx
xx
xo
1.0
if.O
3.0
2.0
2.01.0o-1.0-2.0-1.0 ! , , I I ,
-3.0X
Figure 2 The Estimator ffi With E = n-· 21. n n
e e e•
e e e
TRUE REGRESSION. SCATTER PLOT--I --- -.---.. --~ --.
00o
I
xl!'If><
X
XX
XX
X
X.X
x• I~ X
X ~ _~ x
XX X ~ X
~0x. X
X l'X
X
X
I
6.0Li
I
5.0 r-if. 0~
3.0~I
i2,Or X X
X X X
, X ~ ~ XM<--2c:XX X ~ ~ X
1. f) rx~- I
IoX
iSTIMATeD REGRESSION.7.0 1 ----- --,-- i-
Ii
2.01.0o-1. 0-2.0--1 . 0 ' I , I , ,
--3.0
XFigure 3, The Estimator m With E = n~,4
n n
<Xl:~
2.01.0o
x
-1.0
x
-2.0
if.O
ESTIMATED REGRESSION, TRUE REGRESSION, SCATTER PLOT7 .0iii iii
o
1.0
6.0
5.0
2.0
3.0
-1.0 ' , , , J ,
-3.0x
Ffgure 4. The Estfmator ffin With En = n~·4
e" ••
e e
82
o
o
0 .....N
IC
II
C
eW
-'=~
0.....3
''''- c, I
.--. IE.. I II C
I0
"'0
I QJ(/1
I ttl
ICO
"'0Cttl
~0
caQJ
N uc
I QJ"'0.....4-C0
U
Ln
QJ~
0~
en.....('/') u.
0 0 0 0 0 0 0 0I
r- 1.0 Lf) =:t- (Y) N .--. 0 .--... I
e
83
x
o:::t.I
eII
e ew ...c~.....
0 3: ...e -
.....-l IEI e
0
"'0Q)tilttlo:l
I"'0ettl
CO
I0 Q)
u---,C\J e
I Q)
I "'0.....4-e0u
1.0
Q)S-:::l
0 O'l.....LJ..
0 0 0 0 0 0 0 0 (Y)
1 ..r- ~ If) :::t- ty) C\J .....-l 0 .....-l
I e
1.
2.
BIBLIOGRAPHY
Aizerman, M.A., Bravennan, E.M. and Rozonoer, J.T. (1970). Extrapolation problems in automatic control and the method of potentialfunctions. American Mathematical Society Translations (2)" Vol.87.
Benedetti, J.K. (1974). Kernel estimation of regression functions.Ph.D. Dissertation, Department of Biostatistics, University ofWashington.
3.
4.
....
Bhattacharya, P.K. (1976). An invariance principal in regressionanalysis. Annals of Statistics,,!, no. 3, 621-624.
Bickel, P.J. and Rosenblatt, M. (1973). On some global measures ofthe deviation of density f\.D1.ction estimators. Annals of Mathematical Statistics" ~ 1071-1095.
S. Billinglsey, P. (1968). Convergence of Probability Measures, Wiley.
6. Buck, R.C. (1965). Advanced Calculus, MC-Graw-Hill.
7. Cover, T.M. (1968). Estimation by the nearest neighbor rule. IEEETransactions on Information Theory" IT-14" SO-55.
8. Cover, T.M. and Hart, P.E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Info~ation Theory" IT-1J" 21-27 •
9. Devroye, L.P. (1978). The uniform convergence of nearest neighborregression function estlinators and their application in optimization. IEEE Transactions on InfoT'rrtation Theory" IT-24" no. 2.
10. Fisher, 1. and Yakowitz, S. (1976). Uniform convergence of the potential function algorithm. SIAM Journal on Control" J:.i., 95-103.
11. Fix, E. and Hodges, J.L., Jr. (1951). Discriminatory analysis,nonparametric discrimination, consistency properties. RandolphField, Texas, Project 21-49-004, Report no. 4.
12. Kornl6s, J., Major, P. and Tusn:ldy, G. (1975). An approximation ofpartial sums of independent R.V. 's and the sample DF, I.z. Wahrscheinliahkeitstheorie und Ve~. Gebiete" ~ 111-131.
13. Konakov, V.D. (1977). On a global measure of deviation. for an estimate of the regression line. Theor,?, of ProbabiUty and Appliaations" ~ no. 4, 858-868.
14. Lai, S.L. (1977). Large sat'lple properties of K-nearest neighborprocedures. Ph.D. dissertation, Department of Mathematics,UCLA.
85
15. Leadbetter, M.R. (1963). On the nonpararnetric estimation of probab- ..i1ity densities. Ph.D. dissertation, Department of Statistics, ..,University of North Carolina at Chapel HilL
16. Lo~ve, M. (1963). Pl'obabiUty Theory~ 3rd ed., D. Van Nostrand Co.
17. Loftgaarden, D.O. and Quesenberry, C.P. (1965). A nonparametricestimate of a multivariate density ftmction. Annals of MathematiaalStatistias~ ~ 1049-1051.
18. Masani, P. (1968). OlathogonaUy Saatter>ed MeasUI'es~ AdvancedMathematics.
19. Nadaraya, E.A. (1964). On estimating regression. Theory of Pl'obability and Appliaations~ ~ 141-142.
20. Parzen, E. (1962). On estimation of a probability density ftmction.Annals of Mathematiaal Statistias~ ~ 1065-1076.
21. Priestly, M.B. and Chao, M.T. (1972). Nonpararnetric ftmction fitting.JOUI'Ylal of the Royal Statistwal Soaiety~ Benes B~ ~. 385-392.
22. R~v~sz, P. (1976). On strong approximation of the multidimensionalempirical process. Annals of Pl'obability~ ~ 729-743.
23. Rosenblatt, M. (1952). Remarks on a 1!Rl1tivariate transfonnation.Annals of Mathematiaal Statistias~ ~ 470-472.
24. Rosenblatt, M. (1956). Remarks on some nonpararnetric estimators ofa density ftmction. Annals of MathematiaaZ Statistias~ ~
832-837.
25. Rosenblatt, M. (1971). Curve estimates. Annals of MathematiaalStatistias~ ~ 1815-1842.
26. Rosenblatt, M. (1976). On the maximal deviation of a k-dimensiona1density estimator. Annals of Pl'obability~ ~ 1009-1015.
27. Schuster, E.F. (1972). Joint asymptotic distribution of the estimatedregression function at a finite number of distinct points. Annalsof Mathematiaal Statistias~ ~ 84-88.
28. Schuster, E.F. and Yakowitz, S. (1979). Contributions to the theoryof nonpararnetric regression, with applications to system identification. Annals of Statistias~ ~ 139-149.
29. Stone, C. (1977). Consistent nonpararnetric regression. Annals ofStatistias~ ~ 595-645.
30. Watson, G.S. (1964). Smooth regression analysis. Sankhya~ Senes A~
~ 359-372.
"
J
•
...
86
31. Watson, G.S. and Leadbetter, M.R. (1963). On the estimation of theprobability density, I. AnnaZs of MathematiaaZ Statistias~ ~
480-491.
32. Watson, G.S. and Leadbetter, M.R. (1964). Hazard analysis, II •Sankhya~ Series A~ ~ 101-116. .
33. Wegman, E.J. (1972a). Nonparametric probability density estimation I:A SUlJD11ary of available methods. Teahnometl'ias~ J.iJ 533-546.
34. Wegman, E. J • (1972b). Nonparametric probability density estimation II:A comparison of density estimation methods. Jou.:rrnaZ of StatistiaaZComputation and SimuZation~ ~ 225-245.
35. Yang, S.S. (1977). Linear functions of concomitants of order statistics. Tedmica1 Report No.7, Department of Mathematics, M.!.T.
•
•
UNCLASSIFIEDSECURITY CLASSIFICATION OF THIS PAGE (Wh.n D.t. Ent.r.d)
REPORT DOCUMENTAllON PAGE READ INSTRUCTIONSBEFORE COMPLETING FORM
l. REPORT NUMBER r' GOVT ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER
4. TITLE (end Sublltl.) 5. TYPE OF REPORT a PERIOD COVERED
Smooth Nonparametric Regression Analysis TECHNICAL6. PERFORMING O~G. REPORT NUMBER
Mimeo Series No.7. AUTHOR(.) h8§ir%C_T2~98RANTNUMBER(.)
Gordon JolmstonONR NOOO14-75-C-0809
9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASKAREA a WORK UNIT NUMBERS
Department of StatisticsUniversity of North CarolinaChapel Hill, North Carolina 27514
11. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE
1. Air Force Office of Scientific Research September 1979Bolling AFB, DC 13. NUMBER OF PAGES
2. Office of Naval Research 8614. MO~1~gt3~~CV.KAM2~21Q)'RESS(1f dllf.r.nt from Controllln, Oft/c.) 15. SECURITY CLASS. (of thl. r.port)
UNCLASSIFIED
15•. DECLASSIFICATION/DOWNGRADINGSCHEDULE
16. DISTRIBUTION STATEMENT (01 thl. R.port) I
Approved for Public Release: Dist.ribution Unlimited
17. DISTRIBUTION STATEMENT (01 th • • batr.ct .nter.d In Block 20, If dlU.rent from R.port)
18. SUPPLEMENTARY NOTES
Ph.D. dissertation under the joint direction of Professors R.J. Carrolland MuR. Leadbetter
19. KEY WORDS (Contlnu. on r.v.r...Id. If n.c.....,. end Id.ntlfy by block numb.r)
Regression function, Watson-type estimator, Pointwise weak consistency,Asymptotic joint nonna1ity.
20. ABSTRACT (Conl/nu. on r.v.r•••Id. If n.c••••ry end Id.ntlfy by block numb.r)
This work investigates properties of the Watson type estimator of the re-gression function E[YIX=x] of a bivariate random vector (X,Y) • Pointwiseweak consistency of the estimator is demonstrated. Sufficient conditionsare given for asymptotic joint normality of the estimator taken at a finitenumber of distinct points, and for the uniform consistency of the estimatorover a bounded interval. A large sample confidence band for the regressionftmction, based on the estimator, is derived.
DO FORM1 JAN 73 1473 EDITION OF 1 NOV 55 IS OBSOLETE UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE (Wh.n D.t. Ent.r.d)
..
SECURITY CLASSIFICATION OF THIS PAGE("".. D.t. Ent.,.d)
SECURITY CLASSI,.,CATIOII OF Tu>< PAGE(Wh.n Data E,. .. ,
•
..
•