optimal kernel weights under a power criterion

4
Optimal Kernel Weights Under a Power Criterion Author(s): Stephen Blyth Source: Journal of the American Statistical Association, Vol. 88, No. 424 (Dec., 1993), pp. 1284- 1286 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2291268 . Accessed: 14/06/2014 11:03 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 62.122.76.45 on Sat, 14 Jun 2014 11:03:23 AM All use subject to JSTOR Terms and Conditions

Upload: stephen-blyth

Post on 21-Jan-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Optimal Kernel Weights Under a Power CriterionAuthor(s): Stephen BlythSource: Journal of the American Statistical Association, Vol. 88, No. 424 (Dec., 1993), pp. 1284-1286Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2291268 .

Accessed: 14/06/2014 11:03

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 62.122.76.45 on Sat, 14 Jun 2014 11:03:23 AMAll use subject to JSTOR Terms and Conditions

Optimal Kernel Weights Under a Power Criterion Stephen BLYTH*

We develop an approach to choosing optimal kernel weights for nonparametric estimation of the slope of the regression function which uses maximization of power, rather than minimization of integrated mean squared error (IMSE), as its optimality criterion. This power criterion leads to optimal kernel weights whose derivation is simpler than under other criteria and which provides an intuitive understanding of the nature of the optimality.

KEY WORDS: Nonparametric regression; Optimality criteria; Power.

The question of which kernel weights to use arises naturally when considering nonparametric estimation of the regression function, the derivatives of the regression function, or den- sity. Many papers that address the problem of determining an optimal kernel in such cases use minimization of inte- grated mean square error (IMSE) as the optimality criterion. For example, Epanechnikov (1969) obtained his familiar optimal parabolic kernel that minimizes the IMSE of a class of density estimator. Gasser and Muller (1984) and Gasser, Muller, and Mammitzsch (1985) considered kernel estima- tors of the regression function and its derivatives and found kernels that minimized the IMSE of these estimators.

However, although minimization of IMSE is widely used as an optimality criterion, it is by no means the only possible candidate. An alternative choice of criterion that has advan- tageous attributes is advocated here; namely, maximization of power of a test. This maximization of power criterion was previously used by Ghosh and Huang (1991) to determine an optimal kernel for the Bickel-Rosenblatt test.

The derivation of optimal kernel weights under a power criterion is straightforward and provides an intuitive under- standing of the nature of the optimality. We detail the case of inference concerning the slope of the nonparametric regression function; however, our approach generalizes in a straightforward manner to the case of inference concerning the regression function or the second derivative of the regres- sion function.

1. THE POWER OPTIMALITY CRITERION

We consider inference concerning the slope of the non- parametric regression function, g'(x) = (d/dx)E(Y I X = x). A natural test is Ho: g'(xo) = 0 versus Ha: g'(xo) > 0, where xo is some covariate value of interest. This hy- pothesis addresses the question of whether the conditional mean of Y changes with the covariate at the value X = xo, and it is of interest in many cases. For example, the corre- lation curve investigated by Doksum, Blyth, Bradlow, Meng and Zhao (1993) vanishes at a point if and only if the regres- sion slope vanishes at that point. The common linear hy- pothesis-namely, that the regression coefficient vanishes- has the natural local nonlinear equivalent, that g'(xo) = 0

* Stephen Blyth is a SERC Postdoctoral Research Fellow, Department of Mathematics, Imperial College, London SW7 2BZ, U.K. Much of this work was completed while the author was at the Department of Statistics, Harvard University, Cambridge, MA 02138. The author was partially sup- ported at Harvard University by NSF grants DMS-90-03216 and DMS-9 1- 06752. The author thanks Kjell Doksum, two referees, and the associate editor for helpful suggestions on drafts of this article.

for some xo. It thus is also natural to consider the test that the regression slope vanishes at a point and investigate the power of that test. The optimality criterion associated with this test is maximization of power among some class of test statistics. In the next section we obtain optimal kernel weights under this criterion.

2. RESULTS

We adopt the "fixed design" model (see, for example, Hardle 1990). That is, for n > 0, we consider the covariate values to be nonrandom values xin < ... < xnn. We assume that Yin, i = 1, ..., n are independent random variables with E(Yin) = g(xin) and var(Yin) = o2(xin), where g(x) has a second derivative bounded at xo and a2(x) has a first derivative bounded at xo. Further we assume that xin = F-((i - 1/2)/n), i = 1, . . ., n, for some cdf F that admits a densityf( x) withf( xo) 0 0 and such thatf' exists at xo. This generalizes the condition that the xin be equally spaced. We henceforth make implicit the dependence of Xin on n and write the observations ((xl, Y1), . . ., (xn, Yn)). We write

Yi = g(xi) + U(xi)ci,

where the ei are independent with mean 0 and variance 1 but have otherwise arbitrary distribution functions Gi.

Inference at xo is based on

Tk,x =O -

Wki(XO)yi, iEIk(xO)

where Ik,(xo) is the set of indices of the k/2 xi nearest to and less than xo, and the k/2 xi nearest to and greater than xo. Wk.i(xo) are the weights to be optimally chosen. (k is taken to be even; the dependence of k on n is understood.)

Remark 1. The equivalence between the kernel weights and kernels is established through the correspondences

Wki(Xo) + cKh,(Fn(xi) - Fn(XO))

and

k 2n

where K is a kernel with support [-1, 1], Kh(x) = (1 / h)K(x/ h), Fn is the empirical distribution function of the X's, and c is a normalizing factor independent of i.

? 1993 American Statistical Association Journal of the American Statistical Association

December 1993, Vol. 88, No. 424, Theory and Methods

1284

This content downloaded from 62.122.76.45 on Sat, 14 Jun 2014 11:03:23 AMAll use subject to JSTOR Terms and Conditions

Blyth: Optimal Kernel Weights 1285

For inference concerning the slope of the nonparametric regression function, we use kernel weights of order (1, K), K

2 1 odd (see Gasser et al. 1985, p. 239). Thus we have the condition on the kernel weights

z Wki(XO) = 0. iEIk( XO)

Expanding g(xi) around u(xo), we have

E(Tk,xo) = Wki(XO)(I,L(XO)(xi - xO) + O(xi -XO)2) IEIk(XO)

Using the expansion

(Xi - XO) = (i - io)/(nf(xo)) + O((i - io)2In2),

we obtain after relabeling indices

E(Tk,xo) = E2 Wkj(xo) nf(xo) + O(j /n2) (1) j=-(k/2) \n(o

and k/2

var(Tk,xo) = W2j(xO)(o2(xO) + O(j/n)). j=-(k/2)

Under the Lindeberg-Feller condition (see, for example, Chung 1968, p. 187),

Tk,xo- E(Tk,xo) N(O, 1). fvar( Tk,Xo) n-oo

k--oo

By Slutsky's theorem, (see, for example, Bickel and Doksum 1977, p. 461), if we use a consistent estimator U-(xo) to es- timate o(xo), then (provided that k5/n4 0 as n oo, k

0 oo) we can ignore the error term in (1), and under Ho

(XO) V_Z___k/1 --

k (X ) N(O, 1).

The left side of this expression is our test statistic. Under these conditions we have that the asymptotic power

of the size a test of Ho: g'(xo) = 0 versus Ha: g'(xo) > 0 is

-z+ g'(xo) J=-(kI2) iWkJ(XO) ~ (2) ( (xo)f(xo)n J/2(k/2) Wkj(Xo) /

Provided that (n2/k3) 0 as n oo, then under the local alternatives Ha(n): A(fn)(xO) dnk-(312), d > 0, the limiting power is nontrivial. This condition on the rate of k is closely related to the condition that an estimator of g'(xo) based on Tk,xo be consistent (see Doksum et al. 1993).

Maximization of the power in terms of the weights depends solely on the term

j=-(k/2) j Wkj(Xo)

By Cauchy-Schwarz,

=-k/2) J=(/2) j=-(k/2)

Therefore, optimal weights are determined immediately; they are of the form Wkj(xo) = ak], where ak does not depend on j. The optimal weights correspond to a kernel of form K(u) = u, which is of the same shape as the order (1, 3) variance minimizing kernel of Gasser et al. (1985, p. 243).

These results have a natural explanation. The assumptions here and in the rest of the nonparametric regression literature imply that the regression function is locally linear near xo. Therefore, the weights maximizing the asymptotic power of the test that the regression slope vanishes at xo are those that account for this local linearity. Note that the statistic Tk,xo under the optimal weights is essentially the same as the local least squares regression slope. Thus this local result proves optimality of the local least squares estimate among statistics of the form Tk,xo for arbitrary regular error distributions Gi .

Remark 2. The use of kernels of order (1, K) for inference concerning the regression slope corresponds to the use by Gasser and Muller (1984) of the same family of kernels for estimation of the regression slope. Thus the present inference- driven optimality result is directly comparable to the MSE optimality results in Gasser et al. ( 1985), both being derived using kernels from this family.

When using kernel weights that do not sum to 0, we are in effect adding a local average of the Yi to our test statistic. To see this, suppose that I Wkj( xo) = W(xo) 0 0. Then we can write Tk,xo = I (Wkj(xo) - W(xo)/k)Yj + W(xo) X E Yj/k, where now the weights Wkj(xo) = Wkj(xo) - 'W(xo)/k sum to 0. When testing whether the regression slope is 0, adding this average does not increase the power of the test.

Remark 3. The Lindeberg-Feller condition holds in the case when the ei are identically distributed with law Fo, where Fo does not depend on xl, . . . , x, or n, and where 0 < var(ei) < oo, provided

lim maxiElk(XO){ 2a(Xi)V( k Xo)} k- oo 'iEIk(XO) a k(xi)Wi(xo)

(see, for example, Hajek and Sidak 1967, p. 153).

3. CONCLUSION

The approach to determining optimal kernel weights using a power optimality rather than a IMSE criterion possesses intuitive accessibility and numerical simplicity. The optimal weights are found in a straightforward manner using the Cauchy-Schwarz inequality. The optimal weights are rela- tively simple in shape, being linear in the index j. This is similar to the experience of Ghosh and Huang ( 1991), who found a uniform optimal kernel for maximizing the power of the Bickel-Rosenblatt test.

[Received July 1991. Revised November 1992.]

REFERENCES

Bickel, P. J., and Doksum, K. A. (1977), Mathematical Statistics: Basic Ideas and Selected Topics, Oakland, CA: Holden-Day.

Chung, K. L. (I1968), A Course in Probability Theory, New York: Harcourt, Brace and World.

This content downloaded from 62.122.76.45 on Sat, 14 Jun 2014 11:03:23 AMAll use subject to JSTOR Terms and Conditions

1286 Journal of the American Statistical Association, December 1993

Doksum, K. A., Blyth, S. J., Bradlow, E., Meng, X-L., and Zhao, H. (1993), "Correlation Curves as Local Measures of Variance Explained by Regres- sion," submitted to Journal of the American Statistical Association.

Epanechnikov, V. A. (1969), "Nonparametric Estimation of a Multivariate Probability Density," Theory ofProbability and its Applications, 14, 1533- 158.

Gasser, T., and Muller, H. G. (1984), "Estimating Regression Functions and Their Derivatives by the Kernel Method," Scandinavian Journal of Statistics, 11, 171-185.

Gasser, T., Muller, H. G., and Mammitzsch, V. (1985), "Kernels for Non- parametric Curve Estimation," Journal of the Royal Statistical Society Series B, 47, 238-252.

Ghosh, B. K., and Huang, W-M. (1991), "The Power and Optimal Kernel of the Bickel-Rosenblatt Test for Goodness of Fit," The Annals of Sta- tistics, 19, 999-1009.

Hajek, J., and Sidak, Z. (1967), Theory ofRank Tests, New York: Academic Press.

Hardle, W. (1990), Applied Nonparametric Regression, New York: Cam- bridge University Press.

This content downloaded from 62.122.76.45 on Sat, 14 Jun 2014 11:03:23 AMAll use subject to JSTOR Terms and Conditions