tesis doctoral de la universidad de alicante. tesi...

92
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

Upload: others

Post on 29-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

    www.adobe.es/products/acrobat/readstep2.htmlwww.adobe.es/products/acrobat/readstep2.htmlwww.adobe.es/products/acrobat/readstep2.htmlwww.adobe.es/products/acrobat/readstep2.html

  • SI • .

    1.

    Some Practical Problems of Recent Nonparametric Procédures: Testing, Estimation,

    and Application.

    Jorge Barrientos-Marin

    Advisor: Stefan Sperlich

    Quantitative Economies Doctorale Departamento de Fundamentos del Análisis Económico

    Universidad de Alicante

    January 2007

    mff' /',*. i~

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • To my wife and my family.

    1

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Agradecimientos

    Los artículos que componen esta tesis es el resultado de cinco años de trabajo continuo. Pero sin duda, esto no habría sido posible sin la colaboración de muchas personas. Quiero expresar mi gratitud para con todas ellas. Sí alguno se queda sin mencionar, lo más posible es que mi memoria, como es usual, me juegue una mala pasada. Quiero entonces expresar mi reconocimiento a los miembros del departa-mento de Fundamentos del Análisis Económico, a mis profesores y especialmente a mis condiscípulos, ellos hicieron estos cinco años soportables lejos de casa. Especial reconocimiento entre profesores merecen Antonio Villar, quien confió en mi siempre y fue consejero en momentos difíciles, a Juan Mora por proveerme ánimo y ratos agradables discutiendo resultados y teoremas, al igual que Javier Alvarez y a Lola, quienes con sus excelentes cursos me animaron a seguir el camino de la econometría.

    Entre mis condiscípulos agradezco a Alicia quien siempre ha sido una amiga. Agradezco a Ricardo su ayuda e innumerables favores (muchos de ellos pecuniarios) y a Paco, Silvio y Szaby su compañía placida y su amistad sincera. A Fafael López, su buen humor e intelegiencia fueron un reto para mi. Agradezco a José Maria su gran aprecio para conmigo, algo que es mutuo, y su generosidad, estos años habrían sido menos divertidos y algunas navidade tristes sin su amistad.

    No puedo dejar de mencionar al personal administrativo (Mercedes, Mariló, Julio, Carlos y Lourdes) siempre estuvieron atentos a ayudarme y tuvieron paciencia para mis innumerables solicitudes.

    Agradezco también a Frédéric Ferraty y a Philippe Vieu su dedicación, ellos me proveyeron la mejor atmósfera para hacer uno de los capítulos que componen esta tesis. Aquí merece mención Juan y Mónica, quienes me acogieron en su casa y siempre fueron compañía, además de introducirme, en modo nada superficial, en los aspectos de la vida francesa.

    Agradezco a mi familia, en especial a Patricia, mi esposa, quien me ha poyado todos estos años de semi-soledad a la espera de que esto acabara, siempre con pa-ciencia y optimismo. A mi madre, quien sé que mi ausencia siempre la entristeció. A mis tías, para quienes soy un orgullo. A José y Leticia Restrepo por ayudar a Patricia a llevar la carga de la soledad.

    Agradezco a mis amigos en Colombia, a Mauricio Alviar y a Pedro, quienes desde un comienzo creyeron que esto era posible de alcanzar. Menciono también a Alejandro Gaviria, quien continua enseñándome a pensar como un economista, me dio además consejos acertados en el momento justo.

    Finalmente, un reconocimiento especial merece Stefan Sperlich, quien me ha en-señado mucho de econometría semi y noparametrica. Estaré siempre agradecido con él, porque se preocupó de que esto terminara bien y ha sido un director excepcional aún desde la distancia.

    2

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Contents

    Agradecimientos 2 Introduction and Summary 5 Introducción y Resumen en Español 8

    1 The Size Problem of Kernel Based Bootstrap Tests when the Nuil is Nonparametric 12 1.1 Introduction 12 1.2 Statistical Methods: Estimators and Test Statistics 14

    1.2.1 Estimators 14 1.2.2 Test Statistics 15

    1.3 Resampling and Choice of Parameters 17 1.3.1 Bootstrap Tests 18 1.3.2 The Choice of Bandwidths h 19 1.3.3 The Choice of Bandwidths k 19 1.3.4 The Choice of Bootstrap Residuals 20 1.3.5 An Alternative: Subsampling 21 1.3.6 The Choice of Bootstrap Bandwidth/i;, 23

    1.4 Simulation Results 23 1.5 Conclusions 27 Références 29

    2 Estimating and Testing An Additive Partially Linear Model in a System of Engel Curves 37 2.1 Introduction 37 2.2 Additive Partially Linear Model and Testing Hypothesis 40 2.3 The Shape of Engel Curves and Spécification Testing 45

    2.3.1 Data Used in this Application 48 2.3.2 Some Pictures of the Expenditure expenditure-Log Total Ex-

    penditure Relationship 49 2.3.3 Spécification Testing 56

    2.4 Conclusions and Future Research 59 Références 61

    3

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • 3 Locally Modelled Régression and Functional Data 64 3.1 Introduction 64 3.2 Position of the Problem 67 3.3 Functional locally modeled régression 68

    3.3.1 The p-dimensional case 68 3.3.2 The infinite-dimensional case: the functional setting 69

    3.4 FFLM kernel-type estimator: asymptotic behavior 73 3.5 FFLM régression in action 77 3.6 Conclusions 80 Références 88

    4

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Introduction and Summary

    This thesis is composed of three chapters, in which we focus on three related, but

    différent, issues regarding testing, estimation and theoretical developments1. More

    precisely, in Chapter 1, "The Size Problem of Kernel Based Bootstrap Tests when

    the Nuil is Nonparametric", we are interested in clioosing an appropriate smooth-

    ing parameter, a problem that is fundamental for the reasonable use of non- and

    semiparametric methods. In particular for testing, we make note the this problem

    is not équivalent to the one in régression. At least from a theoretical point of view,

    the optimal smoothing parameter for testing has différent rates from those which

    are optimal for estimation.

    While there exists an increasing literature on how to find a proper smoothing

    parameter for the nonparametric alternative, almost nothing is known on how to

    choose a smoothing parameter in practice for the nuil hypothesis if it is also semi- or

    nonparametric. We do know that at least asymptotically oversmoothing is necessary

    in the pre-estimation of the nuil model for generating the bootstrap samples, see

    Hardie and Marron (1990,1991). However, in practice this knowledge is of little

    help. The same can be said about various parameters and procédures to be chosen

    in practice when performing such tests. In this Chapter, we discuss ail thèse choice

    questions. In particular we study the problem of bandwidth choice for the pre-

    estimation to genérate bootstrap samples. As an alternative, we also discuss briefly

    the possibility of subsampling.2.

    In Chapter 2, "Estimating and Testing An Additive Partially Linear Model in

    a System of Engel Curves ", we focuses on an application of additive partial linear

    model and some ideas extracted from applications on Chapter 1. Our main goal is

    to make an application to consumer theory. More exactly, to Engel curves Systems.

    The form of the Engel curve has long been a subject of discussion in applied econo-

    Chapter 1 is a joint work with Stefan Sperlich and Chapter 3 is a joint work with Frédéric Ferraty and Philippe Vieu.

    2The authors gratefully acknowledge financial support from the Spanish DG de Investigación del Ministerio de Ciencia y Tecnología. SEJ2004-04583/ECON.

    5

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • metrics and until now there has no been definitive conclusion about its form. In this

    Chapter an additive partially linear model is used to estimate semiparametrically

    the effect of total expenditure in this context. Additionally, we consider the non-

    parametric inclusion of some regressors which traditionally have a non linear effect

    such as age and schooling. To that end we compare an additive partially linear

    model with the fully nonparametric one using recent popular test statistics. Be-

    cause of inference in nonparametric regression can take place in a number of ways,

    the most natural is to use nonparametric regression as an alternative against a fully

    parametric or semiparametric null hypothesis. Then, for investigating purpose we

    check whether an additive PLM provides a reasonable adjustment to our data using

    different resampling schemes to obtain critical (p-values) computed by bootstrap

    and subsampling schemes for the proposed test statistics.

    Additionally, in this Chapter, we dealing with a well-known problem very com-

    mon in the context of Engel curves, it is that total expenditure may well be jointly de-

    termined with expenditure on different goods. Therefore, endogeneity problem may

    arise. In order to solve this problem we are interested in applying nonparammetric

    constructed regressors as instrumental variables. In particular, we use the nonpara-

    metric two step with generated regressors and constructed variables (NP2SCV) due

    to Sperlich (2005). Our feeling is that a generated variables approach in combination

    with additive PLM can help us to overcome to some extent any possible endogeneity

    problem and that is exactly the procedure implemented in this Chapter.

    In Chapter 3, "Locally Modelled Regression and Functional Data"3, we are in-

    terested in extend nonparametric methods when the regressors are functions (i.e.

    one observation could be curve, surface or any other object lying into an infinite

    dimensional space). From a statistical pint of view, this corresponds to a functional

    regression setting because on wishes to predict a response Y from an explanatory

    functional variable X. In addition, only regularity conditions on regression operator

    J Acknowledgement . The authors thank gratefully the members of the working group STAPH (http : //www.lsp.ups — tlse.fr/staph) for their helpful comments and discussions. In addition, the first author acknowledges financial support from the Spanish Ministry of Education and Science, under project BEC2001-0535

    6

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

    http://www.lsp.upshttp://�http://tlse.fr/staph

  • are assumed. Then, this leads us to the nonparametric context. So, the problematic

    of this work deal with the nonparametric functional regression. Recently, there are

    several works dealing with the nonparametric functional regression (see for instance

    Ferraty and Vieu (2002, 2005)). This nonparametric functional regression method is

    essentially based on an extension of the well-known Nadaraya(1964)-Watson(1964)

    kernel regression estimator of the regression, to the case of functional explanatory

    variable. On the other hand, local linear ideas have been developed in the regression

    context for univariate and multivariate explanatory variable, see Wand and Jones

    (1995) for an overview of this topic. Therefore, our work can be considered as an

    extension, which is a combination, of the nonparametric local constant method with

    the ideas of functional variable. So, the aim of this setting does not make easy both

    the asymptotic study and the implementation of a natural generalization of the mul-

    tivariate local linear method. Therefore, one focuses on a simpler and faster local

    approach. Asymptotic properties are stated, and a functional dataset illustrates the

    good behavior of this fast functional local modelled regression method.

    7

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Introducción y Resumen en Español

    Esta tesis esta compuesta por tres capítulos, los cuales se centran en tres diferentes

    problemas, aunque relacionados, estos van desde estimación y contrastes de hipótesis

    hasta desarrollos teóricos. Más exactamente, en el Capítulo 1, " The Size Problem of

    Kernel Based Bootstrap Tests when the Null is Nonparametric", nosotros estamos

    interesados en la selección apropiada de un parámetro de suavización, un problema

    que es fundamental para un razonable use de los métodos semi y noparamétricos.

    En particular, para contrastes de hipótesis, nosotros notamos que este problema no

    es equivalente aquel que se presenta en análisis de regresión, esto es en la simple

    estimación. Al menos desde un punto de vista teórico, la selección del parámetro

    para contrastes de hipótesis tiene tasas (de convergencia) diferents a las que se

    supone debe tener los parámetros que son óptimos para la estimación.

    Mientras que existe una creciente literatura sobre el modo de hallar un parámetro

    apropiado para la hipótesis alternativa, casi nada es sabido sobre como elegir un

    parámetro de suavización en la práctica para la hipótesis nula, si esta es también

    semiparamétrica o incluso noparamétrica. Solo sabemos que asintóticamente una

    parámetro sobresuavizado es necesario en la preestimación del modelo bajo la nula

    para generar las muestras bootstrap, ver al respecto Hárdle and Marrón (1990,1991).

    Sin embargo, en la práctica este conocimiento es de poca ayuda. Lo mismo puede

    decirse acerca de varios parámetros y procedimientos a ser elegidos en la práctica

    cuando hacemos un uso de un procedimiento de contraste. En este Capítulo en-

    tonces, nosotros discutimos estas cuestiones acerca de la selección. En particular,

    nosotros estudiamos el problema de la selección del parámetro de suavizado en la

    pre-estimación para generar las muestras bootstrap. Como alternativa, también

    discutimos brevemente la posibilidad de submuestras.

    En el Capítulo 2, "Estimation and Testing An Additive Partially Linear Model

    in a System of Engel Curves", nosotros nos centramos en la aplicción de modelos

    aditivos parcialemente lineales basados en algunas ideas del Capítulo 1. Nuestra

    meta es hacer una aplicación en teoría del consumidor. Específicamente a sistemas

    8

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • de curvas de Engel. La forma de la curva de Engel ha sido por mucho tiempo objeto

    de investigación en econometría aplicada y hasta el momento no hay conclusiones

    definitivas sobre su forma. En este capítulo un modelo parcialmente aditivo es us-

    ado para estimar semiparametricamente el efecto del gasto total. Adicionalmente,

    consideramos la inclusión noparamétrica de algunos regresores que tradicionalmente

    tienen un efecto no-lineal como la edad y la escolaridad. Para llevar a cabo este

    trabajo, comparamos un modelo aditivo parcialmente lineal con un modelo plena-

    mente noparamétrico usando algunos estadísticos de contraste recientemente desar-

    rollados. Puesto que infererencia en regresión noparamétrica puede ser hecha de

    varias maneras, lo más natural es usar la regresión noparamétrica como hipótesis

    alternativa contra una hipótesis nula semiparametrica. Entonces, para propósito

    de investigación nosotros chequeamos si un modelo PLM proporciona un razonable

    ajuste a los datos usando diferentes métodos de reemuestreo para obtener valores

    críticos calculados con bootstrap y submuestras de los mencionados estadísticos de

    contraste.

    En este capítulo, nosotros también tratamos un problema común el contexto

    de las curvas de Engel, y es que el gasto total esta conjuntamente determinado

    con el gasto en los diferentes bienes. Por ello existe una endogenidad potencial.

    Para resolver este problema usamos regresores construidos como variables instru-

    mentales, en adición a variables en otras bases de datos. En particular, nosotros

    usamos el método desarrollado por Sperlich (2005) llamado regresores noparametri-

    camente generados o construidos en dos pasos (NP2SCV). Nuestra sensación es que

    ciertamente (NP2SCV) en combinación con modelos aditivos parcialmente lineales

    ayudan a eliminar la endogeneidad en la estimación de las curvas de Engel.

    En el Capítulo 3, "Locally Modelled Regression and Functional Data", nosotros

    estamos interesados en extender los métodos noparametricos cuando los regresores

    son funciones (i.e. una observación podría ser una curva, una superficie o cualquier

    otro objeto perteneciente a un espacio de dimensión infinita). Desde un punto

    de vista estadístico, esto corresponde a una regresión funcional, porque deseamos

    predecir un^ variable respuesta Y de una variable explicativa funcional X. Además,

    9

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • solo condiciones de regularidad con impuestas al operador de regresión son asumidas.

    Esto conduce entonces a un contexto noparamétrico. Así que la problemática de

    este trabajo trata de la regresión funcional noparamétríca. Recientemente, varios

    artículos tratan con la regresión funcional noparamétríca (ver por ejemplo Ferraty

    and Vieu (2002, 2005)). Estos consiste esencialmente en la extensión de estimador

    kernel Nadaraya(l964)-Watson(1964) a el caso de variable explicativa funcional. De

    otro lado, ideas de regresión local han sido desarrolladas en el contexto de regresión

    univariante y multivariante, ver Wand and Jones (1995). Por tanto, nuestro método

    es una extensión, que es una combinación de los métodos de regresión locales con

    las ideas actuales de variables funcionales. Así pues, la meta nos es fácil en cuanto

    al estudio asintótico y la implementación de una más que natural generalización

    del método lineal local multivariante. Por tanto, nos centramos en una más simple

    y rápida aproximación local. Las propiedades asintóticas son establecidas y datos

    funcionales ilustran el buen comportamiento de este método rápido de regresión

    local.

    10

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • REFERENCES

    Ferraty, F and P. Vieu (2004). Nonparametric Models For Functional Data, with

    Applications in Regression, Time Series Prediction and Curve Discrimination. Non-

    Parametric Statistics, 16, 1-2, 111-125.

    Ferraty, F and P. Vieu (2006). Nonparametric Modelling for Functional Data Analy-

    sis. Theory and Practice. Springer, New York (In print).

    Hardle, W and J.S Marrón (1990) Semiparametric Comparison of Regression Curves.

    Annals of Statistics, 18, 63-89.

    Hardle, W and J.S Marrón (1991) Bootstrap Simultaneous Bars For Nonparametric

    Regression. Annals of Statistics, 19, 778-796.

    Sperlich, S. (2005). A Note on Nonparametric Estimation with Constructed Vari-

    ables and Generated Regressors. Working Paper. Universidad Carlos III.

    Wand, M. P and M. C. Jones (1995) Kernel Smoothing. Monographs on Statistics

    and Applied Probability, 60. Chapman & Hall.

    Watson, G. S (1964) Smooth Regression Analysis. Sankhya Ser. A 26.

    11

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1

    The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonpar ametric

    1.1 Introduction

    IN B O T H A P P L I E D AND M A T H E M A T I C A L STATISTICS, N O N - AND S E M I P A R A M E T R I C

    SPECIFICATION TESTING is still quite a popular research field. Any internet search

    engine can find several hundred papers dealing with this topic even when looking

    at the last five years only. Therefore, it is surprising that so few of them study

    the problem of choosing an appropriate smoothing parameter, a problem that is

    fundamental for the reasonable use of these methods. Unfortunately, for testing this

    problem is not equivalent to the one in regression. It is well known that, at least

    from a theoretical point of view, the optimal smoothing parameter for testing has

    different rates from those which are optimal for estimation.

    In the last couple of years there has been a growing amount of literature on

    adaptive testing. In most cases, the adaptiveness refers to the smoothness of the

    alternative and deals with the choice of smoothness parameter for the alternative, or

    the test statistic, see e.g. Ledwina (1994), Spokoiny (1996,1998), Kallenberg & Led-

    wina (1995), Hardle et al (2001), Horowitz & Spokoiny (2001), Guerre & Lavergne

    (2005). Even though these methods have so far had little direct impact in the sense

    that we could not find published papers using these methods (in practice or in the-

    12

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    ory), they have been useful in determining a better understanding of the problem.

    However, to our knowledge, all these papers concentrate on testing problems where

    the null hypothesis is fully parametric. It is not clear to what extend these meth-

    ods help if the null hypothesis is semi- or nonparametric. This is not such a rare

    situation, since additivity tests already belong to this family. When bootstrap is

    used to determine the critical value, these tests entail at least one more parameter

    choice problem: pre-estimating the model under the null hypothesis to later gen-

    erate the bootstrap samples. This is necessary as in most cases the bandwidths

    for the estimation and the bootstrap should have different rates, see e.g. Hardle &

    Marrón (1990,1991). Although these authors have already mentioned the problem

    of choosing an appropriate bandwidth, in practical applications this problem has

    hardly been addressed. As a consequence, in most published procedures for test-

    ing or constructing confidence bands with a semi- or nonparametric null hypothesis,

    there is no guarantee that the test holds the level, or the bands the nominal coverage

    probability. This has recently been confirmed in the work of Dette et al. (2005)

    and Rodríguez-Póo et al (2004). However, in the former it is not referred to as a

    bandwidth problem but rather as a problem of correlated designs and dimension-

    ality because the size distortion is much smaller for uncorrelated design. In the

    latter paper the problem is avoided by using subsampling instead of bootstraps. It

    should also be mentioned that in that simulation study, the authors face basically a

    parametric bootstrap drawing the bootstrap errors from a distribution known up to

    a certain parameter. Although that unknown parameter depends on nonparametric

    nuisance parameters, knowledge of distribution greatly mitigates the impact of the

    bandwidth on the critical value.

    To study the problem outlined in more detail we concentrate on the problem

    of testing additivity. We limit ourselves to test statistics proposed in Dette et al.

    (2005) and Rodriguez-Poo et al. (2004) but we try different modifications, methods

    of bandwidth choice, and subsampling. The aim is not to find the most efficient

    additivity test or to propose new ones. Our focus is only directed at finding a

    method that guarantees that the level will be held by non trivial power when the

    13

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    null hypothesis and the resampling method are non- or semiparametric. So, after

    a review of the additivity tests considered here, we study different procedures for

    bandwidth choice. Unfortunately, we have not found a generally valid method. Our

    conclusion is basically that further research is necessary.

    The rest of the paper is organized as follows. In the next section we review the

    estimation and testing procedures considered in this work. In Section 1.3 we discuss

    the different, scenarios from which the practitioner has to make his choice, including

    modifications of test statistics, and resampling methods. Section 1.4 summarizes

    the main findings from our simulation results, and Section 1.5 concludes.

    1.2 Statistical Methods: Estimators and Test Sta-tistics

    1.2.1 Estimators

    We consider the following model:

    Yi = m(Xi)+ui i = l,2:....n, '(1.1)

    with {{XtYl)}"=1 e Md xK i.i.d., m : Ud -» K the unknown function of interest,

    m[x) = E(Y\X = x), and IÍ¿ i.i.d. random errors with E[u{] — 0 and finite variance.

    The internalized Nadaraya-Watson estimator is defined as

    n - i mk(x) = ] T vk{x, Xi)Yi, with vk{x, Xz) = (/ fc(X ;)) Kk(x - X,) (1.2)

    where fk{Xj) = ^ J^"=l Kk(Xj — Xr) is a kernel density estimator (unlike standard

    Nadaraya-Watson, here Í fk{Xt) j appears internally to the summation, see Jones

    et.al (1994)), and Kfc(u) = \\d

    a=l Kk (u) a product kernel with Kk{u) = k^Kiuk"1).

    Commonly, the kernel is assumed to be Lipschitz continuous with compact support

    and / \K(x)\dx < oo, / K{x)dx = 1. Furthermore, k is the bandwidth, assumed to

    go to zero for sample size n going to infinity, but nk^ going to infinity. Let Vk be

    the n xn matrix whose (j,i) element is vk{Xj,Xi)1 then rhk{x) = VkY, where Y

    and mi; (•) are n x 1 vectors with rhk(Xj) and Yj is its jth entry respectively.

    14

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • .Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    We are interested in the additive model, which we write in terms of

    d

    E (Y\X = x) = ms(x) = x{; + J2m« (x«) , (1-3)

    a=l

    where we set Exa {irLa(Xa)} = J ma(x)fa(x)dx = 0 Va for identification. Here,

    ma, a = 1,. . ., d are the marginal impact functions for each regressor. Therefore,

    ^ is a constant equal to the unconditional expectation of Y. Writing m{X) =

    ma(Xa)+m_a(X_a) where X_a is the vector X of all explanatory variables without

    Xa, i.e. X^a = (Xii,... ,X¿(Q._i), Xi(a+i),... ,Xid) we can use the identification

    condition directly to estimate ma. The so called marginal integration idea is based

    on the fact that for xa fix we have

    Ex-a [m {xa,X-a)\ = I m (xa, x_Q) /_„ (x_Q) dx^a = i< + ma (xa) .

    Substituting for m(-) a nonparametric pre-estimator such as the one given in (1.2), a

    sample average for the expectation, and for ip simply ip = - Y17-1 V* &ves (neglecting

    the constant for a moment for the sake of simplicity):

    n

    fTla\%a) / t l^ah y^at -A-ia) *i j

    ¿=1

    where

    wh (xa, Xai) = Kh (xa - Xia) j - ^ lsz£iL . (1.4)

    Finally, we set rhs(Xj) = ip+^2a=1 rha(Xja) for each j = 1, 2,..., n. Note that defin-

    ing Wh •= J2a=i Wah (xa) with Wah (xa) being the nxn matrices with wah (Xj, X{)

    as elements, one has rhs (x) — ip + Wh (x) Y. For more details see Dette el al (2005).

    Some of the test statistics we will consider here are also introduced and discussed

    there.

    1.2.2 Test Statistics

    As mentioned above, we do not introduce new testing procedures but rather study

    two statistics which have already been studied in Dette et al (2005) together, with

    15

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    other additivity tests, and which have turned out to perform best. We add a new

    test statistic motivated by one that was introduced recently by Rodríguez-Poó et al

    (2005), and which performed excellently in the study by Roca-Pardiñas & Sperlich

    (2006). For more details on the test statistics readers are referred to these papers.

    The null hypothesis of interest is Ho : rrc(-) = ms(-) versus Hi : m(-) ^ ms{-).

    We consider the following two test statistics from Dette et al (2005) :

    n = Í ¿ ( m ( * i ) - m s ( * i ) ) M * í ) , n /—J ¿=i

    1 " T2 = -S^ei{rh{Xi)-rns(Xl))w{Xl),

    n ¿—•* n ¿ = 1

    where é¿ = Y¿ — rhs(Xi), i.e. the residuals under the null hypothesis, and ñ¿ =

    Yi — m(Xi), the residuals without restrictions. Obviously, T\ calculates directly the

    integrated squared difference between the null and alternative models. Alternatively,

    T2 seeks to mitigate the bias problem inherited from the estimate m, which suffers

    from the curse of dimensionality. In Dette et al (2005) it is proved that for all r¿,

    the nkz (jj — /¿-) converge under the null to a normal variable with mean zero

    and variances v\ for j — 1,2 with

    ¡ix = EH0{TI} = —^ / a2(x)w(x)dx / K2(x)dx +

    ¡i2 = EHo {r2} =-r-j

  • Chapter 1 The Size Problem of Kernel Based .Bootstrap Tests when the Null is Nonparametric

    where for ease of presentation and implementation K (•) is the same kernel function

    as in the last subsection, and k again its bandwidth. It is straightforward to derive

    from the above mentioned paper that nkz (T 3 — fi3) converges under the null to a

    normal variable with mean zero and variance v\ for

    ¡d3 = EH0{TÍ}— I (K * K) (x) dx I a2(x)f2(x)w(x)dx

    All tests have been proven to be consistent in the sense that under the alternative

    they converge with n to infinity.

    Finally let us mention that we have also studied other test statistics, e.g. those

    given in Dette et al (2005) but not presented here. These, however, showed even

    less satisfactory performance, so we have skipped them in our presentation.

    1.3 Resampling and Choice of Parameters

    As is well known, asymptotic expressions are of little help in practice, for calculating

    the

    exact critical value, for several reasons: bias and variance contain unknown ex-

    pressions which have to be estimated nonparametrically, and the convergence rate

    is quite slow for large d. For this reason it is common to use resampling methods

    to approximate the critical value for the particular sample statistic. These can be

    bootstrap methods or subsampling procedures. Unfortunately, unlike subsampling,

    for the bootstrap it is not known how to choose the smoothing parameter in practice

    for the pre-estimation of the model that is used to generate the bootstrap samples.

    From theory it is known that one should somewhat oversmooth (see for instance

    Hardle and Marrón (1991) and discussion below). For the choice of k (when esti-

    mating the alternative), some procedures are provided in the literature (see our brief

    discussion of adaptive tests in the introduction). We will come back to this point

    later in this section.

    17

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    1.3.1 B o o t s t r a p T e s t s

    We give the general procedure first and then discuss some details:

    1. With bandwidth ft-, calculate the estimate rhs under the null hypothesis of

    additivity and its resulting residuals é¿, i = 1,. . ., n.

    2. With bandwidth k, calculate the estimator m for the conditional expectation

    without the additivity restriction, and the corresponding residuals ü¿, i =

    1,. . ., n.

    3. With the results from step 1 and 2 we can calculate our test statistics TI, T2,

    and T3.

    4. Repeat step 1 but now with a bandwidth hb which depends on h from step 1. We

    call the outcome rhbs, respectively e¿ = Yi—rhb

    s(Xi), i = 1,. . . , n. Draw random

    variables e* with E[(e*Y] = u\ (respectively e\ or e¿, see discussion below) for

    j = 1,2,3 (respectively j = 1,2, see below again). Set Y* = rhbs(Xi) + e*,

    i = 1 , . . . , n. Repeat this B times. This defines B different bootstrap samples

    {{Xi,Y;fi)}Z=1,b=l,...,B.

    5. For each bootstrap sample from step 4 calculate the test statistics r*' , j — 1, 2, 3,

    b = 1,... ,B. Then, for each test statistic r¿, j — 1,2,3, the critical value

    is approximated by the corresponding quantiles of the distribution of the B

    bootstrap analogues: F*(ü) = j¡ Ylb=i ^iT*j' — ^ } - R-ecaH that they are

    generated under the null hypothesis.

    This procedure is well known, has proved to be consistent for many test sta-

    tistics and has therefore been applied, certainly with slight modifications, to many

    non- or semiparametric testing problems. However, several questions of practical

    importance remain open: bandwidth choice h in step 1., bandwidth choice k in step

    2., how to generate the bootstrap residuals e* in step 4. (see above), and how to

    choose hb. Finally, how many bootstrap samples are necessary to get a reasonable

    18

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    approximation of the distribution in step 5. In this paper we will discuss all these

    questions except the last one.

    1.3.2 The Choice of Bandwidths h

    The problem of finding an optimal h is somewhat different from that of finding the

    optimal smoothing parameter k which is directly linked to the optimal rate of the

    test statistic. In that case it is clear that a theoretical optimal choice depends on the

    optimal rate at which the test can detect a deviation from the null hypothesis. For

    further details see the next subsection. In most cases, the estimator of the null model

    can have faster convergence rates than that of the alternative, so the asymptotics

    of the test statistics provide no theoretical guideline for an optimal choice of h. In

    other words, we have to rely on practical issues.

    As there are exist data adaptive methods for finding the optimal bandwidth k for

    the alternative (compare next subsection) one could argue that h should be chosen

    according to k. This way one could guarantee that the same smoothness is imposed

    on the regression function regardless of whether it is estimated under the null hy-

    pothesis or not. However, it is not clear whether this is always wanted. Moreover,

    we will see later that on the one hand the adaptive choice of k is computationally

    intensive, and on the other hand /i¡, depends on h. For k one needs a grid search

    which then has to be extended to the choice of h (as it then depends on k) and thus

    to the choice of h¡,. Altogether we would get a procedure that is computationally

    quite unattractive.

    Intuitively, it seems to be desirable to look for a reasonable estimation of the null

    model. This is only guaranteed with a reasonable bandwidth choice of h beforehand.

    We therefore recommend cross validation or plug-in methods.

    1.3.3 The Choice of Bandwidths k

    It is known that a bandwidth k which is optimal for estimation is usually suboptimal

    for testing. More specifically, for testing the optimal smoothing parameter has faster

    convergence rates, i.e. we should undersmooth. As for regression, cross validation

    19

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    bandwidths have a tendency to undersmooth in practice, and they are also quite

    popular for nonparametric testing.

    As an alternative, let us consider the adaptive testing approach introduced e.g.

    in Spokoiny (1996,1998). It has been extended by Rodríguez-Poó et al (2004) to

    nonparametric testing problems such as those we consider here. The method is the

    same for each of our three test statistics, so we can skip the index j of Tj, j — 1,2, 3

    in this subsection. Adapted to our problem it works as follows:

    We consider simultaneously a family of tests {rfc, k 6 &}, where 8. — {fcj, ¿2,...., kp)

    is a finite set of reasonable bandwidths. The theoretical maximal number P depends

    on n but is of no practical relevance, for details see Horowitz & Spokoiny (2001).

    Define rk - Eo[rk} .

    Tmax = m a x t > w h e r e

    keK Varl/2[Tk]

    EQ[] indicates the expectation under HQ. This studentizing under the null is only

    to correct for the deviations in distribution caused by the different bandwidths k.

    Therefore, instead of Varl^2[rk] we could take something proportional to it without

    loosing consistency, as long as it corrects for the standard deviation caused by the

    different k — k\,..., kp.

    A particularity of the bootstrap analogues of rmax is that one first needs to cal-

    culate the bootstrap statistics (rfc)*'6 for all k E 8. to afterwards get (Tmax)*.6. Note

    that for each k, the empirical moment of the bootstrap statistics (jk)*'b (average,

    respectively standard deviation) can be used as a substitute for EQ [rh\, respectively

    Var1//2[Tk], in practice. This is what we do in our simulation study.

    1.3.4 T h e Choice of B o o t s t r a p Res idua l s

    From a theoretical point of view, wild bootstrap errors should be drawn from the

    residuals of the alternative model, i.e. t¿¿ should be used in Subsection 1.3.4 instead

    of e¿ or é¿. It is clear that this should maximize the power as the variance of e¿ (and

    é¿) can increase greatly with increasing distance between HQ and the true model.

    Arguments in favor of using e¿ exist only under practical aspects: often the size

    20

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    distortion in bootstrap tests is worse when using ui or é¿; when using adaptive

    procedures as described in Subsection 1.3.3, then it is not that clear which of the

    Ui to use or whether the t¿¿ should even be estimated independently of the fc-choice

    for the test; at least in the study of Dette et al (2005) to which our study comes

    closest, the power loss is negligible so the size argument is decisive.

    We conclude so far that if no adaptive choice of k is made, it would be desirable

    to use m as long as one can control for the size distortion.

    The second question is what kind of distribution for generating the random errors

    should be used. In step 4 of the bootstrap procedure described in Subsection 1.3.4 a

    distribution is often taken that gives e* with E[(e*y] = ej for j = 1 up to 3 (or even

    more). The so called golden-cut wild bootstrap is also quite popular, see e.g. Hardle

    & Mammen (1993). More recently, in the context of size distortion of bootstrap tests,

    Davidson & Flachaire (2001) argue that for problems with moderate sample size

    the disadvantages of the higher-order-moment adapting bootstraps outweigh their

    (asymptotic) advantages. We therefore compare different methods in our simulations

    (see Section 1.4).

    1.3.5 An Alternative: Subsampling

    A more and more popular alternative to bootstrapping is the subsampling proce-

    dure, see Politis et al (1999). To date, as subsampling is commonly believed to

    converge slower in practice than bootstrapping, it has been used almost exclusively

    when the bootstrap fails, i.e. has been proven not to converge. See Neumeyer &

    Sperlich (2006) as an example in a purely nonparametric testing context. However,

    Rodríguez-Poó et al (2004) introduce subsampling in the context that we discuss

    here, although the bootstrap is consistent, because of the size distortion their boot-

    strap test suffered from (until the sample size was huge). In both papers subsampling

    works well. The former also studies the automatic choice of subsample size m (with

    m < n) which turns out to work in their simulations. As this method might be

    remodeled to serve as a procedure for finding hb, we briefly introduce subsampling

    and the automatic choice of the subsample size m:

    21

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kerne] Based Bootstrap Tests when the Null is Nonparametric

    Let y = {{X.¿, Y¡) \i — 1,..,, n) be the original sample, and denoted by r (y) the

    original statistic calculated from this sample, leaving aside index j = 1,2,3 for a

    moment. To determine the critical values we need to approximate

    Q{z) = P (nVi?T Q>) < z\ . (1.6)

    Recall that under HQ this distribution converges to an iV(/¿,t>2), for ¡i and v see

    Subsection 1.2.2. For finite sample size n, drawing B subsamples y¡, - each of size

    m - we can approximate Q under HQ by

    1 B

    ¿W :== QT,I{myñ^Tkm^m) = 1

    Note that the awkward notation comes from the fact that we have to adjust all

    bandwidths for the new sample size m. For example, imagine k = ko • n's for fco

    being constant. Then, Tfcm is calculated like T but with bandwidth km — konsm'6.

    Certainly, under the alternative Hi, not only nVk^T (y) but also m^/k^jkm (ym)

    converges to infinity. When demanding m/n —> 0 guarantees that ny/k^r (y) con-

    verges (much) faster to infinity than the subsample analogues. Then, Q underesti-

    mates the quantiles of Q which yields the rejection of HQ-

    The problem here is to find a proper subsample size m. Actually, the optimal

    m is a function of the level a. Again we apply resampling methods: Draw some

    pseudo sequences y*>1, i = 1 , . . . , L of y of size n with the same distribution as JA

    For the desired level a, test HQ : m(x) ~ ms{x) = rh{x) — rhs{x) the same way as

    you want to test HQ : m(x) = m-s(x), i.e. applying your particular test statistic to

    HQ and using subsampling. From the L repetitions you can determine the empirical

    rejection level (estimated size) for your given a. Now find an m such that this

    empirical rejection level is ^ a. In practice, you choose from a grid of possible m

    the one whose estimated rejection level for HQ is closest to a from below. Note that

    HQ is always true up to an estimation error that should be almost the same as in your

    original test. The only drawback of this procedure is the enormous computational

    effort. For further details and examples see Politís et al (1999), Delgado et al (2001),

    and Neumeyer & Sperlich (20P6).

    22

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    1.3.6 T h e Choice of B o o t s t r a p B a n d w i d t h hi

    In general, for many test statistics one could repeat the arguments outlined in Hardle

    & Marrón (1990,1991): For the mean of fhh(x) — m(x) under the conditional distri-

    bution of Yi,..., Yn\Xi, ...,Xn, respectively of rh*h(x) — rhhb(x) under the conditional

    distribution of Yf, ...,Y*\X\, ...,Xn, we know from Rosenblatt (1969) that

    EY\x{mh(x)-m{x)) « h2^-m"{x) , (1.8)

    Er(m-h{x)-mg(x)) « h^^-m^x) , (1.9)

    where fj,(K) = J u2K(u)du. Obviously, we need that vnl'h (x) — m"(x) •—> 0. The

    optimal bandwidth /i6 for estimating the second derivative is known to be much

    larger (in rates) than the optimal h for estimating the function itself. We can even

    give the optimal rate. For example, the optimal rate to estimate ras" is of the order

    n - 1 ' 9 (instead of n~1//5), an observation we make use of in our simulation studies in

    Section 1.4.

    As will be seen once more in Section 1.4, the typical comment that /ib has to be

    oversmoothing, is unhelpful in practice. We therefore try the following automatic

    bandwidth choice: apply the same procedure used for the automatic choice of a

    proper subsample size m (last subsection) to find an adequate hi, for a given level a.

    This is what we explain in more detail and afterwards try in our simulation study.

    1.4 Simulation Results

    To study all the points listed in the last section, we perform a huge number of

    simulations. We give here only a summary of them; for example, limiting the pre-

    sentation to Tj, j = 1, 2, 3, one particular model, one specific (random) design, and

    sample size n = 100.

    The model considered is as follows: We consider the same data generating process

    23

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    as Dette et al (2005). That is, we draw i.i.d. three dimensional explanatory variables

    / 1 0.2 0.4 \ Xi ~ N(0, Ex) with Ex = 0 . 2 1 0.6

    \ 0.4 0.6 1 /

    and i.i.d. error terms e¿ ~ iV(0, al) to generate

    Yi = Xhi + Xli + 2 sin(7rX3ii) + aX2,iX3s + eu i = l,...,n

    with a = 0 to generate an additive separable model, or a = 2 for the alternative.

    Recall that the target is a test for additivity. Unless otherwise indicated, we set

    ae — 1. Dette et al (2005) show that for the rather unrealistic situation that if

    Ex is the identity matrix (i.e. with an uncorrelated design), then the problem is

    greatly simplified, whereas a (much) stronger correlated design than ours leads to

    identification problems for moderate sample sizes.

    All results in the tables are calculated from 250 replications using 200 bootstrap

    samples (or subsamples respectively). For real data applications 200- bootstrap sam-

    ples are certainly very few; but in our simulations the results differed little when we

    increased the number to 500. We used the (multiplicative) quartic kernel through-

    out.

    In all three test statistics we use the weighting function w(-) for different trim-

    ming: we cut the outer 10%, 5% or 0% of the sample, where "outer" refers to the

    tails of the explanatory variables. This is done to get rid of the boundary effects in

    the statistics. The tables only give results for 5% and 0% as the boundary effects

    turned out not to be a major problem.

    To further speed up our simulation studies, we first looked for an average cross

    validation bandwidth k, which turned out to be kopt = 0.78. Then we did all our

    simulations for the non-adaptive tests (compare Subsection 1.3.3) with kopt. This

    was done not only for computational reasons but also because otherwise the size of

    the tests would also depend on the randomness induced by the estimation of k. For

    the adaptive test procedure, k ran over a grid of 10 bandwidths placed around kopt.

    We verified that in most cases Tmax did not refer to the boundary, i.e. to kmin or

    h ^rnax •

    24

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Nul) is Nonparametric

    As discussed above, the bandwidth choice problem is different for h. Here, the

    parameter responsible for the size, /ifr, depends on both a (the level) and h. Alto-

    gether, it is no problem here that h is chosen by cross validation in each simulation

    run as recommended in Subsection 1.3.2. For the internalized marginal integration

    estimator, cross validation bandwidths were introduced by Kim et ai (1999). For

    the nuisance directions X-a (see equation (1.4) in Section 1.2.1) we used h_a = 6 • h

    as recommended in Dette et al (2005) and Hengartner & Sperlich (2005).

    We tried different bootstrap residuals (compare Subsection 1.3.4). Our simu-

    lations mainly seem to confirm the findings discussed above. Therefore, below we

    report only results referring to e* = £¿e¿, where the e¿ are i.i.d., drawn either from

    the golden-cut distribution

    f - ( \ / 5 + l ) /2 with probability p = (>/5 + l ) / (2v5) €i ~ \ (\/5 + l ) /2 with probability 1 - p

    or from the Gaussian normal N(0,1). However, we admit that it may be interesting

    to try more, different automatic choice procedures for h¡,, in order to study again

    what effect the choice of residuals taken has (ult ¿, or ¿;).

    Probably the most interesting and challenging point is the choice of h¡,. We first

    give the results for several choices of /i¡, with different bootstrap generating methods,

    /c-adaptive and non adaptive procedures. To have h¡, as a function of h, to take also

    into account h/hb —* 0, and perhaps validate the rate n~l//9 (motivated in Subsection

    1.3.6) we set hb = /in1/5-1/" and try different tc < 9.

    Table 1.1 shows the results for the non-adaptive golden-cut bootstrap test. These

    results basically i) confirm the statements of Dette et al (2005) for our context;

    and ii) show that the problem is not solved simply by different smoothing in the

    pre-estimation. Undersmoothing, as generally stated from a theoretical point of

    view, seems to go in the wrong direction. In particular, the hope that the results

    of Rosenblatt (1969) (see equations (1.8) and (1.9)) might give us a hint or even

    provide a rule of thumb for the choice of h^, is not confirmed here. T3 , introduced

    by Rodríguez-Poó et al (2004) clearly outperforms the others in this study (as it

    does in the following).

    25

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    The results for the fc-adaptive analogues, see Table 1.2, show hardly any im-

    provement. In particular, the problem of choosing h^ or, in other words, the size

    problem is only mitigated for r%.

    Following to some extent the findings of Davidson & Flachaire (2001) we then

    repeated these two studies but with the Gaussian bootstrap, see above. Though

    there is some improvement in both, size and power, the results in Table 1.3 and 1.4

    give us hope only for test statistic T%. Note that the observation that a slight un-

    dersmoothing is produces much better results than oversmoothing has not changed

    over the four different trials.

    Next, for comparison we also provide a small simulation study where the critical

    values are approximated by subsampling, trying several subsample sizes m. The

    results are given in Table 1.5 for non-adaptive tests, and in Table 1.6 for fc-adaptive

    tests. We tried more sizes m for the non-adaptive test but got reasonable results

    only for T3 . In contrast, looking at the A;-adaptive versions, ryax, T™ax seem to

    work, too - though with a rather weak power. Table 1.6 unfortunately is misleading

    concerning r™ax\ one needs a much smaller m to get reasonable results here. A small

    simulation study evaluating the automatic choice of m seems to indicate that this

    procedure might work and therefore should be tried for what is our main focus: the

    automatic choice of hb-

    Therefore, we adjusted the automatic choice of the subsample size to find an

    adequate hb (see Subsection 1.3.6). This was done as follows, described here in

    detail for r3. Let {Y*, £*}™=1 := 3̂ * be a member of the pseudo sequence introduced

    in Subsection 1.3.5. Then, for testing HQ : m{x) — ms(x) = rh(x) — rhs(x) with

    sample 3^*, an analogue to T3 would be

    1 ^ = IT, 2

    -K^X'-XAiYj-msiXj)} w{X¡) . (1.10)

    26

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    Other statistics are thinkable certainly, e.g.

    2

    w(X-- ¿ K , ( X t - X*){Y* ~ rhs(X;)} - ¥Lh{Xt - X3){Y0- - ms(X3)} nkd n /—1

    but they should all be asymptotically equivalent to (1.10). The procedure was

    performed with only L = 100 pseudo samples 3^*- As the results varied widely

    we were forced either to enlarge L considerably or to reduce ae considerably. For

    computational reasons we decided on the second option and repeated the study with

    ae = 0.1.

    Some results are summarized in Table 1.7. As can be seen, this time we emphasize

    the possibility of undersmoothing much more. You first have to look at T\ to find

    the K giving the rejection level closest to a = 5% from below. Here, this is always

    K = 3. Note that this might also change depending on the trimming, a, sample size,

    etc. It is important to understand that the lines of T^ can always be calculated, i.e.

    without knowing the true data generating process. Therefore we call this method

    fully automatic. Now look at the lines for T%, the test of interest. Obviously, K — 3

    is indeed the best possible choice; it holds the level and has strongest power of

    any K respecting the level. This could be taken as indicating that our suggestion

    for selecting /ib works. Unfortunately, this method does not work that well for all

    possible a; specifically, it becomes quite incorrect for a > 10%. We repeated this

    study also for j \ and T-I- The results were always somewhat worse than for T3 so

    they do not change our conclusion that this procedure seems to be an interesting

    and promising approach but further research is necessary.

    1.5 Conclusions

    We discuss the choices of all "parameters" a practitioner has to use when facing a

    kernel based specification test where the null hypothesis is non- or semiparametric.

    We have set parameters in quotation marks because we refer here also to questions

    such as how to generate bootstrap errors, etc. However, our main focus is the boot-

    strap and its size distortion in practice when the sample size is small or moderate.

    27

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    These points are illustrated along the popular problem of additivity testing. Natu-

    rally, one looks for an optimal trade-off between controlling for size under the null

    hypothesis HQ and maximizing power. Even though these problems have already

    been discussed and studied in theory, as yet, it is unclear how to set these para-

    meters in practice. We show that theory is not just unhelpful here; at present, a

    reasonable application of tests of these kinds is questionable.

    We try and compare many modifications that can be found in the literature

    without finding any clue to an optimal - or even a reasonable - parameter choice.

    While there are different suggestions for singular problems such us which residuals

    to take for the bootstrap or an adaptive choice of k, combining them gives puzzling

    results. Sometimes, in practice, combining these suggestions, the power goes down

    where it should increase or size becomes less precise where it should come closer to

    the level.

    Altogether, we have recommend certain procedures for particular test statistics.

    However, the main open question seems to be how to find an automatic choice of

    lib- We suggest a new procedure, taken from subsampling theory, that seems to

    be a good way to go. However, further research is necessary to provide reliable

    procedures for the nonparametric testing problems considered here.

    28

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    REFERENCES

    Davidson, R. and Flachaire, E. (2001) The Wild Bootstrap, Tamed at Last, Working

    Papers 1000, Queen's University, Department of Economics.

    Delgado, M. A., Rodriguez-Poó, J. M. & Wolf, M. (2001). Subsampling Cube Root

    Asymptotics with an Application to Manski's MSE. economics letters, 73, 241-250.

    Dette, H., von Lieres und Wilkau, C , and Sperlich, S. (2005) A Comparison of Dif-

    ferent Nonparametric Method for Inference on Additive Models. J. Nonparametric

    Statistics, 17, 57-81.

    Guerre, E. and Lavergne, P. (2005). Data-driven rate-optimal specification testing

    in regression models. Annals of Statistics, 33(2), 840-870.

    Hardle, W and J.S Marrón (1990) Semiparametric Comparison of Regression Curves.

    Annals of Statistics, 18, 63-89.

    Hardle, W and J.S Marrón (1991) Bootstrap Simultaneous Bars For Nonparametric

    Regression. Annals of Statistics, 19, 778-796.

    Hardle, W. and E. Mammen (1993) Comparing Nonparametric Versus Parametric

    Regression Fits. Annals of Statistics, 21, 1926-1947.

    Hardle, W., Sperlich, S., and Spokoiny, V. (2001) Structural tests in additive regres-

    sion. J. Am. Statist. Assoc, 96, 1333-1347.

    Hengartner, N.W. and Sperlich, S. (2005) Rate Optimal Estimation with the Integra-

    tion Method in the Presence of Many Covariates. Journal of Multivariate Analysis,

    95, 246-272.

    Horowitz, J.L. and Spokoiny, V. (2001) An Adaptive, Rate-optimal Test of Paramet-

    ric Mean-Regression Model Against A Nonparametric Alternative. Econometrica,

    69, No. 3, 599-631.

    Jones, M., C , Davies, S.,J and B. U. Park. (1994) Versions of Kernel-Type Regres-

    sion Estimators. Journal of the American Statistical Association, Vol 89, 825-832.

    29

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    Kallenberg, W.C.M. and Ledwina, T. (1995), Consistency and Monte-Carlo simula-

    tioins of a data driven version of smooth goodness-of-fit tests, Annals of Statistics,

    23, 1594-1608.

    Kim, W., Linton, O.B., and Hengartner, N. (1999) A computationally efficient oracle

    estimator of additive nonparametric regression with bootstrap confidence intervals.

    The J. of Computational and Graphical Statistics, 8, 278-297

    Ledwina, T. (1994), "Data-driven version of Neyman's smooth test of fit," J. Amer.

    Stat. Ass., 89, 1000-1005

    Neumeyer, N. and S. Sperlich, S. (2006) Comparison of Separable Components in

    Different Samples. Forthcoming in the Scandinavian Journal of Statistics

    Politis, D.N., Romano, J.P., and Wolf, M. (1999) Sub sampling. Springer Series in

    Statistics. Springer.

    Roca-Pardiñas, J. and Sperlich, S. (2006) Testing the link when the index is semi-

    paramtric - A comparison study. Working Paper Universidad de Vigo, Spain.

    Rodriguez-Póo, J.M., Sperlich, S., and Vieu, P. (2004) And Adaptive Specification

    Test For Semiparametric Models. Working Paper Carlos III de Madrid, Spain.

    Rosenblatt, M. (1969) Conditional Probability Density and Regression estimators.

    Multivariate Analysis I I , 25-31.

    Spokoiny, V. (1996) Adaptive hypothesis testing using wavelets. Annals of Statistics,

    24, 2477-2498.

    Spokoiny, V. (1998) Adaptive and spatially adaptive testing of a nonparametric

    hypothesis. Math. Methods of Statist, 7, 245-273

    30

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    Trimming

    0%

    5%

    a{%)

    5

    10

    5

    10

    K

    4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9

    under HQ a=0.0 T\ T2 r 3

    .000 .000 .008

    .040 .000 -008

    .068 .000 .008

    .128 .000 .012

    .176 .000 .012

    .256 .000 .012

    .024 .000 .032

    .068 .000 .024

    .120 .000 .024

    .184 .000 .024

    .272 .000 .020

    .344 .000 .020

    .012 .000 .008

    .060 .000 .008

    .108 .000 .008

    .172 .000 .012

    .284 .000 .012

    .340 .000 .012

    .040 .000 .024

    .084 .000 -020

    .168 .000 .024

    .288 .000 .024

    .364 .000 .020

    .440 .000 .020

    under H} a=2.0 T\ T2 T 3

    .000 .032 .248

    .004 .012 .184

    .012 .012 .184

    .016 .012 .196

    .028 .012 .228

    .028 .024 .252

    .004 .060 .448

    .008 .028 .312

    .020 .020 .292

    .032 .024 .300

    .036 .028 .304

    .056 .032 .340

    .016 .052 .248

    .020 .032 .192

    .028 .028 .184

    .040 .028 .184

    .064 .032 .228

    .080 .032 .244

    .024 .112 .448

    .036 .076 .308

    .044 .052 .284

    .064 .048 .292

    .076 .052 .308

    .116 .056 .332

    Table 1.1: Rejection levels of the three original test statistics with and without trimming. Critical values are determined with golden-cut wild bootstrap, using hb = hn1^"1^ for the pre-estimation.

    31

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kerne] Based Bootstrap Tests when the Null is Nonparametric

    Trimming

    0%

    5%

    a(%)

    5

    10

    5

    10

    K

    4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9

    under Ho a—0.0 T\ T2 r3

    .004 .004 .028

    .004 .004 .020

    .000 .000 .012

    .000 .000 .000

    .000 .000 .000

    .000 .000 .000

    .016 .012 .076

    .012 .008 .072

    .008 .004 .056

    .000 .000 .028

    .000 .000 .008

    .000 .000 .008

    .008 .004 .016

    .000 .000 .016

    .000 .000 .008

    .000 .000 .004

    .000 .000 .004

    .000 .000 .004

    .020 .012 .080

    .008 .004 .064

    .004 .000 .040

    .000 .000 .024

    .000 .000 .008

    .000 .000 .008

    under Hi a=2.0 T\ r2 r 3

    .044 .032 .176

    .064 .056 .204

    .048 .036 .204

    .036 .032 .196

    .036 .012 .196

    .032 .008 .188

    .096 .072 .316

    .140 .120 .308

    .132 .092 .296

    .104 .052 .316

    .072 .044 .296

    .064 .036 .284

    .080 .052 .196

    .068 .024 .184

    .036 .016 .188

    .016 .012 .184

    .008 .008 .200

    .008 .004 .192

    .136 .120 .328

    .120 .096 .296

    .116 .060 .296

    .100 .036 .292

    .084 .024 .284

    .056 .016 .288

    Table 1.2: Rejection levels of the three ¿-adaptive test statistics with and without trimming. Critical values are determined with golden-cut wild bootstrap, using h¡, — hn1^5"1^ for the pre-estimation.

    32

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    Trimming

    0%

    5%

    a(%)

    5

    10

    5

    10

    K

    4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9

    under HQ a=0.0 T\ r2 r 3

    .004 .000 .008

    .036 .000 .012

    .080 .000 .012

    .132 .000 .012

    .188 .000 .012

    .260 .000 .012

    .020 .000 .044

    .072 .000 .044

    .116 .000 .032

    .196 .000 .028

    .276 .000 .016

    .352 .000 .020

    .012 .000 .008

    .052 .000 .012

    .116 .000 .012

    .176 .000 .012

    .268 .000 .012

    .352 .000 .012

    .028 .000 .048

    .088 .000 .032

    .164 .000 .024

    .252 .000 .020

    .380 .000 .016

    .436 .000 .020

    under H] a=2.0 T\ T2 r 3

    .000 .036 .340

    .004 .024 .236

    .008 .012 .216

    .016 .016 .224

    .028 .012 .240

    .032 .016 .248

    .012 .064 .560

    .012 .036 .380

    .020 .024 .336

    .036 .032 .332

    .044 .032 .344

    .068 .032 .372

    .008 .080 .324

    .008 .036 .236

    .028 .036 .212

    .040 .028 .220

    .064 .028 .244

    .096 .032 .260

    .036 .172 .556

    .036 .092 .372

    .048 .072 .332

    .060 .056 .328

    .092 .048 .340

    .120 .064 .376

    Table 1.3: Rejection levels of the three original test statistics with and without trimming. Critical values are determined with Gaussian bootstrap, using hi, = ^ni/5-i/K £or t k e pre-estimation.

    33

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    Trimming

    0%

    5%

    a (%)

    5

    10

    5

    10

    K,

    4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9

    under i í 0 c

    Tl

    .000

    .000

    .000

    .000

    .000

    .000

    .028

    .020

    .004

    .004

    .004 .000 .004 .000 .000 .000 .000 .000 .012 .004 .000 .000 .000 .000

    T2

    .000

    .000

    .000

    .000

    .000

    .000

    .008

    .004

    .000

    .000

    .000

    .000

    .004

    .000

    .000

    .000

    .000

    .000

    .012

    .000

    .000

    .000

    .000

    .000

    1=0.0

    r3

    .036

    .028

    .020

    .008

    .008

    .008

    .096

    .088

    .056

    .032

    .024

    .016

    .024

    .024

    .012

    .008

    .008

    .008

    .096

    .072

    .048

    .040

    .020

    .012

    under Hi a=2.0 T\

    .048

    .048

    .052

    .032

    .024

    .016

    .124

    .184

    .172

    .116

    .092

    .076

    .064

    .048

    .032

    .016

    .016

    .016 • 136 .124 .100 .072 .052 .040

    T7

    .028

    .048

    .032

    .016 .008 .008 .096 .156 .124 .072 .048 .032 .036 .020 .012 .004 .004 .004 .100 .092 .048 .032 .012 .012

    T3

    .220

    .204

    .216

    .200

    .204

    .200

    .364 .340 .328 .324 .296 .304 .220 .200 .196 .200 .200 .204 .368 .332 .300 .312 .292 .296

    Table 1.4: Rejection levels of the three fc-adaptive test statistics with and without trimming. Critical values are determined with Gaussian bootstrap, using h>, = hnl/5~1/K for the pre-estimation.

    34

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    Trimming

    0% i

    5%

    a{%)

    5

    10

    5

    10

    m

    50 40 50 40 50 40 50 40

    under Ho a

    .000

    .000

    .004

    .028

    .000

    .000

    .000

    .000

    .000

    .000

    .000

    .000

    .000

    .000

    .000

    .000

    =0.0

    7-3

    .000

    .040

    .020

    .224

    .000

    .052

    .028

    .240

    under Hi a T\ T2

    .004 -004 .004 ,004 .000 .000 .000 .000

    .000

    .004

    .004

    .004

    .000

    .000

    .000

    .000

    =2.0 T 3

    .028

    .212

    .248

    .744

    .032

    .202

    .232

    .732

    Table 1.5: Rejection levels of the three original test statistics with and without trimming. Critical values are determined with subsampling, using subsamples of sizes m.

    Trimming

    0%

    5%

    a(%)

    5

    10

    5

    10

    m

    90 80 70 60 90 80 70 60 90 80 70 60 90 80 70 60

    under HQ a=0.0 T\ T2 r 3

    .000 .000 .000

    .000 .000 .000

    .056 .088 .000

    .244 .336 .000

    .000 .000 .000

    .028 .072 .000

    .208 ,328 ,000

    .584 .680 .000

    .000 .000 .000

    .000 .000 .000

    .008 .016 .000

    .060 .084 .000

    .000 .000 .000

    .008 .012 .000

    .048 .096 .000

    .196 .304 .000

    under Hx a= T¡ T2

    .140

    .148 -156 .196 .192 .192 .276 .416 .080 .080 .076 ,064 .128 .140 .132 .136

    148 160 168 236 196 208 308 484 104 104 088 076 152 148 160 168

    =2.0

    000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000

    Table 1.6: Rejection levels of the three ¿--adaptive test statistics with and without trimming. Critical values are determined with subsampling, using subsamples of sizes m.

    35

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter ] The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric

    Trimming

    # 0 (a = 0)

    Hi ( a = 2)

    0%

    5%

    0%

    5%

    T 3

    T3

    T3

    T 3

    T3

    1

    .012

    .680

    .012

    .676

    .001

    .972

    .001

    .968

    2

    .063

    .392

    .062

    .380

    .019

    .932

    .019

    .936

    3

    .028

    .032

    .028

    .024

    .042

    .632

    .042

    .620

    K

    4

    030 012 030 012 022 380 023 368

    5

    .032

    .012

    .032

    .012

    .015

    .272

    .015

    .260

    6

    .031

    .012

    .031

    .012

    .011

    .260

    .011

    .252

    7

    .029

    .016

    .029

    .020

    .009

    .264

    .010

    .264

    Table 1.7: Rejection levels of T?, and T\ for a = 5%, with and without trimming, using Gaussian bootstrap with hb = /in1/5-1/* for the pre-estimation.

    36

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter 2

    Estimating and Testing An Additive Partially Linear Model in a System of Engel Curves

    2.1 Introduction

    T H E SPECIFICATION OF ENGEL CURVES IN EMPIRICAL MICROECONOMICS has

    been an important problem since the early studies of Working (1943) and Leser

    (1963) and the well-known work of Deaton and Muellbauer (1980a), in which they

    developed parametric structures such as the Almost Ideal and Translog demand

    model. Many Microeconomic examples are provided in Deaton and Muellbauer

    (1980b) in which a separable structure is convenient for analysis and important

    for interpretability. However, there is increasing empirical evidence pointing to the

    conclusion that a sort of nonlinearity is present in the specification of Engel curves.

    An alternative way of investigating nonlinear effects is to model consumer behav-

    ior by means of semi- and nonparametric additive structures. Moreover, non and

    semiparametrie regression provides an alternative to standard parametric regression,

    allowing the data to determine the local shape of the conditional mean.

    From an economic point of view there are many reasons why it is interesting to

    recover a correct specification of Engel curves. Firstly, a correct specification allows

    us to examine the nature of the effect of changes in indirect tax reforms. Secondly,

    it is important to specify the response of consumers in the face of changes in total

    37

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter Estimating and Testing An Additive Partially Linear Mode! in a System of Engel Curves

    income. Changes of this kind allow us to assess the impact on consumers' welfare.

    Consumer demand has become a very important field for applying non and semi-

    parametric methods. An interesting analysis of the cross-sectional behavior of con-

    sumers in the context of a fully nonparametric model can be found in Bierens and

    Pott-Buter (1990). Papers which consider the implementation of semiparametric

    methods in empirical analysis of consumer demand include Banks, Blundell and

    Lewbel (1997) and Blundell, Duncan and Pendakur (1998). This latter paper is of

    special interest because its analysis regression is based on semi- and nonparametric

    specifications of Engel curves. It also tests Working-Leser and Piglog's null hypoth-

    esis against the well-known partial linear model in which budget expenditures are

    linear in the log of total expenditure. In this paper we estimate the Engel curves

    directly as in Lyssiotou, Pashardes and Stengos (2003) among others.

    We estimate an additive partially linear model (PLM) in order to investigate

    consumer behavior using individual household data drawn from the Spanish Expen-

    diture Survey (SES) and use the result obtained from semiparametric analysis to

    examine the modelling-of age, schooling and expenditure in a system of Engel curves.

    The importance of using an additive PLM models lies in the fact that in the context

    of this model the effects of expenditure, the age and schooling on consumer demand

    can be investigated simultaneously in the semiparametric context1. There are several

    ways to get estimations of nonparametric additive structure, and we mention only

    the most important: smooth backfitting, series estimators and marginal integration.

    In this paper we use internalized marginal integration to estimate nonparametric

    components in the additive PLM mainly because at the present time there is no

    applied or theoretical study on the testing procedure using smooth backfitting.

    Most of the papers that investigate consumer behavior in a nonparametric con-

    text are focused on the appropriate way of modeling the form of the Engel curves.

    Those focused on the unidimensional nonparametric effect of log total expenditure on

    budget expenditures, taking in to account some parametric indexes to reflect demo-

    1 Analysis of consumer behavior can he carried out with fully nonparametric models. However, for sake of interpretability and implementation, additive models overcome the well-known problems coming from multidimensional Nadaraya-Watson and Local Polynomial regression estimators.

    38

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter Estimating and Testing An Additive Partially Linear Mode! in a System of Engel Curves

    graphic composition include Blundeli, Browning and Crawford (2003) and references

    therein. In this paper we investigate consumer behavior in semi and -nonparametric

    terms focused on the nonparametric effect of total expenditure the age and the

    schooling. In this study, unless stated otherwise, the effect of age and schooling

    refer to the age and schooling of the household head. There is evidence suggesting

    that these have deeper effect than generally assumed in parametric demand analysis

    (see Lyssiotou, Pashardes and Stengos (2001)). In fact, it is common practice to in-

    clude the square of age and/or schooling as well as their higher terms in parametric

    models to capture possible nonlinear effects.

    Inference in nonparametric regression can take place in a number of ways. The

    most natural is to use nonparametric regression as an alternative against a fully

    parametric or semiparametric null hypothesis. With this in mind, we investigate

    whether an additive PLM provides a reasonable adjustment to our data using differ-

    ent resampling schemes to obtain critical values of the test statistics. In this paper

    we are interested in applying some recently developed test statistics which are very

    popular in the literature about testing semiparametric hypotheses against nonpara-

    metric alternatives. These test statistics are in the spirit of Hardle and Mammen

    (1993) and Gózalo and Linton (2001), among others. On the other hand there is a

    growing interest in the so called adaptive testing methods, in which the test statis-

    tics are adaptive to the unknown smoothness of the alternative, see among others

    Horowitz and Sponkoiny (2001) and Rodrigue2-Poo, Sperlich and Vieu (2005). in

    this paper we adapt their ideas with some differences, where are considered kernel

    smoother for our problem.

    It should be remarked that a problem that we may well have to consider is the

    endogeneity of regressors. Note that in the context of Engel curves total expenditure

    may well be jointly determined with expenditure on different goods. The approach

    used to solve this problem is instrumental variable estimation. We remark two

    recently developed procedures in the context of nonparametric regression to tackle

    the problem of endogenous regressors. The so called nonparametric two step least

    square (NP2SLS) due to Newey and Powell (2003), and the nonparametric two

    39

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter Estimating and Testing An Additive Partially Linear Model in a System of Engel Curves

    step with generated regressors and constructed variables (NP2SCV) due to Sperlich

    (2005). Newey and Powell (2003) 's approach is a cumbersome procedure involving

    the choice of basis expansion in the first step. However, Sperlich's approach only

    requires a non, semi or even parametric construction of regressors of interest in the

    first step. Our feeling is that a generated variables approach in combination with

    additive PLM can help us to overcome to some extent any possible endogeneity

    problem and that is exactly the procedure implemented in this paper.

    The contribution of this work can be summarized as follows. Firstly, we are the

    first (to our knowledge) to carry out an exploratory analysis of consumer behavior

    with data drawn from the Family Expenditure Survey for Spain using semiparamet-

    ric models. Second, we apply recently developed methods to estimate, test (vari-

    ous model specifications) and correct for possible endogeneity of total expenditure.

    Third, our estimations of the additive model are accompanied by a reasonable mea-

    surement of discrepancy between the fully nonparametric model and the additive

    estimation. An adequate model check is necessary whenever estimations of additive

    models are carried out (Dette, von Lieres and Sperlich (2004)). Additionally, our

    measure of discrepancy adapts to the unknown smoothness of the non-parametric

    model and this constitutes a novelty in empirical economics.

    The rest of the paper is organized as follows. In Section 2 we provide some back-

    ground to understand both the estimating and the testing procedures. In Section

    3, we discuss the shape of Engel curves and report empirical results obtained from

    the application of additive PLM. We also provide the results of testing the additive

    specification as well as the linearity of each nonparametric component in additive

    PLM regression. In Section 4 concludes.

    2.2 Additive Partially Linear Model and Testing Hypothesis

    There are many fields of empirical economics in which explanatory variables and

    their second power are included in regression analysis to capture nonlinear effects;

    40

    Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín

    Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007

  • Chapter Estimating and Testing An Additive Partially Linear Model in a System of Engel Curves

    In order to estimate the functions ma (xQ) we first estimate the function m (x) with

    a multidimensional local smoother and then integrate out the variables different

    from Xa. This method can be applied to estimate all the components, and finally

    the regression function m(-) is estimated by summing an estimator ifi of tp, so we

    get that:

    ms(X3) = 4, + ¿ ¿ Kh [X3a - Xia) filM-Yi ¡4}

    for j=l,...,n. The expression to get the estimation of each component rna (•) defined

    in [4], is called the internalized marginal integration estimator (IMIE) because of

    the joint density that appears under the summation sign. For a detailed explanation

    see Dette, von Lieres and Sperlich (2004) and references therein. Note that IMIE

    does not provide exactly the orthogonal projection onto the space of additive func-

    tions. In other words, the sum of the estimated nonparametric components does

    not necessarily recover the complete conditional mean because