tesis doctoral de la universidad de alicante. tesi...
TRANSCRIPT
-
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
www.adobe.es/products/acrobat/readstep2.htmlwww.adobe.es/products/acrobat/readstep2.htmlwww.adobe.es/products/acrobat/readstep2.htmlwww.adobe.es/products/acrobat/readstep2.html
-
SI • .
1.
Some Practical Problems of Recent Nonparametric Procédures: Testing, Estimation,
and Application.
Jorge Barrientos-Marin
Advisor: Stefan Sperlich
Quantitative Economies Doctorale Departamento de Fundamentos del Análisis Económico
Universidad de Alicante
January 2007
mff' /',*. i~
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
To my wife and my family.
1
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Agradecimientos
Los artículos que componen esta tesis es el resultado de cinco años de trabajo continuo. Pero sin duda, esto no habría sido posible sin la colaboración de muchas personas. Quiero expresar mi gratitud para con todas ellas. Sí alguno se queda sin mencionar, lo más posible es que mi memoria, como es usual, me juegue una mala pasada. Quiero entonces expresar mi reconocimiento a los miembros del departa-mento de Fundamentos del Análisis Económico, a mis profesores y especialmente a mis condiscípulos, ellos hicieron estos cinco años soportables lejos de casa. Especial reconocimiento entre profesores merecen Antonio Villar, quien confió en mi siempre y fue consejero en momentos difíciles, a Juan Mora por proveerme ánimo y ratos agradables discutiendo resultados y teoremas, al igual que Javier Alvarez y a Lola, quienes con sus excelentes cursos me animaron a seguir el camino de la econometría.
Entre mis condiscípulos agradezco a Alicia quien siempre ha sido una amiga. Agradezco a Ricardo su ayuda e innumerables favores (muchos de ellos pecuniarios) y a Paco, Silvio y Szaby su compañía placida y su amistad sincera. A Fafael López, su buen humor e intelegiencia fueron un reto para mi. Agradezco a José Maria su gran aprecio para conmigo, algo que es mutuo, y su generosidad, estos años habrían sido menos divertidos y algunas navidade tristes sin su amistad.
No puedo dejar de mencionar al personal administrativo (Mercedes, Mariló, Julio, Carlos y Lourdes) siempre estuvieron atentos a ayudarme y tuvieron paciencia para mis innumerables solicitudes.
Agradezco también a Frédéric Ferraty y a Philippe Vieu su dedicación, ellos me proveyeron la mejor atmósfera para hacer uno de los capítulos que componen esta tesis. Aquí merece mención Juan y Mónica, quienes me acogieron en su casa y siempre fueron compañía, además de introducirme, en modo nada superficial, en los aspectos de la vida francesa.
Agradezco a mi familia, en especial a Patricia, mi esposa, quien me ha poyado todos estos años de semi-soledad a la espera de que esto acabara, siempre con pa-ciencia y optimismo. A mi madre, quien sé que mi ausencia siempre la entristeció. A mis tías, para quienes soy un orgullo. A José y Leticia Restrepo por ayudar a Patricia a llevar la carga de la soledad.
Agradezco a mis amigos en Colombia, a Mauricio Alviar y a Pedro, quienes desde un comienzo creyeron que esto era posible de alcanzar. Menciono también a Alejandro Gaviria, quien continua enseñándome a pensar como un economista, me dio además consejos acertados en el momento justo.
Finalmente, un reconocimiento especial merece Stefan Sperlich, quien me ha en-señado mucho de econometría semi y noparametrica. Estaré siempre agradecido con él, porque se preocupó de que esto terminara bien y ha sido un director excepcional aún desde la distancia.
2
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Contents
Agradecimientos 2 Introduction and Summary 5 Introducción y Resumen en Español 8
1 The Size Problem of Kernel Based Bootstrap Tests when the Nuil is Nonparametric 12 1.1 Introduction 12 1.2 Statistical Methods: Estimators and Test Statistics 14
1.2.1 Estimators 14 1.2.2 Test Statistics 15
1.3 Resampling and Choice of Parameters 17 1.3.1 Bootstrap Tests 18 1.3.2 The Choice of Bandwidths h 19 1.3.3 The Choice of Bandwidths k 19 1.3.4 The Choice of Bootstrap Residuals 20 1.3.5 An Alternative: Subsampling 21 1.3.6 The Choice of Bootstrap Bandwidth/i;, 23
1.4 Simulation Results 23 1.5 Conclusions 27 Références 29
2 Estimating and Testing An Additive Partially Linear Model in a System of Engel Curves 37 2.1 Introduction 37 2.2 Additive Partially Linear Model and Testing Hypothesis 40 2.3 The Shape of Engel Curves and Spécification Testing 45
2.3.1 Data Used in this Application 48 2.3.2 Some Pictures of the Expenditure expenditure-Log Total Ex-
penditure Relationship 49 2.3.3 Spécification Testing 56
2.4 Conclusions and Future Research 59 Références 61
3
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
3 Locally Modelled Régression and Functional Data 64 3.1 Introduction 64 3.2 Position of the Problem 67 3.3 Functional locally modeled régression 68
3.3.1 The p-dimensional case 68 3.3.2 The infinite-dimensional case: the functional setting 69
3.4 FFLM kernel-type estimator: asymptotic behavior 73 3.5 FFLM régression in action 77 3.6 Conclusions 80 Références 88
4
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Introduction and Summary
This thesis is composed of three chapters, in which we focus on three related, but
différent, issues regarding testing, estimation and theoretical developments1. More
precisely, in Chapter 1, "The Size Problem of Kernel Based Bootstrap Tests when
the Nuil is Nonparametric", we are interested in clioosing an appropriate smooth-
ing parameter, a problem that is fundamental for the reasonable use of non- and
semiparametric methods. In particular for testing, we make note the this problem
is not équivalent to the one in régression. At least from a theoretical point of view,
the optimal smoothing parameter for testing has différent rates from those which
are optimal for estimation.
While there exists an increasing literature on how to find a proper smoothing
parameter for the nonparametric alternative, almost nothing is known on how to
choose a smoothing parameter in practice for the nuil hypothesis if it is also semi- or
nonparametric. We do know that at least asymptotically oversmoothing is necessary
in the pre-estimation of the nuil model for generating the bootstrap samples, see
Hardie and Marron (1990,1991). However, in practice this knowledge is of little
help. The same can be said about various parameters and procédures to be chosen
in practice when performing such tests. In this Chapter, we discuss ail thèse choice
questions. In particular we study the problem of bandwidth choice for the pre-
estimation to genérate bootstrap samples. As an alternative, we also discuss briefly
the possibility of subsampling.2.
In Chapter 2, "Estimating and Testing An Additive Partially Linear Model in
a System of Engel Curves ", we focuses on an application of additive partial linear
model and some ideas extracted from applications on Chapter 1. Our main goal is
to make an application to consumer theory. More exactly, to Engel curves Systems.
The form of the Engel curve has long been a subject of discussion in applied econo-
Chapter 1 is a joint work with Stefan Sperlich and Chapter 3 is a joint work with Frédéric Ferraty and Philippe Vieu.
2The authors gratefully acknowledge financial support from the Spanish DG de Investigación del Ministerio de Ciencia y Tecnología. SEJ2004-04583/ECON.
5
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
metrics and until now there has no been definitive conclusion about its form. In this
Chapter an additive partially linear model is used to estimate semiparametrically
the effect of total expenditure in this context. Additionally, we consider the non-
parametric inclusion of some regressors which traditionally have a non linear effect
such as age and schooling. To that end we compare an additive partially linear
model with the fully nonparametric one using recent popular test statistics. Be-
cause of inference in nonparametric regression can take place in a number of ways,
the most natural is to use nonparametric regression as an alternative against a fully
parametric or semiparametric null hypothesis. Then, for investigating purpose we
check whether an additive PLM provides a reasonable adjustment to our data using
different resampling schemes to obtain critical (p-values) computed by bootstrap
and subsampling schemes for the proposed test statistics.
Additionally, in this Chapter, we dealing with a well-known problem very com-
mon in the context of Engel curves, it is that total expenditure may well be jointly de-
termined with expenditure on different goods. Therefore, endogeneity problem may
arise. In order to solve this problem we are interested in applying nonparammetric
constructed regressors as instrumental variables. In particular, we use the nonpara-
metric two step with generated regressors and constructed variables (NP2SCV) due
to Sperlich (2005). Our feeling is that a generated variables approach in combination
with additive PLM can help us to overcome to some extent any possible endogeneity
problem and that is exactly the procedure implemented in this Chapter.
In Chapter 3, "Locally Modelled Regression and Functional Data"3, we are in-
terested in extend nonparametric methods when the regressors are functions (i.e.
one observation could be curve, surface or any other object lying into an infinite
dimensional space). From a statistical pint of view, this corresponds to a functional
regression setting because on wishes to predict a response Y from an explanatory
functional variable X. In addition, only regularity conditions on regression operator
J Acknowledgement . The authors thank gratefully the members of the working group STAPH (http : //www.lsp.ups — tlse.fr/staph) for their helpful comments and discussions. In addition, the first author acknowledges financial support from the Spanish Ministry of Education and Science, under project BEC2001-0535
6
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
http://www.lsp.upshttp://�http://tlse.fr/staph
-
are assumed. Then, this leads us to the nonparametric context. So, the problematic
of this work deal with the nonparametric functional regression. Recently, there are
several works dealing with the nonparametric functional regression (see for instance
Ferraty and Vieu (2002, 2005)). This nonparametric functional regression method is
essentially based on an extension of the well-known Nadaraya(1964)-Watson(1964)
kernel regression estimator of the regression, to the case of functional explanatory
variable. On the other hand, local linear ideas have been developed in the regression
context for univariate and multivariate explanatory variable, see Wand and Jones
(1995) for an overview of this topic. Therefore, our work can be considered as an
extension, which is a combination, of the nonparametric local constant method with
the ideas of functional variable. So, the aim of this setting does not make easy both
the asymptotic study and the implementation of a natural generalization of the mul-
tivariate local linear method. Therefore, one focuses on a simpler and faster local
approach. Asymptotic properties are stated, and a functional dataset illustrates the
good behavior of this fast functional local modelled regression method.
7
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Introducción y Resumen en Español
Esta tesis esta compuesta por tres capítulos, los cuales se centran en tres diferentes
problemas, aunque relacionados, estos van desde estimación y contrastes de hipótesis
hasta desarrollos teóricos. Más exactamente, en el Capítulo 1, " The Size Problem of
Kernel Based Bootstrap Tests when the Null is Nonparametric", nosotros estamos
interesados en la selección apropiada de un parámetro de suavización, un problema
que es fundamental para un razonable use de los métodos semi y noparamétricos.
En particular, para contrastes de hipótesis, nosotros notamos que este problema no
es equivalente aquel que se presenta en análisis de regresión, esto es en la simple
estimación. Al menos desde un punto de vista teórico, la selección del parámetro
para contrastes de hipótesis tiene tasas (de convergencia) diferents a las que se
supone debe tener los parámetros que son óptimos para la estimación.
Mientras que existe una creciente literatura sobre el modo de hallar un parámetro
apropiado para la hipótesis alternativa, casi nada es sabido sobre como elegir un
parámetro de suavización en la práctica para la hipótesis nula, si esta es también
semiparamétrica o incluso noparamétrica. Solo sabemos que asintóticamente una
parámetro sobresuavizado es necesario en la preestimación del modelo bajo la nula
para generar las muestras bootstrap, ver al respecto Hárdle and Marrón (1990,1991).
Sin embargo, en la práctica este conocimiento es de poca ayuda. Lo mismo puede
decirse acerca de varios parámetros y procedimientos a ser elegidos en la práctica
cuando hacemos un uso de un procedimiento de contraste. En este Capítulo en-
tonces, nosotros discutimos estas cuestiones acerca de la selección. En particular,
nosotros estudiamos el problema de la selección del parámetro de suavizado en la
pre-estimación para generar las muestras bootstrap. Como alternativa, también
discutimos brevemente la posibilidad de submuestras.
En el Capítulo 2, "Estimation and Testing An Additive Partially Linear Model
in a System of Engel Curves", nosotros nos centramos en la aplicción de modelos
aditivos parcialemente lineales basados en algunas ideas del Capítulo 1. Nuestra
meta es hacer una aplicación en teoría del consumidor. Específicamente a sistemas
8
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
de curvas de Engel. La forma de la curva de Engel ha sido por mucho tiempo objeto
de investigación en econometría aplicada y hasta el momento no hay conclusiones
definitivas sobre su forma. En este capítulo un modelo parcialmente aditivo es us-
ado para estimar semiparametricamente el efecto del gasto total. Adicionalmente,
consideramos la inclusión noparamétrica de algunos regresores que tradicionalmente
tienen un efecto no-lineal como la edad y la escolaridad. Para llevar a cabo este
trabajo, comparamos un modelo aditivo parcialmente lineal con un modelo plena-
mente noparamétrico usando algunos estadísticos de contraste recientemente desar-
rollados. Puesto que infererencia en regresión noparamétrica puede ser hecha de
varias maneras, lo más natural es usar la regresión noparamétrica como hipótesis
alternativa contra una hipótesis nula semiparametrica. Entonces, para propósito
de investigación nosotros chequeamos si un modelo PLM proporciona un razonable
ajuste a los datos usando diferentes métodos de reemuestreo para obtener valores
críticos calculados con bootstrap y submuestras de los mencionados estadísticos de
contraste.
En este capítulo, nosotros también tratamos un problema común el contexto
de las curvas de Engel, y es que el gasto total esta conjuntamente determinado
con el gasto en los diferentes bienes. Por ello existe una endogenidad potencial.
Para resolver este problema usamos regresores construidos como variables instru-
mentales, en adición a variables en otras bases de datos. En particular, nosotros
usamos el método desarrollado por Sperlich (2005) llamado regresores noparametri-
camente generados o construidos en dos pasos (NP2SCV). Nuestra sensación es que
ciertamente (NP2SCV) en combinación con modelos aditivos parcialmente lineales
ayudan a eliminar la endogeneidad en la estimación de las curvas de Engel.
En el Capítulo 3, "Locally Modelled Regression and Functional Data", nosotros
estamos interesados en extender los métodos noparametricos cuando los regresores
son funciones (i.e. una observación podría ser una curva, una superficie o cualquier
otro objeto perteneciente a un espacio de dimensión infinita). Desde un punto
de vista estadístico, esto corresponde a una regresión funcional, porque deseamos
predecir un^ variable respuesta Y de una variable explicativa funcional X. Además,
9
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
solo condiciones de regularidad con impuestas al operador de regresión son asumidas.
Esto conduce entonces a un contexto noparamétrico. Así que la problemática de
este trabajo trata de la regresión funcional noparamétríca. Recientemente, varios
artículos tratan con la regresión funcional noparamétríca (ver por ejemplo Ferraty
and Vieu (2002, 2005)). Estos consiste esencialmente en la extensión de estimador
kernel Nadaraya(l964)-Watson(1964) a el caso de variable explicativa funcional. De
otro lado, ideas de regresión local han sido desarrolladas en el contexto de regresión
univariante y multivariante, ver Wand and Jones (1995). Por tanto, nuestro método
es una extensión, que es una combinación de los métodos de regresión locales con
las ideas actuales de variables funcionales. Así pues, la meta nos es fácil en cuanto
al estudio asintótico y la implementación de una más que natural generalización
del método lineal local multivariante. Por tanto, nos centramos en una más simple
y rápida aproximación local. Las propiedades asintóticas son establecidas y datos
funcionales ilustran el buen comportamiento de este método rápido de regresión
local.
10
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
REFERENCES
Ferraty, F and P. Vieu (2004). Nonparametric Models For Functional Data, with
Applications in Regression, Time Series Prediction and Curve Discrimination. Non-
Parametric Statistics, 16, 1-2, 111-125.
Ferraty, F and P. Vieu (2006). Nonparametric Modelling for Functional Data Analy-
sis. Theory and Practice. Springer, New York (In print).
Hardle, W and J.S Marrón (1990) Semiparametric Comparison of Regression Curves.
Annals of Statistics, 18, 63-89.
Hardle, W and J.S Marrón (1991) Bootstrap Simultaneous Bars For Nonparametric
Regression. Annals of Statistics, 19, 778-796.
Sperlich, S. (2005). A Note on Nonparametric Estimation with Constructed Vari-
ables and Generated Regressors. Working Paper. Universidad Carlos III.
Wand, M. P and M. C. Jones (1995) Kernel Smoothing. Monographs on Statistics
and Applied Probability, 60. Chapman & Hall.
Watson, G. S (1964) Smooth Regression Analysis. Sankhya Ser. A 26.
11
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1
The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonpar ametric
1.1 Introduction
IN B O T H A P P L I E D AND M A T H E M A T I C A L STATISTICS, N O N - AND S E M I P A R A M E T R I C
SPECIFICATION TESTING is still quite a popular research field. Any internet search
engine can find several hundred papers dealing with this topic even when looking
at the last five years only. Therefore, it is surprising that so few of them study
the problem of choosing an appropriate smoothing parameter, a problem that is
fundamental for the reasonable use of these methods. Unfortunately, for testing this
problem is not equivalent to the one in regression. It is well known that, at least
from a theoretical point of view, the optimal smoothing parameter for testing has
different rates from those which are optimal for estimation.
In the last couple of years there has been a growing amount of literature on
adaptive testing. In most cases, the adaptiveness refers to the smoothness of the
alternative and deals with the choice of smoothness parameter for the alternative, or
the test statistic, see e.g. Ledwina (1994), Spokoiny (1996,1998), Kallenberg & Led-
wina (1995), Hardle et al (2001), Horowitz & Spokoiny (2001), Guerre & Lavergne
(2005). Even though these methods have so far had little direct impact in the sense
that we could not find published papers using these methods (in practice or in the-
12
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
ory), they have been useful in determining a better understanding of the problem.
However, to our knowledge, all these papers concentrate on testing problems where
the null hypothesis is fully parametric. It is not clear to what extend these meth-
ods help if the null hypothesis is semi- or nonparametric. This is not such a rare
situation, since additivity tests already belong to this family. When bootstrap is
used to determine the critical value, these tests entail at least one more parameter
choice problem: pre-estimating the model under the null hypothesis to later gen-
erate the bootstrap samples. This is necessary as in most cases the bandwidths
for the estimation and the bootstrap should have different rates, see e.g. Hardle &
Marrón (1990,1991). Although these authors have already mentioned the problem
of choosing an appropriate bandwidth, in practical applications this problem has
hardly been addressed. As a consequence, in most published procedures for test-
ing or constructing confidence bands with a semi- or nonparametric null hypothesis,
there is no guarantee that the test holds the level, or the bands the nominal coverage
probability. This has recently been confirmed in the work of Dette et al. (2005)
and Rodríguez-Póo et al (2004). However, in the former it is not referred to as a
bandwidth problem but rather as a problem of correlated designs and dimension-
ality because the size distortion is much smaller for uncorrelated design. In the
latter paper the problem is avoided by using subsampling instead of bootstraps. It
should also be mentioned that in that simulation study, the authors face basically a
parametric bootstrap drawing the bootstrap errors from a distribution known up to
a certain parameter. Although that unknown parameter depends on nonparametric
nuisance parameters, knowledge of distribution greatly mitigates the impact of the
bandwidth on the critical value.
To study the problem outlined in more detail we concentrate on the problem
of testing additivity. We limit ourselves to test statistics proposed in Dette et al.
(2005) and Rodriguez-Poo et al. (2004) but we try different modifications, methods
of bandwidth choice, and subsampling. The aim is not to find the most efficient
additivity test or to propose new ones. Our focus is only directed at finding a
method that guarantees that the level will be held by non trivial power when the
13
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
null hypothesis and the resampling method are non- or semiparametric. So, after
a review of the additivity tests considered here, we study different procedures for
bandwidth choice. Unfortunately, we have not found a generally valid method. Our
conclusion is basically that further research is necessary.
The rest of the paper is organized as follows. In the next section we review the
estimation and testing procedures considered in this work. In Section 1.3 we discuss
the different, scenarios from which the practitioner has to make his choice, including
modifications of test statistics, and resampling methods. Section 1.4 summarizes
the main findings from our simulation results, and Section 1.5 concludes.
1.2 Statistical Methods: Estimators and Test Sta-tistics
1.2.1 Estimators
We consider the following model:
Yi = m(Xi)+ui i = l,2:....n, '(1.1)
with {{XtYl)}"=1 e Md xK i.i.d., m : Ud -» K the unknown function of interest,
m[x) = E(Y\X = x), and IÍ¿ i.i.d. random errors with E[u{] — 0 and finite variance.
The internalized Nadaraya-Watson estimator is defined as
n - i mk(x) = ] T vk{x, Xi)Yi, with vk{x, Xz) = (/ fc(X ;)) Kk(x - X,) (1.2)
where fk{Xj) = ^ J^"=l Kk(Xj — Xr) is a kernel density estimator (unlike standard
Nadaraya-Watson, here Í fk{Xt) j appears internally to the summation, see Jones
et.al (1994)), and Kfc(u) = \\d
a=l Kk (u) a product kernel with Kk{u) = k^Kiuk"1).
Commonly, the kernel is assumed to be Lipschitz continuous with compact support
and / \K(x)\dx < oo, / K{x)dx = 1. Furthermore, k is the bandwidth, assumed to
go to zero for sample size n going to infinity, but nk^ going to infinity. Let Vk be
the n xn matrix whose (j,i) element is vk{Xj,Xi)1 then rhk{x) = VkY, where Y
and mi; (•) are n x 1 vectors with rhk(Xj) and Yj is its jth entry respectively.
14
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
.Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
We are interested in the additive model, which we write in terms of
d
E (Y\X = x) = ms(x) = x{; + J2m« (x«) , (1-3)
a=l
where we set Exa {irLa(Xa)} = J ma(x)fa(x)dx = 0 Va for identification. Here,
ma, a = 1,. . ., d are the marginal impact functions for each regressor. Therefore,
^ is a constant equal to the unconditional expectation of Y. Writing m{X) =
ma(Xa)+m_a(X_a) where X_a is the vector X of all explanatory variables without
Xa, i.e. X^a = (Xii,... ,X¿(Q._i), Xi(a+i),... ,Xid) we can use the identification
condition directly to estimate ma. The so called marginal integration idea is based
on the fact that for xa fix we have
Ex-a [m {xa,X-a)\ = I m (xa, x_Q) /_„ (x_Q) dx^a = i< + ma (xa) .
Substituting for m(-) a nonparametric pre-estimator such as the one given in (1.2), a
sample average for the expectation, and for ip simply ip = - Y17-1 V* &ves (neglecting
the constant for a moment for the sake of simplicity):
n
fTla\%a) / t l^ah y^at -A-ia) *i j
¿=1
where
wh (xa, Xai) = Kh (xa - Xia) j - ^ lsz£iL . (1.4)
Finally, we set rhs(Xj) = ip+^2a=1 rha(Xja) for each j = 1, 2,..., n. Note that defin-
ing Wh •= J2a=i Wah (xa) with Wah (xa) being the nxn matrices with wah (Xj, X{)
as elements, one has rhs (x) — ip + Wh (x) Y. For more details see Dette el al (2005).
Some of the test statistics we will consider here are also introduced and discussed
there.
1.2.2 Test Statistics
As mentioned above, we do not introduce new testing procedures but rather study
two statistics which have already been studied in Dette et al (2005) together, with
15
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
other additivity tests, and which have turned out to perform best. We add a new
test statistic motivated by one that was introduced recently by Rodríguez-Poó et al
(2005), and which performed excellently in the study by Roca-Pardiñas & Sperlich
(2006). For more details on the test statistics readers are referred to these papers.
The null hypothesis of interest is Ho : rrc(-) = ms(-) versus Hi : m(-) ^ ms{-).
We consider the following two test statistics from Dette et al (2005) :
n = Í ¿ ( m ( * i ) - m s ( * i ) ) M * í ) , n /—J ¿=i
1 " T2 = -S^ei{rh{Xi)-rns(Xl))w{Xl),
n ¿—•* n ¿ = 1
where é¿ = Y¿ — rhs(Xi), i.e. the residuals under the null hypothesis, and ñ¿ =
Yi — m(Xi), the residuals without restrictions. Obviously, T\ calculates directly the
integrated squared difference between the null and alternative models. Alternatively,
T2 seeks to mitigate the bias problem inherited from the estimate m, which suffers
from the curse of dimensionality. In Dette et al (2005) it is proved that for all r¿,
the nkz (jj — /¿-) converge under the null to a normal variable with mean zero
and variances v\ for j — 1,2 with
¡ix = EH0{TI} = —^ / a2(x)w(x)dx / K2(x)dx +
¡i2 = EHo {r2} =-r-j
-
Chapter 1 The Size Problem of Kernel Based .Bootstrap Tests when the Null is Nonparametric
where for ease of presentation and implementation K (•) is the same kernel function
as in the last subsection, and k again its bandwidth. It is straightforward to derive
from the above mentioned paper that nkz (T 3 — fi3) converges under the null to a
normal variable with mean zero and variance v\ for
¡d3 = EH0{TÍ}— I (K * K) (x) dx I a2(x)f2(x)w(x)dx
All tests have been proven to be consistent in the sense that under the alternative
they converge with n to infinity.
Finally let us mention that we have also studied other test statistics, e.g. those
given in Dette et al (2005) but not presented here. These, however, showed even
less satisfactory performance, so we have skipped them in our presentation.
1.3 Resampling and Choice of Parameters
As is well known, asymptotic expressions are of little help in practice, for calculating
the
exact critical value, for several reasons: bias and variance contain unknown ex-
pressions which have to be estimated nonparametrically, and the convergence rate
is quite slow for large d. For this reason it is common to use resampling methods
to approximate the critical value for the particular sample statistic. These can be
bootstrap methods or subsampling procedures. Unfortunately, unlike subsampling,
for the bootstrap it is not known how to choose the smoothing parameter in practice
for the pre-estimation of the model that is used to generate the bootstrap samples.
From theory it is known that one should somewhat oversmooth (see for instance
Hardle and Marrón (1991) and discussion below). For the choice of k (when esti-
mating the alternative), some procedures are provided in the literature (see our brief
discussion of adaptive tests in the introduction). We will come back to this point
later in this section.
17
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
1.3.1 B o o t s t r a p T e s t s
We give the general procedure first and then discuss some details:
1. With bandwidth ft-, calculate the estimate rhs under the null hypothesis of
additivity and its resulting residuals é¿, i = 1,. . ., n.
2. With bandwidth k, calculate the estimator m for the conditional expectation
without the additivity restriction, and the corresponding residuals ü¿, i =
1,. . ., n.
3. With the results from step 1 and 2 we can calculate our test statistics TI, T2,
and T3.
4. Repeat step 1 but now with a bandwidth hb which depends on h from step 1. We
call the outcome rhbs, respectively e¿ = Yi—rhb
s(Xi), i = 1,. . . , n. Draw random
variables e* with E[(e*Y] = u\ (respectively e\ or e¿, see discussion below) for
j = 1,2,3 (respectively j = 1,2, see below again). Set Y* = rhbs(Xi) + e*,
i = 1 , . . . , n. Repeat this B times. This defines B different bootstrap samples
{{Xi,Y;fi)}Z=1,b=l,...,B.
5. For each bootstrap sample from step 4 calculate the test statistics r*' , j — 1, 2, 3,
b = 1,... ,B. Then, for each test statistic r¿, j — 1,2,3, the critical value
is approximated by the corresponding quantiles of the distribution of the B
bootstrap analogues: F*(ü) = j¡ Ylb=i ^iT*j' — ^ } - R-ecaH that they are
generated under the null hypothesis.
This procedure is well known, has proved to be consistent for many test sta-
tistics and has therefore been applied, certainly with slight modifications, to many
non- or semiparametric testing problems. However, several questions of practical
importance remain open: bandwidth choice h in step 1., bandwidth choice k in step
2., how to generate the bootstrap residuals e* in step 4. (see above), and how to
choose hb. Finally, how many bootstrap samples are necessary to get a reasonable
18
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
approximation of the distribution in step 5. In this paper we will discuss all these
questions except the last one.
1.3.2 The Choice of Bandwidths h
The problem of finding an optimal h is somewhat different from that of finding the
optimal smoothing parameter k which is directly linked to the optimal rate of the
test statistic. In that case it is clear that a theoretical optimal choice depends on the
optimal rate at which the test can detect a deviation from the null hypothesis. For
further details see the next subsection. In most cases, the estimator of the null model
can have faster convergence rates than that of the alternative, so the asymptotics
of the test statistics provide no theoretical guideline for an optimal choice of h. In
other words, we have to rely on practical issues.
As there are exist data adaptive methods for finding the optimal bandwidth k for
the alternative (compare next subsection) one could argue that h should be chosen
according to k. This way one could guarantee that the same smoothness is imposed
on the regression function regardless of whether it is estimated under the null hy-
pothesis or not. However, it is not clear whether this is always wanted. Moreover,
we will see later that on the one hand the adaptive choice of k is computationally
intensive, and on the other hand /i¡, depends on h. For k one needs a grid search
which then has to be extended to the choice of h (as it then depends on k) and thus
to the choice of h¡,. Altogether we would get a procedure that is computationally
quite unattractive.
Intuitively, it seems to be desirable to look for a reasonable estimation of the null
model. This is only guaranteed with a reasonable bandwidth choice of h beforehand.
We therefore recommend cross validation or plug-in methods.
1.3.3 The Choice of Bandwidths k
It is known that a bandwidth k which is optimal for estimation is usually suboptimal
for testing. More specifically, for testing the optimal smoothing parameter has faster
convergence rates, i.e. we should undersmooth. As for regression, cross validation
19
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
bandwidths have a tendency to undersmooth in practice, and they are also quite
popular for nonparametric testing.
As an alternative, let us consider the adaptive testing approach introduced e.g.
in Spokoiny (1996,1998). It has been extended by Rodríguez-Poó et al (2004) to
nonparametric testing problems such as those we consider here. The method is the
same for each of our three test statistics, so we can skip the index j of Tj, j — 1,2, 3
in this subsection. Adapted to our problem it works as follows:
We consider simultaneously a family of tests {rfc, k 6 &}, where 8. — {fcj, ¿2,...., kp)
is a finite set of reasonable bandwidths. The theoretical maximal number P depends
on n but is of no practical relevance, for details see Horowitz & Spokoiny (2001).
Define rk - Eo[rk} .
Tmax = m a x t > w h e r e
keK Varl/2[Tk]
EQ[] indicates the expectation under HQ. This studentizing under the null is only
to correct for the deviations in distribution caused by the different bandwidths k.
Therefore, instead of Varl^2[rk] we could take something proportional to it without
loosing consistency, as long as it corrects for the standard deviation caused by the
different k — k\,..., kp.
A particularity of the bootstrap analogues of rmax is that one first needs to cal-
culate the bootstrap statistics (rfc)*'6 for all k E 8. to afterwards get (Tmax)*.6. Note
that for each k, the empirical moment of the bootstrap statistics (jk)*'b (average,
respectively standard deviation) can be used as a substitute for EQ [rh\, respectively
Var1//2[Tk], in practice. This is what we do in our simulation study.
1.3.4 T h e Choice of B o o t s t r a p Res idua l s
From a theoretical point of view, wild bootstrap errors should be drawn from the
residuals of the alternative model, i.e. t¿¿ should be used in Subsection 1.3.4 instead
of e¿ or é¿. It is clear that this should maximize the power as the variance of e¿ (and
é¿) can increase greatly with increasing distance between HQ and the true model.
Arguments in favor of using e¿ exist only under practical aspects: often the size
20
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
distortion in bootstrap tests is worse when using ui or é¿; when using adaptive
procedures as described in Subsection 1.3.3, then it is not that clear which of the
Ui to use or whether the t¿¿ should even be estimated independently of the fc-choice
for the test; at least in the study of Dette et al (2005) to which our study comes
closest, the power loss is negligible so the size argument is decisive.
We conclude so far that if no adaptive choice of k is made, it would be desirable
to use m as long as one can control for the size distortion.
The second question is what kind of distribution for generating the random errors
should be used. In step 4 of the bootstrap procedure described in Subsection 1.3.4 a
distribution is often taken that gives e* with E[(e*y] = ej for j = 1 up to 3 (or even
more). The so called golden-cut wild bootstrap is also quite popular, see e.g. Hardle
& Mammen (1993). More recently, in the context of size distortion of bootstrap tests,
Davidson & Flachaire (2001) argue that for problems with moderate sample size
the disadvantages of the higher-order-moment adapting bootstraps outweigh their
(asymptotic) advantages. We therefore compare different methods in our simulations
(see Section 1.4).
1.3.5 An Alternative: Subsampling
A more and more popular alternative to bootstrapping is the subsampling proce-
dure, see Politis et al (1999). To date, as subsampling is commonly believed to
converge slower in practice than bootstrapping, it has been used almost exclusively
when the bootstrap fails, i.e. has been proven not to converge. See Neumeyer &
Sperlich (2006) as an example in a purely nonparametric testing context. However,
Rodríguez-Poó et al (2004) introduce subsampling in the context that we discuss
here, although the bootstrap is consistent, because of the size distortion their boot-
strap test suffered from (until the sample size was huge). In both papers subsampling
works well. The former also studies the automatic choice of subsample size m (with
m < n) which turns out to work in their simulations. As this method might be
remodeled to serve as a procedure for finding hb, we briefly introduce subsampling
and the automatic choice of the subsample size m:
21
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kerne] Based Bootstrap Tests when the Null is Nonparametric
Let y = {{X.¿, Y¡) \i — 1,..,, n) be the original sample, and denoted by r (y) the
original statistic calculated from this sample, leaving aside index j = 1,2,3 for a
moment. To determine the critical values we need to approximate
Q{z) = P (nVi?T Q>) < z\ . (1.6)
Recall that under HQ this distribution converges to an iV(/¿,t>2), for ¡i and v see
Subsection 1.2.2. For finite sample size n, drawing B subsamples y¡, - each of size
m - we can approximate Q under HQ by
1 B
¿W :== QT,I{myñ^Tkm^m) = 1
Note that the awkward notation comes from the fact that we have to adjust all
bandwidths for the new sample size m. For example, imagine k = ko • n's for fco
being constant. Then, Tfcm is calculated like T but with bandwidth km — konsm'6.
Certainly, under the alternative Hi, not only nVk^T (y) but also m^/k^jkm (ym)
converges to infinity. When demanding m/n —> 0 guarantees that ny/k^r (y) con-
verges (much) faster to infinity than the subsample analogues. Then, Q underesti-
mates the quantiles of Q which yields the rejection of HQ-
The problem here is to find a proper subsample size m. Actually, the optimal
m is a function of the level a. Again we apply resampling methods: Draw some
pseudo sequences y*>1, i = 1 , . . . , L of y of size n with the same distribution as JA
For the desired level a, test HQ : m(x) ~ ms{x) = rh{x) — rhs{x) the same way as
you want to test HQ : m(x) = m-s(x), i.e. applying your particular test statistic to
HQ and using subsampling. From the L repetitions you can determine the empirical
rejection level (estimated size) for your given a. Now find an m such that this
empirical rejection level is ^ a. In practice, you choose from a grid of possible m
the one whose estimated rejection level for HQ is closest to a from below. Note that
HQ is always true up to an estimation error that should be almost the same as in your
original test. The only drawback of this procedure is the enormous computational
effort. For further details and examples see Politís et al (1999), Delgado et al (2001),
and Neumeyer & Sperlich (20P6).
22
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
1.3.6 T h e Choice of B o o t s t r a p B a n d w i d t h hi
In general, for many test statistics one could repeat the arguments outlined in Hardle
& Marrón (1990,1991): For the mean of fhh(x) — m(x) under the conditional distri-
bution of Yi,..., Yn\Xi, ...,Xn, respectively of rh*h(x) — rhhb(x) under the conditional
distribution of Yf, ...,Y*\X\, ...,Xn, we know from Rosenblatt (1969) that
EY\x{mh(x)-m{x)) « h2^-m"{x) , (1.8)
Er(m-h{x)-mg(x)) « h^^-m^x) , (1.9)
where fj,(K) = J u2K(u)du. Obviously, we need that vnl'h (x) — m"(x) •—> 0. The
optimal bandwidth /i6 for estimating the second derivative is known to be much
larger (in rates) than the optimal h for estimating the function itself. We can even
give the optimal rate. For example, the optimal rate to estimate ras" is of the order
n - 1 ' 9 (instead of n~1//5), an observation we make use of in our simulation studies in
Section 1.4.
As will be seen once more in Section 1.4, the typical comment that /ib has to be
oversmoothing, is unhelpful in practice. We therefore try the following automatic
bandwidth choice: apply the same procedure used for the automatic choice of a
proper subsample size m (last subsection) to find an adequate hi, for a given level a.
This is what we explain in more detail and afterwards try in our simulation study.
1.4 Simulation Results
To study all the points listed in the last section, we perform a huge number of
simulations. We give here only a summary of them; for example, limiting the pre-
sentation to Tj, j = 1, 2, 3, one particular model, one specific (random) design, and
sample size n = 100.
The model considered is as follows: We consider the same data generating process
23
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
as Dette et al (2005). That is, we draw i.i.d. three dimensional explanatory variables
/ 1 0.2 0.4 \ Xi ~ N(0, Ex) with Ex = 0 . 2 1 0.6
\ 0.4 0.6 1 /
and i.i.d. error terms e¿ ~ iV(0, al) to generate
Yi = Xhi + Xli + 2 sin(7rX3ii) + aX2,iX3s + eu i = l,...,n
with a = 0 to generate an additive separable model, or a = 2 for the alternative.
Recall that the target is a test for additivity. Unless otherwise indicated, we set
ae — 1. Dette et al (2005) show that for the rather unrealistic situation that if
Ex is the identity matrix (i.e. with an uncorrelated design), then the problem is
greatly simplified, whereas a (much) stronger correlated design than ours leads to
identification problems for moderate sample sizes.
All results in the tables are calculated from 250 replications using 200 bootstrap
samples (or subsamples respectively). For real data applications 200- bootstrap sam-
ples are certainly very few; but in our simulations the results differed little when we
increased the number to 500. We used the (multiplicative) quartic kernel through-
out.
In all three test statistics we use the weighting function w(-) for different trim-
ming: we cut the outer 10%, 5% or 0% of the sample, where "outer" refers to the
tails of the explanatory variables. This is done to get rid of the boundary effects in
the statistics. The tables only give results for 5% and 0% as the boundary effects
turned out not to be a major problem.
To further speed up our simulation studies, we first looked for an average cross
validation bandwidth k, which turned out to be kopt = 0.78. Then we did all our
simulations for the non-adaptive tests (compare Subsection 1.3.3) with kopt. This
was done not only for computational reasons but also because otherwise the size of
the tests would also depend on the randomness induced by the estimation of k. For
the adaptive test procedure, k ran over a grid of 10 bandwidths placed around kopt.
We verified that in most cases Tmax did not refer to the boundary, i.e. to kmin or
h ^rnax •
24
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Nul) is Nonparametric
As discussed above, the bandwidth choice problem is different for h. Here, the
parameter responsible for the size, /ifr, depends on both a (the level) and h. Alto-
gether, it is no problem here that h is chosen by cross validation in each simulation
run as recommended in Subsection 1.3.2. For the internalized marginal integration
estimator, cross validation bandwidths were introduced by Kim et ai (1999). For
the nuisance directions X-a (see equation (1.4) in Section 1.2.1) we used h_a = 6 • h
as recommended in Dette et al (2005) and Hengartner & Sperlich (2005).
We tried different bootstrap residuals (compare Subsection 1.3.4). Our simu-
lations mainly seem to confirm the findings discussed above. Therefore, below we
report only results referring to e* = £¿e¿, where the e¿ are i.i.d., drawn either from
the golden-cut distribution
f - ( \ / 5 + l ) /2 with probability p = (>/5 + l ) / (2v5) €i ~ \ (\/5 + l ) /2 with probability 1 - p
or from the Gaussian normal N(0,1). However, we admit that it may be interesting
to try more, different automatic choice procedures for h¡,, in order to study again
what effect the choice of residuals taken has (ult ¿, or ¿;).
Probably the most interesting and challenging point is the choice of h¡,. We first
give the results for several choices of /i¡, with different bootstrap generating methods,
/c-adaptive and non adaptive procedures. To have h¡, as a function of h, to take also
into account h/hb —* 0, and perhaps validate the rate n~l//9 (motivated in Subsection
1.3.6) we set hb = /in1/5-1/" and try different tc < 9.
Table 1.1 shows the results for the non-adaptive golden-cut bootstrap test. These
results basically i) confirm the statements of Dette et al (2005) for our context;
and ii) show that the problem is not solved simply by different smoothing in the
pre-estimation. Undersmoothing, as generally stated from a theoretical point of
view, seems to go in the wrong direction. In particular, the hope that the results
of Rosenblatt (1969) (see equations (1.8) and (1.9)) might give us a hint or even
provide a rule of thumb for the choice of h^, is not confirmed here. T3 , introduced
by Rodríguez-Poó et al (2004) clearly outperforms the others in this study (as it
does in the following).
25
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
The results for the fc-adaptive analogues, see Table 1.2, show hardly any im-
provement. In particular, the problem of choosing h^ or, in other words, the size
problem is only mitigated for r%.
Following to some extent the findings of Davidson & Flachaire (2001) we then
repeated these two studies but with the Gaussian bootstrap, see above. Though
there is some improvement in both, size and power, the results in Table 1.3 and 1.4
give us hope only for test statistic T%. Note that the observation that a slight un-
dersmoothing is produces much better results than oversmoothing has not changed
over the four different trials.
Next, for comparison we also provide a small simulation study where the critical
values are approximated by subsampling, trying several subsample sizes m. The
results are given in Table 1.5 for non-adaptive tests, and in Table 1.6 for fc-adaptive
tests. We tried more sizes m for the non-adaptive test but got reasonable results
only for T3 . In contrast, looking at the A;-adaptive versions, ryax, T™ax seem to
work, too - though with a rather weak power. Table 1.6 unfortunately is misleading
concerning r™ax\ one needs a much smaller m to get reasonable results here. A small
simulation study evaluating the automatic choice of m seems to indicate that this
procedure might work and therefore should be tried for what is our main focus: the
automatic choice of hb-
Therefore, we adjusted the automatic choice of the subsample size to find an
adequate hb (see Subsection 1.3.6). This was done as follows, described here in
detail for r3. Let {Y*, £*}™=1 := 3̂ * be a member of the pseudo sequence introduced
in Subsection 1.3.5. Then, for testing HQ : m{x) — ms(x) = rh(x) — rhs(x) with
sample 3^*, an analogue to T3 would be
1 ^ = IT, 2
-K^X'-XAiYj-msiXj)} w{X¡) . (1.10)
26
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
Other statistics are thinkable certainly, e.g.
2
w(X-- ¿ K , ( X t - X*){Y* ~ rhs(X;)} - ¥Lh{Xt - X3){Y0- - ms(X3)} nkd n /—1
but they should all be asymptotically equivalent to (1.10). The procedure was
performed with only L = 100 pseudo samples 3^*- As the results varied widely
we were forced either to enlarge L considerably or to reduce ae considerably. For
computational reasons we decided on the second option and repeated the study with
ae = 0.1.
Some results are summarized in Table 1.7. As can be seen, this time we emphasize
the possibility of undersmoothing much more. You first have to look at T\ to find
the K giving the rejection level closest to a = 5% from below. Here, this is always
K = 3. Note that this might also change depending on the trimming, a, sample size,
etc. It is important to understand that the lines of T^ can always be calculated, i.e.
without knowing the true data generating process. Therefore we call this method
fully automatic. Now look at the lines for T%, the test of interest. Obviously, K — 3
is indeed the best possible choice; it holds the level and has strongest power of
any K respecting the level. This could be taken as indicating that our suggestion
for selecting /ib works. Unfortunately, this method does not work that well for all
possible a; specifically, it becomes quite incorrect for a > 10%. We repeated this
study also for j \ and T-I- The results were always somewhat worse than for T3 so
they do not change our conclusion that this procedure seems to be an interesting
and promising approach but further research is necessary.
1.5 Conclusions
We discuss the choices of all "parameters" a practitioner has to use when facing a
kernel based specification test where the null hypothesis is non- or semiparametric.
We have set parameters in quotation marks because we refer here also to questions
such as how to generate bootstrap errors, etc. However, our main focus is the boot-
strap and its size distortion in practice when the sample size is small or moderate.
27
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
These points are illustrated along the popular problem of additivity testing. Natu-
rally, one looks for an optimal trade-off between controlling for size under the null
hypothesis HQ and maximizing power. Even though these problems have already
been discussed and studied in theory, as yet, it is unclear how to set these para-
meters in practice. We show that theory is not just unhelpful here; at present, a
reasonable application of tests of these kinds is questionable.
We try and compare many modifications that can be found in the literature
without finding any clue to an optimal - or even a reasonable - parameter choice.
While there are different suggestions for singular problems such us which residuals
to take for the bootstrap or an adaptive choice of k, combining them gives puzzling
results. Sometimes, in practice, combining these suggestions, the power goes down
where it should increase or size becomes less precise where it should come closer to
the level.
Altogether, we have recommend certain procedures for particular test statistics.
However, the main open question seems to be how to find an automatic choice of
lib- We suggest a new procedure, taken from subsampling theory, that seems to
be a good way to go. However, further research is necessary to provide reliable
procedures for the nonparametric testing problems considered here.
28
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
REFERENCES
Davidson, R. and Flachaire, E. (2001) The Wild Bootstrap, Tamed at Last, Working
Papers 1000, Queen's University, Department of Economics.
Delgado, M. A., Rodriguez-Poó, J. M. & Wolf, M. (2001). Subsampling Cube Root
Asymptotics with an Application to Manski's MSE. economics letters, 73, 241-250.
Dette, H., von Lieres und Wilkau, C , and Sperlich, S. (2005) A Comparison of Dif-
ferent Nonparametric Method for Inference on Additive Models. J. Nonparametric
Statistics, 17, 57-81.
Guerre, E. and Lavergne, P. (2005). Data-driven rate-optimal specification testing
in regression models. Annals of Statistics, 33(2), 840-870.
Hardle, W and J.S Marrón (1990) Semiparametric Comparison of Regression Curves.
Annals of Statistics, 18, 63-89.
Hardle, W and J.S Marrón (1991) Bootstrap Simultaneous Bars For Nonparametric
Regression. Annals of Statistics, 19, 778-796.
Hardle, W. and E. Mammen (1993) Comparing Nonparametric Versus Parametric
Regression Fits. Annals of Statistics, 21, 1926-1947.
Hardle, W., Sperlich, S., and Spokoiny, V. (2001) Structural tests in additive regres-
sion. J. Am. Statist. Assoc, 96, 1333-1347.
Hengartner, N.W. and Sperlich, S. (2005) Rate Optimal Estimation with the Integra-
tion Method in the Presence of Many Covariates. Journal of Multivariate Analysis,
95, 246-272.
Horowitz, J.L. and Spokoiny, V. (2001) An Adaptive, Rate-optimal Test of Paramet-
ric Mean-Regression Model Against A Nonparametric Alternative. Econometrica,
69, No. 3, 599-631.
Jones, M., C , Davies, S.,J and B. U. Park. (1994) Versions of Kernel-Type Regres-
sion Estimators. Journal of the American Statistical Association, Vol 89, 825-832.
29
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
Kallenberg, W.C.M. and Ledwina, T. (1995), Consistency and Monte-Carlo simula-
tioins of a data driven version of smooth goodness-of-fit tests, Annals of Statistics,
23, 1594-1608.
Kim, W., Linton, O.B., and Hengartner, N. (1999) A computationally efficient oracle
estimator of additive nonparametric regression with bootstrap confidence intervals.
The J. of Computational and Graphical Statistics, 8, 278-297
Ledwina, T. (1994), "Data-driven version of Neyman's smooth test of fit," J. Amer.
Stat. Ass., 89, 1000-1005
Neumeyer, N. and S. Sperlich, S. (2006) Comparison of Separable Components in
Different Samples. Forthcoming in the Scandinavian Journal of Statistics
Politis, D.N., Romano, J.P., and Wolf, M. (1999) Sub sampling. Springer Series in
Statistics. Springer.
Roca-Pardiñas, J. and Sperlich, S. (2006) Testing the link when the index is semi-
paramtric - A comparison study. Working Paper Universidad de Vigo, Spain.
Rodriguez-Póo, J.M., Sperlich, S., and Vieu, P. (2004) And Adaptive Specification
Test For Semiparametric Models. Working Paper Carlos III de Madrid, Spain.
Rosenblatt, M. (1969) Conditional Probability Density and Regression estimators.
Multivariate Analysis I I , 25-31.
Spokoiny, V. (1996) Adaptive hypothesis testing using wavelets. Annals of Statistics,
24, 2477-2498.
Spokoiny, V. (1998) Adaptive and spatially adaptive testing of a nonparametric
hypothesis. Math. Methods of Statist, 7, 245-273
30
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
Trimming
0%
5%
a{%)
5
10
5
10
K
4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9
under HQ a=0.0 T\ T2 r 3
.000 .000 .008
.040 .000 -008
.068 .000 .008
.128 .000 .012
.176 .000 .012
.256 .000 .012
.024 .000 .032
.068 .000 .024
.120 .000 .024
.184 .000 .024
.272 .000 .020
.344 .000 .020
.012 .000 .008
.060 .000 .008
.108 .000 .008
.172 .000 .012
.284 .000 .012
.340 .000 .012
.040 .000 .024
.084 .000 -020
.168 .000 .024
.288 .000 .024
.364 .000 .020
.440 .000 .020
under H} a=2.0 T\ T2 T 3
.000 .032 .248
.004 .012 .184
.012 .012 .184
.016 .012 .196
.028 .012 .228
.028 .024 .252
.004 .060 .448
.008 .028 .312
.020 .020 .292
.032 .024 .300
.036 .028 .304
.056 .032 .340
.016 .052 .248
.020 .032 .192
.028 .028 .184
.040 .028 .184
.064 .032 .228
.080 .032 .244
.024 .112 .448
.036 .076 .308
.044 .052 .284
.064 .048 .292
.076 .052 .308
.116 .056 .332
Table 1.1: Rejection levels of the three original test statistics with and without trimming. Critical values are determined with golden-cut wild bootstrap, using hb = hn1^"1^ for the pre-estimation.
31
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kerne] Based Bootstrap Tests when the Null is Nonparametric
Trimming
0%
5%
a(%)
5
10
5
10
K
4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9
under Ho a—0.0 T\ T2 r3
.004 .004 .028
.004 .004 .020
.000 .000 .012
.000 .000 .000
.000 .000 .000
.000 .000 .000
.016 .012 .076
.012 .008 .072
.008 .004 .056
.000 .000 .028
.000 .000 .008
.000 .000 .008
.008 .004 .016
.000 .000 .016
.000 .000 .008
.000 .000 .004
.000 .000 .004
.000 .000 .004
.020 .012 .080
.008 .004 .064
.004 .000 .040
.000 .000 .024
.000 .000 .008
.000 .000 .008
under Hi a=2.0 T\ r2 r 3
.044 .032 .176
.064 .056 .204
.048 .036 .204
.036 .032 .196
.036 .012 .196
.032 .008 .188
.096 .072 .316
.140 .120 .308
.132 .092 .296
.104 .052 .316
.072 .044 .296
.064 .036 .284
.080 .052 .196
.068 .024 .184
.036 .016 .188
.016 .012 .184
.008 .008 .200
.008 .004 .192
.136 .120 .328
.120 .096 .296
.116 .060 .296
.100 .036 .292
.084 .024 .284
.056 .016 .288
Table 1.2: Rejection levels of the three ¿-adaptive test statistics with and without trimming. Critical values are determined with golden-cut wild bootstrap, using h¡, — hn1^5"1^ for the pre-estimation.
32
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
Trimming
0%
5%
a(%)
5
10
5
10
K
4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9
under HQ a=0.0 T\ r2 r 3
.004 .000 .008
.036 .000 .012
.080 .000 .012
.132 .000 .012
.188 .000 .012
.260 .000 .012
.020 .000 .044
.072 .000 .044
.116 .000 .032
.196 .000 .028
.276 .000 .016
.352 .000 .020
.012 .000 .008
.052 .000 .012
.116 .000 .012
.176 .000 .012
.268 .000 .012
.352 .000 .012
.028 .000 .048
.088 .000 .032
.164 .000 .024
.252 .000 .020
.380 .000 .016
.436 .000 .020
under H] a=2.0 T\ T2 r 3
.000 .036 .340
.004 .024 .236
.008 .012 .216
.016 .016 .224
.028 .012 .240
.032 .016 .248
.012 .064 .560
.012 .036 .380
.020 .024 .336
.036 .032 .332
.044 .032 .344
.068 .032 .372
.008 .080 .324
.008 .036 .236
.028 .036 .212
.040 .028 .220
.064 .028 .244
.096 .032 .260
.036 .172 .556
.036 .092 .372
.048 .072 .332
.060 .056 .328
.092 .048 .340
.120 .064 .376
Table 1.3: Rejection levels of the three original test statistics with and without trimming. Critical values are determined with Gaussian bootstrap, using hi, = ^ni/5-i/K £or t k e pre-estimation.
33
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
Trimming
0%
5%
a (%)
5
10
5
10
K,
4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9 4 5 6 7 8 9
under i í 0 c
Tl
.000
.000
.000
.000
.000
.000
.028
.020
.004
.004
.004 .000 .004 .000 .000 .000 .000 .000 .012 .004 .000 .000 .000 .000
T2
.000
.000
.000
.000
.000
.000
.008
.004
.000
.000
.000
.000
.004
.000
.000
.000
.000
.000
.012
.000
.000
.000
.000
.000
1=0.0
r3
.036
.028
.020
.008
.008
.008
.096
.088
.056
.032
.024
.016
.024
.024
.012
.008
.008
.008
.096
.072
.048
.040
.020
.012
under Hi a=2.0 T\
.048
.048
.052
.032
.024
.016
.124
.184
.172
.116
.092
.076
.064
.048
.032
.016
.016
.016 • 136 .124 .100 .072 .052 .040
T7
.028
.048
.032
.016 .008 .008 .096 .156 .124 .072 .048 .032 .036 .020 .012 .004 .004 .004 .100 .092 .048 .032 .012 .012
T3
.220
.204
.216
.200
.204
.200
.364 .340 .328 .324 .296 .304 .220 .200 .196 .200 .200 .204 .368 .332 .300 .312 .292 .296
Table 1.4: Rejection levels of the three fc-adaptive test statistics with and without trimming. Critical values are determined with Gaussian bootstrap, using h>, = hnl/5~1/K for the pre-estimation.
34
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 1 The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
Trimming
0% i
5%
a{%)
5
10
5
10
m
50 40 50 40 50 40 50 40
under Ho a
.000
.000
.004
.028
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
=0.0
7-3
.000
.040
.020
.224
.000
.052
.028
.240
under Hi a T\ T2
.004 -004 .004 ,004 .000 .000 .000 .000
.000
.004
.004
.004
.000
.000
.000
.000
=2.0 T 3
.028
.212
.248
.744
.032
.202
.232
.732
Table 1.5: Rejection levels of the three original test statistics with and without trimming. Critical values are determined with subsampling, using subsamples of sizes m.
Trimming
0%
5%
a(%)
5
10
5
10
m
90 80 70 60 90 80 70 60 90 80 70 60 90 80 70 60
under HQ a=0.0 T\ T2 r 3
.000 .000 .000
.000 .000 .000
.056 .088 .000
.244 .336 .000
.000 .000 .000
.028 .072 .000
.208 ,328 ,000
.584 .680 .000
.000 .000 .000
.000 .000 .000
.008 .016 .000
.060 .084 .000
.000 .000 .000
.008 .012 .000
.048 .096 .000
.196 .304 .000
under Hx a= T¡ T2
.140
.148 -156 .196 .192 .192 .276 .416 .080 .080 .076 ,064 .128 .140 .132 .136
148 160 168 236 196 208 308 484 104 104 088 076 152 148 160 168
=2.0
000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
Table 1.6: Rejection levels of the three ¿--adaptive test statistics with and without trimming. Critical values are determined with subsampling, using subsamples of sizes m.
35
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter ] The Size Problem of Kernel Based Bootstrap Tests when the Null is Nonparametric
Trimming
# 0 (a = 0)
Hi ( a = 2)
0%
5%
0%
5%
T 3
T3
T3
T 3
T3
1
.012
.680
.012
.676
.001
.972
.001
.968
2
.063
.392
.062
.380
.019
.932
.019
.936
3
.028
.032
.028
.024
.042
.632
.042
.620
K
4
030 012 030 012 022 380 023 368
5
.032
.012
.032
.012
.015
.272
.015
.260
6
.031
.012
.031
.012
.011
.260
.011
.252
7
.029
.016
.029
.020
.009
.264
.010
.264
Table 1.7: Rejection levels of T?, and T\ for a = 5%, with and without trimming, using Gaussian bootstrap with hb = /in1/5-1/* for the pre-estimation.
36
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter 2
Estimating and Testing An Additive Partially Linear Model in a System of Engel Curves
2.1 Introduction
T H E SPECIFICATION OF ENGEL CURVES IN EMPIRICAL MICROECONOMICS has
been an important problem since the early studies of Working (1943) and Leser
(1963) and the well-known work of Deaton and Muellbauer (1980a), in which they
developed parametric structures such as the Almost Ideal and Translog demand
model. Many Microeconomic examples are provided in Deaton and Muellbauer
(1980b) in which a separable structure is convenient for analysis and important
for interpretability. However, there is increasing empirical evidence pointing to the
conclusion that a sort of nonlinearity is present in the specification of Engel curves.
An alternative way of investigating nonlinear effects is to model consumer behav-
ior by means of semi- and nonparametric additive structures. Moreover, non and
semiparametrie regression provides an alternative to standard parametric regression,
allowing the data to determine the local shape of the conditional mean.
From an economic point of view there are many reasons why it is interesting to
recover a correct specification of Engel curves. Firstly, a correct specification allows
us to examine the nature of the effect of changes in indirect tax reforms. Secondly,
it is important to specify the response of consumers in the face of changes in total
37
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter Estimating and Testing An Additive Partially Linear Mode! in a System of Engel Curves
income. Changes of this kind allow us to assess the impact on consumers' welfare.
Consumer demand has become a very important field for applying non and semi-
parametric methods. An interesting analysis of the cross-sectional behavior of con-
sumers in the context of a fully nonparametric model can be found in Bierens and
Pott-Buter (1990). Papers which consider the implementation of semiparametric
methods in empirical analysis of consumer demand include Banks, Blundell and
Lewbel (1997) and Blundell, Duncan and Pendakur (1998). This latter paper is of
special interest because its analysis regression is based on semi- and nonparametric
specifications of Engel curves. It also tests Working-Leser and Piglog's null hypoth-
esis against the well-known partial linear model in which budget expenditures are
linear in the log of total expenditure. In this paper we estimate the Engel curves
directly as in Lyssiotou, Pashardes and Stengos (2003) among others.
We estimate an additive partially linear model (PLM) in order to investigate
consumer behavior using individual household data drawn from the Spanish Expen-
diture Survey (SES) and use the result obtained from semiparametric analysis to
examine the modelling-of age, schooling and expenditure in a system of Engel curves.
The importance of using an additive PLM models lies in the fact that in the context
of this model the effects of expenditure, the age and schooling on consumer demand
can be investigated simultaneously in the semiparametric context1. There are several
ways to get estimations of nonparametric additive structure, and we mention only
the most important: smooth backfitting, series estimators and marginal integration.
In this paper we use internalized marginal integration to estimate nonparametric
components in the additive PLM mainly because at the present time there is no
applied or theoretical study on the testing procedure using smooth backfitting.
Most of the papers that investigate consumer behavior in a nonparametric con-
text are focused on the appropriate way of modeling the form of the Engel curves.
Those focused on the unidimensional nonparametric effect of log total expenditure on
budget expenditures, taking in to account some parametric indexes to reflect demo-
1 Analysis of consumer behavior can he carried out with fully nonparametric models. However, for sake of interpretability and implementation, additive models overcome the well-known problems coming from multidimensional Nadaraya-Watson and Local Polynomial regression estimators.
38
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter Estimating and Testing An Additive Partially Linear Mode! in a System of Engel Curves
graphic composition include Blundeli, Browning and Crawford (2003) and references
therein. In this paper we investigate consumer behavior in semi and -nonparametric
terms focused on the nonparametric effect of total expenditure the age and the
schooling. In this study, unless stated otherwise, the effect of age and schooling
refer to the age and schooling of the household head. There is evidence suggesting
that these have deeper effect than generally assumed in parametric demand analysis
(see Lyssiotou, Pashardes and Stengos (2001)). In fact, it is common practice to in-
clude the square of age and/or schooling as well as their higher terms in parametric
models to capture possible nonlinear effects.
Inference in nonparametric regression can take place in a number of ways. The
most natural is to use nonparametric regression as an alternative against a fully
parametric or semiparametric null hypothesis. With this in mind, we investigate
whether an additive PLM provides a reasonable adjustment to our data using differ-
ent resampling schemes to obtain critical values of the test statistics. In this paper
we are interested in applying some recently developed test statistics which are very
popular in the literature about testing semiparametric hypotheses against nonpara-
metric alternatives. These test statistics are in the spirit of Hardle and Mammen
(1993) and Gózalo and Linton (2001), among others. On the other hand there is a
growing interest in the so called adaptive testing methods, in which the test statis-
tics are adaptive to the unknown smoothness of the alternative, see among others
Horowitz and Sponkoiny (2001) and Rodrigue2-Poo, Sperlich and Vieu (2005). in
this paper we adapt their ideas with some differences, where are considered kernel
smoother for our problem.
It should be remarked that a problem that we may well have to consider is the
endogeneity of regressors. Note that in the context of Engel curves total expenditure
may well be jointly determined with expenditure on different goods. The approach
used to solve this problem is instrumental variable estimation. We remark two
recently developed procedures in the context of nonparametric regression to tackle
the problem of endogenous regressors. The so called nonparametric two step least
square (NP2SLS) due to Newey and Powell (2003), and the nonparametric two
39
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter Estimating and Testing An Additive Partially Linear Model in a System of Engel Curves
step with generated regressors and constructed variables (NP2SCV) due to Sperlich
(2005). Newey and Powell (2003) 's approach is a cumbersome procedure involving
the choice of basis expansion in the first step. However, Sperlich's approach only
requires a non, semi or even parametric construction of regressors of interest in the
first step. Our feeling is that a generated variables approach in combination with
additive PLM can help us to overcome to some extent any possible endogeneity
problem and that is exactly the procedure implemented in this paper.
The contribution of this work can be summarized as follows. Firstly, we are the
first (to our knowledge) to carry out an exploratory analysis of consumer behavior
with data drawn from the Family Expenditure Survey for Spain using semiparamet-
ric models. Second, we apply recently developed methods to estimate, test (vari-
ous model specifications) and correct for possible endogeneity of total expenditure.
Third, our estimations of the additive model are accompanied by a reasonable mea-
surement of discrepancy between the fully nonparametric model and the additive
estimation. An adequate model check is necessary whenever estimations of additive
models are carried out (Dette, von Lieres and Sperlich (2004)). Additionally, our
measure of discrepancy adapts to the unknown smoothness of the non-parametric
model and this constitutes a novelty in empirical economics.
The rest of the paper is organized as follows. In Section 2 we provide some back-
ground to understand both the estimating and the testing procedures. In Section
3, we discuss the shape of Engel curves and report empirical results obtained from
the application of additive PLM. We also provide the results of testing the additive
specification as well as the linearity of each nonparametric component in additive
PLM regression. In Section 4 concludes.
2.2 Additive Partially Linear Model and Testing Hypothesis
There are many fields of empirical economics in which explanatory variables and
their second power are included in regression analysis to capture nonlinear effects;
40
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Some Practical Problems of Recent Nonparametric Procedures: Testing, Estimation and Application. Jorge Barrientos-Marín
Tesis doctoral de la Universidad de Alicante. Tesi doctoral de la Universitat d'Alacant. 2007
-
Chapter Estimating and Testing An Additive Partially Linear Model in a System of Engel Curves
In order to estimate the functions ma (xQ) we first estimate the function m (x) with
a multidimensional local smoother and then integrate out the variables different
from Xa. This method can be applied to estimate all the components, and finally
the regression function m(-) is estimated by summing an estimator ifi of tp, so we
get that:
ms(X3) = 4, + ¿ ¿ Kh [X3a - Xia) filM-Yi ¡4}
for j=l,...,n. The expression to get the estimation of each component rna (•) defined
in [4], is called the internalized marginal integration estimator (IMIE) because of
the joint density that appears under the summation sign. For a detailed explanation
see Dette, von Lieres and Sperlich (2004) and references therein. Note that IMIE
does not provide exactly the orthogonal projection onto the space of additive func-
tions. In other words, the sum of the estimated nonparametric components does
not necessarily recover the complete conditional mean because