a new estimation procedure for a partially nonlinear model via a mixed-effects approach

13
lk Canadian Journal of Statistics Vol. 35. No. 3.2007. Pages 39941 1 Lo mue cdienne de sratistique 399 A new estimation procedure for a partially nonlinear model via ‘a mixed-effects approach Runze LI and Lei NIE Key words and phmses: Nonlinear mixed-effects model; partially linear model; partially nonlinear model. MSC 2OOO: Primary 62F10. Abstract: The authors consider the estimation of the parametric component of a partially nonlinear semi- parametric regression model whose nonparametric component is viewed as a nuisance parameter. They show how estimation’can proceed through a nonlinear mixed-effects model approach. They prove that under certain regularity conditions, the proposed estimate is consistent and asymptotically Gaussian. They inves- tigate its finite-sample properties through simulationsand illustrate its use with data on the relation between the photosynthetically active radiation and the net ecosystem-atmosphere exchange of carbon dioxide. Une nouvelle procedure fondee sur une approche h effets mixtes pour I’estirnation des pararnetres d‘un rnodele partiellernent non lineaire R4sd : Les auteurs s’indressent a I’estimation de la partie paramktrique d’un modtle de regression semiparamktrique partiellement non lin6aire dont la composante non parametrique est consider& comme nuisible. Us montrent comment I’estimation est possible au moyen d’un modtle non linkaire ti effets mixtes. Ils honuent que sous certaines conditions de r6gularit6, I’estimateur propost est convergent et asympto- tiquement gaussien. lls en ktudient le comportement ti taille finie au moyen de simulations et en illustrent I’emploi ti I’aide de d o n n h concernant le rayonnement absorb6 par photosynthbse en relation avec le bilan des khanges en bioxyde de carbone entre I’atmosphbre et I’tkosysttme. 1. INTRODUCTION Parametric nonlinear regression models have been widely studied in the statistical and econo- metric literature. Many interesting examples and applications of nonlinear regression models are given in Gallant (1987) and Seber & Wild (1989). Motivated by an empirical study in ecology, we consider a partially nonlinear model Y = a(U) + g(x; P) + E, (1) where Y is the response variable, { V, x} are the associated covariates, a(. ) is an unknown smooth function, g(x; P) is a prespecified function, P is a d-dimensional parameter vector, and E is the random error with E (E) = 0 and W(E) = 0,“. When the baseline function a(V) is a constant, model (1) reduces to a parametric nonlinear regression model. Thus, model (1) can seme as a model diagnostic tool for the corresponding parametric nonlinear regression model. When g(x; P) = xTP, then model (1) reduces to the partially linear model The partially linear model has been popular in the statistical literature (Engle. Granger, Rice & Weiss 1986; Heckman 1986; Speckman 1988, and others). Hirdle, Liang & Gao (2000) gave a systematic study of partially linear models. Some estimation procedures for (2), such as back- fitting algorithms and profile likelihood approaches, can be extended to model (1). Li & Nie (2007) propose two estimation procedures for model (1) using the profile nonlinear least squares approach and the linear approximation approach. In practical implementation of these smooth- ing methods, one needs to choose a smoothing parameter. This requires an easily obtained fi-convergent initial estimate for P. Fan & Li (2004) and Fan & Huang (2005) proposed a Y =a(U)+XTP+&. (2)

Upload: runze-li

Post on 11-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

lk Canadian Journal of Statistics Vol. 35. No. 3.2007. Pages 39941 1 Lo mue c d i e n n e de sratistique

399

A new estimation procedure for a partially nonlinear model via ‘a mixed-effects approach Runze LI and Lei NIE

Key words and phmses: Nonlinear mixed-effects model; partially linear model; partially nonlinear model.

MSC 2OOO: Primary 62F10.

Abstract: The authors consider the estimation of the parametric component of a partially nonlinear semi- parametric regression model whose nonparametric component is viewed as a nuisance parameter. They show how estimation’can proceed through a nonlinear mixed-effects model approach. They prove that under certain regularity conditions, the proposed estimate is consistent and asymptotically Gaussian. They inves- tigate its finite-sample properties through simulations and illustrate its use with data on the relation between the photosynthetically active radiation and the net ecosystem-atmosphere exchange of carbon dioxide.

Une nouvelle procedure fondee sur une approche h effets mixtes pour I’estirnation des pararnetres d‘un rnodele partiellernent non lineaire R 4 s d : Les auteurs s’indressent a I’estimation de la partie paramktrique d’un modtle de regression semiparamktrique partiellement non lin6aire dont la composante non parametrique est consider& comme nuisible. Us montrent comment I’estimation est possible au moyen d’un modtle non linkaire ti effets mixtes. Ils h o n u e n t que sous certaines conditions de r6gularit6, I’estimateur propost est convergent et asympto- tiquement gaussien. lls en ktudient le comportement ti taille finie au moyen de simulations et en illustrent I’emploi ti I’aide de d o n n h concernant le rayonnement absorb6 par photosynthbse en relation avec le bilan des khanges en bioxyde de carbone entre I’atmosphbre et I’tkosysttme.

1. INTRODUCTION Parametric nonlinear regression models have been widely studied in the statistical and econo- metric literature. Many interesting examples and applications of nonlinear regression models are given in Gallant (1987) and Seber & Wild (1989). Motivated by an empirical study in ecology, we consider a partially nonlinear model

Y = a ( U ) + g(x; P ) + E , (1)

where Y is the response variable, { V, x} are the associated covariates, a ( . ) is an unknown smooth function, g(x; P ) is a prespecified function, P is a d-dimensional parameter vector, and E is the random error with E ( E ) = 0 and W ( E ) = 0,“.

When the baseline function a(V) is a constant, model (1) reduces to a parametric nonlinear regression model. Thus, model (1) can seme as a model diagnostic tool for the corresponding parametric nonlinear regression model. When g(x; P ) = xTP, then model (1) reduces to the partially linear model

The partially linear model has been popular in the statistical literature (Engle. Granger, Rice & Weiss 1986; Heckman 1986; Speckman 1988, and others). Hirdle, Liang & Gao (2000) gave a systematic study of partially linear models. Some estimation procedures for (2), such as back- fitting algorithms and profile likelihood approaches, can be extended to model (1). Li & Nie (2007) propose two estimation procedures for model (1) using the profile nonlinear least squares approach and the linear approximation approach. In practical implementation of these smooth- ing methods, one needs to choose a smoothing parameter. This requires an easily obtained fi-convergent initial estimate for P. Fan & Li (2004) and Fan & Huang (2005) proposed a

Y = a ( U ) + X T P + & . (2)

400 LI & NIE Vol. 35, No. 3

difference-based estimate as the initial estimate for p for partially linear models. However, the sampling properties of the difference-based estimate have not been systematically studied in the statistical literature. In this paper, we will develop a simple and easily implemented estimation procedure for P, and then systematically investigate the sampling properties of the proposed procedure.

In many situations, such as the example presented in Section 3.3, our primary interest is in es- timating P, and the nonparametric baseline function a(V) may be viewed as a nuisance parame- ter. This allows us to develop a simple estimation procedure for P. We first establish a connection between the partially nonlinear model (1) and a nonlinear mixed-effects model. Our approach can be viewed as an extension or improvement of the difference-based estimate by Yatchew (1997) and Fan & Huang (2001). Based on the nonlinear mixed-effects model, we propose a new estimation procedure for 0, using simple mixed-effects model techniques. The proposed esti- mation procedure corrects the finite sample approximation bias of the existing difference-based estimate (Yatchew 1997; Fan & Huang 2001) by introducing a local linear approximation. It also improves the efficiency of existing difference-based estimate by using weighted least squares and introducing a weighting scheme using a nonlinear mixed-effects model. The proposed estimation procedures can be easily implemented with existing statistical software packages. Furthermore, the proposed procedures avoid the selection of a smoothing parameter for the baseline function and do not require data analysts to acquire technical knowledge about nonparametric smoothing methods. Thus, the proposed procedures will be attractive to scientists who are already familiar with the parametric nonlinear regression methodology in their fields.

We will further demonstrate the effectiveness of the proposed estimation procedure. Root- n consistency and asymptotic normality of the resulting estimates are established. A robust standard error formula for the resulting estimate is proposed based on a sandwich formula, and it is empirically tested by Monte Car10 simulation studies. The simulation results show that our approach provides significant improvements over the previous difference-based methods of Yatchew (1997) and Fan & Huang (2001) for partially linear models. SimuIation results and a real data analysis also demonstrate the usefulness of our approach in partially nonlinear models.

The rest of this paper is organized as follows. In Section 2, we propose an estimation pro- cedure for p which employs existing estimation procedures for nonlinear mixed-effects models. In Section 3, we investigate the finite sample performance of the proposed estimator by a limited simulation study. We also illustrate the proposed methodology by analyzing of an example from a study in ecology. Proofs are in the Appendix.

2. A NEW ESTIMATION PROCEDURE FOR p Suppose that we have observations (xi, ui, yi), i = 1, . . . , n, from model (1) with ui and xi nonrandom. Without loss of generality, we assume that u1 < . . + < un. First of all, we observe thatfor i= 1, ..., 7 2 - 1 ,

Y i + l - Yi = 4 U i + l ) - 4%) + h + 1 ; P ) - g(x2;P) + ei, where the stochastic error ei = E ~ + I - ~ i . If a(u) is a smooth function of u, Yatchew (1997) suggested the use of the approximation

(3)

when ui is close to ui+l. and further advocated using ordinary least squares to estimate p. In order to obtain a better approximation, Fan & Huang (2001) proposed approximating a(ui+l) - a(ui) linearly,

(4)

and using ordinary least squares to estimate p. Although there are many existing estimation approaches for partially linear models, with the approximation (3) or (4), the unknown coefficient

4 U i + l ) - Q ( U i ) = 0,

Q ( U i + l ) - a(%) = a0 + Q I ( U i + l - Ui),

2007 PARTIALLY NONLINEAR MODEL 401

p in the partially linear models can be easily estimated by using least squares approaches without involving any smoothing techniques. The simulation results presented in Fan & Huang (2005) are encouraging. However, the sampling properties of Fan & Huang's proposal have not yet been studied. In this section, we explore this difference-based approach further for the partially nonlinear model (1).

The approximation (4) can be treated as a global linear approximation to a(u,+l ) - a(ut) , where the first derivative of a(V) is viewed as a constant. In general, the coefficient a0 and a1 in (4) may vary over the locations u, and u,+ 1. Thus, we consider a local Linear approximation,

( 5 )

In the presence of ties among observation times ux, this linear approximation to a(u,+l ) - a(u,) still holds. Some theoretic insights into the approximations (3), (4) and ( 5 ) are given in A.2 of the Appendix. Compared with the global approximation (3) and (4), the local linear approximation ( 5 ) provides a much better finite sample approximation and can therefore be used to reduce the bias. In the tails of the ux, the gap between u, and u,+1 may be large. For such cases, we would discard the u values. For example, we may trim the observations by discarding the observations with the smallest and largest 5% of the U-values.

Thus, we have the following approximation:

a(ux+l) - a(u,> = atl(ut+l - u,).

Yx+1 - Y t =:xl(U,+l -uc)+g(x,+l;p)-g(x*;p)+e*. (6)

The local linear approximation (5 ) introduces n - 1 nuisance parameters a , ~ , i = 1 , . . ., n - 1. However, when the space between 11% and ux+l is small, and under some mild conditions, a(u,+1) - a(%) is negligible. Thus, to reduce the number of nuisance parameters, we view the a,l as random variables which are independently and identically distributed with mean 0 and variance u:. See Appendix A.2 for a discussion of why the means are zero. Now model (6) becomes an approximate nonlinear mixed-effects model with the linear random effects; see Vonesh & Carter (1992), for example. Hence nonlinear weighted least squares approaches can be directly applied to partially nonlinear models.

Note that a,l is a random effect with zero mean. Then

E(Y*+l - Y t ) =9(xx+l ;P) -9 (xx;P) .

Since e l , . . . e,-l are correlated, weighted least squares should be used to incorporate the correlation structure. Denote YD = (92 - 31,. . . ,yn - ~ ~ - 1 ) ~ and = (g(x2;P) - g(x1; p), . . . g(xn;/3) - g(x,-l; p) } . Consider a general weighted nonlinear least squares problem

(7) where W is a weight matrix usually referred to as a working covariance matrix. Minimizing (7) yields a weighted nonlinear least squares estimate for p. For example, taking W to be an identity matrix, the resulting estimate corresponds to the approach of Yatchew (1997) with the approximation (3). Although all of these models, including model (6). are not the actual true model, they provide some good approximations. Based on this approximation, we expect some bias, although the bias will be decreasing and become negligible as the sample size increases. See Theorem 1 below for details. The model (6) motivates us to construct a good weight matrix W to characterize the variance and covariance matrix of the e,. In other words, define W-' to be (wxj In-- I ,,- 1, and take

T

S(P) = {YO - ~ D ( P ) ) ~ w { Y D - SD(P) ) ,

402 LI 8 NIE Vol. 35, No. 3

We further assume that a,1 and are independent and normally distributed. In our implementa- tion, we set the initial value for p to be the minimizer of S(p) with W = I, and iterate between the following two steps until convergence is obtained:

Step 1. Estimate the variance parameters by minimizing S(p) t log{det (W)},

Step 2. Estimate by minimizing S(p) with variance parameters replaced by their estimates obtained in Step 1.

It is worth noting that the choice of W affects the efficiency of B, but not its 6-consistency.

squares estimator are given by Theorem 1. Define Po to be the true value of p. The fi-consistency and asymptotic normality of the resulting weighted nonlinear least

THEOREM 1. Consider model (1) and suppose that Conditions (A)-(F), given in the Appendix, hold. With probability tending to one as n - + 00, there exists a minimizer p̂ of S(p) , such that llp - poll = Op(n- ' / * ) as n + 00, andfurther,

&(fi - Po) 2 N(0, C-'C*C-'),

where " + " stands for convergence in distribution and where C and C' as given in the Ap- pendix.

D

The matrices C and C' depend on the unknown parameter Po and cannot be directly used to calculate the standard error of the resulting estimate. Following conventional techniques, we estimate the standard error using the sandwich formula. Denote by S'(p) and S"(p) the gradient vector and the Hessian matrix of S(p), respectively. The covariance matrix of 9can be estimated by the following:

(8)

Using this estimate, we can further construct an asymptotic confidence interval for each &. In our simulation studies in Section 3, the asymptotic confidence interval provided us a correct coverage probability with moderate sample sizes.

In Section 3 we will empirically compare the efficiency of the proposed estimate with that of the profile likelihood estimate under the framework of partially linear models. From our simulation studies, the proposed estimate appears almost as efficient as the profile likelihood estimates.

{ S,,( p̂ ) } - 1 CTV { S@)} { S"(p^)} - 1.

3. NUMERICAL STUDY AND APPLICATION In Sections 3.1 and 3.2. we assess the finite sample performance of the proposed estimator by Monte Carlo simulations. In Section 3.3, we further illustrate the proposed methodology by an application to a real data example. All simulations were conducted by using the SAS Code, which is available upon request.

3.1. Cornparkon of efficiency. We first conduct some comparisons of the proposed estimate in terms of efficiency. For the partially linear model

the profile likelihood approach proposed in Severini & Staniswalis (1994) yields a semipara- metric efficient estimator for p as described in Bickel, Klaasen, Ritov & Wellner (1993). The proposed estimation procedure for p in Section 2 can be applied directly to the partially linear model. This allows us to compare, using a Monte Carlo simulation study, the efficiency of the resulting estimate with the profile likelihood approach. In our simulation, we generate E from the N(0,l) distribution. For the covariates {U, x}, we consider two scenarios:

Y = a ( U ) + xTp + E ,

2007 PARTIALLY NONLINEAR MODEL 403

(A) (U and 2 are independent). We generate U from U ( 0 , l), the uniform distribution over [0,1], and the covariate vector x = (21, ~ 2 ) ~ is simulated from a normal distribution with mean zero and cov(q, q) = .5l*-jl;

(B) (U and 2 are correlated). We first generate ( 2 1 , 2 2 , 2 3 ) from 3-dimensional standard normal distribution, and then we set 21 = (21 + z ~ ) / f i . 2 2 = 22 , and U = ‘P{ ( 2 2 + 2 3 ) / f i } ,

where 9( . ) is the cumulative distribution function of the standard normal distribution. Thus, U follows a U(0 , l ) distribution, and is correlated with ( 2 1 , ~ ) whose marginal distributions are the standard normal. The correlation between 2 1 and U approximately equals 0.489 and the correlation between 22 and U is approximately 0.691.

We note that this simulation setup does not entirely match the assumptions of Theorem 1. This is because { ut, xi} do not constitute a fixed design, contrary to the assumption of the theorem.

We took the following two baseline functions:

a l ( ~ ) = sin(^^^), a 2 ( ~ ) = u2, and set ,& = PZ = 1. We took sample sizes n = 50 and n = 200. For each case, we conducted lo00 Monte Car10 simulations, and for each simulation, we generated a different set of covariates. For the profile likelihood approach, we used local linear fitting to estimate the baseline function. The simulation results are summarized in Table 1, in which “Mean” stands for the average of the lo00 estimates of pj , and “ S D is the standard deviation of these lo00 estimates and can be regarded as the true standard error of BJ. From Table 1, we can see that the proposed estimation procedure for p is almost as efficient as obtained from a semiparametric efficient estimator.

TABLE 1 : Efficiency comparisons.

Profile New Yatchew FH 0 n a(V) Mean(SD) Mean(SD) Mean(SD) Mean(SD)

(V, x) independent P1 50 ~i

02 50 a1 01 2(30 0 1

L32 200 a1

PI 50 a 2

0 2 50 a 2

P1 200 a2 0 2 200 a2

0.987(0.148) 1.002(0.l45) 0.984(0.149) 1 .OOO(O. 155) 0.9%(0.072) 1.000(0.074) 0.995(0.073) 0.999(0.073)

0.996(0.143) 1.004(0.l40) 0.993(0.151) I.OOO(0.150) 0.997(0.072) 1 .OOO(0.073) 0.997(0.072) 0.999(0.073)

(V, x) correlated 0.974(0.219) 1.005(0.207) I .018(0.245) 0.995(0.260) 0.994(0.105) 1.000(0.l05) 1 .OO5(0. 125) I.Ol5(0.131)

0.981(0.214) 1 .OOO(O. 196) 0.993(0.236) 1.009(0.241) 0.996(0.104) 0.999(0.104) 0.997(0.122) 1 .O 19(0.127)

1.005(0.180) 1 .005(0. 180) 0.997(0.180) 0.99q0.180) 1 .OOO(0.086) 1.000(0.087) 1.002(0.086) 1.002(0.087)

1.004(0.18 1) I .004(0.18 1) 0.996(0.182) 0.996(0.182) 1 .OOO(0.086) 1 .OOO(0.087) I .002(0.087) 1.002(0.087)

0.985(0.260) 0.986(0.263) 1.016(0.297), 1.015(0.300) 1.015(0.124) 0.948(0.124) 0.949(0.151) 0.93q0.152)

0.985(0.261) 0.986(0.263) 1.014(0.297) 1.010(0.300) 1.023(0.124) 0.954(0.124) 0.949(0.151) 0.933(0.152)

404 LI 8 NIE Vol. 35, No. 3

Note that the standard deviations in the simulation study are unconditional standard devi- ations, while the empirical standard deviations can be regarded as true standard errors. The proposed estimation procedure is to minimize the weighted nonlinear least squares (7). A natural question arising here is how much efficiency would be lost if one simply used either the approx- imation in (3) proposed in Yatchew (1997) or the global linear approximation in (4) proposed in Fan & Huang (2001). To address this question, we conducted a further 1000 simulations, with data being generated from the partially linear model. The simulation results are also sum- marized in Table 1, in which columns labeled “Yatchew” and “FH” stand for the proposal of Yatchew ( 1997) and Fan & Huang (200 1). respectively. According to Table 1, the performances of these two proposals are similar, and the proposed mixed-effects approach is more efficient than the methods of Yatchew and Fan & Huang. For example, their relative efficiency for jl with R = 50 is 1.541 = (0.1S0/0.145)2. Thus, the proposed mixed-effect approach gains about 54% efficiency. Furthermore, the proposed approach is less biased than the methods of Yatchew and Fan & Huang, as we use local linear approximation to ct(U). The results for other cases are similar.

TABLE 2: Finite sample performance of 6.

Mean SE CP Mean SE CP Mean SE CP ~ ~~~

(U, x) independent

81 = 18 3 2 = 0.8

19.14 3.79 0.93 0.91 0.41 0.92 19.13 3.69 0.93 0.92 0.40 0.91 18.30 1.45 0.95 0.83 0.16 0.95 18.27 1.41 0.94 0.83 0.15 0.95

81 1 Pz = 1 1.01 0.12 0.94 1.00 0.12 0.95 1.01 0.12 0.94 1.00 0.12 0.95 1.00 0.06 0.94 1.00 0.06 0.95 1.00 0.06 0.94 1.00 0.06 0.95

(U, x) correlated

P1 = 18 = 0.8 19.77 4.73 0.95 0.97 0.49 0.93 17.95 3.91 0.94 0.85 0.41 0.90 18.34 1.70 0.96 0.84 0.18 0.97 17.93 1.54 0.94 0.80 0.16 0.94

P1 = 1 pz 1 1.00 0.14 0.95 0.97 0.17 0.94 1.02 0.15 0.95 1.06 0.18 0.92 1.00 0.07 0.94 1.00 0.08 0.95 1.00 0.07 0.94 1.01 0.08 0.94

mz = 1 0.91 0.20 0.87 0.92 0.19 0.86 0.97 0.10 0.93 0.98 0.10 0.92

u2 = 1 0.90 0.20 0.85 0.92 0.19 0.86 0.97 0.10 0.93 0.98 0.10 0.92

u2 = 1 0.92 0.21 0.87 0.93 0.21 0.89 0.98 0.10 0.92 0.98 0.10 0.92

= 1

0.92 0.21 0.87 0.92 0.20 0.87 0.97 0.10 0.92 0.98 0.10 0.92

3.2. Performance of 6. Now we access the finite sample performance of the proposed estimator. In our simulation, we generated random samples from the model

Y = a ( U ) + g(x; p) t E ,

2007 PARTIALLY NONLINEAR MODEL 405

where 5, U , x and a ( U ) are as in Section 3.1. We consider two nonlinear functions y(. ; . ):

with 01 = 18 and 3 2 = 0.8 (close to the estimate for the real data example in Section 3.3); and

with D1 = /92 = 1. Again, the sample sizes used were 11 = 50 and n = 200. For each case, we conducted lo00 Monte Carlo simulations, and for each simulation, we generated a different set of covariates.

The simulation results for p^ are summarized in Table 2, in which “Mean” stands for the average of the 1000 estimates of !3, and “SE” is the average of the 1000 standard errors using the sandwich formula (8). We computed the coverage probability (CP) of a 95% confidence interval. From Table 2, we sec that the bias of 3 is very small. The coverage probability of a 95% confidence interval of Pj is very close to 0.95, which indicates that the sandwich formula performs well with moderate sample sizes. However, the coverage rate of a 95% confidence interval of o2 seems to be less than 0.95 when n = 50 since b2 is a slight undercstimate for 02.

Bias correction techniques might be helpfbl when the sample size is small.

3.3. Application. We illustrate the propsed methodology through an analysis of a data set from ecology. Of interest in this example is the study of how temperature affects the relationship between the response ofnet ecosystematmosphere exchange of C02 (NEE) and the photosynthetically active radiation (PAR). This data set consists of 1997 observations of NEE, PAR and temperature (T), and was collected over a subalpine forest at approximately 3050 meters elevation, by using three- dimensional sonic anemometers on hundreds of meter towers during parts of the growth season of 1999.

For data collected from laboratory experiments in which climate variables, such as tem- perature and moisture availability, can be well controlled, the following empirical NEE-PAR relationship

N E E = R - plpAR +&. P A R + k (9)

has been applied widely since rcmote sensing data became available (see, for instance, Monteith 1972, and more recent work by Ruimy, Kergoat, Bondeau et al. 1999, and references therein).

The temperature for a ~ t u r a l ecosystem Cannot be controlled, and the parameter R most likely depends on the temperature. To examine how the temperature affects the NEE-PAR re- lationship, we take NEE as the response variable, and PAR and T as covariates, and consider a fully nonpamnetric regression model

NEE = m(PAR, T) + E .

where 1 7 4 . . ) is an unspecified smoothing function, and E is a random e m r with inean 0 and variance 02. Twdimensional kemel regression was used to estimate the regression m( , . ). To examine how temperature af€ects the parameters in Model (9). we plot r i z ( P A R . T) versus PAR for given values of T. The lines in Figure l(a) depict the plot of +(PAR, 1‘) over PAR for T = 10.76,13.29 and 15.41, which correspond to the three sample quartiles of T. From Figure I(a), we can see the nonlinear relationship between PAR and NEE when the temperature is fixed. The parallel pattern of the three lines suggests that the partially nonlinear model at (10) below may be appropriate. Evidenced fiom Figure I(a), h ( 0 , T), the regression function &(PAR, T) at PAR = 0, varies over temperature. This implies that R in model (9) likely depends on the temperature.

406 LI 8 NIE Vol. 35, No. 3

-6-

PAR

FIGURE 1 : (a) Left panel: Plot of regression functions of NEE over PAR whcn temperature T is given at the 1st sample quartile (10.76"C). median (13.29"C) and the 3rd sample quartile (15.41"C) of observed

values of temperature. (b) Right panel: Plot offi(1'). The solid line is an estimate of the baseline function, and the dots are partial residuals: y r - ijlr,/(r, + 22).

We next fit the data by a partially nonlinear model

3 1 PAR PAR + 4 NEE = R(T) - + E . (10)

The proposed estimation procedure for 0 is used to fit the data set using model (10). It yields 3, = 17.78 with standard error 0.414, and 3* = 0.832 with standard error 0.0724. The estimate of 0: is 12.38 (0.41).

To estimate the baseline fiuiction, we define partial residuals as

&PAR, PAR, + /j.L.

E , = NEE, -

This yields a synthetic nonparametric model with a single covariate

e, = R(T,) + 5,. The estimation of I?(?') can be easily carried out by smoothing the partial residuals over temper- ature by various existing one-dimensional nonparametric smoothing procedures, such as spline smoothing and local polynomial regression. Here we directly apply the SAS procedure LOESS to estimate R(T) with smoothing parameter s = 0.4. The resulting estiniate is depicted in Figure l(b), from which we can see that the partial residuals have an increasing trend over tem- perature. The residual sum of squares is 24846.00.

If R ( T ) does not depend on the temperature T, model (10) reduces to model (9). Thus, it is of interest to cornEare the fits using models (9) and (10). Using nonlinear least squares, for model (9) we obtainR = 3.0277, ;jl = 14.0113 and ,& = 0.6272. The residual sum of squares is 32270.15. Compared with model (9). model (10) dramatically reduces the residual sum of squares, although model (10) may use more degrees of freedom for R(?').

It is of scientific interest to test whether R(1') really varies with temperature. This can be fonnulated as a nonparametric hypothesis testing problem:

3-10 : R ( T ) = Ro versus 3-11 : R(T) # Rn (1 1)

for sonic unknown constant Ro. Motivated by the F-test in linear regression models, we sliould conipare the residual sum of squares under 7-l" and under 'HI. We next extend an F-test for (1 1).

2007 PARTIALLY NONLINEAR MODEL 407

Denote by RSS('H0) and RSS('H1) the residual sums of squares under R0 and under 'HI, respec- tively. Note that the traditional F-test is not well defined since the dimension of the parameter space under 3-10 is finite, while it is infinite under 'HI. Define a generalized F-test statistic

I.'= RSS('Ho) - RSS('H1) TZ- RSS( 'Hi )

Following Cai, Fan & Li (2000), we may calculate the critical values for the generalized F- test by bootstrapping the residuals and considering (ut, x,) as fixed. The residuals used for the bootstrap are f , = y, - 6 ( U , ) - g(x,; 6). where both &(U,) and @ are estimated under 'HI. In this example, the observed P'-value is

32270.15 - 24846.00 F = = 596.7 24846.00/1997

with P-value < 0.001 obtained by using IOOO bootstraps. Thus, R(T) depends on temperature. We also applied the generalized F-test for testing the hypothesis 3-1" : R(T) is a linear h c t i o n of T, versus H1 : R(T) is not a linear hc t ion . The RSS('H0) = 225510.88 and the resulting F value is 22.69 with P-value < 0.001 obtained by using 1000 bootstraps. We firther employ the generalized F-test for hypothesis 7-i" : R(T) is a quadratic polynomial function of T, versus 7 i 1 : R(1') is not quadratic. The RSS('H0) = 25143 and the corresponding k' value is 7.60 with P-value 0.006 obtained by using lo00 bootstraps. This implies that R(1') is not quadratic at the level of significance 0.05. It is of interest to investigate how well the bootstrap procedure approximates the null distribution of the F-statistic, but it is beyond the scope of this paper.

4. CONCLUDING REMARKS

We have proposed a simple and effective estimation procedure for the partially nonlinear model, using a weighted nodinear least squares approach. The proposed estimation procedures can easily be implemented using existing statistical software packages. Furthermore, the proposed procedures avoid the selection of a smoothing parameter for the baseline h c t i o n and do not require data analysts to have technical knowledge of nonparametric smoothing methods. Thus, the proposed procedures should be atuactive to scientists who are already familiar with nonlinear regression models such as (9).

Our approach generalizes the proposals in Yatchew (1997) and Fan & Huang (2001) in two aspects: (1) we use the weighted least squares approach to the differenced data, rather than ordinary least squares; (2) we use a local approximation to the differenced baseline function. The fust of these explains why our approach outperfom the methods in Yatchew (1 997) and Fan & Huang (2001). When the weight matrix W is selected appropriately, weighted least squares can be far more efficient than ordinary least squares when the correlation among data is high. as in the difference-based method, in which two consecutive differences of the y w i + l - yi and yI - yi- 1-are highly correlated with correlation about 0.5. As pointed out by the Associate Editor, if the ordered ys were generated by an independent increments process, then the method in Yatchew (1997) or Fan & H u n g (2001) would presumably be better than the newly proposed method. In this paper, we consider the settings in which the random errors are independent and identically distributed. All the assumptions made in the theorem are important, but in some situations they could be violated. For example, the proposed method might not work well when the random errors are dependent. This may occur when U represents time, and data are collected over time. Further study is needed for situations with dependent errors.

408 LI & NIE Vol. 35, No. 3

APPENDIX

A. 1 . Proof of Theorem 1.

Before proving Theorem 1, we present the required regularity conditions. Let p = (131.. . . . P , I ) ~ , E D = ( e l . . . . . e ,L - l ) , and O D = { n ( u z ) - a(u1). . . . . a(u,) - c r ( t i r L - 1 ) } .

Denote by &(p) the (n - 1)-column vector a9&3)/a!3, and by y;(p) the ( n - 1)-column vector azggo(p)/a0,8/3,. Define gb(p) to be the ( n - 1) x d matrix with ith column gb(P).

REGULARITY CONDITIONS.

(A) The random errors E, are independent and identically distributed with mean zero and fi- nite positive variance a:. Let p E t?, which is assumed to be a closed, bounded subset of IR". The first and second derivatives of go@) exist and are continuous for all p in a neighbourhood of PO.

(B) There exists a constant matrix C(p) such that

tz-19b(p)T~f'gb(P) - C(P)

unifomily in /3 for p in a neighbourhood of Po as 71 -+ cc. Furthermore, denote C to be C(p,), which is assumed to be positive definite.

(C) There exists a constant, positive definite matrix such that

uniformly in p for ,f3 in a neighbourhood of Po as 72 4 OG, where W,, is the true variance- covariance of E D .

(D) n- lyb(P)Tgb(P), i = 1,. . . . d, and r ~ - ~ g $ ( P ) ~ g g ( , B ) , i . j = 1,. . . ,d, are uniformly bounded and unifomily convergent in p for p in a neighbourhood of Po as ri 4 o.

(E) The largest eigenvalue of IV is bounded.

(F) The baseline fiinction &( . ) has a continuous second derivative and is Lipschitz continuous on a, and u,+1 - u, = O ( I Z - ~ / ~ ) .

Conditions (AHD) are rypical regularity conditions for consistency and asymptotic normal- ity of nonlinear least squares estimators (see Jennrich 1969 and Seber & Wild 1989). Condi- tion (A) corresponds to Conditions A(lFA(6) in Seber & Wild (1989, ch. 12); Conditions (B) and (C) are adapted respectively from Conditions A(7) and A(8) of Seber & Wild (1989, ch. 12). All these conditions are mild regularity assumptions. Conditions (B) and (C) guarantee that the sandwich estimate is positive definite. Condition @) imposes some control over the first and second order derivatives of the objective function S(p). Condition (E) is a natural restriction of the working covariance matrix. Condition (F) is a new condition additional to previous regu- larity conditions, and basically makes sure the difference is negligible. This condition plays an important role in the proof. If this condition is violated, the least squares estimate obtained by maximizing S(p) is biased.

Proof of Z't'ieorwr 1. We first establish the 6-consistency of @. It suffices to show that for any given 77 > 0, there exists a constant C such that

2007 PARTIALLY NONLINEAR MODEL 409

This implies that, with probability at least 1 - v, there exists a local minimizer in the ball {Po + t/f i : lltll 5 C}. Hence, there exists a local minimizer such that IIP - Poll =

h

OP ( 1 / f i ) . Using a Taylor expansion,

S(P0 + t / 6 ) - S(P0) = n-’/2tTS’(po) + 1 n-ltTS”(p*)t I1 + 1 2 , 2

where P’ lies between P and Po, S’(P) and S”(P) are the gradient vector and the Hessian matrix of S(P), respectively. It is enough to show that I2 dominates 11 for a sufficiently large C. Let us calculate the order of 11 first. Note that

~ ’ ( P O ) = -2gb(P0)~WcD - 2gb(Po)TWaD. (13)

By the Cauchy-Schwartz inequality,

19b(Po)TWaD I 5 {9h (Po)TWgb(Po)} ‘I2 {aLWaD } ‘ I 2

for i = 1,. . . d. Thus, under Conditions (B), (D), (E) and (F), the second term in (13) is of order o p ( f i ) . Note that E ( E D ) = 0. Under Condition (B) the first term in (13) is of order O p ( f i ) . Thus, n-’I2S’(P0) = Op(1).

Next we deal with 12 . Let P = Po + t/f i . The (i, j)th element of the d x d matrix S”(j3) is

2g;(P)TW&D + 2&(P)TWaD + 2&(P)TWgb(P) + 2&(P)TW{gU(Po) - gD(P)}.

Under Condition (D), it follows from Jennrich (1969, Th. 4) that

n-’gz(P)TWED --t 0

uniformly for P in a neighbourhood of Po. Under Conditions (DHF), and using the Cauchy- Schwartz inequality, we have

n-’gg(P)’ WaD --t 0 and n-’gz(P)TW{gD(Po) - gD(P)} -+ 0,

uniformly for P in a neighbourhood of Po. Therefore by Condition (B),

n-’S”(P*) 4 2C (14)

in probability. Note that C is positive definite. If we choose a sufficiently large C, then 12

dominates 1’ . Thus, (12) holds. Let Si(P) denote the j th component of S’(P), and Sy(P) be the j th row of S”(P). Using a

Taylor expansion, for j = 1 . . . d,

(15) 0 = S;(a = s;(Po) + s;(P;)(P - P O L

.-*sp;> 4 2C,,

where P,’ lies between p^ and Po. Using (14).

in probability, where C, is the j th row of C. Using (12) together with (13 ,

2n1/2C{1 + o p ( l ) } ( @ - Po) = 2n-”2gb(Po)TW&D -k op(1).

Under Conditions (C)-(F),

n-’ /2gb(P~)TW&~ 3 N(0, c*).

41 0 LI 8. NIE VOl. 35, No. 3

Finally, by Slutsky’s Theorem and (14),

A.2. Theoretic insights into approximation in difference-based estimation methods.

In this section, we provide some insights into the difference-based approximations. From Con- dition (F), the baseline function a(.) is smooth, and u,+1 - u, = o ( l / f i ) . Thus, for some u,~) E (u,, u,+ 1 ) we can make the following approximations:

and

Then

4 1 4 = 4 2 1 2 0 ) + Q’(U,O)(% - %).

4% 1) - a(%> = Q ’ ( ~ * O ) ( % , 1 - 4 = O p ( l / J ; E ) (16)

according to Condition (F). This explains why the approximation in (3) proposed by Yatchew (1997) holds and results in a fi-consistent estimate, as demonstrated in our simulations. The approximation (4) proposed by Fan & Huang (2001) can be derived in a similar way. From (16), it follows that

a(uz+l)-a(uz) = a l ( U t + l -uz)+(a’(21,0)--(1.1)(11,+1 -%) x a l ( u ~ + l -%)+op(1 / f i ) ,

where a1 = follows. From (16),

a’(u) du is a global constant. The local approximation in ( 5 ) can be derived as

This yields the local approximation ( 5 ) :

In an earlier version of this paper, we considered another local approximation

a(u,+1) - 4%) = a t0 + Q l i ( U i + l - 21,). (17)

This approximation attempts to further reduce approximation bias by introducing an extra para- meter aio. However, from our simulations, it appears that (17) has a similar performance to the local approximation (5 ) .

ACKNOWLEDGEMENTS The authors are grateful to the Editor, the Associate Editor and two anonymous referees for their con- structive comments and suggestions, which greatly improved the quality of the paper. In particular, the discussion in Section A.2 should be credited to an anonymous referee. The authors also want to thank Dr C. Yi for providing data analyzed in Section 3.3. Li’s research was supported by two NSF grants.

PARTIALLY NONLINEAR MODEL 41 1

REFERENCES P. J. Bickel, A. J. Klaassen, V. Ritov & J. A. Wellner (1993). Emient andAdaptive Inference in Semipara-

Z. Cai, J . Fan & R. Li (2000). Efficient estimation and inferences for varying-coefficient models. Journal

R. F. Engle, C. W. J. Granger, J. Rice & A. Weiss (1986). Semiparametric estimates of the relation between

J. Fan & L. Huang (2001). Goodness-of-fit test for parametric regression models. Journal ofthe American

J. Fan & T. Huang (2005). Profile likelihood inferences on semiparametric varying-coefficient partially

J. Fan & R. Li (2004). New estimation and model selection procedures for semiparametric modeling in

A. N. Gallant (1987). Nonlinear Statistical Models. Wiley, New York. W. H;lrdle, H. Liang & J. Gao (2000). Partially Linear Models. Springer, New York. N. Heckman (1986). Spline smoothing in partly linear models. Journal of the Royal Staristical Society

R. 1. Jennrich (1969). Asymptotic properties of nonlinear least squares estimators. The Annals ofMathe-

R. Li & L. Nie (2007). Statistical inferences on partially nonlinear models and their applications. Submitted

J. L. Monteith (1972). Solar radiation and productivity in tropical ecosystems. Journal ofApplied Ecology,

A. Ruimy, L. Kergoat, A. Bondeau & The Participants of The Potsdam NPP. (1999). Model Intercompari- son, Comparing global models of terrestrial net primary productivity (NPP): Analysis of differences in light absorption and light-use efficiency, Global Change Biology, 5,56-64.

metric Models. Johns Hopkins University Press, Baltimore.

of the American Statistical Association, 95,888-902.

weather and electricity sales. Journal of the American Statisticul Association, 8 1,3 10-320.

Statistical Association, 96,640-652.

linearmodels. Bernoulli. 1 I , 1031-1057.

longitudinal data analysis. Journal of the American Statistical Association, 99,710-723.

Series B, 48,244248.

matical Statistics, 4 0 , 6 3 3 4 3 .

for publication.

9,747-766.

G. A. F. Seber & C. J. Wild (1989). Nonlinear Regression. Wiley, New York. T. A. Severini & J. G. Staniswalis (1994). Quasi-likelihood estimation in semiparametric models. Journal

F? Speckman (1988). Kernel smoothing in partial linear models. Journal ofthe Royal Statistical Society

E. F. Vonesh & R. L. Carter (1992). Mixed effects nonlinear regression for unbalanced repeated measures.

A. Yatchew (1 997). An elementary estimator for the partial linear model. Econometric Letters. 57, 135-

of the American Statistical Association, 89,501-5 1 1 .

Series B, 50,4 13-436.

Biometrics, 68, 1-17.

143.

Received 5 August 2005 Accepted 22 March 2007

Runze LI: rliOstat.psu.edu Deparrmenr of Statistics

Pennsylvania State University University Park, PA 16802, USA

Lei N E : In540georgetown.edu Department of Biostatistics, Bioinfomtics, and Biomathematics

Georgetown University Washington, DC 20057, USA