penalized function-on-function regressionstaicu/papers/pffr_cost_2014_11_v0.pdfpffr function of the...

Penalized Function-on-Function Regression

Andrada E. Ivanescu · Ana-Maria Staicu ·Fabian Scheipl · Sonja Greven

Received: November 12, 2014

Abstract A general framework for smooth regression of a functional response onone or multiple functional predictors is proposed. Using the mixed model represen-tation of penalized regression expands the scope of function-on-function regressionto many realistic scenarios. In particular, the approach can accommodate a denselyor sparsely sampled functional response as well as multiple functional predictorsthat are observed on the same or different domains than the functional response,on a dense or sparse grid, and with or without noise. It also allows for seam-less integration of continuous or categorical covariates and provides approximateconfidence intervals as a by-product of the mixed model inference. The proposedmethods are accompanied by easy to use and robust software implemented in thepffr function of the R package refund1. Methodological developments are general,but were inspired by and applied to a Diffusion Tensor Imaging (DTI) brain trac-tography dataset.

Keywords functional data analysis · functional regression model · mixed model ·multiple functional predictors · penalized splines · tractography data.

Andrada E. IvanescuDepartment of Mathematical Sciences, Montclair State University, tel: +1-973-655-7233, fax:+1-973-655-7686, E-mail: [email protected]

Ana-Maria StaicuDepartment of Statistics, North Carolina State University, E-mail: [email protected]

Fabian ScheiplDepartment of Statistics, Ludwig-Maximilians-University Munich,E-mail: [email protected]

Sonja GrevenDepartment of Statistics, Ludwig-Maximilians-University Munich,E-mail: [email protected]

1 The refund package is available from CRAN at the following website:http://CRAN.R-project.org/package=refund

http://CRAN.R-project.org/package=refund

2 Andrada E. Ivanescu et al.

1 Introduction

Research in functional regression, where the responses and/or predictors are curves,has received great attention recently: see, e.g., Ramsay and Silverman (2005, Ch12); Ferraty and Vieu (2006, 2009); Aneiros-Perez and Vieu (2008); Horvath andKokoszka (2012, Ch 8). However, the lack of working statistical inferential toolsthat are flexible enough for many settings encountered in practice and that arefully implemented in software, is a serious methodological and computational gapin the literature. In this paper we propose an estimation and inferential frame-work to study the association between a functional response and one or multiplefunctional predictors in a variety of settings. Specifically, we introduce penalizedfunction-on-function regression (PFFR) implemented in the pffr function of the R

(2014) package refund (Crainiceanu et al. 2014). The proposed framework is veryflexible and accommodates 1) multiple functional predictors observed on the sameor different domains than the response, and in various realistic scenarios such aspredictors and response observed on dense or sparse grids, as well as 2) linear andnonlinear effects of multiple scalar covariates.

The paper considers linear relationships between the functional response andthe functional predictor, so that the effect of the predictor is expressed as theintegral of the corresponding covariate weighted by a smooth bivariate coefficientfunction. Function-on-function linear regression models are well known; see, e.g.,Ramsay and Silverman (2005, Ch 16), and Horvath and Kokoszka (2012, sec 8.3).

Most of the work in this area (e.g. Aguilera et al. 1999; Yao, Muller and Wang2005b; He, Muller, Wang and Wang 2010; Wu, Fan and Muller 2010; Horvath andKokoszka 2012, sec 8.3) considers regression with a single functional predictor, orwith an additional lagged response functional explanatory process, see Valderramaet al. 2009, and is based on an eigenbasis expansion for both the functional predic-tor and the corresponding functional coefficient. While these approaches are intu-itive, they are also subject to two subtly dangerous problems. First, estimating thenumber of components used for the expansion of the functional predictors is knownto be difficult, and, as noticed in Crainiceanu, Staicu and Di (2009), the shape andinterpretation of the functional parameter can change dramatically when one in-cludes one or two additional principal components. This problem is exacerbated bythe fact that eigenfunctions corresponding to smaller eigenvalues tend to be muchwigglier in applications. Second, the smoothness of the functional parameter is in-duced by the choice of basis dimension of the functional predictor. This may leadto strong under-smoothing when the functional parameter is much smoother thanthe higher order principal components. Furthermore, principal component-basedfunction-on-function regression methods given in the literature currently do notincorporate a large number of functional predictors or effects of additional scalarcovariates. While such an extension, at least for linear effects of scalar covariates,might be possible using the hybrid principal component approach of Ramsay andSilverman (2005, Ch. 10), the quality of estimation, optimal scaling of variablesand scalability to a large number of functional and/or scalar covariates wouldcertainly need further research. Penalized approaches with basis functions expan-sions for function-on-function regression (e.g. Ramsay and Silverman 2005, ch 14,ch 16; Horvath and Kokoszka 2012, p.130) and their implementations are not cur-rently developed to the fullest generality, focusing on one functional predictor, noscalar covariates (Matsui et al. 2009), or on the concurrent model, for a setting

Penalized Function-on-Function Regression 3

where both variables are observed over the same domain (e.g. Ramsay and Sil-verman 2005, ch 14). Regression models for functional responses with non-lineareffect of functional predictor have been considered in Ferraty, Laksaci, Tadj andVieu (2011) and Ferraty, Van Keilegom and Vieu (2012). While such modelingapproaches are more general in how they account for the effect of the functionalpredictor, it is not clear how to extend them to account for additional functional orscalar predictors. A current work (Fan et al. 2014) discusses a functional responsemodel with scalar covariates and a single index model for the functional covariates.This is an interesting approach focusing on prediction, though not on inference forthe estimated effects. Kadri et al. (2011) consider a nonparametric approach withboth functional and scalar covariates based on reproducing kernels, but this shortproceedings paper gives no details on implementation, choice of kernel, selectionof smoothing parameter, simulations or applications, it does not consider inferencein addition to prediction, nor does it provide software. Most of the discussed ap-proaches are furthermore limited to the case of functional responses and functionalcovariates not being sparsely observed.

We propose a novel solution for the linear function-on-function regression set-ting. Our approach is inspired by the penalized functional regression (PFR) inGoldsmith et al. (2011), developed for the simpler case of scalar on function regres-sion. It uses basis function expansions of the smooth coefficient function/s, basedon pre-determined bases, and applies quadratic roughness penalties to avoid over-fitting. The smoothness of the functional coefficients is controlled by smoothingparameters, which are estimated using restricted maximum likelihood (REML) inan associated mixed model. Estimation of the functional coefficients is carried outunder the working assumption of independent errors.

To the best of the authors’ knowledge, the developed pffr function is thefirst publicly available software for fitting function-on-function regression modelswith two or more functional predictors and additional scalar covariates. The fda

package (Ramsay, Wickham, Graves and Hooker 2014) in R includes functionslinmod and fRegress for function-on-function regression. However, linmod (seeRamsay, Hooker and Graves, 2009, Ch. 10.3) cannot handle multiple functionalpredictors or linear or non-linear effects of scalar covariates. Also, the fRegress

function is restricted to concurrent associations, where response and predictors areobserved on the same domain. The PACE package (Yao, Muller, and Wang 2005b)in Matlab (2014) does not accommodate multiple functional predictors, nor linearor smooth effects of additional scalar covariates.

The main contributions of this paper thus are the following: 1) It developsa modeling and estimation framework for function-on-function regression flexi-ble enough to accommodate multiple functional covariates, functional responsesand/or covariates observed on possibly different non-equidistant or sparse grids,as well as smooth effects of scalar covariates, 2) it describes model-based andbootstrap-based confidence intervals, 3) it provides an open source implementa-tion in the R package refund and describes the implementation in detail, 4) it teststhese methods in an application and in extensive simulations, including realisticscenarios, such as covariates or responses observed sparsely and with noise.

The remainder of the article proceeds as follows: We present our PFFR method-ology in Section 2. Section 3 describes the software implementation of this methodin detail. The performance of PFFR in simulations is discussed in Section 4. Sec-


tion 5 presents an application to the motivating Diffusion Tensor Imaging (DTI)tractography dataset. Section 6 provides our discussion.

2 Mixed model representation of function-on-function regression

2.1 Function-on-function regression framework

While the approach is general, the presentation is focused on the case of twofunctional predictors, which is also relevant in our DTI application in Section5. Let Yi(tij) be the functional response for subject i measured at tij ∈ T , aninterval on the real line, 1 ≤ i ≤ n, 1 ≤ j ≤ mi, where n is the number of curvesor subjects and mi is the number of observations for curve i. We assume that the

observed data for the ith subject are [{Yi(tij)}j , {Xi1(sik)}k, {Xi2(riq)}q,W i],where {Xi1(sik) : 1 ≤ k ≤ Ki} and {Xi2(riq) : 1 ≤ q ≤ Qi} are functionalpredictors and W i is a p×1 vector of scalar covariates. Furthermore, it is assumedthat {Xi1(sik)} and {Xi2(riq)} are square-integrable, finite sample realizations ofsome underlying stochastic processes {X1(s) : s ∈ S} and {X2(r) : r ∈ R}respectively, where S and R are intervals on the real line. We start with thefollowing model for Yi(tij):

Yi(tij) = W tiγ + β0(tij) +

∫Sβ1(tij , s)Xi1(s) ds+

∫Rβ2(tij , r)Xi2(r) dr+ εi(tij),

(1)where the mean function is modeled semiparametrically (Ruppert, Wand and Car-roll 2003; Wood 2006) and consists of two components: a linear parametric functionW t

iγ to account for the covariates W i, whose effects are assumed linear and con-stant along the interval T , where γ is a p-dimensional parameter, and an overallnonparametric function β0(t). The effect of the functional predictors is capturedby the component

∫S β1(tij , s)Xi1(s) ds +

∫R β2(tij , r)Xi2(r) dr. The regression

parameter functions, β1(·, ·) and β2(·, ·), are assumed to be smooth and squareintegrable over their domains. The errors εi are zero-mean random processes andare uncorrelated with the functional predictors and scalar covariates. The modelwill be extended in several different ways in the following sections.

2.1.1 Extension to sparsely observed functional responses

Our approach is able to accommodate sparseness of the observed response trajec-tories. This work is the first, to the best of our knowledge, to consider a sparselyobserved functional response within the general setting of multiple functional andscalar predictors. Let {tij : j = 1, . . . ,mi} be the set of grid points at which theresponse for curve i is observed such that ∪ni=1[{tij}mi

j=1] is dense in T . PFFRcan easily accommodate such a scenario with only few modifications in the imple-mentation, which we outline in Section 3.2. Section 4.2 provides simulation resultsobtained in this setting.

2.1.2 Extension to corruptly observed functional predictors

When the functional predictors are observed with error or not on the same fine grid,the underlying smooth curves need to be estimated first. If gridpoints are relatively


fine, then a popular method is to smooth each trajectory separately using commonsmoothing approaches (Ramsay and Silverman 2005; Ramsay, Hooker and Graves2009, Ch 5). Zhang and Chen (2007) provide justification for local polynomialmethods that this approach can estimate the true curves with asymptoticallynegligible error under some regularity assumptions. If grids of time points aresparse, the smoothing is done by pooling information across curves. We follow anapproach similar to Yao et al. (2005a): first consider an undersmooth estimator ofthe covariance function and then smooth it to remove the measurement error. Atthe latter step we use penalized splines-based smoothing, as previously employedby Goldsmith et al. (2011). In such cases, the PFFR methodology is recommended

to be applied to the reconstructed smooth trajectories, Xi1(s) and Xi2(r), that, inaddition, are centered to have zero mean function. This approach is investigatednumerically in Section 4.3.

2.2 A penalized criterion for function-on-function regression

In this section we introduce a representation of model (1) that allows us to estimatethe model using available mixed model software, as detailed in Section 2.3. Wediscuss the case when the predictor curves, Xi1(·) and Xi2(·), are measured denselyand without noise. For simplicity of presentation, consider the case when Ki = K,sik = sk, Qi = Q and riq = rq for every i, k and q. Additionally, assume that theresponse curves are measured on a common grid, that is mi = m and tij = tj forevery i and j.

To start with, we represent the functional intercept using a basis function ex-pansion as β0(t) ≈

∑κ0

l=1A0,l(t)β0,l, where A0,l(·) is a known univariate basis andβ0,l are the corresponding coefficients. Then, we use numerical integration to ap-proximate the linear “function-on-function” terms; for example,

∫S β1(t, s)Xi1(s)ds

can be approximated by∫S β1(t, s)Xi1(s)ds ≈

∑Kk=1∆kβ1(t, sk)Xi1(sk), where

sk, k = 1, . . . ,K, forms a grid of points in S and ∆k are the lengths of thecorresponding intervals. The next step is to expand β1(t, s) ≈

∑κ1

l=1 a1,l(t, s)β1,l,where a1,l(·, ·) is a bivariate basis and β1,l are the corresponding coefficients. Thus,∫S β1(t, s)Xi1(s)ds can be approximated by

κ1∑l=1

{K∑k=1

a1,l(t, sk){∆kXi1(sk)}

}β1,l =

κ1∑l=1

{K∑k=1

a1,l(t, sk)Xi1(sk)

}β1,l,

where Xi1(sk) = ∆kXi1(sk), and the accuracy of this approximation depends onhow dense the grid {sk : k = 1, . . . ,K} is. Note that Xi1 values can be includeddirectly in the function-on-function term and we do not apply a basis expansionreconstruction approach for Xi1 if Xi1 is observed densely and without noise.Using similar notation,

∫R β2(t, r)Xi2(r)dr can be approximated by

κ2∑l=1

{Q∑q=1

a2,l(t, rq)Xi2(rq)

}β2,l.

Thus, we approximate model (1) by the additive model:

Yi(t) = W tiγ +

κ0∑l=1

A0,l(t)β0,l +

κ1∑l=1

A1,l,i(t)β1,l +

κ2∑l=1

A2,l,i(t)β2,l + εi(t), (2)


where A1,l,i(t) =∑Kk=1 a1,l(t, sk)Xi1(sk) and A2,l,i(t) =

∑Qq=1 a2,l(t, rq)Xi2(rq)

are known because the predictor functions are observed without noise.While the presentation is provided in full generality, the various choices in-

volved are crucial when one develops practical software. Our approach to smooth-ing is to use rich bases that reasonably exceed the maximum complexity of theparameter functions to be estimated and then penalize the roughness of thesefunctions. This translates into choosing a large number of basis functions, κ0, κ1,κ2, and applying the penalties λ0P0(β0), λ1P1(β1), and λ2P2(β2), where βd is thevector of all parameters βd,l for d = 0, 1, 2. Thus, if we denote by µi(t;γ,β0,β1,β2)the mean of Yi(t), our penalized criterion to be minimized is∑

i,j

||Yi(tj)− µi(tj ;γ,β0,β1,β2)||2 + λ0P0(β0) + λ1P1(β1) + λ2P2(β2). (3)

This is a penalized least squares or penalized integrated squared error criterion, andis a natural extension of the integrated residual sum of squares criterion discussedin Ramsay and Silverman (2005, Ch 16.1).

Penalties, P0(β0), P1(β1), and P2(β2) are of known functional form with theamount of shrinkage being controlled by the three scalar smoothing parameters λ0,λ1, and λ2. We employ quadratic penalties, P0(β0) = βt0D0β0, P1(β1) = βt1D1β1,P2(β2) = βt2D2β2, whereD0,D1,D2 are known penalty matrices associated withthe chosen basis. Examples include integrated square of second derivative penalties(c.f. Wood 2006, Section 3.2.2), B-spline bases with a difference penalty (Eilersand Marx, 1996), but also thin plate splines, cubic regression splines, cyclic cubicregression splines or Duchon splines (Wood, 2006). Among the possible criteriafor selection of the smoothing parameters, Generalized Cross Validation (GCV),AIC and Restricted Maximum Likelihood (REML) are most popular. We preferREML in a particular mixed model due to its favorable comparison to GCV interms of MSE and stability (Reiss and Ogden 2009; Wood 2011); our preferenceis also motivated by the theoretical results that restricted maximum likelihood-based selection of the smoothing parameters is more robust than AIC to moderateviolations of the independent error assumption, see Krivobokova and Kauermann(2007). Estimation of smoothing parameters and model parameters are describednext.

2.3 Mixed model representation of function-on-function regression

We note that the penalized criterion (3) with the quadratic penalty choices de-scribed in the previous section becomes

1

σ2ε

∑i,j

||Yi(tj)− µi(tj ;γ,β0,β1,β2)||2 +λ0σ2εβt0D0β0 +

λ1σ2εβt1D1β1 +

λ2σ2εβt2D2β2,

where σ2ε is the variance of the errors εi(t) in model (1). In the literature on

semiparametric regression (see e.g. Ruppert et al. 2003), such a penalized log-likelihood has a well-established connection to mixed model estimation, where theβ coefficients are treated during estimation as if they were random effects. Theintuition behind this approach is that a penalty can be thought of as similar to aprior in a Bayesian setting or a normal distribution for random effects in a mixed


model context, in terms of incorporating prior knowledge (here of smoothness), andin fact the quadratic penalty corresponds to the logarithm of a normal distributiondensity for the coefficients. Mathematically, by denoting σ2

0 = σ2ε/λ0, σ2

1 = σ2ε/λ1,

σ22 = σ2

ε/λ2 and following arguments identical to those in Ruppert et al. (2003,sec 4.9) minimizing the penalized criterion (3) is equivalent to obtaining the bestlinear unbiased predictor in the mixed model

Yi(t) ∼ N{µi(t;γ,β0,β1,β2), σ2ε};

β0 ∼ N(0, σ20D−10 ); β1 ∼ N(0, σ2

1D−11 ); β2 ∼ N(0, σ2

2D−12 ),

(4)

where D0, D1, and D2 are known, and shrinkage of the functional parameters iscontrolled by σ2

0 , σ21 , and σ2

2 , respectively. There is a slight abuse of notation inmodel (4), as matrices D0, D1, and D2 are typically not invertible. Indeed, inmany cases, only a subset of the coefficients are being penalized or the penaltymatrix is rank deficient. These are well known problems in penalized regressionand the standard solution (Wood 2006, ch 6.6) is to either replace the inverse witha particular generalized inverse or to separate the coefficients that are penalizedfrom the ones that are not. For example, for the penalty corresponding to β1, theeigendecomposition for matrix D1 is used to split the coefficients into two kinds:one kind that is not penalized, i.e. in the null-space of the penalty matrix, whichwould be estimated as fixed coefficients, and the other kind, say bR1 that is penal-ized (not in the null-space of the penalty) and that would be treated as randomeffects during estimation using the distributional assumption bR1 ∼ N(0, σ2

1D−11+),

where D1+ is a projection of D1 into the cone of positive definite matrices; seeSection 6.6.1, Wood (2006), for a detailed presentation of this partitioning of penal-ized coefficient vectors into a mixed model representation with fixed and randomcomponents for producing smooth estimators for the parameters.

Replacing the penalized approach with the mixed model (4) has many favorableconsequences. First, it allows the estimation of the smoothing parameters as cer-tain variance ratios in this mixed model, λk = σ2

ε/σ2k, k = 0, 1, 2, using restricted

maximum likelihood (REML) estimation. This has favorable theoretical propertiesas discussed in the previous subsection and avoids computationally costly cross-validation. Second, models can naturally be extended in a likelihood frameworkto adapt to different levels of data complexity, such as the addition of smootheffects of scalar covariates. This enables us to expand the scope of function-on-function regression substantially. Third, inferential methods originally developedfor mixed models transfer to function-on-function regression. In particular, ap-proximate confidence intervals (Wahba 1983; Nychka 1988; Ruppert et al. 2003)can be obtained as a by-product of the fitting algorithm. This provides a sta-tistically sound solution to an important problem that is currently unaddressedin function-on-function regression. We discuss implementation of this inferentialtool when independent identically distributed (i.i.d.) errors in model (1) are as-sumed. Online Appendix A discusses first steps towards related inference tools forsettings with non-i.i.d. errors. Moreover, tests on the shape of the association be-tween responses and covariates are available as well. For suitably chosen bases andpenalties, likelihood ratio tests of linearity versus non-linearity, or constancy versusnon-constancy, of the coefficient surfaces along the lines described in Crainiceanuand Ruppert (2004), Greven, Crainiceanu, Kuchenhoff, and Peters (2008) can beperformed with the R package RLRsim (Scheipl, Greven and Kuchenhoff, 2008).


We propose to fit model (4) using frequentist model software based on REMLestimation of the variance components. The robust mgcv package (Wood 2014) in R

(R Development Core Team 2014), which is designed for penalized regression andhas a built-in capability to construct the penalty matrices that are appropriate forthe specified spline bases can be used for model fitting.

3 Function-on-function regression via mixed models software

3.1 Implementation for functional regression with densely observed functionalresponse

We now turn our attention to implementation. In particular, we explain how model(4) can be fit using the mgcv package (Wood 2014) in R. Note that our represen-tation of the function-on-function regression model (2) and (3) is a penalizedadditive model where the original basis functions are re-weighted via the expres-sions A1,l,i(t) =

∑Kk=1 a1,l(t, sk)Xi1(sk) and A2,l,i(t) =

∑Qq=1 a2,l(t, rq)Xi2(rq).

If a0,l(·), a1,l(·, ·), and a2,l(·, ·) are, for example, thin plate spline bases, then themgcv fit can simply be expressed as

fit <- gam(Y ~ W + s(t_0) + s(t_1, s_0, by=DX1) + s(t_2, r, by=DX2),

method="REML").

These very short software lines deserve an in-depth explanation to illustrate itsdirect connection to our mixed model representation of function-on-function re-gression. In order to fit the model, the response functions Yi(t) are stacked in annm × 1 dimensional vector Y = {Y1(t1), . . . , Y1(tm), Y2(t1), . . . , Yn(tm)}t, wheren is the number of curves (or subjects) and m is the number of observations percurve. This vector is labeled Y. The p×1 dimensional vectors of covariates W i arealso stacked using the same rules used for Yi(t). More precisely, for every subjecti the vector W i is repeated m times and rows are row-stacked in an m× p dimen-sional matrix W i. These subject specific matrices are further row-stacked acrosssubjects to form an mn× p dimensional matrix W . This matrix is labeled W. Thenext step is to define a grid of points for the smooth function β0(t) in model (1).The grid corresponds exactly to the stacking of the response functions and is de-fined as the mn× 1 dimensional vector t0 = (t1, . . . , tm, . . . , t1, . . . , tm)t obtainedby stacking n repetitions of the grid vector (t1, . . . , tm)t. This vector is labeledt_0. The expression s(t_0) thus fits a penalized univariate thin-plate spline atthe grid points in the vector t0.

So far, data manipulation and labeling have been quite straightforward. How-ever, fitting the functional part of the model requires some degree of software cus-tomization. We focus on the first functional component and build three nm ×Kdimensional matrices: 1) t1, obtained by column-binding K copies of the nm× 1dimensional vector t0; this matrix is labeled t_1; 2) s0 obtained by row-bindingnm copies of the 1×K dimensional vector (s1, . . . , sK); this matrix is labeled s_0;


and 3) X1 the nm×K dimensional matrix

X11(s1), X11(s2), . . . , X11(sK)

X11(s1), X11(s2), . . . , X11(sK). . .

X11(s1), X11(s2), . . . , X11(sK)

X21(s1), X21(s2), . . . , X21(sK). . .

Xn1(s1), Xn1(s2), . . . , Xn1(sK)

,

where each row {Xi1(s1), . . . , Xi1(sK)} is repeated m times, for i = 1, . . . , n; thismatrix is labeled DX1. With these notations, the expression s(t_1,s_0,by=DX1) isessentially building

∑lA1,l,i(t)β1,l(t), where A1,l,i(t) =

∑Kk=1 a1,l(t, sk)Xi1(sk).

Without the option by=DX1 the expression s(t_1,s_0) would build the bivariatethin-plate spline basis a1,l(t, sk), whereas adding by=DX1 it averages these bivariate

bases along the second dimension using the weights Xi1(sk) derived from thefirst functional predictor. A similar construction is done for the second functionalpredictor. Smoothing for all three penalized splines is done via REML, as indicatedby the option method=”REML”.

The implementation presented above is simple, but the great flexibility of thegam function allows multiple useful extensions. First, it is obvious that the methodand software can be adapted to a larger number of functional predictors. Thenumber of basis functions used for each smooth can be adjusted. For example,requiring k0 = 10 basis functions for a univariate thin-plate penalized spline forthe β0(·) function can be obtained by replacing s(t_0) with s(t_0,k=10). More-over, the implementation can accommodate unevenly sampled grid points bothfor the functional response and predictors. Indeed, nothing in the modeling orimplementation requires equally spaced grids. Changing from (isotropic) bivariatethin-plate splines to (anisotropic) tensor product splines based on two univariatebases can be done by replacing s(t_1,s,by=DX1) with te(t_1,s,by=DX1). Thischanges criterion (3) only slightly, such that β1 then has two additive penalties ins and t direction, respectively. We can also easily incorporate linear effects of scalarcovariates allowed to vary smoothly along t (varying coefficients), that is ziβ3(t),using expressions of the type s(t_0,by=Z), where Z is obtained using a strategysimilar to the one for functional predictors. For large datasets the function bam ismore computationally efficient than gam.

In practice, it is useful to have a dedicated, user-friendly interface that auto-matically takes care of the stacking, concatenation and multiplication operationsdescribed here, calls the appropriate estimation routines in mgcv, and returns arich model object that can easily be summarized, visualized, and validated. pffroffers a formula-based interface that accepts functional predictors and responses

in standard matrix form, i.e., the ith row contains the function evaluations forsubject i on an index vector like t0 or s. It returns a model object whose fitcan be summarized, plotted and compared with other model formulations withoutany programming effort by the user through the convenient and fully documentedfunctions summary, plot and predict. The model formula syntax used to specifymodels is very similar to the established formula syntax to lower the barrier toentry for users familiar with the mgcv-package, i.e. to specify model (2), we use


fit.pffr <- pffr(Ymat ~ c(W) + ff(X1mat, yind=t) + ff(X2mat, yind=t))

where Ymat, X1mat, X2mat are matrices containing the function evaluations, andwhere ff(Xmat, yind=t) denotes a linear function-on-function term

∫X(s)β(t, s)ds.

A functional intercept β0(t) is included by default. The term c(W) corresponds toa constant effect of the covariates in W , i.e., W tγ. By default, pffr associatesscalar covariates with an effect varying smoothly on the domain of the response,i.e., ~W yields an effect Wγ(t). Our implementation also supports non-linear ef-fects of scalar covariates zi that may or may not be constant over the domain ofthe response, i.e., f(zi) or f(zi, t), specified as c(s(z)) or s(z), respectively, aswell as multivariate non-linear effects of scalar covariates z1i, z2i that may or maynot be constant over the domain of the response, i.e., f(z1i, z2i) or f(z1i, z2i, t),specified as c(te(z1, z2)) or te(z1, z2), respectively.

Prior to applying our pffr procedure, we recommend to center the functionalpredictors. For each functional predictor, the mean function is estimated (e.g.Ramsay, Hooker and Graves 2009, sec 6.1; Bunea, Ivanescu and Wegkamp 2011)and subtracted from each curve i. When the functional predictors are centered,the functional intercept has the interpretation of the overall mean function forobservations with functional predictor values at the respective mean values.

3.2 Implementation for functional regression with sparsely observed functionalresponse

We have already seen in Section 3.1 that the class of models covered by our frame-work is much broader than model (1); specifically the proposed framework ac-commodates easily multiple functional predictors, varying coefficient effects, andnon-linear effects of one or multiple scalar covariates. Here we present extensionsto another realistic setting. PFFR can easily accommodate a function-on-functionregression scenario when the functional response is sparsely observed, with onlyfew modifications in the fitting procedure. First, the vector of responses, labeledY, accommodates subject-responses observed at different time points, and has theform Y = {Y1(t11), . . . , Y1(t1m1), Y2(t21), . . . , Y2(t2m2), . . . , Yn(tnmn))}t. Second,the vector labeled t_0 has the form t0 = (t11, . . . , t1m1 , . . . , tnmn)t. Third, the ma-trix of covariates, labeled W, is obtained by taking mi copies of the 1×p row vectorof covariates W i for subject i, column-stacking these into a mi × p-dimensionalmatrix W i, and then further column-stacking these matrices across subjects. Thefunctional components labeled DX1 and DX2 are constructed using a similar logic.There is no modification in the definitions of the vectors labeled t1 and s0. Our im-plementation in pffr performs these modified pre-processing steps automaticallyif a suitable data structure for sparsely observed functional responses is provided.With these adjustments, the fitting procedure can proceed with gam as describedin Section 3.1. Section 4.2 provides simulation results obtained in this setting.

4 Simulation study

We conduct a simulation study to evaluate the performance of PFFR in realisticscenarios including both densely and sparsely observed functional predictors, andfor a functional response that is densely or sparsely sampled.


Due to the lack of other available implemented methods to do regression withmultiple functional and scalar covariates, we compare PFFR to a sequence ofscalar-on-function regressions; see Section 4.4 for a comparison with linmod (Ram-say et al. 2009), for the simpler setting of a single functional covariate and no addi-tional scalar covariates. More precisely, for every fixed t, the function-on-functionregression model (1) becomes a scalar on function regression that can be fit us-ing, for example, PFR (Goldsmith et al. 2011), as implemented in the R functionpfr from the refund package. PFR provides estimates of the functional parame-ters β1(t, s) and β2(t, r) for every fixed t. Aggregating these functions over t leadsto surface estimators that are then smoothed using a bivariate penalized splinesmoother. This approach is labeled modified-PFR in the remainder of the paper.Without bivariate smoothing the modified-PFR was found not to be as compet-itive. For the modified-PFR method, the upper and lower bound surfaces of theconfidence intervals were obtained analogously by smoothing the upper and lowerbounds estimated by PFR for each fixed t.

4.1 Densely sampled functional predictors

The curves Yi(t) were observed on an equally spaced grid t ∈ {j/10 : j =1, 2, ..., 60} and the functional predictors were observed on equally spaced grids,but on different domains: Xi1(s) on s ∈ {k/10 : k = 1, 2, ..., 50} and Xi2(r)on r ∈ {q/10 : q = 1, 2, ..., 70}. The bivariate functional parameters β1(t, s) =cos(tπ/3)sin(sπ/5) and β2(t, r) =

√tr/4.2 have comparable range and are dis-

played in the left panels of Figure 2. The functional intercept is β0(t) = 2e−(t−2.5)2

and the random errors εi(tj) were simulated i.i.d. N(0, σ2ε ).

We generated n functional responses Yi(t) from model (1) by approximatingthe integrals via Riemann sums with a dense grid for each domain S and R.For the first functional predictor we considered the following mean zero processVi1(s) = Xi1(s) + δi1(s), where Xi1(s) =

∑Esk=1{vik1

sin(kπs/5) + vik2cos(kπs/5)}

with Es = 10, and where vik1, vik2

∼ N(0,1/k4) are independent across subjects i.For the second functional predictor we considered Vi2(r) = Xi2(r) + δi2(r), whereXi2(r) =

∑Erk=1(2

√2/(kπ))Uiksin(kπr/7) with Er = 40, and where Uik ∼ N(0,1),

and δi1(s), δi2(r) ∼ N(0, σ2X) are mutually independent. More precisely, Xi1(s) and

Xi2(r) were the true underlying processes used to generate Yi(t) from model (1)and Vi1(s) and Vi2(r) were the actually observed functional predictors. The choiceof Xi1(s), β1(·, ·) and β2(·, ·) is similar to the choices in Goldsmith et al. (2011),whereas Xi2(r) is a modification of a Brownian bridge. For the scalar covariates, weconsidered a binary variable W1 = 1{Unif[0, 1] ≥ .75}, and a continuous variableW2 ∼ N(10, 52). A univariate cubic B-spline basis with κ0 = 10 basis functionswith second-order difference penalty (Eilers and Marx, 1996) was used to fit β0(·).Tensor products of cubic B-splines with κ1 = κ2 = 25 basis functions and second-order difference penalties in both directions were used to fit β1(·, ·) and β2(·, ·).For more complex functional parameters, increasing the number of basis functionsmay be necessary to capture the increased complexity. In our setting, increasingthe number of basis functions for the bivariate smoothers from 25 to 100 did notsignificantly affect the fit or computation times.

We considered all possible combinations of the following choices:

1. Number of subjects: (a) n = 100 and (b) n = 200.


2. Functional predictors: (a) noiseless σX = 0 and (b) noisy σX = 0.2.3. Standard deviation for the error: σε = 1.4. Effects of the scalar covariates:

(a) no scalar covariates, and(b) scalar covariates W1 and W2 with γ1 = 1 and γ2 = −0.5.

Fig. 1 The left panel displays a sample of 200 simulated functional responses Yi(t) with

σε = 1. The middle and right panels display 200 simulated functions from Xi1(s) and Xi2(r),

respectively, highlighting three examples of the functional predictors with no error (solid) and

with measurement error σX = 0.2 (dashed).

The combination of the various choices provides eight different scenarios andfor each scenario we simulated 500 data sets. For illustration, Figure 1 displaysone simulated data set for scenario 4(a). Curves for three subjects are highlightedin color, with each color representing one subject across the three panels. PFFRuses two steps for case 2(b). First, the functional predictors were estimated as

Xi1(s) and Xi2(r) using the smoothing approach previously used in Goldsmithet al. (2011). Second, these estimated functions were used instead of Xi1(s) andXi2(r) when fitting model (1).

We computed the integrated mean squared error (IMSE), integrated squared

bias (IBIAS2), and integrated variance (IVAR), where IMSE{β(t, s)}=∫T∫S E[{β(t, s)−

β(t, s)}2]dtds, IBIAS2{β(t, s)} =∫T∫S [E{β(t, s)}−β(t, s)]2dtds, and IVAR{β(t, s)} =∫

T∫S V ar{β(t, s)}dtds. Here E{β(t, s)} and V ar{β(t, s)} were estimated by the


empirical mean and variance of β(t, s) in 500 simulations. To characterize the prop-erties of the pointwise confidence intervals we report the integrated actual point-wise coverage (IAC) and integrated actual width (IAW), where IAC = E[

∫T∫S 1{β(t, s) ∈

CIp(t, s)}dtds], where CIp is the pointwise approximate confidence interval forthe model parameter β(t, s), based on an independence assumption of the errors.We implemented approximate pointwise confidence intervals for the nominal level0.95. For example, an approximate 95% pointwise confidence interval CIp(t0, s0)

for β1(t0, s0) can be constructed as β1(t0, s0)±1.96 sd{β1(t0, s0)}. As β1(t0, s0) =∑κ1

l=1 a1,l(t0, s0)β1,l, for every (t0, s0), sd{β1(t0, s0)} =√a1(t0, s0)Σ1at

1(t0, s0),

where Σ1 is the estimated covariance matrix of β1, and a1(t0, s0) = {a1,l(t0, s0)}l.We use the Bayesian posterior covariance matrix, see Ruppert et al. (2003), avail-able under the Vp specification for gam; alternatively, for REML estimation, a Vc

option that additionally accounts for smoothing parameter uncertainty is availablein the mgcv R-package (Wood 2014) and can in some cases lead to slightly improved

coverage. The width of this confidence interval is 3.92 sd{β1(t0, s0)} and IAW =

3.92E[∫T∫S sd{β1(t, s)}dtds]. As a measure of accuracy of the fit we provide the

functional R2, denoted by fR2, and computed as fR2 = 1 − {∑i

∑j [Yi(tj) −

µi(tj)]2}/{

∑i

∑j [Yi(tj) − µY (tj)]

2}, where µY (tj) =∑ni=1 Yi(tj)/n, and µi(tj)

are the estimated µi(tj) using the fitted model. Table 1 compares the averages ofthese measures over simulations, whereas Online Appendix B displays boxplots ofthese measures for the bivariate functional parameters calculated for each data set.Overall, our results indicate that PFFR outperforms the modified-PFR in termsof estimation precision. To provide the intuition behind these results, Figure 2displays the fits using PFFR and PFR obtained in one simulation for the settingn = 200, σε = 1, σX = 0 and case 4(a). Both methods capture the general fea-tures of the true parameter surfaces well, but the modified-PFR method does notborrow strength between the neighboring values of the functional responses sinceit is derived from PFR. As our results show, this causes unnecessary roughness ofthe estimates along the t dimension. In contrast, PFFR provides a much smoothersurface that better approximates the shape of the true underlying functions.

Results in Table 1 can be summarized as follows. PFFR performs better thanmodified-PFR in terms of IMSE for almost all the estimated parameters, and givescomparable performance in a few cases, irrespective of the noise level in the func-tional predictors and the presence of additional non-functional covariates; see thecolumns labeled IMSE. For the bivariate functional parameters, the variability ofthe MSEs of PFFR and modified-PFR is comparable, and the median MSE ofPFFR is smaller than that of modified-PFR; see Online Appendix B. As the num-ber of subjects increases, IMSE for both methods decreases, confirming that in thesettings considered, the methodology yields consistent estimators; this is expectedto hold more generally, see, for example Claeskens, Krivobokova, and Opsomer(2009), for the asymptotics of penalized splines. In terms of the performance ofthe pointwise approximate confidence intervals, the PFFR intervals are reason-ably narrow and have a coverage probability that is relatively close to the nominallevel; see the columns corresponding to IAW and IAC in Table 1. By contrast,confidence intervals for modified-PFR are constructed based on models for each tseparately, which do not make use of the full information available, and intervalsare thus excessively wide. While this results in coverage close to 1 for some set-


Table

1R

esults

for

fun

ction

-on

-fun

ction

regressio

nw

ithden

selysa

mp

ledfu

nctio

nal

pred

ictors.

Nu

mb

ersare

avera

ges

acro

ss500

simu

latio

ns.

σX

=0

σX

=0.2

(×103)

(×103)

Method

√IMSE

√IVAR

√IBIAS2

IAC

IAW

fR2

√IMSE

√IVAR

√IBIAS2

IAC

IAW

fR2

Scenario

4(a)

n=

100

β0(t)

PFFR

39.2

035.8

015.9

70.9

20.1

442.1

238.9

715.9

90.9

00.1

4modified-PFR

40.4

340.0

65.4

61.0

00.4

043.8

643.4

75.7

80.9

90.4

0β1(t,s)

PFFR

42.5

441.6

48.7

00.9

80.1

792.5

5%

49.4

948.7

28.7

10.9

60.1

892.4

0%

modified-PFR

134.6

826.1

3132.1

20.7

70.2

892.3

4%

137.5

627.9

5134.6

90.7

60.2

892.1

9%

β2(t,r)

PFFR

37.3

425.0

727.6

70.8

80.0

841.6

331.8

226.8

40.8

30.0

9modified-PFR

65.0

028.5

958.3

70.9

70.3

568.1

831.3

860.5

30.9

60.3

5

Scenario

4(a)

n=

200

β0(t)

PFFR

29.4

925.5

314.7

60.9

00.1

031.2

927.6

214.6

90.8

80.1

0modified-PFR

27.6

927.4

63.5

81.0

00.2

829.8

429.6

43.4

50.9

90.2

8β1(t,s)

PFFR

32.9

232.0

87.3

80.9

70.1

392.5

9%

37.8

937.1

77.3

40.9

60.1

492.4

4%

modified-PFR

99.4

023.5

196.5

80.8

30.2

592.4

5%

103.2

525.1

4100.1

40.8

20.2

592.2

9%

β2(t,r)

PFFR

30.2

218.5

023.8

90.8

90.0

634.4

624.6

024.1

40.8

30.0

7modified-PFR

46.3

022.6

240.3

90.9

80.3

049.0

325.1

642.0

80.9

80.3

0

Scenario

4(b)

n=

100

γ1

PFFR

30.4

030.4

00.2

10.9

60.1

242.9

542.9

31.1

80.8

50.1

2modified-PFR

33.9

033.9

00.2

71.0

00.9

444.4

544.4

21.4

91.0

00.9

5γ2

PFFR

2.8

12.8

00.1

20.9

20.0

13.7

83.7

70.2

70.8

50.0

1modified-PFR

3.1

03.1

00.0

51.0

00.0

83.9

63.9

50.3

51.0

00.0

8β0(t)

PFFR

53.9

351.4

316.2

30.7

90.1

494.9

4%

65.2

163.0

616.5

80.7

00.1

494.8

4%

modified-PFR

115.6

8112.4

427.1

70.9

91.1

594.7

5%

122.8

3119.7

327.4

30.9

91.1

794.6

4%

β1(t,s)

PFFR

43.4

242.4

09.3

30.9

70.1

751.3

150.5

38.8

70.9

50.1

8modified-PFR

135.3

626.7

0132.7

00.7

70.2

8138.4

328.7

2135.4

20.7

60.2

8β2(t,r)

PFFR

36.5

424.5

927.0

20.8

90.0

842.7

032.6

427.5

30.8

20.0

8modified-PFR

65.7

229.1

658.8

90.9

70.3

568.7

431.6

661.0

20.9

60.3

5

Scenario

4(b)

n=

200

γ1

PFFR

21.4

121.4

00.3

40.9

40.0

829.8

329.7

81.5

90.8

60.0

8modified-PFR

23.2

523.2

50.5

61.0

00.6

530.1

530.1

01.7

41.0

00.6

6γ2

PFFR

1.8

31.8

30.0

80.9

50.0

12.5

72.5

70.0

50.8

50.0

1modified-PFR

1.9

41.9

40.1

01.0

00.0

52.6

52.6

50.0

31.0

00.0

5β0(t)

PFFR

38.4

235.4

514.8

10.8

00.1

094.9

7%

47.0

444.6

014.9

60.7

10.1

094.8

6%

modified-PFR

80.0

778.7

914.2

51.0

00.8

094.8

4%

84.5

283.0

315.7

80.9

90.8

194.7

4%

β1(t,s)

PFFR

33.0

032.2

27.1

00.9

70.1

339.0

938.3

47.6

50.9

50.1

4modified-PFR

100.1

623.0

997.4

60.8

20.2

5103.7

425.7

9100.4

90.8

20.2

5β2(t,r)

PFFR

30.3

918.9

123.7

90.8

90.0

634.4

525.3

323.3

50.8

40.0

7modified-PFR

46.6

522.8

540.6

70.9

80.3

049.8

625.6

242.7

70.9

80.3

0


Fig. 2 Displayed are the bivariate functional parameters β1(t, s) (left panels) and β2(t, r)

(right panels): true values (top panels), estimates for one simulation iteration via PFFR (middle

panels), and estimates via PFR (bottom panels). Scenario: n = 200, σε = 1, σX = 0 and case

4(a).

tings, in other settings the point estimates have such high IMSE and IBIAS valuesthat intervals are incorrectly centered and give very low coverage despite the largewidth. Overall, confidence intervals based on PFFR are narrower, more properlycentered and give coverage more reliably close to the nominal level, despite someamount of undercoverage in some of the settings. For PFFR confidence intervals,as the number of subjects increases, IAW decreases, as expected, while maintain-ing IAC close to the nominal level. The approximate confidence intervals discussedin this section provide results that overall perform rather close to the 0.95 nominal


level. Alternative bootstrap confidence intervals are discussed in Online AppendixA. Overall, when functional predictors are measured with error, the estimationof parameters and coverage of confidence intervals tends to deteriorate slightly.Otherwise, results are similar to the setting without noise. The accuracy of the fitseems similar for PFFR and modified-PFR, as illustrated by fR2.

Compared to PFFR, the modified-PFR alternative yields inferior results. First,fit results tend to be rougher and may require additional and specialized smooth-ing. Second, running a large number of scalar on function regressions can leadto longer computation times. Third, confidence intervals are not easy to obtain,which may further affect computation times.

4.2 Sparsely sampled functional response

Consider again scenario 4(a) described in Section 4.1 and assume that for eachsubject i, the functional response is observed at randomly sampled mi pointsfrom t ∈ {j/10, j = 1, 2, ..., 60}. Estimation of the parameter functions was carriedout using our PFFR approach with κ0 = 10 basis functions for the univariatespline basis, and κ1 = κ2 = 25 for the bivariate spline bases. The estimates areevaluated using the same measures as described in Section 4.1. In the situation ofsparsely sampled functional response, our method does not have any competitors.

Table 2 shows the results for two sparsity levels, mi = 20 and mi = 6. Asexpected, the sparsity of the functional response affects both the bias and thevariance of the parameter estimators, as well as the width of the confidence inter-vals; compare Table 2 with the PFFR results of Table 1 corresponding to scenario4(a) and n = 200. Nevertheless PFFR continues to show very good performancefor many levels of missingness in the functional response with coverage of theconfidence intervals similar to before.

4.3 Functional predictors sampled with moderate sparsity

The sparse design for the functional predictors was generated by starting with thescenario 4(a) described in Section 4.1 with few changes. The number of eigenfunc-tions for the two functional predictors was set to Es = 2 and Er = 4 respectively.For each functional predictor curve i, 1 ≤ i ≤ n, we randomly sampled Ki pointsfrom s ∈ {k/10, k = 1, 2, ..., 50} and Qi points from r ∈ {q/10, q = 1, 2, ..., 70}. Weapply the same smoothing approach used in Goldsmith et al. (2011) to reconstructthe trajectories for each functional predictor; see the digitally available R programfor Sec 4.3 at the first author’s website.

For illustration, Figure 3 displays the simulated data and predicted values forfour curves, two for process Xi1(s) displayed in the two leftmost columns, and twofor process Xi2(r) displayed in the two rightmost columns. The true underlyingcurves are shown as black dotted lines, the actual observed data are the red dotsand the predicted curves are the solid red lines. Visual inspection of these plotsindicates that the smoothing procedure recovers the underlying signal quite well.This is one of the main reasons PFFR continues to perform well for these scenarios.

Table 3 displays the results for our model parameters for the case of functionalpredictors observed with moderate sparsity, n = 200 subjects and different sparsity


Table

2R

esults

for

fun

ction

-on

-fun

ction

regressio

nw

ithsp

arsely

sam

pled

functio

nal

respon

sean

dd

ensely

sam

pled

fun

ction

al

pred

ictors.

Nu

mb

ersare

avera

ges

acro

ss500

simu

latio

ns.

σX

=0

σX

=0.2

(×103)

(×103)

Method

√IMSE

√IVAR

√IBIAS2

IAC

IAW

√IMSE

√IVAR

√IBIAS2

IAC

IAW

Scenario

4(a)

n=

200

mi

=20

β0(t)

PFFR

48.0

044.7

617.3

10.9

10.1

749.0

745.8

717.4

20.9

10.1

7β1(t,s)

PFFR

49.0

647.8

310.9

00.9

70.2

052.4

751.3

310.9

10.9

70.2

0β2(t,r)

PFFR

40.4

928.9

128.3

50.8

90.1

043.7

831.6

830.2

10.8

50.1

0

mi

=6

β0(t)

PFFR

83.7

779.3

226.9

20.9

20.2

985.1

380.7

127.0

60.9

20.3

0β1(t,s)

PFFR

79.3

975.6

224.1

50.9

70.2

981.9

978.2

024.6

30.9

70.3

0β2(t,r)

PFFR

55.9

343.6

734.9

30.8

80.1

557.8

144.6

836.6

80.8

70.1

5


Fig. 3 Prediction of trajectories for functional predictors in simulation settings with Ki = 12,

Qi = 25 points per curve, and σX = 0.20. Shown are: Xi1(sik) for two subjects (leftmost two

panels) and Xi2(riq) for the same subjects (rightmost two panels). For each panel we have the

true signal (dotted lines), the observed signal (points), and the predicted signal (solid lines).

levels. Both the IMSE and the coverage performance of the PFFR are affectedby the sparsity of the predictor function. Due to the sparsity of the functionalpredictors, a reduced number of basis functions is used for estimating the bivariateparameters. Throughout this simulation exercise we used κ0 = 10 basis functionsfor the univariate basis, and κ1 = κ2 = 20 basis functions for the two bivariatebases.

4.4 Simulations: A Single Functional Predictor

In this section we compare pffr to the fully functional regression model of Ramsayand Silverman (2005, Ch. 16), which is implemented in the linmod function in theR-package fda (Ramsay, Hooker and Graves, 2009, Sections 10.3-4). As linmod

cannot handle more than one functional covariate and sparsely sampled functionalresponses and/or covariates, we consider a model with a single functional predictorand densely observed functions. Our simulation design for this section correspondsto that in Section 4.1, Scenario 4(a), σX = 0, with the exception that now themodel has a single functional predictor, X1(s), instead of two functional predictors.We generate n realizations of functional responses from the model:

Yi(t) = β0(t) +

∫Sβ1(t, s)Xi1(s)ds+ εi(t).

The linmod function requires the construction of functional data objects forthe functional response and functional predictor prior to the model fitting step.We first outline the requirements for setting up the functional data objects, andthen provide details for using the linmod function. One first has to choose the ba-sis for object generation (Ramsay, Hooker and Graves, 2009, Section 5.2.4), which


Table

3R

esults

for

fun

ction

-on

-fun

ction

regressio

nw

ithd

ensely

sam

pled

fun

ction

al

respon

sean

dfu

nctio

nal

pred

ictors

sam

pled

with

mod

erate

sparsity.

Nu

mb

ersare

avera

ges

acro

ss500

simu

latio

ns.

Ki

=15,Qi

=30

Ki

=12,Qi

=25

(×103)

(×103)

Method

√IMSE

√IVAR

√IBIAS2

IAC

IAW

fR2

√IMSE

√IVAR

√IBIAS2

IAC

IAW

fR2

Scenario

4(a)

n=

200

β0(t)

PFFR

35.6

732.4

614.7

80.8

40.1

038.4

235.4

314.8

30.8

20.1

1modified-PFR

33.9

333.7

33.6

50.9

90.2

936.6

136.4

23.7

40.9

90.2

9

β1(t,s)

PFFR

69.8

368.3

214.4

40.8

80.1

892.1

3%

80.7

579.3

814.8

00.8

60.1

991.9

6%

modified-PFR

123.8

735.4

4118.6

90.7

80.2

791.9

7%

131.0

438.2

8125.3

30.7

60.2

791.8

1%

β2(t,r)

PFFR

51.5

149.7

113.4

80.7

20.0

855.9

954.4

812.9

40.7

00.0

8modified-PFR

61.9

733.8

251.9

30.9

70.3

264.9

435.4

154.4

30.9

70.3

2


must be the same as the basis used for the representation of the regression pa-rameters: the functional intercept β0(t), and the bivariate regression slope β1(t, s).In addition to the choice of basis for functional data representation, linmod re-quires the manual specification of the smoothing parameter. We use generalizedcross validation search across 41 values ranging from 10−5 to 105 in steps of 100.25

in a manner similar to the implementation described in Section 5.3. of Ramsay,Hooker and Graves (2009). Separate cross validations are performed for obtainingfunctional data objects for the functional response and the functional predictor,respectively.

The linmod function requires the specification of three additional smoothingparameters: one for the functional intercept, and one each for the s and t direc-tions of the regression coefficient β1(s, t). The specification of these three param-eters, necessary for fitting the linmod function, is determined with the help of aleave-one-curve-out cross-validation. Section 10.1.3 of Ramsay, Hooker and Graves(2009) presents implementation examples for a leave-one-curve-out cross valida-tion. We use the choices λβ0

= {10−4, 10−5, 10−6} for the smoothing parameterfor β0, and λ

β(s)1

= {102, 103, 104} for the smoothing parameter in the s direction

for β1(t, s), λβ

(t)1

= {102, 103, 104} for the smoothing parameter in the t direction

for β1(t, s). These correspond to extended choices compared to the implementa-tion of the linmod function introduced in Section 10.4 of Ramsay, Hooker andGraves (2009). This constructed grid consists of 27 options available for the leave-one-curve-out cross validation. We executed our search using the cross-validated

integrated squared error CV ISE(λβ0, λβ

(s)1, λβ

(t)1

) =∑ni=1

∫T (Yi(t)−Y (−i)

i (t))2dt,

where Y(−i)i (t) = β

(−i)0 (t) +

∫S β

(−i)1 (t, s)X1i(s)ds was computed as the predicted

value for Yi(t) when it was omitted from estimation. We tracked computation timefor the linmod model fitting (that includes the generalized cross validations, andthe leave-one-curve-out cross validation), and for sample size n = 100 the timewas approximately 20 minutes on a MacBook 2GHz. Table 4 shows the compu-tation time for all considered approaches and shows that our approach is fasterthan linmod by two orders of magnitude. This is likely due to the need for cross-validation for linmod as opposed to REML estimation in our case.

Table 4 also shows that our approach improves on linmod in terms of meansquared error, bias, variance and fR2. This finding might be related to the factthat REML can determine the optimal smoothing parameters more precisely thana grid search using cross-validation can do within a reasonable computation time,especially for multiple smoothing parameters. Another benefit of our method mightbe that not having to smooth the outcome functions before estimation might allowfor a more accurate incorporation of uncertainty into the model fit. The currentlinmod function does not permit to obtain confidence intervals for the regressionparameters, whereas pffr enables the estimation of lower and upper bounds forpointwise confidence intervals.

5 Application to DTI-MRI tractography

We consider a diffusion tensor imaging (DTI) brain tractography study of mul-tiple sclerosis (MS) patients; cf. Greven, Crainiceanu, Caffo and Reich (2010),Goldsmith at el. (2011), Staicu, Crainiceanu, Reich and Ruppert (2012), McLean


Table 4 Results for function-on-function regression with a single densely sampled functionalpredictor. Numbers are averages across 200 simulations. Computation time for one fit, listedas cpu, is reported in minutes.

(×103)

Method√IMSE

√IV AR

√IBIAS2

n = 100

β0(t)PFFR 38,46 35.04 15.83

modified-PFR 37.81 37.41 5.50linmod 93.20 28.60 88.71

β1(t, s)PFFR 42.83 41.94 8.69

modified-PFR 75.34 43.10 61.80linmod 83.24 47.66 68.24

fR2 cpu

PFFR 75.46% 0.08modified-PFR 75.11% 0.06

linmod 74.86% 26.66

et al. (2014). Multiple sclerosis is a demyelinating autoimmune-mediated diseasethat is associated with brain lesions and results in severe disability. Little is knownabout in-vivo demyelination including whether it is a global or local phenomenon,whether certain areas of the brain demyelinate faster, or whether lesion formationin certain areas is associated with demyelination in other areas of the brain. Also,it is unclear how these patterns in the demyelination process differ in MS patientsfrom the normal aging process in healthy subjects. Here we attempt to provide ananswer to some of these questions using function-on-function regression.

We focus on three major tracts that are easy to recognize and identify on theMRI scans: the corpus callosum (CCA), corticospinal (CTS) and optical radiation(OPR) tracts. The study comprises 160 MS patients and 42 healthy controls,who are observed at multiple visits. At each visit, functional anisotropy (FA)is extracted along these three major tracts for each subject, resulting in threefunctional measurements per observation. For illustration, the panels in Figure 4display the FA in the MS and control groups for the three tracts - CCA tract(left), CTS tract (middle) and OPR tract (right) - at the baseline visit. Depictedin red/blue/green are the FA measurements corresponding to three subjects ineach group (subjects are color coded).

Our goal here is mainly exploratory, as we are trying to further understandthe spatial and temporal course of the disease. Following Tievsky et al. (1999) andSong et al. (2002), we use FA as our proxy variable for demyelination of the whitematter tracts; larger FA values are closely associated with less demyelination andfewer lesions. MS is typically associated with lesions and axon demyelination inthe corpus callosum. In advanced stages of MS, there is evidence for significantneuronal loss in the corpus callosum (e.g. Evangelou et al. 2000, Ozturk et al.2010). To study the spatial association between demyelination along the CCA,CTS and OPR, we focus in our first regression model on the CCA - one of themain structures in the brain - as the response, and CTS and OPR as the covariates.It would also be possible, and potentially of interest, to look at regression modelswith each of the other two white matter tracts as the response. Interpretationof the associations between demyelination in certain areas of the different tractscan potentially inform medical hypotheses on spatial progression of the disease.


In addition, models for each of the three responses could also be thought of aspotential future building blocks in a mediation analysis (Lindquist 2012) withfunctional variables, to investigate the causal pathway of the effect of changes inthe brain during MS on disability outcomes (e.g. Huang, Goldsmith, Reiss, Reichand Crainiceanu 2013). This would involve both extension of the Lindquist (2012)method to the fully functional setting and expert clinical knowledge to interpretthe neurological findings.

Consider that the response of interest is the FA along the CCA tract, andthe functional predictors are the FA along the left CTS and left OPR tracts, atthe baseline visit. The first step is to de-noise and deal with missing data in thefunctional predictors, as discussed in Section 2. Following the notation in Section2, Yi(t) denotes the FA along the CCA tract at location t, Xi1(s) and Xi2(r)denote the centered de-noised FA along the left CTS tract at location s, and theleft OPR tract at location r, respectively, of subject i. We use regression model(1), where W i is the vector of age and gender of subject i. The proposed methodsare employed to estimate the regression functions β0(·), β1(·, ·) and β2(·, ·) as wellas the regression parameters γ.

Figure 5 shows the regression function estimates along with information ontheir pointwise approximate 95% confidence intervals for each of the two groups:MS subjects (top row) and controls (bottom row). We begin with β0(t), whichaccounts for a relative R2 of 28.9% for the MS group and 38% for the controlgroup. For the MS group, the estimated overall mean of the FA for the CCA tracthas a wavy shape, with two main peaks, one near the beginning of the tract, aroundlocation 10, and one at the end, around 90. For the control group, the estimatedmean function shows similar patterns, but we observe lower FA values for theMS group than for the control; this is biologically plausible, as FA values tend todecrease in MS-affected subjects due to lesions or demyelination. Next, we considerthe effect of the FA for the left CTS tract; the relative R2 corresponding to thispredictor is 10.8% for the MS group and 15.9% for the control group. The estimatedfunction β1(t, s), displayed in the middle row of Figure 5, shows the associationbetween the FA for the CTS tract at location s and the FA of the CCA tractat location t. Pointwise 95% confidence intervals for the coefficient functions areconstructed under independence assumption of the errors, and are displayed usingdifferent colors of the facets of the surface mesh. Specifically, red represents positivevalues, β1(t, s) > 0, that are found to be significant, blue represents significant

negative values, β1(t, s) < 0; light red and light blue correspond to the remainingpositive and negative values, respectively.

For the control group, most of the CTS-FA values (corresponding to locationsbelow 40) are positively associated with FA values along the CCA, indicating thatdemyelination occurs simultaneously in those tracts. The coefficient surface for thecontrol group is mostly constant, indicating a spatially homogeneous associationfor most regions of these two tracts. In MS patients, on the other hand, there areareas (30 to 45, and around 10) of the CTS for which FA measures show negativeassociations with the FA values in the CCA. This might indicate regions of thosetracts where demyelination due to lesions typically occurs only in one tract ata time, while stronger positive associations may suggest that lesions often occursimultaneously in those areas. Our results support the expected pattern that thepathological demyelination in MS patients is a more localized phenomenon thandemyelination in control patients due to aging. These findings, while exploratory,


Fig. 4 FA in the CCA tract (left panels), FA in the left CTS tract (middle) and FA in the

left OPR tract (right panels) for the MS patients (top panels) and controls (bottom panels)

observed at the baseline visit. In each of the top and bottom panels, depicted are the FA

measurements for three subjects, with each mark representing a subject.


Fig. 5 Estimates of the regression functions β0 (top panels), β1 (middle panels) and β2

(bottom panels) corresponding to the MS group (left column) and the control group (right

column). The dashed lines (top panels) correspond to the pointwise 95% confidence intervals.

The facets of the surface mesh are marked: darker shades for more than approximately two

standard errors above/below zero, and lighter shades for less than approximately two standard

errors above/below zero.


may yield new insights into the motivating question of whether lesion formationin certain areas of the brain is associated with demyelination in other areas of thebrain and can lead to hypotheses for further studies.

We examine the effect of the FA of the left OPR tract, as estimated by β2(t, r).The profile of the left OPR tract seems to be less predictive of the response for thecontrol group (relative R2 is 12.6%) and more predictive for the MS group (relativeR2 is 25.7%). Here, the associations between the FA of the OPR and the FA ofthe CCA seem somewhat similar between the two groups, but more pronouncedfor the MS patients, especially for the inferior OPR; see Figure 5, bottom panels.

The estimates of the additional covariates, age and gender (reference group: fe-male), are 0.0002∗(0.00004) and 0.0008(0.0009), for the MS group, and−0.0003∗(0.00005)and −0.0052∗(0.0018) for the control group, respectively, where the asterisks in-dicate significance at level 0.01. Standard errors are displayed within brackets.Negative age effects as seen in the controls are expected, as myelination and FAvalues tend to decrease with age even in healthy subjects.

6 Discussion

We develop a novel method for function-on-function regression that: 1) is appli-cable to one or multiple functional predictors observed on the same or differentdomains than the functional response; 2) accommodates sparsely and/or noisilyobserved functional predictors as well as a sparsely observed functional response,3) accommodates linear or non-linear effects of additional scalar predictors; 4) pro-duces likelihood-based approximate confidence intervals based on a working inde-pendence assumption of the errors, as a by-product of the associated mixed modelframework; 5) is accompanied by open-source and fully documented software. InOnline Appendix A we briefly discuss and numerically investigate bootstrap-basedconfidence intervals to account for non-i.i.d. error structure.

PFFR was applied and tested in a variety of scenarios and showed good resultsin both simulations and a medical application. Equally important PFFR is aneasy to use automatic fitting procedure implemented in the pffr function of theR package refund.

Recent work by Scheipl, Staicu and Greven (2014) develops a framework of re-gression models for correlated functional responses, allowing for multiple partiallynested or crossed functional random effects with flexible correlation structures for,e.g., spatial, temporal, or longitudinal functional data. While the focus is on mod-eling of the covariance structure for correlated functional responses, Scheipl et al.(2014) successfully use our PFFR methods for modeling the dependence of thefunctional mean on functional and scalar covariates within their framework. Thisshows that methods introduced in this paper are potentially useful in a much widerrange of settings than considered here and can be seen as an important buildingblock in the development of flexible regression models for functional responses.

Electronic supplementary material

Additional material is available with our article and consists of the following onlineappendices available as supplementary material online. Material is organized as


Online Appendix A, B, and C. Online Appendix A outlines our pffr bootstrapconfidence intervals. Online Appendix B consists of box plots corresponding tonumerical measures on mean squared error, coverage of the approximate confidenceintervals, and width for β1 and β2 parameters and simulation settings presentedin Section 4.1. Online Appendix C consists of an additional exploratory study ofour data for the temporal progression of demyelination in MS patients across twoconsecutive visits, both within the same location and with spatial propagation.

Acknowledgements

Staicu’s research was supported by U.S. National Science Foundation grant numberDMS 1007466 and by the NCSU Faculty Research and Professional Developmentgrant. Fabian Scheipl and Sonja Greven were funded by Emmy Noether grantGR 3793/1-1 from the German Research Foundation. We thank Daniel Reich andPeter Calabresi for the DTI tractography data. We would like to thank CiprianCrainiceanu for helpful discussions and comments on earlier versions of the paper.

References

1. Aguilera, A., Ocana, F., and Valderrama, M. (1999). Forecasting with unequally spaceddata by a functional principal component approach. Test, 8(1):233-253.

2. Aneiros-Perez, G. and Vieu, P. (2008), Nonparametric time series prediction: A semi-functional partial linear modeling, Journal of Multivariate Analysis, 99, 834-857.

3. Basser, P., Mattiello, J., and LeBihan, D. (1994), MR Diffusion Tensor Spectroscopy andImaging, Biophysical Journal, 66, 259-267.

4. Basser, P., Pajevic, S., Pierpaoli, C., and Duda, J. (2000), In Vivo Fiber Tractographyusing DT-MRI Data, Magnetic Resonance in Medicine, 44, 625-632.

5. Bunea, F., Ivanescu, A. E., and Wegkamp, M. H. (2011), Adaptive Inference for the Meanof a Gaussian Process in Functional Data, Journal of the Royal Statistical Society, SeriesB, 73(4), 531-558.

6. Claeskens, G., Krivobokova, T., and Opsomer, J. D. (2009), Asymptotic Properties ofPenalized Splines Estimators. Biometrika, 96(3), 529-544.

7. Crainiceanu, C., Reiss, P., Goldsmith, J, Huang, L., Huo, L., Scheipl, F.,Swihart, B., Greven, S., Harezlak, J., Kundu, M., G., Zhao, Y., McLean,M., and Xiao, L. (2014), refund: Regression with Functional Data, Website:http://CRAN.R-project.org/package=refund

8. Crainiceanu, C. M., and Ruppert, D. (2004), Likelihood Ratio Tests in Linear MixedModels with one Variance Component, Journal of the Royal Statistical Society, Series B,66(1), 165-185.

9. Crainiceanu, C. M., Staicu, A.-M., and Di, C. (2009), Generalized Multilevel FunctionalRegression, Journal of the American Statistical Association, 104(488), 177-194.

10. Eilers, P. and Marx, B. (1996). Flexible smoothing with B-splines and penalties. StatisticalScience, 11(2):89-121.

11. Evangelou, N., Konz, D., Esiri, M. M., Smith, S., Palace, J., and Matthews, P. M. (2000),Regional axonal loss in the corpus callosum correlates with cerebral white matter lesionvolume and distribution in multiple sclerosis, Brain, 123(9): 1845-1849.

12. Fan, Y., Foutz, N., James, G. M., and Jank, W. (2014). Functional response additivemodel estimation with online virtual stock markets. The Annals of Applied Statistics. Toappear.

13. Ferraty, F., Laksaci, A., Tadj, A. and Vieu, P. (2011), Kernel regression with functionalresponse, Electronic Journal of Statistics, 5, 159-171.

14. Ferraty, F., Van Keilegom, I. and Vieu, P. (2012), Regression when both response andpredictor are functions, Journal of Multivariate Analysis, 109, 10-28.


15. Ferraty, F. and Vieu, P. (2006), Nonparametric functional data analysis, Springer Seriesin Statistics, New York.

16. Ferraty, F. and Vieu, P. (2009), Additive prediction and boosting for functional data,Computational Statistics and Data Analysis, 53, 1400-1413.

17. Goldsmith, J., Bobb, J., Crainiceanu, C. M., Caffo, B. S., and Reich, D. (2011), PenalizedFunctional Regression, Journal of Computational and Graphical Statistics, 20(4), 830-851.

18. Goldsmith, J., Greven, S., and Crainiceanu, C. M. (2013), Corrected confidence bands forfunctional data, Biometrics, 69(1), 41-51.

19. Greven, S., Crainiceanu, C. M., Caffo, B. S., and Reich, D. (2010), Longitudinal FunctionalPrincipal Component Analysis, Electronic Journal of Statistics, 4, 1022-1054.

20. Greven, S., Crainiceanu, C. M., Kuchenhoff, H., and Peters, A. (2008) Restricted likelihoodratio testing for zero variance components in linear mixed models. Journal of Computa-tional and Graphical Statistics, 17(4), 870-891.

21. He, G., Muller, H.-G., Wang, J.-L., and Wang, W. (2010), Functional Linear Regressionvia Canonical Analysis, Bernoulli, 16(3), 705-729.

22. Horvath, L. and Kokoszka, P. (2012) Inference for Functional Data with Applications,Springer Series in Statistics, New York.

23. Huang, L., Goldsmith, J., Reiss, P. T., Reich, D. S., and Crainiceanu, C. M. (2013),Bayesian scalar-on-image regression with application to association between intracranialDTI and cognitive outcomes, NeuroImage, 83, 210-223.

24. Kadri, H., Preux, P., Duflos, E., and Canu, S. (2011). Multiple Functional Regressionwith both Discrete and Continuous Covariates. In Ferraty, F., editor, Recent Advancesin Functional Data Analysis and Related Topics, Contributions to Statistics, 189-195.Physica-Verlag HD.

25. Krivobokova, T. and Kauermann, G. (2007), A note on penalized spline smoothing withcorrelated errors, JASA, 102 (480), 1328-1337.

26. Lindquist, M. A. (2012), Functional causal mediation analysis with an application to brainconnectivity, Journal of the American Statistical Association, 107(500), 1297-1309.

27. Matsui, H., Kawano, S., and Konishi, S. (2009). Regularized functional regression modelingfor functional response and predictors. Journal of Math for Industry, 1(3), 17-25.

28. McLean, M. W., Hooker, G, Staicu, A.-M., Scheipl, F., and Ruppert, D. (2014), Functionalgeneralized additive models, Journal of Computational and Graphical Statistics, 23(1),249-269.

29. Malfait, N. and Ramsay, J. O. (2003), The historical functional linear model, The CanadianJournal of Statistics, 31(2), 185-201.

30. Matlab, The MathWorks, Inc., Natick, Massachusetts, United States, 2014.31. Nychka, D. (1988), Confidence Intervals for Smoothing Splines, Journal of the American

Statistical Association, 83, 1134-1143.32. Ozturk, A., Smith, S. A., Gordon-Lipkin, E. M., Harrison, D. M., Shiee, N., Pham, D.

L., Caffo, B. S., Calabresi, P. A., and Reich, D. S. (2010), MRI of the corpus callosum inmultiple sclerosis: association with disability, Multiple Sclerosis, 16(2), 166-177.

33. R Development Core Team (2014), R: a Language and Environment for Statistical Com-puting, R Foundation for Statistical Computing, Vienna, Austria, www.R-project.org.

34. Ramsay, J. O., Wickham, H., Graves, S., and Hooker, G. (2014), fda: Functional DataAnalysis. Website: http://CRAN.R-project.org/package=fda

35. Ramsay, J. O., Hooker, G., and Graves, S. (2009), Functional Data Analysis with R andMatlab, Springer, New York.

36. Ramsay, J. O., and Silverman, B. W. (2005), Functional Data Analysis, Springer-Verlag,New York.

37. Reiss, P., and Ogden, T. (2009), Smoothing Parameter Selection for a Class of Semipara-metric Linear Models, Journal of the Royal Statistical Society, Series B, 71(2), 505-523.

38. Ruppert, D., Wand, M. P., and Caroll, R. J. (2003), Semiparametric Regression, Cam-bridge University Press, Cambridge, UK.

39. Scheipl, F., Greven, S., and Kuchenhoff, H. (2008), Size and Power of Tests for a ZeroRandom Effect Variance or Polynomial Regression in Additive and Linear Mixed Models,Computational Statistics and Data Analysis, 52(7), 3283-3299.

40. Scheipl, F., Staicu, A.-M., and Greven, S. (2014), Functional Additive Mixed Models.Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2014.901914.

41. Song, S. K., Sun, S. W., Ramsbottom, M. J., Chang, C., Russell, J. and Cross, A. H.(2002), Dysmyelination revealed through MRI as increased radial (but unchanged axial)diffusion of water, Neuroimage, 17(3), 1429-1436.


42. Staicu, A.-M., Crainiceanu, C. M., Reich, D.S. and Ruppert, D. (2012), Modeling Func-tional Data with Spatially Heterogeneous Shape Characteristics, Biometrics, 68(2), 331-343.

43. Tievsky, A. L., Ptak, T. and Farkas, J. (1999), Investigation of apparent diffusion coefficientand diffusion tensor anisotropy in acute and chronic multiple sclerosis lesions, AmericanJournal of Neuroradiology, 20(8), 1491-1499.

44. Valderrama, M. J., Ocana, F. A., Aguilera, A. M., and Ocana-Peinado, F. M. (2010).Forecasting pollen concentration by a two-step functional model. Biometrics, 66(2):578-585.

45. Wahba, G. (1983), Bayesian ‘Confidence Intervals’ for the Cross-Validated SmoothingSpline, Journal of the Royal Statistical Society, Series B, 45, 133-150.

46. Wood, S. N. (2006), Generalized Additive Models: An Introduction with R, Chapman &Hall/CRC, New York.

47. Wood, S. N. (2011), Fast Stable Restricted Maximum Likelihood and Marginal LikelihoodEstimation of Semiparametric Generalized Linear Models, Journal of the Royal StatisticalSociety, Series B, 73(1), 3-36.

48. Wood, S. N. (2014), mgcv: Mixed GAM computation vehicle with GCV/AIC/REMLsmoothness estimation. Website: http://CRAN.R-project.org/package=mgcv

49. Wu, Y., Fan, J., and Muller, H.-G. (2010), Varying-coefficient Functional Linear Regres-sion, Bernoulli, 16(3), 730-758.

50. Yao, F., Muller, H.-G., and Wang, J.-L. (2005a), Functional Data Analysis for SparseLongitudinal Data, Journal of the American Statistical Association, 100(740), 577–590.

51. Yao, F., Muller, H.-G., and Wang, J.-L. (2005b), Functional Linear Regression Analysisfor Longitudinal Data, The Annals of Statistics, 33(6), 2873-2903.

52. Yao, F. and Muller, H.G. (2010), Functional quadratic regression, Biometrika, 97(1), 49-64.53. Zhang, J. T., and Chen, J. (2007). Statistical inferences for functional data. The Annals

of Statistics, 35(3), 1052-1079.

penalized function-on-function regressionstaicu/papers/pffr_cost_2014_11_v0.pdfpffr function of the...

Documents