research article comparative analysis for robust penalized ... · cv ( ) = 1 =1 ... another more...

12
Research Article Comparative Analysis for Robust Penalized Spline Smoothing Methods Bin Wang, Wenzhong Shi, and Zelang Miao Department of Land Surveying and Geo-Informatics, e Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Correspondence should be addressed to Wenzhong Shi; [email protected] Received 2 May 2014; Accepted 17 June 2014; Published 16 July 2014 Academic Editor: Yoshinori Hayafuji Copyright © 2014 Bin Wang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Smoothing noisy data is commonly encountered in engineering domain, and currently robust penalized regression spline models are perceived to be the most promising methods for coping with this issue, due to their flexibilities in capturing the nonlinear trends in the data and effectively alleviating the disturbance from the outliers. Against such a background, this paper conducts a thoroughly comparative analysis of two popular robust smoothing techniques, the -type estimator and -estimation for penalized regression splines, both of which are reelaborated starting from their origins, with their derivation process reformulated and the corresponding algorithms reorganized under a unified framework. Performances of these two estimators are thoroughly evaluated from the aspects of fitting accuracy, robustness, and execution time upon the MATLAB platform. Elaborately comparative experiments demonstrate that robust penalized spline smoothing methods possess the capability of resistance to the noise effect compared with the nonrobust penalized LS spline regression method. Furthermore, the -estimator exerts stable performance only for the observations with moderate perturbation error, whereas the -estimator behaves fairly well even for heavily contaminated observations, but consuming more execution time. ese findings can be served as guidance to the selection of appropriate approach for smoothing the noisy data. 1. Introduction Penalized spline smoothing technique has already been per- ceived as a popular nonparametric smoothing approach for smoothing noisy data in engineering domain [13] during the past 20 years due to its ease of fitting, flexible choice of the inner knots, and smoothing parameter. e ideas of employ- ing the combination of regression splines, with substantially smaller number of inner knots than the sample size, and the introduction of the penalized parameter, which compromises the fitting accuracy and the smoothness of the regression curve, can be traced back to at least O’Sullivan [4] who uti- lized a cubic B-spline basis for estimation in ill-posed inverse problems. During the early stage, Kelly and Rice [5] and Besse et al. [6] approximated the smoothing splines by the hybrid splines equipped with the inner knots equal to the data samples and a penalty parameter for determining the smoothing amount of regression curve, which was usually calculated according to the (generalized) cross-validation criterion. Other criterions included the Mallows’ criterion [7] and the Akaike infor- mation criterion [8, 9]. Besides, Eilers and Marx [10] pro- posed flexible criteria for choosing an optimal penalty param- eter of dealing with the B-splines coefficients. Wand [11] addi- tionally derived a quick and simple formula approximation to the optimal penalty parameter for the penalized spline regres- sion. Moreover, Ruppert and Carroll [12] investigated the spatially varying penalties and Ruppert [13] recommended the selection of designed inner knots. More discussion and examples on the penalized regression spline models can be referred to the excellent monograph written by Ruppert et al. [14]. e subsequent explorations on penalized spline smooth- ing remained a hot area arousing great interests from both theory researchers and practice engineers for several years. For example, the theoretical aspects and practical applica- tions of penalized spline smoothing were fully discussed in a dissertation composed by Krivobokova [15] from financial mathematics. Crainiceanu et al. [16] investigated the imple- mentation of nonparametric Bayesian analysis for penalized Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 642475, 11 pages http://dx.doi.org/10.1155/2014/642475

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

Research ArticleComparative Analysis for Robust PenalizedSpline Smoothing Methods

Bin Wang Wenzhong Shi and Zelang Miao

Department of Land Surveying and Geo-Informatics The Hong Kong Polytechnic University Hung Hom Kowloon Hong Kong

Correspondence should be addressed to Wenzhong Shi lsgiwzshigmailcom

Received 2 May 2014 Accepted 17 June 2014 Published 16 July 2014

Academic Editor Yoshinori Hayafuji

Copyright copy 2014 Bin Wang et alThis is an open access article distributed under theCreativeCommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Smoothing noisy data is commonly encountered in engineering domain and currently robust penalized regression spline modelsare perceived to be themost promisingmethods for coping with this issue due to their flexibilities in capturing the nonlinear trendsin the data and effectively alleviating the disturbance from the outliers Against such a background this paper conducts a thoroughlycomparative analysis of two popular robust smoothing techniques the119872-type estimator and 119878-estimation for penalized regressionsplines both of which are reelaborated starting from their origins with their derivation process reformulated and the correspondingalgorithms reorganized under a unified framework Performances of these two estimators are thoroughly evaluated from theaspects of fitting accuracy robustness and execution time upon the MATLAB platform Elaborately comparative experimentsdemonstrate that robust penalized spline smoothing methods possess the capability of resistance to the noise effect comparedwith the nonrobust penalized LS spline regression method Furthermore the 119872-estimator exerts stable performance only forthe observations with moderate perturbation error whereas the 119878-estimator behaves fairly well even for heavily contaminatedobservations but consumingmore execution timeThese findings can be served as guidance to the selection of appropriate approachfor smoothing the noisy data

1 Introduction

Penalized spline smoothing technique has already been per-ceived as a popular nonparametric smoothing approach forsmoothing noisy data in engineering domain [1ndash3] during thepast 20 years due to its ease of fitting flexible choice of theinner knots and smoothing parameter The ideas of employ-ing the combination of regression splines with substantiallysmaller number of inner knots than the sample size and theintroduction of the penalized parameter which compromisesthe fitting accuracy and the smoothness of the regressioncurve can be traced back to at least OrsquoSullivan [4] who uti-lized a cubic B-spline basis for estimation in ill-posed inverseproblems

During the early stage Kelly and Rice [5] and Besse et al[6] approximated the smoothing splines by the hybrid splinesequippedwith the inner knots equal to the data samples and apenalty parameter for determining the smoothing amount ofregression curve which was usually calculated according tothe (generalized) cross-validation criterion Other criterions

included the Mallowsrsquo 119862119901 criterion [7] and the Akaike infor-mation criterion [8 9] Besides Eilers and Marx [10] pro-posed flexible criteria for choosing an optimal penalty param-eter of dealing with the B-splines coefficientsWand [11] addi-tionally derived a quick and simple formula approximation tothe optimal penalty parameter for the penalized spline regres-sion Moreover Ruppert and Carroll [12] investigated thespatially varying penalties and Ruppert [13] recommendedthe selection of designed inner knots More discussion andexamples on the penalized regression spline models can bereferred to the excellent monograph written by Ruppert et al[14]

The subsequent explorations on penalized spline smooth-ing remained a hot area arousing great interests from boththeory researchers and practice engineers for several yearsFor example the theoretical aspects and practical applica-tions of penalized spline smoothing were fully discussed in adissertation composed by Krivobokova [15] from financialmathematics Crainiceanu et al [16] investigated the imple-mentation of nonparametric Bayesian analysis for penalized

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 642475 11 pageshttpdxdoiorg1011552014642475

2 Mathematical Problems in Engineering

spline regression using WinBUGS in which the MarkovChain Monte Carlo (MCMC) mixing operator was sub-stantially improved by employing the low rank thin platesplines basis Besides Marra and Radice [2] presented anaccessible overview of generalized additive models based onthe penalized regression splines and explored their potentialapplications into medical research Generally the penalizedregression spline model is referred to the estimation ofunivariate smooth functions based on noisy data Howeversome researchers also put forward its extension into higherdimension Typical models include the thin plate regressionsplines [17] which are low rank smoothers obtained bytruncating the basis of standard thin plate splines therebyrequiring less computation to fit to large data sets and thepenalized bivariate splines [18] with application on triangu-lations for irregular regions

The regression curve for penalized spline is establishedby minimizing the sum of squared residuals subject to abound on the size of the spline coefficients Accordingly aclosed-form formula can be derived from the penalized min-imization problem Apparently such penalized least squaresregression splines cannot be resistant to the disturbancefrom the outliers A simple and direct idea is to replace thesquared residuals by some slower-increasing loss functionsimilarly as employed in119872-regression estimator [19 20] foralleviating the outlier effects from the atypical observationsEarly research regarding the 119872-type smoothing techniquestarted from Huber et al [21] and Coxrsquos [22] work on cubicregression splines Another robust model can be the 119878-estimation [23] which possesses high-breakdown point andminimizes the residual scale in an extremely robust mannerA fast algorithm has been created by Salibian-Barrera andYohai [24] for computing the 119878-regression estimates Thesetwo estimators have been elaborately compared in the linearregression by CetIn and Toka [25] with the results indicatingthat the 119878-estimator is more capable of resisting the outlierdisturbance compared with the119872-estimator but with lowerefficiency

Up until almost a decade ago however little work hasbeen delivered taking the penalized spline regression intothe category of robust estimation Oh et al [26] suggesteda robust smoothing approach implemented to the periodanalysis of variable stars by simply imposing Huberrsquos favoriteloss function upon the generalized cross-validation criterionAnd Lee and Oh [27] proposed an iterative algorithm for cal-culating the119872-type estimator of penalized regression splinesvia introducing the empirical pseudodata Afterwards Finger[28] confirmed that such 119872-type estimator compromisedbetween the very robust estimators and the very efficient leastsquare type estimators for the penalized spline regressionswhen investigating the performance of different estimationtechniques in crop insurance applications Meanwhile byreplacing the least squares estimation with a suitable 119878-estimator Tharmaratnam et al [29] put forward an 119878-estimation for penalized regression spline as well whichproves to be equivalent to a weighted penalized least squaresregression and behaves well even for heavily contaminatedobservations

Against such a background this paper performs a thor-oughly comparative analysis mainly based on these twopopular robust smoothing techniques the119872-type estimatorand 119878-estimation for penalized regression splines both ofwhich are reelaborated starting from their origins withtheir derivation process reformulated and the correspondingalgorithms reorganized under a unified framework The restof the paper is organized as follows Section 2 reelaboratesthe119872-type estimator and 119878-estimation for penalized regres-sion splines starting from the least squares estimator andreorganizes the corresponding algorithms within a unifiedframework Section 3 describes the comparative experimentsupon both simulated synthetic data and one real weatherballoon data set Concluding remarks are included in the endof Section 4

2 Robust Penalized Regression Splines

The following discussion is devoted to the reelaborationof penalized regression splines starting from the originwith their derivation processes reformulated and the corre-sponding practical algorithms reorganized under a unifiedframework

The idea of the penalized regression spline beginswith thefollowing model

119910119894 sim 119873(119898 (119909119894) 1205902

120576) 119894 = 1 119899 (1)

where 119898(119909) is supposed to be a smooth but unknownregression function which needs to be estimated based on thesample observations (119909119894 119910119894) 119894 = 1 119899 and 120590

2

120576denotes the

constant variance of the random deviation error between theresponse variable and the regression function119898(119909)

It is a flexible concept as regards the penalized splinesmoothing since different basis function may correspond todifferent penalized spline smoothing regression function Acommon selection is the truncated polynomial bases butother choices can also be explored straightway similarly

119898(119909120573) = 1205730 + 1205731119909 + sdot sdot sdot + 120573119901119909119901+

119870

sum

119895=1

120573119901+119895(119909 minus 120581119895)119901

+ (2)

where (119909 minus 120581119895)+ = max(119909 minus 120581119895 0) and 120573 = (1205730 1205731 120573119901+119870)119879

denotes a vector of regression coefficients 120581119895 120581119870 is thespecified inner knots and 119901 indicates the exponential orderfor truncated power basis

21 Penalized Least Squares Regression Splines Given a groupof sample observations (1199091 1199101) (119909119899 119910119899) an increasinglypopular way for obtaining the estimation of 119898(sdot) is viatransforming it into the category of a least squares problemwhere we need to seek out themember from the class119898(119909120573)which minimizes the sum of squared residuals To preventoverfitting a constraint on the spline coefficients is alwaysimposed that is

min120573isinR119901+119870+1

119899

sum

119894=1

(119910119894 minus 119898 (119909119894120573))2 subject to

119870

sum

119895=1

1205732

119901+119895le 119862

(3)

Mathematical Problems in Engineering 3

for some constant 119862 gt 0 as discussed in Ruppert et al [14]Thus by employing the Lagrange multiplier the coefficientsof the penalized least squares regression spline LS are equiv-alent to the minimizer of

119899

sum

119894=1

(119910119894 minus 119898 (119909119894120573))2+ 120582

119870

sum

119895=1

1205732

119901+119895 (4)

for some smoothing (penalty) parameter 120582 ge 0 with respectto120573 which compromises the bias and variance tradeoff of thefitted regression spline curve

Introduce119865(119909) = (1 119909 119909119901 (119909 minus 1205811)119901

+ (119909 minus 119896119870)119901

+)119879

the spline design matrix

X = (119865(1199091)119879 119865(119909119899)

119879)119879

= (

1 1199091 sdot sdot sdot 119909119901

1(1199091 minus 1205811)

119901

+sdot sdot sdot (1199091 minus 120581119870)

119901

+

d

d

1 119909119899 sdot sdot sdot 119909

119901

119899(119909119899 minus 1205811)

119901

+sdot sdot sdot (119909119899 minus 120581119870)

119901

+

)

(5)

the vector of responses y = (1199101 119910119899)119879 and the diagonal

matrix119863 = diag(0(119901+1)times1 1119870times1) which indicates that only thespline coefficients are penalizedThus the closed-formmini-mizer for the objective function (4) can arrive by direct calcu-lations namely

LS = (X119879X+120582119863)

minus1

X119879y (6)

Consequently the corresponding estimation vector m =

((1199091) (119909119899))119879 generated by the penalized least square

regression spline can be represented as

mLS = XLS = X(X119879X + 120582119863)minus1

X119879y (7)

For figuring out the penalty parameter 120582 in (7) the (general-ized) cross-validation technique is employed hereafter whichis computed by leave-one-out of the residual from sum ofsquares so as to avoid overfitting in the regression splineThecross-validation criterion is given by

CV (120582) = 1119899

119899

sum

119894=1

(119910119894 minus minus119894(119909119894 120582))2 (8)

where minus119894(119909119894 120582) is the regression spline estimated by leavingout the 119894th observation point The optimized penalty param-eter 120582 can be calculated byminimizing CV(120582) over 120582 ge 0 Forthe linear smoothing matrix 119867120582 = X(X119879X + 120582119863)minus1X119879 andit can be verified [14] that

minus119894 (119909119894 120582) = sum

119895 = 119894

(119867119894119895

120582

1 minus 119867119894119895

120582

119910119894) (9)

Substitution of minus119894(119909119894 120582) in (8) with the formula (9) resultsin

CV (120582) = 1119899

119899

sum

119894=1

(119910119894 minus (119909119894 120582)

1 minus 119867119894119894

120582

)

2

(10)

The generalized cross-validation criterion is proposed [30]simply by replacing 119867119894119894

120582in (10) with their average tr(119867120582)119899

namely

GCV (120582) = 119899119899

sum

119894=1

(119910119894 minus (119909119894 120582)

119899 minus tr(119867120582))

2

(11)

22 119872-Type Estimator for Penalized Regression Splines Theabove estimate mLS of the penalized least squares regressionsplines may greatly suffer from various robustness issuesespecially when the data is contaminated with outliers Foralleviating this effect a penalized regression estimator canbe straightforwardly constructed by replacing the previoussquared residual loss function with the following 119872-typecriterion [27]

119899

sum

119894=1

120588(119910119894 minus 119898 (119909119894120573))2+ 120582

119870

sum

119895=1

1205732

119901+119895 (12)

where 120588 is even nondecreasing in [0 +infin) and 120588(0) = 0 forwhich a common choice is taken of the Huber loss functionwith cutoff 119888 gt 0

120588119888 (119905) =

1199052

|119905| le 119888

2119888 |119905| minus 1198882|119905| gt 119888

(13)

Apparently the above Huberrsquos function is a parabola in thevicinity of zero and it increases linearly at a given level |119905| gt119888 A default choice of the tuning constant 119888 = 1345 aimsfor a 95 asymptotic efficiency with respect to the standardnormal distribution

Denote the derivative of 120588119888 as 120595119888(119904) = max[minus119888min(119888 119904)]When a set of fitted residuals 119903119894 = 119910119894 minus 119898(119909119894120573) 119894 = 1 119899from the penalized spline regression are given a robust119872-scale estimator [19 20] can be employed for estimating thestandard deviation 120590120576 of these residuals as follows

119899

sum

119894=1

120595119888 (119910119894 minus 119898 (119909119894120573)

120576

) = 0 (14)

A common choice for a measure of scale with a high finitesample breakdown point can be the value of 120596 determined by

119875 (1003816100381610038161003816119903 minus 11990305

1003816100381610038161003816 lt 120596) =1

2 (15)

where 11990305 is the population median Besides the standardestimation of 120596 usually employs the median absolute devi-ation (MAD) statistic which is defined as

MAD = Median 10038161003816100381610038161199031 minus1198721199031003816100381610038161003816

1003816100381610038161003816119903119899 minus1198721199031003816100381610038161003816 (16)

where119872119903 is the usual sample median of the fitted residualsand MAD is actually the sample median of the 119899 values|1199031minus119872119903| |119903119899minus119872119903| with its finite sample breakdown pointapproximately being 05 [20] Suppose the fitted residualsare randomly sampled from a normal distribution note that

4 Mathematical Problems in Engineering

Step 0 Input an initial curve estimate m(0)119901 termination tolerance 120576 and the maximum iteration

number Itermax Meanwhile set 119896 = 0 and conduct the following loop iterationsStep 1 Estimate 120576 by using MADN in (17)Step 2 Produce the empirical pseudo data according to (19)Step 3 Calculate the penalized least-square estimator m(119896+1)

119901defined in (20) for the pseudo data

where the penalty parameter 120582 is chosen according to the generalized cross-validation(GCV) criterion formulated by (11) in the foregoing subsection

Step 4 If m(119896+1)119901

minus m(119896)119901 lt 120576m(119896)

119901 or 119896 = Itermax terminate and output the robust

119872-type penalized spline estimate m119872 = m(119896+1)119901

else set 119896 = 119896 + 1 and continue Step 1

Algorithm 1119872-type iterative algorithm for penalized regression splines

MAD does not estimate the corresponding standard devia-tion 120576 but rather estimates119885075120576 (with119885075 being the 075quantile of the standard normal distribution) To putMAD ina more familiar context it typically needs to be rescaled so asto estimate 120576 especially when the residuals are sampled froma normal distribution In particular

MADN = MAD119911075

asymp 14836MAD (17)

By utilizing the aforementioned spline basis functions an119872-type penalized spline estimator using the standardizedresiduals can be computed as m119872 = X

119872 with its regression

coefficients being as follows

119872= argmin120573

119899

sum

119894=1

120588119888 (119910119894 minus 119898 (119909119894120573)

120576

) + 120582120573119879119863120573 (18)

Due to the nonlinear nature of 120588119888 and 120595119888 it is nontrivial toseek the minimizer of (18) while satisfying (14) Under suchcircumstances an119872-type iterative algorithm for calculatingthe penalized regression splines is proposed [27] by introduc-ing the empirical pseudodata

z = m119901 +120595119888 (y minus m119901)

2 (19)

where m119901 is the penalized least square estimator correspond-ing to the above pseudodata

m119901 = X(X119879X + 120582119863)minus1

X119879z (20)

As proved inTheorem 1 of Lee and Ohrsquos [27] work the penal-ized least square estimator m119901 converges to m119872 asymptoti-cally which implies an119872-type iterative algorithm consistingof the steps in Algorithm 1

Regarding the119872-type estimator for the penalized regres-sion splines (see Algorithm 1) the initial curve estimate m(0)

119901

in Step 0 is suggested [27] choosing the nonrobust penalizedleast square regression spline presented by (7)

23 119878-Estimation for Penalized Regression Splines The robust-ness properties of the aforementioned119872-type estimator for

penalized regression splines are not completely satisfactorysince they have low breakdown point [31] which was clarifiedby Maronna and Yohai [32] for the general 119872-estimatorswhen considering both the regression and scale simultane-ously

Another more robust and relatively latest approach forsmoothing the noisy data is the 119878-estimation for penal-ized regression splines [29] whose core idea is utilizing aflexible 119878-estimator to replace the traditional least squaresestimator The coefficient vector of the penalized 119878-regressionsplines

119878is defined as follows

119878= argmin120573

1198992

120576(120573) + 120582120573

119879119863120573 (21)

in which for each 120573 2120576(120573) satisfies

1

119899

119899

sum

119894=1

120588(119910119894 minus 119865(119909119894)

119879120573

120576 (120573)) = 119864Φ (120588 (119885)) (22)

where Φ is cumulative distribution function of the standardnorm distribution119885 sim 119873(0 1) and the loss function is takenfrom Tukeyrsquos bisquare family [33]

120588119889 (119906) =

3(119906

119889)

2

minus 3(119906

119889)

4

+ (119906

119889)

6

|119906| le 119889

1 |119906| gt 119889

(23)

It is evidently known from theoretical assumption that eachfitted residual vector should follow a normal distributionwithmean of zero that is r119894 sim 119873(0 1205902120576 ) 119894 = 1 119899 howeverits median statistic Median(r) in which r = (1199031 119903119899)

119879may greatly depart from zero to a large extent in the generalpractice Considering this we ameliorate the estimation of120576(120573)byusingMADN in (17)with the addition of the absolutevalue of residual median namely

120576 (120573) = MADN +Median (r (120573)) (24)

The penalized 119878-estimator for the regression spline model iscomputed as m119878 = X

119878 with its coefficients vector

119878satisfy-

ing both (21) and (22) simultaneouslyThe following formuladerivations indicate that such penalized 119878-estimator can

Mathematical Problems in Engineering 5

actually be equivalent to a weighted penalized least squaresregression

Before proceeding with the derivation it is forewarnedthat the symbol

119878is simplywritten as120573 just for clarity Firstly

when (21) reaches its minimum the following equationshould be satisfied

119899120576 (120573) nabla120576 (120573) + 120582119863120573 = 0 (25)

On the other hand taking the derivative of the 119872-scalefunction in (22) with respect to 120573 leads to

119899

sum

119894=1

1205881015840(119903119894 (120573)

120576 (120573))(

minus119865(119909119894)119879120576 (120573) minus 119903119894 (120573) nabla120576(120573)

119879

2120576(120573)

) = 0

(26)

inwhich 119903119894(120573) = 119910119894minus119865(119909119894)119879120573 119903119894(120573) = 119903119894(120573)120576(120573) andΛ(120573) =

diag(1205881015840(119903119894(120573))119903119894(120573)) 119894 = 1 119899Deformation of the above formula produces

nabla120576 (120573) = minussum119899

119894=11205881015840(119903119894 (120573) 120576 (120573)) 119865 (119909119894)

sum119899

119894=11205881015840 (119903119894 (120573) 120576 (120573)) (119903119894 (120573) 120576 (120573))

= minus(120576 (120573)X119879Λ (120573) r (120573)r(120573)119879Λ (120573) r (120573)

)

(27)

Substituting nabla120576(120573) of (27) into (25)

minus119899120576(120573)

2

119903(120573)119879Λ (120573) r (120573)

X119879Λ (120573) r (120573) + 120582119863120573 = 0 (28)

Recall that r(120573) = y minus X120573 and import new symbol 120591(120573) =119899120576(120573)

2r(120573)119879Λ(120573)r(120573) we have

minus120591 (120573)X119879Λ (120573) (y minus X120573) + 120582119863120573 = 0 (29)

And thus a further simple deformation arrives at the coeffi-cient vector

119878with an iteration form

120573 = (X119879Λ (120573)X + 120582

120591 (120573)119863)

minus1

X119879Λ (120573) y (30)

which forms the basis of constructing an iterative algorithmfor penalized 119878-regression spline

The above formula derivations verify that 119878is actually

the solution to the following weighted penalized least squaresregression problem

min120573

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

+120582

120591 (120573)120573119879119863120573 (31)

And such representation indicates that the GCV criterion in(11) can be employed straightforwardlywith response variabley = Λ(120573)12y weighted design matrix X = Λ(120573)12X

and the penalty term 120582120591(120573) Meanwhile due to someweight elements being zero the penalty parameter 120582 is thuscalculated based on the following RegularizedGCV criterion

RGCV (120582) = 119899119908

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

(119899119908 minus tr (120582))2 (32)

where 120582 = X(X119879X + (120582120591(120573))119863)

minus1

X119879 = Λ(120573)12X(X119879Λ(120573)X+(120582120591(120573))119863)minus1X119879Λ(120573)12 and 119899119908 denotes the number of non-zero elements in weight matrix Λ(120573)

Consequently based on the foregoing in-depth dis-cussions the 119878-estimator iterative algorithm for penalizedregression splines can be constructed as in Algorithm 2

In particular it should be pointed out that since theemployed loss function 120588119889 is bounded giving rise to the non-convexity of 120576(120573) the objective function in (21) may haveseveral critical points each of which only corresponds to onelocal minima Therefore the convergent critical point ofthe iterations derived from the above algorithm completelydepends on the prescribed initial regression coefficients vec-tor (0)

in Step 0 of Algorithm 2One possible solution to cope with this issue proposed by

Tharmaratnam et al [29] is based on the random samplingtheory which can be briefly described as firstly creating 119869random subsamples with a proper size for example one fifthof the sample number namely 119899119904 = max(119901 + 119870 + 1 lfloor1198995rfloor)from the original sample observations Each subsam-ple brings about an initial candidate of the coefficient vec-tor for instance assigned by least squares fitting for thepenalized regression splines or the 119872-type estimator witha small amount of additional computation cost with theinitial regression coefficient vector of the 119895th subsamplebeing denoted by

(0)

119895 staring from which the final iterative

convergent vector produced through Algorithm 2 is beingwritten as

119895 119895 = 1 119869 Eventually the global optimum

is taken among all these 119869 convergent vectors to be the onecorresponding to the minimum objective function in (21)namely

119878= argmin1le119895le119869

1198992

120576(⋆

119895) + 120582

⋆119879

119895119863⋆

119895 (33)

Therefore the robust 119878-estimator for penalized regressionsplines can be computed as m119878 = X

119878

3 Experimental Performance Evaluation

This section is devoted to the comparative experimentsupon both simulated synthetic data and one real weatherballoon data set for the foregoing explored penalized splinesmoothing methods including the nonrobust penalized leastsquare regression spline (LS) presented by (7) and two robustsmoothing approaches the 119872-type estimator [27] and 119878-estimation [29] for penalized regression splines as describedin Algorithms 1 and 2

6 Mathematical Problems in Engineering

Step 0 Input an initial regression coefficients (0)

the termination tolerance 120576 and the maximumiteration number Itermax Meanwhile set 119896 = 0 and enter into the following loop iterations

Step 1 Estimate 120576 based on (24) that is

120576 ((119896)

) = MADN +Median(r((119896)

))

Step 2 Compute the intermediate variables 120591((119896)

) and Λ((119896)

)

Step 3 Calculate the coefficient vector (119896+1)

according to (30) as

(119896+1)

= (X119879Λ((119896)

)X + 120582

120591((119896)

)

119863)

minus1

X119879Λ(120573(119896)) y

Hereinto the penalty parameter 120582 is figured out based on the regularized GCV criterion (32)

Step 4 If (119896+1)

minus (119896)

lt 120576(119896)

or 119896 = Itermax terminate and output the convergentcoefficient vector for the penalized regression spline of 119878-estimator

995333

= (119896+1)

elseaccumulate the counter by setting 119896 = 119896 + 1 and repeat Step 1

Algorithm 2 119878-estimator iterative algorithm for penalized regression splines

31 Configuration and Criteria It should be incidentallymentioned that the exponential order 119901 = 3 for truncatedpower basis is always employed for all these three methodsand the knot locations are selected in accordance withthe recommended choice by Ruppert et al [14] 120581119896 =

((119896 + 1)(119870 + 2))th sample quantile of the Unique(x) 119896 =1 119870 together with a simple choice of the knot num-ber 119870 = min(35 (14) times Number of Unique(x)) wherex = (1199091 1199092 119909119899)

119879 Besides the corresponding penaltyparameter 120582 is determined according to the generalizedcross-validation (GCV) criterion for the LS fitting and 119872-estimation methods and the regularized GCV criterion forthe 119878-estimation method Meanwhile the termination toler-ance 120576 = 10minus6 and the maximum iteration number of 100 areassigned for the initial input parameters in Algorithms 1 and2 and 119869 = 5 subsamples are randomly generated for exploringthe ultimate optimum

Meanwhile the hardware equipment and software envi-ronment established for the experiments below are DellXPS 8700 Desktop (CPU Intel Core i7-4770 340GHz)MATAB R2014a installed upon Windows 7 SP1 Ultimate(64 Bit)

32 Synthetic Data The simulated study is designed the sameas in Lee and Oh [27] with the testing function

119910119894 = sin (2120587(1 minus 119909119894)2) + 05120576119894 119894 = 1 119899 (34)

The designed samples are generated with their horizontalaxes being uniformly extracted from the open interval (0 1)and the corresponding error distribution 120576 employing ninepossible scenarios with the probability distribution function(pdf) specified if necessary including (1) uniform distribu-tion 119880(minus1 1) (2) standard normal distribution 119873(0 1) (3)logistic distribution with its pdf as 119890119909(1 + 119890119909)2 (4) doubleexponential distribution with its pdf as ((12)119890minus|119909|) (5) con-taminated normal distribution I 095119873(0 1) + 005119873(0 10)

that is the error constituted with 95 from 119873(0 1) and5 from 119873(0 10) (6) contaminated normal distribution II09119873(0 1) + 01119873(0 10) (7) slash distribution defined fromtwo statistically independent random variables following (1)and (2) that is119873(0 1)119880(0 1) (8) Cauchy distribution withthe pdf as 1120587(1 + 1199092) whose simulated outliers can be gen-erated from uniform distribution by inverse transformationtechnique [34] and (9) an asymmetric normal mixture dis-tribution 09119873(0 1) + 01119873(30 1) For the sake of reducingthe simulation variability the sample size is taken as a fixedvalue 119899 = 200 It should be mentioned incidentally that theabove (1)sim(8) error scenarios are symmetrically distributedand provided with the ascending order according to theheaviness of their tails and scenario (9) denotes the standardnormal distribution mixed with 10 asymmetric outliersfrom119873(30 1)

Figure 1 presents the spline regression curves from thepenalized LS fitting 119872-estimator and 119878-estimator for onetime simulated scattered samples in nine possible scenarioseach of which with a subfigure panel demonstrating the com-parative regression results for these three penalized splinesmoothing methods

For proceeding further quantitative evaluation of thecomparative analysis for these three penalized spline smooth-ingmethods the statistical indicator of average squared error(ASE) is utilized here to qualify the goodness of fittingbetween the regression spline 119895(119909) for the 119895th simulationreplicate from each of the above outlier scenarios and the truesimulated function119898(119909) = sin(2120587(1 minus 119909)2)

ASE119895 =1

119899

119899

sum

119894=1

(119898 (119909119894) minus 119895(119909119894))2

119895 = 1 119869 (35)

with the specified replicate number 119869 = 103 for each distribu-tion

Performances of the penalized spline smoothing meth-ods are thoroughly evaluated from the aspects of the fitting

Mathematical Problems in Engineering 7

0 02 04 06 08 1minus2

minus1

0

1

2 Uniform

(a)

0 02 04 06 08 1minus2

minus1

0

1

2

3 Normal

(b)

0 02 04 06 08 1minus4

minus2

0

2

4 Logistic

(c)

0 02 04 06 08 1minus4

minus2

0

2

4 D-exponential

(d)

0 02 04 06 08 1minus15

minus10

minus5

0

5

10 CN1

(e)

0 02 04 06 08 1minus10

minus5

0

5

10 CN2

(f)

minus20

minus10

0

10

20 Slash

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(g)

minus10

minus5

0

5

10

15 Cauchy

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(h)

minus5

0

5

10

15 Asymmetric

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(i)

Figure 1 Spline regression curves (with true testing function in black solid line) from the penalized LS fitting (blue dotted)119872-estimator(green dot-dashed) and 119878-estimator (red dashed) for the simulated samples (black circle-dot) in 9 possible scenarios

accuracy robustness and execution time In the followingFigures 2 and 3 illustrate the box plots of both ASEs and coderunning times using (a) penalized LS-estimation (b) penal-ized 119872-estimation and (c) penalized 119878-estimation for thespline smoothing methods as well as their median valuecomparisons of these two indicators for the simulated sam-ples from each of the 9 distribution scenarios

From the ASE comparisons in Figure 2 it can be foundthat the penalized LS-estimation behaves well for the sym-metric error scenarios with small tail weights such as thesimple uniform and standard normal distribution Howeverit cannot cope with the complex conditions such as thethree latter error distributions possessing heavy tails therebybinging about large ASEs which are specially rescaled to

8 Mathematical Problems in Engineering

0

005

01

015

02

025

ASE

for L

S fit

ting

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

25

20

15

10

5

0

(a)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for M

-esti

mat

or

(b)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for S

-esti

mat

or

(c)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

05

1

15

2

25

0

005

01

015

02

025

Med

ian

ASE

LS fittingM-estimatorS-estimator

(d)

Figure 2 Box plots of ASEs using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of Median ASE for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

the additive vertical coordinate on the right so as to beaccommodated with the ASEs results for other error sourcesin the same window (refer to panels (a) and (d) in Figure 2)In comparison the robust penalized 119872-type estimationpresents satisfactory regression fittings for the distributionswith moderate tail weights like (2)sim(5) scenarios which arelocated at the middle in each panel Meanwhile the robust 119878-estimator provides the comparatively meritorious smoothingresults especially for the complicated circumstances like slashCauchy or the asymmetric distributions with heavy tails

In terms of the robustness LS-estimator simply does nothave the qualifications to be mentioned from this aspectInstead the 119872-estimator demonstrates stable performancefor the designed samples with moderate perturbation errorand the 119878-estimator behaves dominantly stable for all kindsof error source scenarios

Unfortunately as for the execution efficiency the mosttime-elapsed one is the 119878-estimator followed by the 119872-estimator with the least consuming estimator being the LS-estimator since it has been treated as the initial curve estimatefor the two former approaches Moreover although both 119878-estimator and119872-estimator are composed of similar iterative

process the random sampling theory is further involved inthe former model for exploring the ultimate solution for anonconvex optimization problem

33 Balloon Data This section is devoted to the comparativeexperiments of the aforementioned penalized spline smooth-ingmethods upon a set of weather balloon data [35]The dataconsist of 4984 observations collected from a balloon at about30 kilometers altitude with the outliers caused due to the factthat the balloon slowly rotates causing the ropes from whichthe measuring instrument is suspended to cut off the directradiation from the sun

As illustrated above Figure 4 displays these balloonobservational data together with the regression fitting curvesby the penalized LS method the robust penalized 119872-estimation method and the robust 119878-estimation methodBesides it can also be explicitly discovered by contrast thatthe non-robust LS regression curve suffers from the dis-turbance of the outliers and this phenomenon seems evenmore significantly around the position of horizontal axisbeing 08 while the other two robust splines pass throughapproximately the majority of the observations Meanwhile

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

2 Mathematical Problems in Engineering

spline regression using WinBUGS in which the MarkovChain Monte Carlo (MCMC) mixing operator was sub-stantially improved by employing the low rank thin platesplines basis Besides Marra and Radice [2] presented anaccessible overview of generalized additive models based onthe penalized regression splines and explored their potentialapplications into medical research Generally the penalizedregression spline model is referred to the estimation ofunivariate smooth functions based on noisy data Howeversome researchers also put forward its extension into higherdimension Typical models include the thin plate regressionsplines [17] which are low rank smoothers obtained bytruncating the basis of standard thin plate splines therebyrequiring less computation to fit to large data sets and thepenalized bivariate splines [18] with application on triangu-lations for irregular regions

The regression curve for penalized spline is establishedby minimizing the sum of squared residuals subject to abound on the size of the spline coefficients Accordingly aclosed-form formula can be derived from the penalized min-imization problem Apparently such penalized least squaresregression splines cannot be resistant to the disturbancefrom the outliers A simple and direct idea is to replace thesquared residuals by some slower-increasing loss functionsimilarly as employed in119872-regression estimator [19 20] foralleviating the outlier effects from the atypical observationsEarly research regarding the 119872-type smoothing techniquestarted from Huber et al [21] and Coxrsquos [22] work on cubicregression splines Another robust model can be the 119878-estimation [23] which possesses high-breakdown point andminimizes the residual scale in an extremely robust mannerA fast algorithm has been created by Salibian-Barrera andYohai [24] for computing the 119878-regression estimates Thesetwo estimators have been elaborately compared in the linearregression by CetIn and Toka [25] with the results indicatingthat the 119878-estimator is more capable of resisting the outlierdisturbance compared with the119872-estimator but with lowerefficiency

Up until almost a decade ago however little work hasbeen delivered taking the penalized spline regression intothe category of robust estimation Oh et al [26] suggesteda robust smoothing approach implemented to the periodanalysis of variable stars by simply imposing Huberrsquos favoriteloss function upon the generalized cross-validation criterionAnd Lee and Oh [27] proposed an iterative algorithm for cal-culating the119872-type estimator of penalized regression splinesvia introducing the empirical pseudodata Afterwards Finger[28] confirmed that such 119872-type estimator compromisedbetween the very robust estimators and the very efficient leastsquare type estimators for the penalized spline regressionswhen investigating the performance of different estimationtechniques in crop insurance applications Meanwhile byreplacing the least squares estimation with a suitable 119878-estimator Tharmaratnam et al [29] put forward an 119878-estimation for penalized regression spline as well whichproves to be equivalent to a weighted penalized least squaresregression and behaves well even for heavily contaminatedobservations

Against such a background this paper performs a thor-oughly comparative analysis mainly based on these twopopular robust smoothing techniques the119872-type estimatorand 119878-estimation for penalized regression splines both ofwhich are reelaborated starting from their origins withtheir derivation process reformulated and the correspondingalgorithms reorganized under a unified framework The restof the paper is organized as follows Section 2 reelaboratesthe119872-type estimator and 119878-estimation for penalized regres-sion splines starting from the least squares estimator andreorganizes the corresponding algorithms within a unifiedframework Section 3 describes the comparative experimentsupon both simulated synthetic data and one real weatherballoon data set Concluding remarks are included in the endof Section 4

2 Robust Penalized Regression Splines

The following discussion is devoted to the reelaborationof penalized regression splines starting from the originwith their derivation processes reformulated and the corre-sponding practical algorithms reorganized under a unifiedframework

The idea of the penalized regression spline beginswith thefollowing model

119910119894 sim 119873(119898 (119909119894) 1205902

120576) 119894 = 1 119899 (1)

where 119898(119909) is supposed to be a smooth but unknownregression function which needs to be estimated based on thesample observations (119909119894 119910119894) 119894 = 1 119899 and 120590

2

120576denotes the

constant variance of the random deviation error between theresponse variable and the regression function119898(119909)

It is a flexible concept as regards the penalized splinesmoothing since different basis function may correspond todifferent penalized spline smoothing regression function Acommon selection is the truncated polynomial bases butother choices can also be explored straightway similarly

119898(119909120573) = 1205730 + 1205731119909 + sdot sdot sdot + 120573119901119909119901+

119870

sum

119895=1

120573119901+119895(119909 minus 120581119895)119901

+ (2)

where (119909 minus 120581119895)+ = max(119909 minus 120581119895 0) and 120573 = (1205730 1205731 120573119901+119870)119879

denotes a vector of regression coefficients 120581119895 120581119870 is thespecified inner knots and 119901 indicates the exponential orderfor truncated power basis

21 Penalized Least Squares Regression Splines Given a groupof sample observations (1199091 1199101) (119909119899 119910119899) an increasinglypopular way for obtaining the estimation of 119898(sdot) is viatransforming it into the category of a least squares problemwhere we need to seek out themember from the class119898(119909120573)which minimizes the sum of squared residuals To preventoverfitting a constraint on the spline coefficients is alwaysimposed that is

min120573isinR119901+119870+1

119899

sum

119894=1

(119910119894 minus 119898 (119909119894120573))2 subject to

119870

sum

119895=1

1205732

119901+119895le 119862

(3)

Mathematical Problems in Engineering 3

for some constant 119862 gt 0 as discussed in Ruppert et al [14]Thus by employing the Lagrange multiplier the coefficientsof the penalized least squares regression spline LS are equiv-alent to the minimizer of

119899

sum

119894=1

(119910119894 minus 119898 (119909119894120573))2+ 120582

119870

sum

119895=1

1205732

119901+119895 (4)

for some smoothing (penalty) parameter 120582 ge 0 with respectto120573 which compromises the bias and variance tradeoff of thefitted regression spline curve

Introduce119865(119909) = (1 119909 119909119901 (119909 minus 1205811)119901

+ (119909 minus 119896119870)119901

+)119879

the spline design matrix

X = (119865(1199091)119879 119865(119909119899)

119879)119879

= (

1 1199091 sdot sdot sdot 119909119901

1(1199091 minus 1205811)

119901

+sdot sdot sdot (1199091 minus 120581119870)

119901

+

d

d

1 119909119899 sdot sdot sdot 119909

119901

119899(119909119899 minus 1205811)

119901

+sdot sdot sdot (119909119899 minus 120581119870)

119901

+

)

(5)

the vector of responses y = (1199101 119910119899)119879 and the diagonal

matrix119863 = diag(0(119901+1)times1 1119870times1) which indicates that only thespline coefficients are penalizedThus the closed-formmini-mizer for the objective function (4) can arrive by direct calcu-lations namely

LS = (X119879X+120582119863)

minus1

X119879y (6)

Consequently the corresponding estimation vector m =

((1199091) (119909119899))119879 generated by the penalized least square

regression spline can be represented as

mLS = XLS = X(X119879X + 120582119863)minus1

X119879y (7)

For figuring out the penalty parameter 120582 in (7) the (general-ized) cross-validation technique is employed hereafter whichis computed by leave-one-out of the residual from sum ofsquares so as to avoid overfitting in the regression splineThecross-validation criterion is given by

CV (120582) = 1119899

119899

sum

119894=1

(119910119894 minus minus119894(119909119894 120582))2 (8)

where minus119894(119909119894 120582) is the regression spline estimated by leavingout the 119894th observation point The optimized penalty param-eter 120582 can be calculated byminimizing CV(120582) over 120582 ge 0 Forthe linear smoothing matrix 119867120582 = X(X119879X + 120582119863)minus1X119879 andit can be verified [14] that

minus119894 (119909119894 120582) = sum

119895 = 119894

(119867119894119895

120582

1 minus 119867119894119895

120582

119910119894) (9)

Substitution of minus119894(119909119894 120582) in (8) with the formula (9) resultsin

CV (120582) = 1119899

119899

sum

119894=1

(119910119894 minus (119909119894 120582)

1 minus 119867119894119894

120582

)

2

(10)

The generalized cross-validation criterion is proposed [30]simply by replacing 119867119894119894

120582in (10) with their average tr(119867120582)119899

namely

GCV (120582) = 119899119899

sum

119894=1

(119910119894 minus (119909119894 120582)

119899 minus tr(119867120582))

2

(11)

22 119872-Type Estimator for Penalized Regression Splines Theabove estimate mLS of the penalized least squares regressionsplines may greatly suffer from various robustness issuesespecially when the data is contaminated with outliers Foralleviating this effect a penalized regression estimator canbe straightforwardly constructed by replacing the previoussquared residual loss function with the following 119872-typecriterion [27]

119899

sum

119894=1

120588(119910119894 minus 119898 (119909119894120573))2+ 120582

119870

sum

119895=1

1205732

119901+119895 (12)

where 120588 is even nondecreasing in [0 +infin) and 120588(0) = 0 forwhich a common choice is taken of the Huber loss functionwith cutoff 119888 gt 0

120588119888 (119905) =

1199052

|119905| le 119888

2119888 |119905| minus 1198882|119905| gt 119888

(13)

Apparently the above Huberrsquos function is a parabola in thevicinity of zero and it increases linearly at a given level |119905| gt119888 A default choice of the tuning constant 119888 = 1345 aimsfor a 95 asymptotic efficiency with respect to the standardnormal distribution

Denote the derivative of 120588119888 as 120595119888(119904) = max[minus119888min(119888 119904)]When a set of fitted residuals 119903119894 = 119910119894 minus 119898(119909119894120573) 119894 = 1 119899from the penalized spline regression are given a robust119872-scale estimator [19 20] can be employed for estimating thestandard deviation 120590120576 of these residuals as follows

119899

sum

119894=1

120595119888 (119910119894 minus 119898 (119909119894120573)

120576

) = 0 (14)

A common choice for a measure of scale with a high finitesample breakdown point can be the value of 120596 determined by

119875 (1003816100381610038161003816119903 minus 11990305

1003816100381610038161003816 lt 120596) =1

2 (15)

where 11990305 is the population median Besides the standardestimation of 120596 usually employs the median absolute devi-ation (MAD) statistic which is defined as

MAD = Median 10038161003816100381610038161199031 minus1198721199031003816100381610038161003816

1003816100381610038161003816119903119899 minus1198721199031003816100381610038161003816 (16)

where119872119903 is the usual sample median of the fitted residualsand MAD is actually the sample median of the 119899 values|1199031minus119872119903| |119903119899minus119872119903| with its finite sample breakdown pointapproximately being 05 [20] Suppose the fitted residualsare randomly sampled from a normal distribution note that

4 Mathematical Problems in Engineering

Step 0 Input an initial curve estimate m(0)119901 termination tolerance 120576 and the maximum iteration

number Itermax Meanwhile set 119896 = 0 and conduct the following loop iterationsStep 1 Estimate 120576 by using MADN in (17)Step 2 Produce the empirical pseudo data according to (19)Step 3 Calculate the penalized least-square estimator m(119896+1)

119901defined in (20) for the pseudo data

where the penalty parameter 120582 is chosen according to the generalized cross-validation(GCV) criterion formulated by (11) in the foregoing subsection

Step 4 If m(119896+1)119901

minus m(119896)119901 lt 120576m(119896)

119901 or 119896 = Itermax terminate and output the robust

119872-type penalized spline estimate m119872 = m(119896+1)119901

else set 119896 = 119896 + 1 and continue Step 1

Algorithm 1119872-type iterative algorithm for penalized regression splines

MAD does not estimate the corresponding standard devia-tion 120576 but rather estimates119885075120576 (with119885075 being the 075quantile of the standard normal distribution) To putMAD ina more familiar context it typically needs to be rescaled so asto estimate 120576 especially when the residuals are sampled froma normal distribution In particular

MADN = MAD119911075

asymp 14836MAD (17)

By utilizing the aforementioned spline basis functions an119872-type penalized spline estimator using the standardizedresiduals can be computed as m119872 = X

119872 with its regression

coefficients being as follows

119872= argmin120573

119899

sum

119894=1

120588119888 (119910119894 minus 119898 (119909119894120573)

120576

) + 120582120573119879119863120573 (18)

Due to the nonlinear nature of 120588119888 and 120595119888 it is nontrivial toseek the minimizer of (18) while satisfying (14) Under suchcircumstances an119872-type iterative algorithm for calculatingthe penalized regression splines is proposed [27] by introduc-ing the empirical pseudodata

z = m119901 +120595119888 (y minus m119901)

2 (19)

where m119901 is the penalized least square estimator correspond-ing to the above pseudodata

m119901 = X(X119879X + 120582119863)minus1

X119879z (20)

As proved inTheorem 1 of Lee and Ohrsquos [27] work the penal-ized least square estimator m119901 converges to m119872 asymptoti-cally which implies an119872-type iterative algorithm consistingof the steps in Algorithm 1

Regarding the119872-type estimator for the penalized regres-sion splines (see Algorithm 1) the initial curve estimate m(0)

119901

in Step 0 is suggested [27] choosing the nonrobust penalizedleast square regression spline presented by (7)

23 119878-Estimation for Penalized Regression Splines The robust-ness properties of the aforementioned119872-type estimator for

penalized regression splines are not completely satisfactorysince they have low breakdown point [31] which was clarifiedby Maronna and Yohai [32] for the general 119872-estimatorswhen considering both the regression and scale simultane-ously

Another more robust and relatively latest approach forsmoothing the noisy data is the 119878-estimation for penal-ized regression splines [29] whose core idea is utilizing aflexible 119878-estimator to replace the traditional least squaresestimator The coefficient vector of the penalized 119878-regressionsplines

119878is defined as follows

119878= argmin120573

1198992

120576(120573) + 120582120573

119879119863120573 (21)

in which for each 120573 2120576(120573) satisfies

1

119899

119899

sum

119894=1

120588(119910119894 minus 119865(119909119894)

119879120573

120576 (120573)) = 119864Φ (120588 (119885)) (22)

where Φ is cumulative distribution function of the standardnorm distribution119885 sim 119873(0 1) and the loss function is takenfrom Tukeyrsquos bisquare family [33]

120588119889 (119906) =

3(119906

119889)

2

minus 3(119906

119889)

4

+ (119906

119889)

6

|119906| le 119889

1 |119906| gt 119889

(23)

It is evidently known from theoretical assumption that eachfitted residual vector should follow a normal distributionwithmean of zero that is r119894 sim 119873(0 1205902120576 ) 119894 = 1 119899 howeverits median statistic Median(r) in which r = (1199031 119903119899)

119879may greatly depart from zero to a large extent in the generalpractice Considering this we ameliorate the estimation of120576(120573)byusingMADN in (17)with the addition of the absolutevalue of residual median namely

120576 (120573) = MADN +Median (r (120573)) (24)

The penalized 119878-estimator for the regression spline model iscomputed as m119878 = X

119878 with its coefficients vector

119878satisfy-

ing both (21) and (22) simultaneouslyThe following formuladerivations indicate that such penalized 119878-estimator can

Mathematical Problems in Engineering 5

actually be equivalent to a weighted penalized least squaresregression

Before proceeding with the derivation it is forewarnedthat the symbol

119878is simplywritten as120573 just for clarity Firstly

when (21) reaches its minimum the following equationshould be satisfied

119899120576 (120573) nabla120576 (120573) + 120582119863120573 = 0 (25)

On the other hand taking the derivative of the 119872-scalefunction in (22) with respect to 120573 leads to

119899

sum

119894=1

1205881015840(119903119894 (120573)

120576 (120573))(

minus119865(119909119894)119879120576 (120573) minus 119903119894 (120573) nabla120576(120573)

119879

2120576(120573)

) = 0

(26)

inwhich 119903119894(120573) = 119910119894minus119865(119909119894)119879120573 119903119894(120573) = 119903119894(120573)120576(120573) andΛ(120573) =

diag(1205881015840(119903119894(120573))119903119894(120573)) 119894 = 1 119899Deformation of the above formula produces

nabla120576 (120573) = minussum119899

119894=11205881015840(119903119894 (120573) 120576 (120573)) 119865 (119909119894)

sum119899

119894=11205881015840 (119903119894 (120573) 120576 (120573)) (119903119894 (120573) 120576 (120573))

= minus(120576 (120573)X119879Λ (120573) r (120573)r(120573)119879Λ (120573) r (120573)

)

(27)

Substituting nabla120576(120573) of (27) into (25)

minus119899120576(120573)

2

119903(120573)119879Λ (120573) r (120573)

X119879Λ (120573) r (120573) + 120582119863120573 = 0 (28)

Recall that r(120573) = y minus X120573 and import new symbol 120591(120573) =119899120576(120573)

2r(120573)119879Λ(120573)r(120573) we have

minus120591 (120573)X119879Λ (120573) (y minus X120573) + 120582119863120573 = 0 (29)

And thus a further simple deformation arrives at the coeffi-cient vector

119878with an iteration form

120573 = (X119879Λ (120573)X + 120582

120591 (120573)119863)

minus1

X119879Λ (120573) y (30)

which forms the basis of constructing an iterative algorithmfor penalized 119878-regression spline

The above formula derivations verify that 119878is actually

the solution to the following weighted penalized least squaresregression problem

min120573

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

+120582

120591 (120573)120573119879119863120573 (31)

And such representation indicates that the GCV criterion in(11) can be employed straightforwardlywith response variabley = Λ(120573)12y weighted design matrix X = Λ(120573)12X

and the penalty term 120582120591(120573) Meanwhile due to someweight elements being zero the penalty parameter 120582 is thuscalculated based on the following RegularizedGCV criterion

RGCV (120582) = 119899119908

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

(119899119908 minus tr (120582))2 (32)

where 120582 = X(X119879X + (120582120591(120573))119863)

minus1

X119879 = Λ(120573)12X(X119879Λ(120573)X+(120582120591(120573))119863)minus1X119879Λ(120573)12 and 119899119908 denotes the number of non-zero elements in weight matrix Λ(120573)

Consequently based on the foregoing in-depth dis-cussions the 119878-estimator iterative algorithm for penalizedregression splines can be constructed as in Algorithm 2

In particular it should be pointed out that since theemployed loss function 120588119889 is bounded giving rise to the non-convexity of 120576(120573) the objective function in (21) may haveseveral critical points each of which only corresponds to onelocal minima Therefore the convergent critical point ofthe iterations derived from the above algorithm completelydepends on the prescribed initial regression coefficients vec-tor (0)

in Step 0 of Algorithm 2One possible solution to cope with this issue proposed by

Tharmaratnam et al [29] is based on the random samplingtheory which can be briefly described as firstly creating 119869random subsamples with a proper size for example one fifthof the sample number namely 119899119904 = max(119901 + 119870 + 1 lfloor1198995rfloor)from the original sample observations Each subsam-ple brings about an initial candidate of the coefficient vec-tor for instance assigned by least squares fitting for thepenalized regression splines or the 119872-type estimator witha small amount of additional computation cost with theinitial regression coefficient vector of the 119895th subsamplebeing denoted by

(0)

119895 staring from which the final iterative

convergent vector produced through Algorithm 2 is beingwritten as

119895 119895 = 1 119869 Eventually the global optimum

is taken among all these 119869 convergent vectors to be the onecorresponding to the minimum objective function in (21)namely

119878= argmin1le119895le119869

1198992

120576(⋆

119895) + 120582

⋆119879

119895119863⋆

119895 (33)

Therefore the robust 119878-estimator for penalized regressionsplines can be computed as m119878 = X

119878

3 Experimental Performance Evaluation

This section is devoted to the comparative experimentsupon both simulated synthetic data and one real weatherballoon data set for the foregoing explored penalized splinesmoothing methods including the nonrobust penalized leastsquare regression spline (LS) presented by (7) and two robustsmoothing approaches the 119872-type estimator [27] and 119878-estimation [29] for penalized regression splines as describedin Algorithms 1 and 2

6 Mathematical Problems in Engineering

Step 0 Input an initial regression coefficients (0)

the termination tolerance 120576 and the maximumiteration number Itermax Meanwhile set 119896 = 0 and enter into the following loop iterations

Step 1 Estimate 120576 based on (24) that is

120576 ((119896)

) = MADN +Median(r((119896)

))

Step 2 Compute the intermediate variables 120591((119896)

) and Λ((119896)

)

Step 3 Calculate the coefficient vector (119896+1)

according to (30) as

(119896+1)

= (X119879Λ((119896)

)X + 120582

120591((119896)

)

119863)

minus1

X119879Λ(120573(119896)) y

Hereinto the penalty parameter 120582 is figured out based on the regularized GCV criterion (32)

Step 4 If (119896+1)

minus (119896)

lt 120576(119896)

or 119896 = Itermax terminate and output the convergentcoefficient vector for the penalized regression spline of 119878-estimator

995333

= (119896+1)

elseaccumulate the counter by setting 119896 = 119896 + 1 and repeat Step 1

Algorithm 2 119878-estimator iterative algorithm for penalized regression splines

31 Configuration and Criteria It should be incidentallymentioned that the exponential order 119901 = 3 for truncatedpower basis is always employed for all these three methodsand the knot locations are selected in accordance withthe recommended choice by Ruppert et al [14] 120581119896 =

((119896 + 1)(119870 + 2))th sample quantile of the Unique(x) 119896 =1 119870 together with a simple choice of the knot num-ber 119870 = min(35 (14) times Number of Unique(x)) wherex = (1199091 1199092 119909119899)

119879 Besides the corresponding penaltyparameter 120582 is determined according to the generalizedcross-validation (GCV) criterion for the LS fitting and 119872-estimation methods and the regularized GCV criterion forthe 119878-estimation method Meanwhile the termination toler-ance 120576 = 10minus6 and the maximum iteration number of 100 areassigned for the initial input parameters in Algorithms 1 and2 and 119869 = 5 subsamples are randomly generated for exploringthe ultimate optimum

Meanwhile the hardware equipment and software envi-ronment established for the experiments below are DellXPS 8700 Desktop (CPU Intel Core i7-4770 340GHz)MATAB R2014a installed upon Windows 7 SP1 Ultimate(64 Bit)

32 Synthetic Data The simulated study is designed the sameas in Lee and Oh [27] with the testing function

119910119894 = sin (2120587(1 minus 119909119894)2) + 05120576119894 119894 = 1 119899 (34)

The designed samples are generated with their horizontalaxes being uniformly extracted from the open interval (0 1)and the corresponding error distribution 120576 employing ninepossible scenarios with the probability distribution function(pdf) specified if necessary including (1) uniform distribu-tion 119880(minus1 1) (2) standard normal distribution 119873(0 1) (3)logistic distribution with its pdf as 119890119909(1 + 119890119909)2 (4) doubleexponential distribution with its pdf as ((12)119890minus|119909|) (5) con-taminated normal distribution I 095119873(0 1) + 005119873(0 10)

that is the error constituted with 95 from 119873(0 1) and5 from 119873(0 10) (6) contaminated normal distribution II09119873(0 1) + 01119873(0 10) (7) slash distribution defined fromtwo statistically independent random variables following (1)and (2) that is119873(0 1)119880(0 1) (8) Cauchy distribution withthe pdf as 1120587(1 + 1199092) whose simulated outliers can be gen-erated from uniform distribution by inverse transformationtechnique [34] and (9) an asymmetric normal mixture dis-tribution 09119873(0 1) + 01119873(30 1) For the sake of reducingthe simulation variability the sample size is taken as a fixedvalue 119899 = 200 It should be mentioned incidentally that theabove (1)sim(8) error scenarios are symmetrically distributedand provided with the ascending order according to theheaviness of their tails and scenario (9) denotes the standardnormal distribution mixed with 10 asymmetric outliersfrom119873(30 1)

Figure 1 presents the spline regression curves from thepenalized LS fitting 119872-estimator and 119878-estimator for onetime simulated scattered samples in nine possible scenarioseach of which with a subfigure panel demonstrating the com-parative regression results for these three penalized splinesmoothing methods

For proceeding further quantitative evaluation of thecomparative analysis for these three penalized spline smooth-ingmethods the statistical indicator of average squared error(ASE) is utilized here to qualify the goodness of fittingbetween the regression spline 119895(119909) for the 119895th simulationreplicate from each of the above outlier scenarios and the truesimulated function119898(119909) = sin(2120587(1 minus 119909)2)

ASE119895 =1

119899

119899

sum

119894=1

(119898 (119909119894) minus 119895(119909119894))2

119895 = 1 119869 (35)

with the specified replicate number 119869 = 103 for each distribu-tion

Performances of the penalized spline smoothing meth-ods are thoroughly evaluated from the aspects of the fitting

Mathematical Problems in Engineering 7

0 02 04 06 08 1minus2

minus1

0

1

2 Uniform

(a)

0 02 04 06 08 1minus2

minus1

0

1

2

3 Normal

(b)

0 02 04 06 08 1minus4

minus2

0

2

4 Logistic

(c)

0 02 04 06 08 1minus4

minus2

0

2

4 D-exponential

(d)

0 02 04 06 08 1minus15

minus10

minus5

0

5

10 CN1

(e)

0 02 04 06 08 1minus10

minus5

0

5

10 CN2

(f)

minus20

minus10

0

10

20 Slash

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(g)

minus10

minus5

0

5

10

15 Cauchy

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(h)

minus5

0

5

10

15 Asymmetric

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(i)

Figure 1 Spline regression curves (with true testing function in black solid line) from the penalized LS fitting (blue dotted)119872-estimator(green dot-dashed) and 119878-estimator (red dashed) for the simulated samples (black circle-dot) in 9 possible scenarios

accuracy robustness and execution time In the followingFigures 2 and 3 illustrate the box plots of both ASEs and coderunning times using (a) penalized LS-estimation (b) penal-ized 119872-estimation and (c) penalized 119878-estimation for thespline smoothing methods as well as their median valuecomparisons of these two indicators for the simulated sam-ples from each of the 9 distribution scenarios

From the ASE comparisons in Figure 2 it can be foundthat the penalized LS-estimation behaves well for the sym-metric error scenarios with small tail weights such as thesimple uniform and standard normal distribution Howeverit cannot cope with the complex conditions such as thethree latter error distributions possessing heavy tails therebybinging about large ASEs which are specially rescaled to

8 Mathematical Problems in Engineering

0

005

01

015

02

025

ASE

for L

S fit

ting

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

25

20

15

10

5

0

(a)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for M

-esti

mat

or

(b)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for S

-esti

mat

or

(c)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

05

1

15

2

25

0

005

01

015

02

025

Med

ian

ASE

LS fittingM-estimatorS-estimator

(d)

Figure 2 Box plots of ASEs using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of Median ASE for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

the additive vertical coordinate on the right so as to beaccommodated with the ASEs results for other error sourcesin the same window (refer to panels (a) and (d) in Figure 2)In comparison the robust penalized 119872-type estimationpresents satisfactory regression fittings for the distributionswith moderate tail weights like (2)sim(5) scenarios which arelocated at the middle in each panel Meanwhile the robust 119878-estimator provides the comparatively meritorious smoothingresults especially for the complicated circumstances like slashCauchy or the asymmetric distributions with heavy tails

In terms of the robustness LS-estimator simply does nothave the qualifications to be mentioned from this aspectInstead the 119872-estimator demonstrates stable performancefor the designed samples with moderate perturbation errorand the 119878-estimator behaves dominantly stable for all kindsof error source scenarios

Unfortunately as for the execution efficiency the mosttime-elapsed one is the 119878-estimator followed by the 119872-estimator with the least consuming estimator being the LS-estimator since it has been treated as the initial curve estimatefor the two former approaches Moreover although both 119878-estimator and119872-estimator are composed of similar iterative

process the random sampling theory is further involved inthe former model for exploring the ultimate solution for anonconvex optimization problem

33 Balloon Data This section is devoted to the comparativeexperiments of the aforementioned penalized spline smooth-ingmethods upon a set of weather balloon data [35]The dataconsist of 4984 observations collected from a balloon at about30 kilometers altitude with the outliers caused due to the factthat the balloon slowly rotates causing the ropes from whichthe measuring instrument is suspended to cut off the directradiation from the sun

As illustrated above Figure 4 displays these balloonobservational data together with the regression fitting curvesby the penalized LS method the robust penalized 119872-estimation method and the robust 119878-estimation methodBesides it can also be explicitly discovered by contrast thatthe non-robust LS regression curve suffers from the dis-turbance of the outliers and this phenomenon seems evenmore significantly around the position of horizontal axisbeing 08 while the other two robust splines pass throughapproximately the majority of the observations Meanwhile

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

Mathematical Problems in Engineering 3

for some constant 119862 gt 0 as discussed in Ruppert et al [14]Thus by employing the Lagrange multiplier the coefficientsof the penalized least squares regression spline LS are equiv-alent to the minimizer of

119899

sum

119894=1

(119910119894 minus 119898 (119909119894120573))2+ 120582

119870

sum

119895=1

1205732

119901+119895 (4)

for some smoothing (penalty) parameter 120582 ge 0 with respectto120573 which compromises the bias and variance tradeoff of thefitted regression spline curve

Introduce119865(119909) = (1 119909 119909119901 (119909 minus 1205811)119901

+ (119909 minus 119896119870)119901

+)119879

the spline design matrix

X = (119865(1199091)119879 119865(119909119899)

119879)119879

= (

1 1199091 sdot sdot sdot 119909119901

1(1199091 minus 1205811)

119901

+sdot sdot sdot (1199091 minus 120581119870)

119901

+

d

d

1 119909119899 sdot sdot sdot 119909

119901

119899(119909119899 minus 1205811)

119901

+sdot sdot sdot (119909119899 minus 120581119870)

119901

+

)

(5)

the vector of responses y = (1199101 119910119899)119879 and the diagonal

matrix119863 = diag(0(119901+1)times1 1119870times1) which indicates that only thespline coefficients are penalizedThus the closed-formmini-mizer for the objective function (4) can arrive by direct calcu-lations namely

LS = (X119879X+120582119863)

minus1

X119879y (6)

Consequently the corresponding estimation vector m =

((1199091) (119909119899))119879 generated by the penalized least square

regression spline can be represented as

mLS = XLS = X(X119879X + 120582119863)minus1

X119879y (7)

For figuring out the penalty parameter 120582 in (7) the (general-ized) cross-validation technique is employed hereafter whichis computed by leave-one-out of the residual from sum ofsquares so as to avoid overfitting in the regression splineThecross-validation criterion is given by

CV (120582) = 1119899

119899

sum

119894=1

(119910119894 minus minus119894(119909119894 120582))2 (8)

where minus119894(119909119894 120582) is the regression spline estimated by leavingout the 119894th observation point The optimized penalty param-eter 120582 can be calculated byminimizing CV(120582) over 120582 ge 0 Forthe linear smoothing matrix 119867120582 = X(X119879X + 120582119863)minus1X119879 andit can be verified [14] that

minus119894 (119909119894 120582) = sum

119895 = 119894

(119867119894119895

120582

1 minus 119867119894119895

120582

119910119894) (9)

Substitution of minus119894(119909119894 120582) in (8) with the formula (9) resultsin

CV (120582) = 1119899

119899

sum

119894=1

(119910119894 minus (119909119894 120582)

1 minus 119867119894119894

120582

)

2

(10)

The generalized cross-validation criterion is proposed [30]simply by replacing 119867119894119894

120582in (10) with their average tr(119867120582)119899

namely

GCV (120582) = 119899119899

sum

119894=1

(119910119894 minus (119909119894 120582)

119899 minus tr(119867120582))

2

(11)

22 119872-Type Estimator for Penalized Regression Splines Theabove estimate mLS of the penalized least squares regressionsplines may greatly suffer from various robustness issuesespecially when the data is contaminated with outliers Foralleviating this effect a penalized regression estimator canbe straightforwardly constructed by replacing the previoussquared residual loss function with the following 119872-typecriterion [27]

119899

sum

119894=1

120588(119910119894 minus 119898 (119909119894120573))2+ 120582

119870

sum

119895=1

1205732

119901+119895 (12)

where 120588 is even nondecreasing in [0 +infin) and 120588(0) = 0 forwhich a common choice is taken of the Huber loss functionwith cutoff 119888 gt 0

120588119888 (119905) =

1199052

|119905| le 119888

2119888 |119905| minus 1198882|119905| gt 119888

(13)

Apparently the above Huberrsquos function is a parabola in thevicinity of zero and it increases linearly at a given level |119905| gt119888 A default choice of the tuning constant 119888 = 1345 aimsfor a 95 asymptotic efficiency with respect to the standardnormal distribution

Denote the derivative of 120588119888 as 120595119888(119904) = max[minus119888min(119888 119904)]When a set of fitted residuals 119903119894 = 119910119894 minus 119898(119909119894120573) 119894 = 1 119899from the penalized spline regression are given a robust119872-scale estimator [19 20] can be employed for estimating thestandard deviation 120590120576 of these residuals as follows

119899

sum

119894=1

120595119888 (119910119894 minus 119898 (119909119894120573)

120576

) = 0 (14)

A common choice for a measure of scale with a high finitesample breakdown point can be the value of 120596 determined by

119875 (1003816100381610038161003816119903 minus 11990305

1003816100381610038161003816 lt 120596) =1

2 (15)

where 11990305 is the population median Besides the standardestimation of 120596 usually employs the median absolute devi-ation (MAD) statistic which is defined as

MAD = Median 10038161003816100381610038161199031 minus1198721199031003816100381610038161003816

1003816100381610038161003816119903119899 minus1198721199031003816100381610038161003816 (16)

where119872119903 is the usual sample median of the fitted residualsand MAD is actually the sample median of the 119899 values|1199031minus119872119903| |119903119899minus119872119903| with its finite sample breakdown pointapproximately being 05 [20] Suppose the fitted residualsare randomly sampled from a normal distribution note that

4 Mathematical Problems in Engineering

Step 0 Input an initial curve estimate m(0)119901 termination tolerance 120576 and the maximum iteration

number Itermax Meanwhile set 119896 = 0 and conduct the following loop iterationsStep 1 Estimate 120576 by using MADN in (17)Step 2 Produce the empirical pseudo data according to (19)Step 3 Calculate the penalized least-square estimator m(119896+1)

119901defined in (20) for the pseudo data

where the penalty parameter 120582 is chosen according to the generalized cross-validation(GCV) criterion formulated by (11) in the foregoing subsection

Step 4 If m(119896+1)119901

minus m(119896)119901 lt 120576m(119896)

119901 or 119896 = Itermax terminate and output the robust

119872-type penalized spline estimate m119872 = m(119896+1)119901

else set 119896 = 119896 + 1 and continue Step 1

Algorithm 1119872-type iterative algorithm for penalized regression splines

MAD does not estimate the corresponding standard devia-tion 120576 but rather estimates119885075120576 (with119885075 being the 075quantile of the standard normal distribution) To putMAD ina more familiar context it typically needs to be rescaled so asto estimate 120576 especially when the residuals are sampled froma normal distribution In particular

MADN = MAD119911075

asymp 14836MAD (17)

By utilizing the aforementioned spline basis functions an119872-type penalized spline estimator using the standardizedresiduals can be computed as m119872 = X

119872 with its regression

coefficients being as follows

119872= argmin120573

119899

sum

119894=1

120588119888 (119910119894 minus 119898 (119909119894120573)

120576

) + 120582120573119879119863120573 (18)

Due to the nonlinear nature of 120588119888 and 120595119888 it is nontrivial toseek the minimizer of (18) while satisfying (14) Under suchcircumstances an119872-type iterative algorithm for calculatingthe penalized regression splines is proposed [27] by introduc-ing the empirical pseudodata

z = m119901 +120595119888 (y minus m119901)

2 (19)

where m119901 is the penalized least square estimator correspond-ing to the above pseudodata

m119901 = X(X119879X + 120582119863)minus1

X119879z (20)

As proved inTheorem 1 of Lee and Ohrsquos [27] work the penal-ized least square estimator m119901 converges to m119872 asymptoti-cally which implies an119872-type iterative algorithm consistingof the steps in Algorithm 1

Regarding the119872-type estimator for the penalized regres-sion splines (see Algorithm 1) the initial curve estimate m(0)

119901

in Step 0 is suggested [27] choosing the nonrobust penalizedleast square regression spline presented by (7)

23 119878-Estimation for Penalized Regression Splines The robust-ness properties of the aforementioned119872-type estimator for

penalized regression splines are not completely satisfactorysince they have low breakdown point [31] which was clarifiedby Maronna and Yohai [32] for the general 119872-estimatorswhen considering both the regression and scale simultane-ously

Another more robust and relatively latest approach forsmoothing the noisy data is the 119878-estimation for penal-ized regression splines [29] whose core idea is utilizing aflexible 119878-estimator to replace the traditional least squaresestimator The coefficient vector of the penalized 119878-regressionsplines

119878is defined as follows

119878= argmin120573

1198992

120576(120573) + 120582120573

119879119863120573 (21)

in which for each 120573 2120576(120573) satisfies

1

119899

119899

sum

119894=1

120588(119910119894 minus 119865(119909119894)

119879120573

120576 (120573)) = 119864Φ (120588 (119885)) (22)

where Φ is cumulative distribution function of the standardnorm distribution119885 sim 119873(0 1) and the loss function is takenfrom Tukeyrsquos bisquare family [33]

120588119889 (119906) =

3(119906

119889)

2

minus 3(119906

119889)

4

+ (119906

119889)

6

|119906| le 119889

1 |119906| gt 119889

(23)

It is evidently known from theoretical assumption that eachfitted residual vector should follow a normal distributionwithmean of zero that is r119894 sim 119873(0 1205902120576 ) 119894 = 1 119899 howeverits median statistic Median(r) in which r = (1199031 119903119899)

119879may greatly depart from zero to a large extent in the generalpractice Considering this we ameliorate the estimation of120576(120573)byusingMADN in (17)with the addition of the absolutevalue of residual median namely

120576 (120573) = MADN +Median (r (120573)) (24)

The penalized 119878-estimator for the regression spline model iscomputed as m119878 = X

119878 with its coefficients vector

119878satisfy-

ing both (21) and (22) simultaneouslyThe following formuladerivations indicate that such penalized 119878-estimator can

Mathematical Problems in Engineering 5

actually be equivalent to a weighted penalized least squaresregression

Before proceeding with the derivation it is forewarnedthat the symbol

119878is simplywritten as120573 just for clarity Firstly

when (21) reaches its minimum the following equationshould be satisfied

119899120576 (120573) nabla120576 (120573) + 120582119863120573 = 0 (25)

On the other hand taking the derivative of the 119872-scalefunction in (22) with respect to 120573 leads to

119899

sum

119894=1

1205881015840(119903119894 (120573)

120576 (120573))(

minus119865(119909119894)119879120576 (120573) minus 119903119894 (120573) nabla120576(120573)

119879

2120576(120573)

) = 0

(26)

inwhich 119903119894(120573) = 119910119894minus119865(119909119894)119879120573 119903119894(120573) = 119903119894(120573)120576(120573) andΛ(120573) =

diag(1205881015840(119903119894(120573))119903119894(120573)) 119894 = 1 119899Deformation of the above formula produces

nabla120576 (120573) = minussum119899

119894=11205881015840(119903119894 (120573) 120576 (120573)) 119865 (119909119894)

sum119899

119894=11205881015840 (119903119894 (120573) 120576 (120573)) (119903119894 (120573) 120576 (120573))

= minus(120576 (120573)X119879Λ (120573) r (120573)r(120573)119879Λ (120573) r (120573)

)

(27)

Substituting nabla120576(120573) of (27) into (25)

minus119899120576(120573)

2

119903(120573)119879Λ (120573) r (120573)

X119879Λ (120573) r (120573) + 120582119863120573 = 0 (28)

Recall that r(120573) = y minus X120573 and import new symbol 120591(120573) =119899120576(120573)

2r(120573)119879Λ(120573)r(120573) we have

minus120591 (120573)X119879Λ (120573) (y minus X120573) + 120582119863120573 = 0 (29)

And thus a further simple deformation arrives at the coeffi-cient vector

119878with an iteration form

120573 = (X119879Λ (120573)X + 120582

120591 (120573)119863)

minus1

X119879Λ (120573) y (30)

which forms the basis of constructing an iterative algorithmfor penalized 119878-regression spline

The above formula derivations verify that 119878is actually

the solution to the following weighted penalized least squaresregression problem

min120573

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

+120582

120591 (120573)120573119879119863120573 (31)

And such representation indicates that the GCV criterion in(11) can be employed straightforwardlywith response variabley = Λ(120573)12y weighted design matrix X = Λ(120573)12X

and the penalty term 120582120591(120573) Meanwhile due to someweight elements being zero the penalty parameter 120582 is thuscalculated based on the following RegularizedGCV criterion

RGCV (120582) = 119899119908

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

(119899119908 minus tr (120582))2 (32)

where 120582 = X(X119879X + (120582120591(120573))119863)

minus1

X119879 = Λ(120573)12X(X119879Λ(120573)X+(120582120591(120573))119863)minus1X119879Λ(120573)12 and 119899119908 denotes the number of non-zero elements in weight matrix Λ(120573)

Consequently based on the foregoing in-depth dis-cussions the 119878-estimator iterative algorithm for penalizedregression splines can be constructed as in Algorithm 2

In particular it should be pointed out that since theemployed loss function 120588119889 is bounded giving rise to the non-convexity of 120576(120573) the objective function in (21) may haveseveral critical points each of which only corresponds to onelocal minima Therefore the convergent critical point ofthe iterations derived from the above algorithm completelydepends on the prescribed initial regression coefficients vec-tor (0)

in Step 0 of Algorithm 2One possible solution to cope with this issue proposed by

Tharmaratnam et al [29] is based on the random samplingtheory which can be briefly described as firstly creating 119869random subsamples with a proper size for example one fifthof the sample number namely 119899119904 = max(119901 + 119870 + 1 lfloor1198995rfloor)from the original sample observations Each subsam-ple brings about an initial candidate of the coefficient vec-tor for instance assigned by least squares fitting for thepenalized regression splines or the 119872-type estimator witha small amount of additional computation cost with theinitial regression coefficient vector of the 119895th subsamplebeing denoted by

(0)

119895 staring from which the final iterative

convergent vector produced through Algorithm 2 is beingwritten as

119895 119895 = 1 119869 Eventually the global optimum

is taken among all these 119869 convergent vectors to be the onecorresponding to the minimum objective function in (21)namely

119878= argmin1le119895le119869

1198992

120576(⋆

119895) + 120582

⋆119879

119895119863⋆

119895 (33)

Therefore the robust 119878-estimator for penalized regressionsplines can be computed as m119878 = X

119878

3 Experimental Performance Evaluation

This section is devoted to the comparative experimentsupon both simulated synthetic data and one real weatherballoon data set for the foregoing explored penalized splinesmoothing methods including the nonrobust penalized leastsquare regression spline (LS) presented by (7) and two robustsmoothing approaches the 119872-type estimator [27] and 119878-estimation [29] for penalized regression splines as describedin Algorithms 1 and 2

6 Mathematical Problems in Engineering

Step 0 Input an initial regression coefficients (0)

the termination tolerance 120576 and the maximumiteration number Itermax Meanwhile set 119896 = 0 and enter into the following loop iterations

Step 1 Estimate 120576 based on (24) that is

120576 ((119896)

) = MADN +Median(r((119896)

))

Step 2 Compute the intermediate variables 120591((119896)

) and Λ((119896)

)

Step 3 Calculate the coefficient vector (119896+1)

according to (30) as

(119896+1)

= (X119879Λ((119896)

)X + 120582

120591((119896)

)

119863)

minus1

X119879Λ(120573(119896)) y

Hereinto the penalty parameter 120582 is figured out based on the regularized GCV criterion (32)

Step 4 If (119896+1)

minus (119896)

lt 120576(119896)

or 119896 = Itermax terminate and output the convergentcoefficient vector for the penalized regression spline of 119878-estimator

995333

= (119896+1)

elseaccumulate the counter by setting 119896 = 119896 + 1 and repeat Step 1

Algorithm 2 119878-estimator iterative algorithm for penalized regression splines

31 Configuration and Criteria It should be incidentallymentioned that the exponential order 119901 = 3 for truncatedpower basis is always employed for all these three methodsand the knot locations are selected in accordance withthe recommended choice by Ruppert et al [14] 120581119896 =

((119896 + 1)(119870 + 2))th sample quantile of the Unique(x) 119896 =1 119870 together with a simple choice of the knot num-ber 119870 = min(35 (14) times Number of Unique(x)) wherex = (1199091 1199092 119909119899)

119879 Besides the corresponding penaltyparameter 120582 is determined according to the generalizedcross-validation (GCV) criterion for the LS fitting and 119872-estimation methods and the regularized GCV criterion forthe 119878-estimation method Meanwhile the termination toler-ance 120576 = 10minus6 and the maximum iteration number of 100 areassigned for the initial input parameters in Algorithms 1 and2 and 119869 = 5 subsamples are randomly generated for exploringthe ultimate optimum

Meanwhile the hardware equipment and software envi-ronment established for the experiments below are DellXPS 8700 Desktop (CPU Intel Core i7-4770 340GHz)MATAB R2014a installed upon Windows 7 SP1 Ultimate(64 Bit)

32 Synthetic Data The simulated study is designed the sameas in Lee and Oh [27] with the testing function

119910119894 = sin (2120587(1 minus 119909119894)2) + 05120576119894 119894 = 1 119899 (34)

The designed samples are generated with their horizontalaxes being uniformly extracted from the open interval (0 1)and the corresponding error distribution 120576 employing ninepossible scenarios with the probability distribution function(pdf) specified if necessary including (1) uniform distribu-tion 119880(minus1 1) (2) standard normal distribution 119873(0 1) (3)logistic distribution with its pdf as 119890119909(1 + 119890119909)2 (4) doubleexponential distribution with its pdf as ((12)119890minus|119909|) (5) con-taminated normal distribution I 095119873(0 1) + 005119873(0 10)

that is the error constituted with 95 from 119873(0 1) and5 from 119873(0 10) (6) contaminated normal distribution II09119873(0 1) + 01119873(0 10) (7) slash distribution defined fromtwo statistically independent random variables following (1)and (2) that is119873(0 1)119880(0 1) (8) Cauchy distribution withthe pdf as 1120587(1 + 1199092) whose simulated outliers can be gen-erated from uniform distribution by inverse transformationtechnique [34] and (9) an asymmetric normal mixture dis-tribution 09119873(0 1) + 01119873(30 1) For the sake of reducingthe simulation variability the sample size is taken as a fixedvalue 119899 = 200 It should be mentioned incidentally that theabove (1)sim(8) error scenarios are symmetrically distributedand provided with the ascending order according to theheaviness of their tails and scenario (9) denotes the standardnormal distribution mixed with 10 asymmetric outliersfrom119873(30 1)

Figure 1 presents the spline regression curves from thepenalized LS fitting 119872-estimator and 119878-estimator for onetime simulated scattered samples in nine possible scenarioseach of which with a subfigure panel demonstrating the com-parative regression results for these three penalized splinesmoothing methods

For proceeding further quantitative evaluation of thecomparative analysis for these three penalized spline smooth-ingmethods the statistical indicator of average squared error(ASE) is utilized here to qualify the goodness of fittingbetween the regression spline 119895(119909) for the 119895th simulationreplicate from each of the above outlier scenarios and the truesimulated function119898(119909) = sin(2120587(1 minus 119909)2)

ASE119895 =1

119899

119899

sum

119894=1

(119898 (119909119894) minus 119895(119909119894))2

119895 = 1 119869 (35)

with the specified replicate number 119869 = 103 for each distribu-tion

Performances of the penalized spline smoothing meth-ods are thoroughly evaluated from the aspects of the fitting

Mathematical Problems in Engineering 7

0 02 04 06 08 1minus2

minus1

0

1

2 Uniform

(a)

0 02 04 06 08 1minus2

minus1

0

1

2

3 Normal

(b)

0 02 04 06 08 1minus4

minus2

0

2

4 Logistic

(c)

0 02 04 06 08 1minus4

minus2

0

2

4 D-exponential

(d)

0 02 04 06 08 1minus15

minus10

minus5

0

5

10 CN1

(e)

0 02 04 06 08 1minus10

minus5

0

5

10 CN2

(f)

minus20

minus10

0

10

20 Slash

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(g)

minus10

minus5

0

5

10

15 Cauchy

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(h)

minus5

0

5

10

15 Asymmetric

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(i)

Figure 1 Spline regression curves (with true testing function in black solid line) from the penalized LS fitting (blue dotted)119872-estimator(green dot-dashed) and 119878-estimator (red dashed) for the simulated samples (black circle-dot) in 9 possible scenarios

accuracy robustness and execution time In the followingFigures 2 and 3 illustrate the box plots of both ASEs and coderunning times using (a) penalized LS-estimation (b) penal-ized 119872-estimation and (c) penalized 119878-estimation for thespline smoothing methods as well as their median valuecomparisons of these two indicators for the simulated sam-ples from each of the 9 distribution scenarios

From the ASE comparisons in Figure 2 it can be foundthat the penalized LS-estimation behaves well for the sym-metric error scenarios with small tail weights such as thesimple uniform and standard normal distribution Howeverit cannot cope with the complex conditions such as thethree latter error distributions possessing heavy tails therebybinging about large ASEs which are specially rescaled to

8 Mathematical Problems in Engineering

0

005

01

015

02

025

ASE

for L

S fit

ting

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

25

20

15

10

5

0

(a)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for M

-esti

mat

or

(b)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for S

-esti

mat

or

(c)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

05

1

15

2

25

0

005

01

015

02

025

Med

ian

ASE

LS fittingM-estimatorS-estimator

(d)

Figure 2 Box plots of ASEs using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of Median ASE for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

the additive vertical coordinate on the right so as to beaccommodated with the ASEs results for other error sourcesin the same window (refer to panels (a) and (d) in Figure 2)In comparison the robust penalized 119872-type estimationpresents satisfactory regression fittings for the distributionswith moderate tail weights like (2)sim(5) scenarios which arelocated at the middle in each panel Meanwhile the robust 119878-estimator provides the comparatively meritorious smoothingresults especially for the complicated circumstances like slashCauchy or the asymmetric distributions with heavy tails

In terms of the robustness LS-estimator simply does nothave the qualifications to be mentioned from this aspectInstead the 119872-estimator demonstrates stable performancefor the designed samples with moderate perturbation errorand the 119878-estimator behaves dominantly stable for all kindsof error source scenarios

Unfortunately as for the execution efficiency the mosttime-elapsed one is the 119878-estimator followed by the 119872-estimator with the least consuming estimator being the LS-estimator since it has been treated as the initial curve estimatefor the two former approaches Moreover although both 119878-estimator and119872-estimator are composed of similar iterative

process the random sampling theory is further involved inthe former model for exploring the ultimate solution for anonconvex optimization problem

33 Balloon Data This section is devoted to the comparativeexperiments of the aforementioned penalized spline smooth-ingmethods upon a set of weather balloon data [35]The dataconsist of 4984 observations collected from a balloon at about30 kilometers altitude with the outliers caused due to the factthat the balloon slowly rotates causing the ropes from whichthe measuring instrument is suspended to cut off the directradiation from the sun

As illustrated above Figure 4 displays these balloonobservational data together with the regression fitting curvesby the penalized LS method the robust penalized 119872-estimation method and the robust 119878-estimation methodBesides it can also be explicitly discovered by contrast thatthe non-robust LS regression curve suffers from the dis-turbance of the outliers and this phenomenon seems evenmore significantly around the position of horizontal axisbeing 08 while the other two robust splines pass throughapproximately the majority of the observations Meanwhile

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

4 Mathematical Problems in Engineering

Step 0 Input an initial curve estimate m(0)119901 termination tolerance 120576 and the maximum iteration

number Itermax Meanwhile set 119896 = 0 and conduct the following loop iterationsStep 1 Estimate 120576 by using MADN in (17)Step 2 Produce the empirical pseudo data according to (19)Step 3 Calculate the penalized least-square estimator m(119896+1)

119901defined in (20) for the pseudo data

where the penalty parameter 120582 is chosen according to the generalized cross-validation(GCV) criterion formulated by (11) in the foregoing subsection

Step 4 If m(119896+1)119901

minus m(119896)119901 lt 120576m(119896)

119901 or 119896 = Itermax terminate and output the robust

119872-type penalized spline estimate m119872 = m(119896+1)119901

else set 119896 = 119896 + 1 and continue Step 1

Algorithm 1119872-type iterative algorithm for penalized regression splines

MAD does not estimate the corresponding standard devia-tion 120576 but rather estimates119885075120576 (with119885075 being the 075quantile of the standard normal distribution) To putMAD ina more familiar context it typically needs to be rescaled so asto estimate 120576 especially when the residuals are sampled froma normal distribution In particular

MADN = MAD119911075

asymp 14836MAD (17)

By utilizing the aforementioned spline basis functions an119872-type penalized spline estimator using the standardizedresiduals can be computed as m119872 = X

119872 with its regression

coefficients being as follows

119872= argmin120573

119899

sum

119894=1

120588119888 (119910119894 minus 119898 (119909119894120573)

120576

) + 120582120573119879119863120573 (18)

Due to the nonlinear nature of 120588119888 and 120595119888 it is nontrivial toseek the minimizer of (18) while satisfying (14) Under suchcircumstances an119872-type iterative algorithm for calculatingthe penalized regression splines is proposed [27] by introduc-ing the empirical pseudodata

z = m119901 +120595119888 (y minus m119901)

2 (19)

where m119901 is the penalized least square estimator correspond-ing to the above pseudodata

m119901 = X(X119879X + 120582119863)minus1

X119879z (20)

As proved inTheorem 1 of Lee and Ohrsquos [27] work the penal-ized least square estimator m119901 converges to m119872 asymptoti-cally which implies an119872-type iterative algorithm consistingof the steps in Algorithm 1

Regarding the119872-type estimator for the penalized regres-sion splines (see Algorithm 1) the initial curve estimate m(0)

119901

in Step 0 is suggested [27] choosing the nonrobust penalizedleast square regression spline presented by (7)

23 119878-Estimation for Penalized Regression Splines The robust-ness properties of the aforementioned119872-type estimator for

penalized regression splines are not completely satisfactorysince they have low breakdown point [31] which was clarifiedby Maronna and Yohai [32] for the general 119872-estimatorswhen considering both the regression and scale simultane-ously

Another more robust and relatively latest approach forsmoothing the noisy data is the 119878-estimation for penal-ized regression splines [29] whose core idea is utilizing aflexible 119878-estimator to replace the traditional least squaresestimator The coefficient vector of the penalized 119878-regressionsplines

119878is defined as follows

119878= argmin120573

1198992

120576(120573) + 120582120573

119879119863120573 (21)

in which for each 120573 2120576(120573) satisfies

1

119899

119899

sum

119894=1

120588(119910119894 minus 119865(119909119894)

119879120573

120576 (120573)) = 119864Φ (120588 (119885)) (22)

where Φ is cumulative distribution function of the standardnorm distribution119885 sim 119873(0 1) and the loss function is takenfrom Tukeyrsquos bisquare family [33]

120588119889 (119906) =

3(119906

119889)

2

minus 3(119906

119889)

4

+ (119906

119889)

6

|119906| le 119889

1 |119906| gt 119889

(23)

It is evidently known from theoretical assumption that eachfitted residual vector should follow a normal distributionwithmean of zero that is r119894 sim 119873(0 1205902120576 ) 119894 = 1 119899 howeverits median statistic Median(r) in which r = (1199031 119903119899)

119879may greatly depart from zero to a large extent in the generalpractice Considering this we ameliorate the estimation of120576(120573)byusingMADN in (17)with the addition of the absolutevalue of residual median namely

120576 (120573) = MADN +Median (r (120573)) (24)

The penalized 119878-estimator for the regression spline model iscomputed as m119878 = X

119878 with its coefficients vector

119878satisfy-

ing both (21) and (22) simultaneouslyThe following formuladerivations indicate that such penalized 119878-estimator can

Mathematical Problems in Engineering 5

actually be equivalent to a weighted penalized least squaresregression

Before proceeding with the derivation it is forewarnedthat the symbol

119878is simplywritten as120573 just for clarity Firstly

when (21) reaches its minimum the following equationshould be satisfied

119899120576 (120573) nabla120576 (120573) + 120582119863120573 = 0 (25)

On the other hand taking the derivative of the 119872-scalefunction in (22) with respect to 120573 leads to

119899

sum

119894=1

1205881015840(119903119894 (120573)

120576 (120573))(

minus119865(119909119894)119879120576 (120573) minus 119903119894 (120573) nabla120576(120573)

119879

2120576(120573)

) = 0

(26)

inwhich 119903119894(120573) = 119910119894minus119865(119909119894)119879120573 119903119894(120573) = 119903119894(120573)120576(120573) andΛ(120573) =

diag(1205881015840(119903119894(120573))119903119894(120573)) 119894 = 1 119899Deformation of the above formula produces

nabla120576 (120573) = minussum119899

119894=11205881015840(119903119894 (120573) 120576 (120573)) 119865 (119909119894)

sum119899

119894=11205881015840 (119903119894 (120573) 120576 (120573)) (119903119894 (120573) 120576 (120573))

= minus(120576 (120573)X119879Λ (120573) r (120573)r(120573)119879Λ (120573) r (120573)

)

(27)

Substituting nabla120576(120573) of (27) into (25)

minus119899120576(120573)

2

119903(120573)119879Λ (120573) r (120573)

X119879Λ (120573) r (120573) + 120582119863120573 = 0 (28)

Recall that r(120573) = y minus X120573 and import new symbol 120591(120573) =119899120576(120573)

2r(120573)119879Λ(120573)r(120573) we have

minus120591 (120573)X119879Λ (120573) (y minus X120573) + 120582119863120573 = 0 (29)

And thus a further simple deformation arrives at the coeffi-cient vector

119878with an iteration form

120573 = (X119879Λ (120573)X + 120582

120591 (120573)119863)

minus1

X119879Λ (120573) y (30)

which forms the basis of constructing an iterative algorithmfor penalized 119878-regression spline

The above formula derivations verify that 119878is actually

the solution to the following weighted penalized least squaresregression problem

min120573

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

+120582

120591 (120573)120573119879119863120573 (31)

And such representation indicates that the GCV criterion in(11) can be employed straightforwardlywith response variabley = Λ(120573)12y weighted design matrix X = Λ(120573)12X

and the penalty term 120582120591(120573) Meanwhile due to someweight elements being zero the penalty parameter 120582 is thuscalculated based on the following RegularizedGCV criterion

RGCV (120582) = 119899119908

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

(119899119908 minus tr (120582))2 (32)

where 120582 = X(X119879X + (120582120591(120573))119863)

minus1

X119879 = Λ(120573)12X(X119879Λ(120573)X+(120582120591(120573))119863)minus1X119879Λ(120573)12 and 119899119908 denotes the number of non-zero elements in weight matrix Λ(120573)

Consequently based on the foregoing in-depth dis-cussions the 119878-estimator iterative algorithm for penalizedregression splines can be constructed as in Algorithm 2

In particular it should be pointed out that since theemployed loss function 120588119889 is bounded giving rise to the non-convexity of 120576(120573) the objective function in (21) may haveseveral critical points each of which only corresponds to onelocal minima Therefore the convergent critical point ofthe iterations derived from the above algorithm completelydepends on the prescribed initial regression coefficients vec-tor (0)

in Step 0 of Algorithm 2One possible solution to cope with this issue proposed by

Tharmaratnam et al [29] is based on the random samplingtheory which can be briefly described as firstly creating 119869random subsamples with a proper size for example one fifthof the sample number namely 119899119904 = max(119901 + 119870 + 1 lfloor1198995rfloor)from the original sample observations Each subsam-ple brings about an initial candidate of the coefficient vec-tor for instance assigned by least squares fitting for thepenalized regression splines or the 119872-type estimator witha small amount of additional computation cost with theinitial regression coefficient vector of the 119895th subsamplebeing denoted by

(0)

119895 staring from which the final iterative

convergent vector produced through Algorithm 2 is beingwritten as

119895 119895 = 1 119869 Eventually the global optimum

is taken among all these 119869 convergent vectors to be the onecorresponding to the minimum objective function in (21)namely

119878= argmin1le119895le119869

1198992

120576(⋆

119895) + 120582

⋆119879

119895119863⋆

119895 (33)

Therefore the robust 119878-estimator for penalized regressionsplines can be computed as m119878 = X

119878

3 Experimental Performance Evaluation

This section is devoted to the comparative experimentsupon both simulated synthetic data and one real weatherballoon data set for the foregoing explored penalized splinesmoothing methods including the nonrobust penalized leastsquare regression spline (LS) presented by (7) and two robustsmoothing approaches the 119872-type estimator [27] and 119878-estimation [29] for penalized regression splines as describedin Algorithms 1 and 2

6 Mathematical Problems in Engineering

Step 0 Input an initial regression coefficients (0)

the termination tolerance 120576 and the maximumiteration number Itermax Meanwhile set 119896 = 0 and enter into the following loop iterations

Step 1 Estimate 120576 based on (24) that is

120576 ((119896)

) = MADN +Median(r((119896)

))

Step 2 Compute the intermediate variables 120591((119896)

) and Λ((119896)

)

Step 3 Calculate the coefficient vector (119896+1)

according to (30) as

(119896+1)

= (X119879Λ((119896)

)X + 120582

120591((119896)

)

119863)

minus1

X119879Λ(120573(119896)) y

Hereinto the penalty parameter 120582 is figured out based on the regularized GCV criterion (32)

Step 4 If (119896+1)

minus (119896)

lt 120576(119896)

or 119896 = Itermax terminate and output the convergentcoefficient vector for the penalized regression spline of 119878-estimator

995333

= (119896+1)

elseaccumulate the counter by setting 119896 = 119896 + 1 and repeat Step 1

Algorithm 2 119878-estimator iterative algorithm for penalized regression splines

31 Configuration and Criteria It should be incidentallymentioned that the exponential order 119901 = 3 for truncatedpower basis is always employed for all these three methodsand the knot locations are selected in accordance withthe recommended choice by Ruppert et al [14] 120581119896 =

((119896 + 1)(119870 + 2))th sample quantile of the Unique(x) 119896 =1 119870 together with a simple choice of the knot num-ber 119870 = min(35 (14) times Number of Unique(x)) wherex = (1199091 1199092 119909119899)

119879 Besides the corresponding penaltyparameter 120582 is determined according to the generalizedcross-validation (GCV) criterion for the LS fitting and 119872-estimation methods and the regularized GCV criterion forthe 119878-estimation method Meanwhile the termination toler-ance 120576 = 10minus6 and the maximum iteration number of 100 areassigned for the initial input parameters in Algorithms 1 and2 and 119869 = 5 subsamples are randomly generated for exploringthe ultimate optimum

Meanwhile the hardware equipment and software envi-ronment established for the experiments below are DellXPS 8700 Desktop (CPU Intel Core i7-4770 340GHz)MATAB R2014a installed upon Windows 7 SP1 Ultimate(64 Bit)

32 Synthetic Data The simulated study is designed the sameas in Lee and Oh [27] with the testing function

119910119894 = sin (2120587(1 minus 119909119894)2) + 05120576119894 119894 = 1 119899 (34)

The designed samples are generated with their horizontalaxes being uniformly extracted from the open interval (0 1)and the corresponding error distribution 120576 employing ninepossible scenarios with the probability distribution function(pdf) specified if necessary including (1) uniform distribu-tion 119880(minus1 1) (2) standard normal distribution 119873(0 1) (3)logistic distribution with its pdf as 119890119909(1 + 119890119909)2 (4) doubleexponential distribution with its pdf as ((12)119890minus|119909|) (5) con-taminated normal distribution I 095119873(0 1) + 005119873(0 10)

that is the error constituted with 95 from 119873(0 1) and5 from 119873(0 10) (6) contaminated normal distribution II09119873(0 1) + 01119873(0 10) (7) slash distribution defined fromtwo statistically independent random variables following (1)and (2) that is119873(0 1)119880(0 1) (8) Cauchy distribution withthe pdf as 1120587(1 + 1199092) whose simulated outliers can be gen-erated from uniform distribution by inverse transformationtechnique [34] and (9) an asymmetric normal mixture dis-tribution 09119873(0 1) + 01119873(30 1) For the sake of reducingthe simulation variability the sample size is taken as a fixedvalue 119899 = 200 It should be mentioned incidentally that theabove (1)sim(8) error scenarios are symmetrically distributedand provided with the ascending order according to theheaviness of their tails and scenario (9) denotes the standardnormal distribution mixed with 10 asymmetric outliersfrom119873(30 1)

Figure 1 presents the spline regression curves from thepenalized LS fitting 119872-estimator and 119878-estimator for onetime simulated scattered samples in nine possible scenarioseach of which with a subfigure panel demonstrating the com-parative regression results for these three penalized splinesmoothing methods

For proceeding further quantitative evaluation of thecomparative analysis for these three penalized spline smooth-ingmethods the statistical indicator of average squared error(ASE) is utilized here to qualify the goodness of fittingbetween the regression spline 119895(119909) for the 119895th simulationreplicate from each of the above outlier scenarios and the truesimulated function119898(119909) = sin(2120587(1 minus 119909)2)

ASE119895 =1

119899

119899

sum

119894=1

(119898 (119909119894) minus 119895(119909119894))2

119895 = 1 119869 (35)

with the specified replicate number 119869 = 103 for each distribu-tion

Performances of the penalized spline smoothing meth-ods are thoroughly evaluated from the aspects of the fitting

Mathematical Problems in Engineering 7

0 02 04 06 08 1minus2

minus1

0

1

2 Uniform

(a)

0 02 04 06 08 1minus2

minus1

0

1

2

3 Normal

(b)

0 02 04 06 08 1minus4

minus2

0

2

4 Logistic

(c)

0 02 04 06 08 1minus4

minus2

0

2

4 D-exponential

(d)

0 02 04 06 08 1minus15

minus10

minus5

0

5

10 CN1

(e)

0 02 04 06 08 1minus10

minus5

0

5

10 CN2

(f)

minus20

minus10

0

10

20 Slash

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(g)

minus10

minus5

0

5

10

15 Cauchy

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(h)

minus5

0

5

10

15 Asymmetric

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(i)

Figure 1 Spline regression curves (with true testing function in black solid line) from the penalized LS fitting (blue dotted)119872-estimator(green dot-dashed) and 119878-estimator (red dashed) for the simulated samples (black circle-dot) in 9 possible scenarios

accuracy robustness and execution time In the followingFigures 2 and 3 illustrate the box plots of both ASEs and coderunning times using (a) penalized LS-estimation (b) penal-ized 119872-estimation and (c) penalized 119878-estimation for thespline smoothing methods as well as their median valuecomparisons of these two indicators for the simulated sam-ples from each of the 9 distribution scenarios

From the ASE comparisons in Figure 2 it can be foundthat the penalized LS-estimation behaves well for the sym-metric error scenarios with small tail weights such as thesimple uniform and standard normal distribution Howeverit cannot cope with the complex conditions such as thethree latter error distributions possessing heavy tails therebybinging about large ASEs which are specially rescaled to

8 Mathematical Problems in Engineering

0

005

01

015

02

025

ASE

for L

S fit

ting

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

25

20

15

10

5

0

(a)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for M

-esti

mat

or

(b)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for S

-esti

mat

or

(c)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

05

1

15

2

25

0

005

01

015

02

025

Med

ian

ASE

LS fittingM-estimatorS-estimator

(d)

Figure 2 Box plots of ASEs using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of Median ASE for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

the additive vertical coordinate on the right so as to beaccommodated with the ASEs results for other error sourcesin the same window (refer to panels (a) and (d) in Figure 2)In comparison the robust penalized 119872-type estimationpresents satisfactory regression fittings for the distributionswith moderate tail weights like (2)sim(5) scenarios which arelocated at the middle in each panel Meanwhile the robust 119878-estimator provides the comparatively meritorious smoothingresults especially for the complicated circumstances like slashCauchy or the asymmetric distributions with heavy tails

In terms of the robustness LS-estimator simply does nothave the qualifications to be mentioned from this aspectInstead the 119872-estimator demonstrates stable performancefor the designed samples with moderate perturbation errorand the 119878-estimator behaves dominantly stable for all kindsof error source scenarios

Unfortunately as for the execution efficiency the mosttime-elapsed one is the 119878-estimator followed by the 119872-estimator with the least consuming estimator being the LS-estimator since it has been treated as the initial curve estimatefor the two former approaches Moreover although both 119878-estimator and119872-estimator are composed of similar iterative

process the random sampling theory is further involved inthe former model for exploring the ultimate solution for anonconvex optimization problem

33 Balloon Data This section is devoted to the comparativeexperiments of the aforementioned penalized spline smooth-ingmethods upon a set of weather balloon data [35]The dataconsist of 4984 observations collected from a balloon at about30 kilometers altitude with the outliers caused due to the factthat the balloon slowly rotates causing the ropes from whichthe measuring instrument is suspended to cut off the directradiation from the sun

As illustrated above Figure 4 displays these balloonobservational data together with the regression fitting curvesby the penalized LS method the robust penalized 119872-estimation method and the robust 119878-estimation methodBesides it can also be explicitly discovered by contrast thatthe non-robust LS regression curve suffers from the dis-turbance of the outliers and this phenomenon seems evenmore significantly around the position of horizontal axisbeing 08 while the other two robust splines pass throughapproximately the majority of the observations Meanwhile

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

Mathematical Problems in Engineering 5

actually be equivalent to a weighted penalized least squaresregression

Before proceeding with the derivation it is forewarnedthat the symbol

119878is simplywritten as120573 just for clarity Firstly

when (21) reaches its minimum the following equationshould be satisfied

119899120576 (120573) nabla120576 (120573) + 120582119863120573 = 0 (25)

On the other hand taking the derivative of the 119872-scalefunction in (22) with respect to 120573 leads to

119899

sum

119894=1

1205881015840(119903119894 (120573)

120576 (120573))(

minus119865(119909119894)119879120576 (120573) minus 119903119894 (120573) nabla120576(120573)

119879

2120576(120573)

) = 0

(26)

inwhich 119903119894(120573) = 119910119894minus119865(119909119894)119879120573 119903119894(120573) = 119903119894(120573)120576(120573) andΛ(120573) =

diag(1205881015840(119903119894(120573))119903119894(120573)) 119894 = 1 119899Deformation of the above formula produces

nabla120576 (120573) = minussum119899

119894=11205881015840(119903119894 (120573) 120576 (120573)) 119865 (119909119894)

sum119899

119894=11205881015840 (119903119894 (120573) 120576 (120573)) (119903119894 (120573) 120576 (120573))

= minus(120576 (120573)X119879Λ (120573) r (120573)r(120573)119879Λ (120573) r (120573)

)

(27)

Substituting nabla120576(120573) of (27) into (25)

minus119899120576(120573)

2

119903(120573)119879Λ (120573) r (120573)

X119879Λ (120573) r (120573) + 120582119863120573 = 0 (28)

Recall that r(120573) = y minus X120573 and import new symbol 120591(120573) =119899120576(120573)

2r(120573)119879Λ(120573)r(120573) we have

minus120591 (120573)X119879Λ (120573) (y minus X120573) + 120582119863120573 = 0 (29)

And thus a further simple deformation arrives at the coeffi-cient vector

119878with an iteration form

120573 = (X119879Λ (120573)X + 120582

120591 (120573)119863)

minus1

X119879Λ (120573) y (30)

which forms the basis of constructing an iterative algorithmfor penalized 119878-regression spline

The above formula derivations verify that 119878is actually

the solution to the following weighted penalized least squaresregression problem

min120573

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

+120582

120591 (120573)120573119879119863120573 (31)

And such representation indicates that the GCV criterion in(11) can be employed straightforwardlywith response variabley = Λ(120573)12y weighted design matrix X = Λ(120573)12X

and the penalty term 120582120591(120573) Meanwhile due to someweight elements being zero the penalty parameter 120582 is thuscalculated based on the following RegularizedGCV criterion

RGCV (120582) = 119899119908

100381710038171003817100381710038171003817Λ(120573)12(y minus X120573)

100381710038171003817100381710038171003817

2

(119899119908 minus tr (120582))2 (32)

where 120582 = X(X119879X + (120582120591(120573))119863)

minus1

X119879 = Λ(120573)12X(X119879Λ(120573)X+(120582120591(120573))119863)minus1X119879Λ(120573)12 and 119899119908 denotes the number of non-zero elements in weight matrix Λ(120573)

Consequently based on the foregoing in-depth dis-cussions the 119878-estimator iterative algorithm for penalizedregression splines can be constructed as in Algorithm 2

In particular it should be pointed out that since theemployed loss function 120588119889 is bounded giving rise to the non-convexity of 120576(120573) the objective function in (21) may haveseveral critical points each of which only corresponds to onelocal minima Therefore the convergent critical point ofthe iterations derived from the above algorithm completelydepends on the prescribed initial regression coefficients vec-tor (0)

in Step 0 of Algorithm 2One possible solution to cope with this issue proposed by

Tharmaratnam et al [29] is based on the random samplingtheory which can be briefly described as firstly creating 119869random subsamples with a proper size for example one fifthof the sample number namely 119899119904 = max(119901 + 119870 + 1 lfloor1198995rfloor)from the original sample observations Each subsam-ple brings about an initial candidate of the coefficient vec-tor for instance assigned by least squares fitting for thepenalized regression splines or the 119872-type estimator witha small amount of additional computation cost with theinitial regression coefficient vector of the 119895th subsamplebeing denoted by

(0)

119895 staring from which the final iterative

convergent vector produced through Algorithm 2 is beingwritten as

119895 119895 = 1 119869 Eventually the global optimum

is taken among all these 119869 convergent vectors to be the onecorresponding to the minimum objective function in (21)namely

119878= argmin1le119895le119869

1198992

120576(⋆

119895) + 120582

⋆119879

119895119863⋆

119895 (33)

Therefore the robust 119878-estimator for penalized regressionsplines can be computed as m119878 = X

119878

3 Experimental Performance Evaluation

This section is devoted to the comparative experimentsupon both simulated synthetic data and one real weatherballoon data set for the foregoing explored penalized splinesmoothing methods including the nonrobust penalized leastsquare regression spline (LS) presented by (7) and two robustsmoothing approaches the 119872-type estimator [27] and 119878-estimation [29] for penalized regression splines as describedin Algorithms 1 and 2

6 Mathematical Problems in Engineering

Step 0 Input an initial regression coefficients (0)

the termination tolerance 120576 and the maximumiteration number Itermax Meanwhile set 119896 = 0 and enter into the following loop iterations

Step 1 Estimate 120576 based on (24) that is

120576 ((119896)

) = MADN +Median(r((119896)

))

Step 2 Compute the intermediate variables 120591((119896)

) and Λ((119896)

)

Step 3 Calculate the coefficient vector (119896+1)

according to (30) as

(119896+1)

= (X119879Λ((119896)

)X + 120582

120591((119896)

)

119863)

minus1

X119879Λ(120573(119896)) y

Hereinto the penalty parameter 120582 is figured out based on the regularized GCV criterion (32)

Step 4 If (119896+1)

minus (119896)

lt 120576(119896)

or 119896 = Itermax terminate and output the convergentcoefficient vector for the penalized regression spline of 119878-estimator

995333

= (119896+1)

elseaccumulate the counter by setting 119896 = 119896 + 1 and repeat Step 1

Algorithm 2 119878-estimator iterative algorithm for penalized regression splines

31 Configuration and Criteria It should be incidentallymentioned that the exponential order 119901 = 3 for truncatedpower basis is always employed for all these three methodsand the knot locations are selected in accordance withthe recommended choice by Ruppert et al [14] 120581119896 =

((119896 + 1)(119870 + 2))th sample quantile of the Unique(x) 119896 =1 119870 together with a simple choice of the knot num-ber 119870 = min(35 (14) times Number of Unique(x)) wherex = (1199091 1199092 119909119899)

119879 Besides the corresponding penaltyparameter 120582 is determined according to the generalizedcross-validation (GCV) criterion for the LS fitting and 119872-estimation methods and the regularized GCV criterion forthe 119878-estimation method Meanwhile the termination toler-ance 120576 = 10minus6 and the maximum iteration number of 100 areassigned for the initial input parameters in Algorithms 1 and2 and 119869 = 5 subsamples are randomly generated for exploringthe ultimate optimum

Meanwhile the hardware equipment and software envi-ronment established for the experiments below are DellXPS 8700 Desktop (CPU Intel Core i7-4770 340GHz)MATAB R2014a installed upon Windows 7 SP1 Ultimate(64 Bit)

32 Synthetic Data The simulated study is designed the sameas in Lee and Oh [27] with the testing function

119910119894 = sin (2120587(1 minus 119909119894)2) + 05120576119894 119894 = 1 119899 (34)

The designed samples are generated with their horizontalaxes being uniformly extracted from the open interval (0 1)and the corresponding error distribution 120576 employing ninepossible scenarios with the probability distribution function(pdf) specified if necessary including (1) uniform distribu-tion 119880(minus1 1) (2) standard normal distribution 119873(0 1) (3)logistic distribution with its pdf as 119890119909(1 + 119890119909)2 (4) doubleexponential distribution with its pdf as ((12)119890minus|119909|) (5) con-taminated normal distribution I 095119873(0 1) + 005119873(0 10)

that is the error constituted with 95 from 119873(0 1) and5 from 119873(0 10) (6) contaminated normal distribution II09119873(0 1) + 01119873(0 10) (7) slash distribution defined fromtwo statistically independent random variables following (1)and (2) that is119873(0 1)119880(0 1) (8) Cauchy distribution withthe pdf as 1120587(1 + 1199092) whose simulated outliers can be gen-erated from uniform distribution by inverse transformationtechnique [34] and (9) an asymmetric normal mixture dis-tribution 09119873(0 1) + 01119873(30 1) For the sake of reducingthe simulation variability the sample size is taken as a fixedvalue 119899 = 200 It should be mentioned incidentally that theabove (1)sim(8) error scenarios are symmetrically distributedand provided with the ascending order according to theheaviness of their tails and scenario (9) denotes the standardnormal distribution mixed with 10 asymmetric outliersfrom119873(30 1)

Figure 1 presents the spline regression curves from thepenalized LS fitting 119872-estimator and 119878-estimator for onetime simulated scattered samples in nine possible scenarioseach of which with a subfigure panel demonstrating the com-parative regression results for these three penalized splinesmoothing methods

For proceeding further quantitative evaluation of thecomparative analysis for these three penalized spline smooth-ingmethods the statistical indicator of average squared error(ASE) is utilized here to qualify the goodness of fittingbetween the regression spline 119895(119909) for the 119895th simulationreplicate from each of the above outlier scenarios and the truesimulated function119898(119909) = sin(2120587(1 minus 119909)2)

ASE119895 =1

119899

119899

sum

119894=1

(119898 (119909119894) minus 119895(119909119894))2

119895 = 1 119869 (35)

with the specified replicate number 119869 = 103 for each distribu-tion

Performances of the penalized spline smoothing meth-ods are thoroughly evaluated from the aspects of the fitting

Mathematical Problems in Engineering 7

0 02 04 06 08 1minus2

minus1

0

1

2 Uniform

(a)

0 02 04 06 08 1minus2

minus1

0

1

2

3 Normal

(b)

0 02 04 06 08 1minus4

minus2

0

2

4 Logistic

(c)

0 02 04 06 08 1minus4

minus2

0

2

4 D-exponential

(d)

0 02 04 06 08 1minus15

minus10

minus5

0

5

10 CN1

(e)

0 02 04 06 08 1minus10

minus5

0

5

10 CN2

(f)

minus20

minus10

0

10

20 Slash

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(g)

minus10

minus5

0

5

10

15 Cauchy

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(h)

minus5

0

5

10

15 Asymmetric

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(i)

Figure 1 Spline regression curves (with true testing function in black solid line) from the penalized LS fitting (blue dotted)119872-estimator(green dot-dashed) and 119878-estimator (red dashed) for the simulated samples (black circle-dot) in 9 possible scenarios

accuracy robustness and execution time In the followingFigures 2 and 3 illustrate the box plots of both ASEs and coderunning times using (a) penalized LS-estimation (b) penal-ized 119872-estimation and (c) penalized 119878-estimation for thespline smoothing methods as well as their median valuecomparisons of these two indicators for the simulated sam-ples from each of the 9 distribution scenarios

From the ASE comparisons in Figure 2 it can be foundthat the penalized LS-estimation behaves well for the sym-metric error scenarios with small tail weights such as thesimple uniform and standard normal distribution Howeverit cannot cope with the complex conditions such as thethree latter error distributions possessing heavy tails therebybinging about large ASEs which are specially rescaled to

8 Mathematical Problems in Engineering

0

005

01

015

02

025

ASE

for L

S fit

ting

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

25

20

15

10

5

0

(a)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for M

-esti

mat

or

(b)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for S

-esti

mat

or

(c)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

05

1

15

2

25

0

005

01

015

02

025

Med

ian

ASE

LS fittingM-estimatorS-estimator

(d)

Figure 2 Box plots of ASEs using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of Median ASE for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

the additive vertical coordinate on the right so as to beaccommodated with the ASEs results for other error sourcesin the same window (refer to panels (a) and (d) in Figure 2)In comparison the robust penalized 119872-type estimationpresents satisfactory regression fittings for the distributionswith moderate tail weights like (2)sim(5) scenarios which arelocated at the middle in each panel Meanwhile the robust 119878-estimator provides the comparatively meritorious smoothingresults especially for the complicated circumstances like slashCauchy or the asymmetric distributions with heavy tails

In terms of the robustness LS-estimator simply does nothave the qualifications to be mentioned from this aspectInstead the 119872-estimator demonstrates stable performancefor the designed samples with moderate perturbation errorand the 119878-estimator behaves dominantly stable for all kindsof error source scenarios

Unfortunately as for the execution efficiency the mosttime-elapsed one is the 119878-estimator followed by the 119872-estimator with the least consuming estimator being the LS-estimator since it has been treated as the initial curve estimatefor the two former approaches Moreover although both 119878-estimator and119872-estimator are composed of similar iterative

process the random sampling theory is further involved inthe former model for exploring the ultimate solution for anonconvex optimization problem

33 Balloon Data This section is devoted to the comparativeexperiments of the aforementioned penalized spline smooth-ingmethods upon a set of weather balloon data [35]The dataconsist of 4984 observations collected from a balloon at about30 kilometers altitude with the outliers caused due to the factthat the balloon slowly rotates causing the ropes from whichthe measuring instrument is suspended to cut off the directradiation from the sun

As illustrated above Figure 4 displays these balloonobservational data together with the regression fitting curvesby the penalized LS method the robust penalized 119872-estimation method and the robust 119878-estimation methodBesides it can also be explicitly discovered by contrast thatthe non-robust LS regression curve suffers from the dis-turbance of the outliers and this phenomenon seems evenmore significantly around the position of horizontal axisbeing 08 while the other two robust splines pass throughapproximately the majority of the observations Meanwhile

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

6 Mathematical Problems in Engineering

Step 0 Input an initial regression coefficients (0)

the termination tolerance 120576 and the maximumiteration number Itermax Meanwhile set 119896 = 0 and enter into the following loop iterations

Step 1 Estimate 120576 based on (24) that is

120576 ((119896)

) = MADN +Median(r((119896)

))

Step 2 Compute the intermediate variables 120591((119896)

) and Λ((119896)

)

Step 3 Calculate the coefficient vector (119896+1)

according to (30) as

(119896+1)

= (X119879Λ((119896)

)X + 120582

120591((119896)

)

119863)

minus1

X119879Λ(120573(119896)) y

Hereinto the penalty parameter 120582 is figured out based on the regularized GCV criterion (32)

Step 4 If (119896+1)

minus (119896)

lt 120576(119896)

or 119896 = Itermax terminate and output the convergentcoefficient vector for the penalized regression spline of 119878-estimator

995333

= (119896+1)

elseaccumulate the counter by setting 119896 = 119896 + 1 and repeat Step 1

Algorithm 2 119878-estimator iterative algorithm for penalized regression splines

31 Configuration and Criteria It should be incidentallymentioned that the exponential order 119901 = 3 for truncatedpower basis is always employed for all these three methodsand the knot locations are selected in accordance withthe recommended choice by Ruppert et al [14] 120581119896 =

((119896 + 1)(119870 + 2))th sample quantile of the Unique(x) 119896 =1 119870 together with a simple choice of the knot num-ber 119870 = min(35 (14) times Number of Unique(x)) wherex = (1199091 1199092 119909119899)

119879 Besides the corresponding penaltyparameter 120582 is determined according to the generalizedcross-validation (GCV) criterion for the LS fitting and 119872-estimation methods and the regularized GCV criterion forthe 119878-estimation method Meanwhile the termination toler-ance 120576 = 10minus6 and the maximum iteration number of 100 areassigned for the initial input parameters in Algorithms 1 and2 and 119869 = 5 subsamples are randomly generated for exploringthe ultimate optimum

Meanwhile the hardware equipment and software envi-ronment established for the experiments below are DellXPS 8700 Desktop (CPU Intel Core i7-4770 340GHz)MATAB R2014a installed upon Windows 7 SP1 Ultimate(64 Bit)

32 Synthetic Data The simulated study is designed the sameas in Lee and Oh [27] with the testing function

119910119894 = sin (2120587(1 minus 119909119894)2) + 05120576119894 119894 = 1 119899 (34)

The designed samples are generated with their horizontalaxes being uniformly extracted from the open interval (0 1)and the corresponding error distribution 120576 employing ninepossible scenarios with the probability distribution function(pdf) specified if necessary including (1) uniform distribu-tion 119880(minus1 1) (2) standard normal distribution 119873(0 1) (3)logistic distribution with its pdf as 119890119909(1 + 119890119909)2 (4) doubleexponential distribution with its pdf as ((12)119890minus|119909|) (5) con-taminated normal distribution I 095119873(0 1) + 005119873(0 10)

that is the error constituted with 95 from 119873(0 1) and5 from 119873(0 10) (6) contaminated normal distribution II09119873(0 1) + 01119873(0 10) (7) slash distribution defined fromtwo statistically independent random variables following (1)and (2) that is119873(0 1)119880(0 1) (8) Cauchy distribution withthe pdf as 1120587(1 + 1199092) whose simulated outliers can be gen-erated from uniform distribution by inverse transformationtechnique [34] and (9) an asymmetric normal mixture dis-tribution 09119873(0 1) + 01119873(30 1) For the sake of reducingthe simulation variability the sample size is taken as a fixedvalue 119899 = 200 It should be mentioned incidentally that theabove (1)sim(8) error scenarios are symmetrically distributedand provided with the ascending order according to theheaviness of their tails and scenario (9) denotes the standardnormal distribution mixed with 10 asymmetric outliersfrom119873(30 1)

Figure 1 presents the spline regression curves from thepenalized LS fitting 119872-estimator and 119878-estimator for onetime simulated scattered samples in nine possible scenarioseach of which with a subfigure panel demonstrating the com-parative regression results for these three penalized splinesmoothing methods

For proceeding further quantitative evaluation of thecomparative analysis for these three penalized spline smooth-ingmethods the statistical indicator of average squared error(ASE) is utilized here to qualify the goodness of fittingbetween the regression spline 119895(119909) for the 119895th simulationreplicate from each of the above outlier scenarios and the truesimulated function119898(119909) = sin(2120587(1 minus 119909)2)

ASE119895 =1

119899

119899

sum

119894=1

(119898 (119909119894) minus 119895(119909119894))2

119895 = 1 119869 (35)

with the specified replicate number 119869 = 103 for each distribu-tion

Performances of the penalized spline smoothing meth-ods are thoroughly evaluated from the aspects of the fitting

Mathematical Problems in Engineering 7

0 02 04 06 08 1minus2

minus1

0

1

2 Uniform

(a)

0 02 04 06 08 1minus2

minus1

0

1

2

3 Normal

(b)

0 02 04 06 08 1minus4

minus2

0

2

4 Logistic

(c)

0 02 04 06 08 1minus4

minus2

0

2

4 D-exponential

(d)

0 02 04 06 08 1minus15

minus10

minus5

0

5

10 CN1

(e)

0 02 04 06 08 1minus10

minus5

0

5

10 CN2

(f)

minus20

minus10

0

10

20 Slash

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(g)

minus10

minus5

0

5

10

15 Cauchy

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(h)

minus5

0

5

10

15 Asymmetric

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(i)

Figure 1 Spline regression curves (with true testing function in black solid line) from the penalized LS fitting (blue dotted)119872-estimator(green dot-dashed) and 119878-estimator (red dashed) for the simulated samples (black circle-dot) in 9 possible scenarios

accuracy robustness and execution time In the followingFigures 2 and 3 illustrate the box plots of both ASEs and coderunning times using (a) penalized LS-estimation (b) penal-ized 119872-estimation and (c) penalized 119878-estimation for thespline smoothing methods as well as their median valuecomparisons of these two indicators for the simulated sam-ples from each of the 9 distribution scenarios

From the ASE comparisons in Figure 2 it can be foundthat the penalized LS-estimation behaves well for the sym-metric error scenarios with small tail weights such as thesimple uniform and standard normal distribution Howeverit cannot cope with the complex conditions such as thethree latter error distributions possessing heavy tails therebybinging about large ASEs which are specially rescaled to

8 Mathematical Problems in Engineering

0

005

01

015

02

025

ASE

for L

S fit

ting

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

25

20

15

10

5

0

(a)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for M

-esti

mat

or

(b)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for S

-esti

mat

or

(c)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

05

1

15

2

25

0

005

01

015

02

025

Med

ian

ASE

LS fittingM-estimatorS-estimator

(d)

Figure 2 Box plots of ASEs using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of Median ASE for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

the additive vertical coordinate on the right so as to beaccommodated with the ASEs results for other error sourcesin the same window (refer to panels (a) and (d) in Figure 2)In comparison the robust penalized 119872-type estimationpresents satisfactory regression fittings for the distributionswith moderate tail weights like (2)sim(5) scenarios which arelocated at the middle in each panel Meanwhile the robust 119878-estimator provides the comparatively meritorious smoothingresults especially for the complicated circumstances like slashCauchy or the asymmetric distributions with heavy tails

In terms of the robustness LS-estimator simply does nothave the qualifications to be mentioned from this aspectInstead the 119872-estimator demonstrates stable performancefor the designed samples with moderate perturbation errorand the 119878-estimator behaves dominantly stable for all kindsof error source scenarios

Unfortunately as for the execution efficiency the mosttime-elapsed one is the 119878-estimator followed by the 119872-estimator with the least consuming estimator being the LS-estimator since it has been treated as the initial curve estimatefor the two former approaches Moreover although both 119878-estimator and119872-estimator are composed of similar iterative

process the random sampling theory is further involved inthe former model for exploring the ultimate solution for anonconvex optimization problem

33 Balloon Data This section is devoted to the comparativeexperiments of the aforementioned penalized spline smooth-ingmethods upon a set of weather balloon data [35]The dataconsist of 4984 observations collected from a balloon at about30 kilometers altitude with the outliers caused due to the factthat the balloon slowly rotates causing the ropes from whichthe measuring instrument is suspended to cut off the directradiation from the sun

As illustrated above Figure 4 displays these balloonobservational data together with the regression fitting curvesby the penalized LS method the robust penalized 119872-estimation method and the robust 119878-estimation methodBesides it can also be explicitly discovered by contrast thatthe non-robust LS regression curve suffers from the dis-turbance of the outliers and this phenomenon seems evenmore significantly around the position of horizontal axisbeing 08 while the other two robust splines pass throughapproximately the majority of the observations Meanwhile

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

Mathematical Problems in Engineering 7

0 02 04 06 08 1minus2

minus1

0

1

2 Uniform

(a)

0 02 04 06 08 1minus2

minus1

0

1

2

3 Normal

(b)

0 02 04 06 08 1minus4

minus2

0

2

4 Logistic

(c)

0 02 04 06 08 1minus4

minus2

0

2

4 D-exponential

(d)

0 02 04 06 08 1minus15

minus10

minus5

0

5

10 CN1

(e)

0 02 04 06 08 1minus10

minus5

0

5

10 CN2

(f)

minus20

minus10

0

10

20 Slash

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(g)

minus10

minus5

0

5

10

15 Cauchy

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(h)

minus5

0

5

10

15 Asymmetric

0 02 04 06 08 1

Data with outliersTrue curveLS fitting

M-estimatorS-estimator

(i)

Figure 1 Spline regression curves (with true testing function in black solid line) from the penalized LS fitting (blue dotted)119872-estimator(green dot-dashed) and 119878-estimator (red dashed) for the simulated samples (black circle-dot) in 9 possible scenarios

accuracy robustness and execution time In the followingFigures 2 and 3 illustrate the box plots of both ASEs and coderunning times using (a) penalized LS-estimation (b) penal-ized 119872-estimation and (c) penalized 119878-estimation for thespline smoothing methods as well as their median valuecomparisons of these two indicators for the simulated sam-ples from each of the 9 distribution scenarios

From the ASE comparisons in Figure 2 it can be foundthat the penalized LS-estimation behaves well for the sym-metric error scenarios with small tail weights such as thesimple uniform and standard normal distribution Howeverit cannot cope with the complex conditions such as thethree latter error distributions possessing heavy tails therebybinging about large ASEs which are specially rescaled to

8 Mathematical Problems in Engineering

0

005

01

015

02

025

ASE

for L

S fit

ting

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

25

20

15

10

5

0

(a)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for M

-esti

mat

or

(b)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for S

-esti

mat

or

(c)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

05

1

15

2

25

0

005

01

015

02

025

Med

ian

ASE

LS fittingM-estimatorS-estimator

(d)

Figure 2 Box plots of ASEs using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of Median ASE for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

the additive vertical coordinate on the right so as to beaccommodated with the ASEs results for other error sourcesin the same window (refer to panels (a) and (d) in Figure 2)In comparison the robust penalized 119872-type estimationpresents satisfactory regression fittings for the distributionswith moderate tail weights like (2)sim(5) scenarios which arelocated at the middle in each panel Meanwhile the robust 119878-estimator provides the comparatively meritorious smoothingresults especially for the complicated circumstances like slashCauchy or the asymmetric distributions with heavy tails

In terms of the robustness LS-estimator simply does nothave the qualifications to be mentioned from this aspectInstead the 119872-estimator demonstrates stable performancefor the designed samples with moderate perturbation errorand the 119878-estimator behaves dominantly stable for all kindsof error source scenarios

Unfortunately as for the execution efficiency the mosttime-elapsed one is the 119878-estimator followed by the 119872-estimator with the least consuming estimator being the LS-estimator since it has been treated as the initial curve estimatefor the two former approaches Moreover although both 119878-estimator and119872-estimator are composed of similar iterative

process the random sampling theory is further involved inthe former model for exploring the ultimate solution for anonconvex optimization problem

33 Balloon Data This section is devoted to the comparativeexperiments of the aforementioned penalized spline smooth-ingmethods upon a set of weather balloon data [35]The dataconsist of 4984 observations collected from a balloon at about30 kilometers altitude with the outliers caused due to the factthat the balloon slowly rotates causing the ropes from whichthe measuring instrument is suspended to cut off the directradiation from the sun

As illustrated above Figure 4 displays these balloonobservational data together with the regression fitting curvesby the penalized LS method the robust penalized 119872-estimation method and the robust 119878-estimation methodBesides it can also be explicitly discovered by contrast thatthe non-robust LS regression curve suffers from the dis-turbance of the outliers and this phenomenon seems evenmore significantly around the position of horizontal axisbeing 08 while the other two robust splines pass throughapproximately the majority of the observations Meanwhile

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

8 Mathematical Problems in Engineering

0

005

01

015

02

025

ASE

for L

S fit

ting

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

25

20

15

10

5

0

(a)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for M

-esti

mat

or

(b)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

005

01

015

02

ASE

for S

-esti

mat

or

(c)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

0

05

1

15

2

25

0

005

01

015

02

025

Med

ian

ASE

LS fittingM-estimatorS-estimator

(d)

Figure 2 Box plots of ASEs using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of Median ASE for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

the additive vertical coordinate on the right so as to beaccommodated with the ASEs results for other error sourcesin the same window (refer to panels (a) and (d) in Figure 2)In comparison the robust penalized 119872-type estimationpresents satisfactory regression fittings for the distributionswith moderate tail weights like (2)sim(5) scenarios which arelocated at the middle in each panel Meanwhile the robust 119878-estimator provides the comparatively meritorious smoothingresults especially for the complicated circumstances like slashCauchy or the asymmetric distributions with heavy tails

In terms of the robustness LS-estimator simply does nothave the qualifications to be mentioned from this aspectInstead the 119872-estimator demonstrates stable performancefor the designed samples with moderate perturbation errorand the 119878-estimator behaves dominantly stable for all kindsof error source scenarios

Unfortunately as for the execution efficiency the mosttime-elapsed one is the 119878-estimator followed by the 119872-estimator with the least consuming estimator being the LS-estimator since it has been treated as the initial curve estimatefor the two former approaches Moreover although both 119878-estimator and119872-estimator are composed of similar iterative

process the random sampling theory is further involved inthe former model for exploring the ultimate solution for anonconvex optimization problem

33 Balloon Data This section is devoted to the comparativeexperiments of the aforementioned penalized spline smooth-ingmethods upon a set of weather balloon data [35]The dataconsist of 4984 observations collected from a balloon at about30 kilometers altitude with the outliers caused due to the factthat the balloon slowly rotates causing the ropes from whichthe measuring instrument is suspended to cut off the directradiation from the sun

As illustrated above Figure 4 displays these balloonobservational data together with the regression fitting curvesby the penalized LS method the robust penalized 119872-estimation method and the robust 119878-estimation methodBesides it can also be explicitly discovered by contrast thatthe non-robust LS regression curve suffers from the dis-turbance of the outliers and this phenomenon seems evenmore significantly around the position of horizontal axisbeing 08 while the other two robust splines pass throughapproximately the majority of the observations Meanwhile

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

Mathematical Problems in Engineering 9

3

4

5

6

7

Run

time f

or L

S fit

ting

(s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

times10minus3

(a)

0

005

01

015

02

025

Run

time f

or M

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(b)

00102030405060708

Run

time f

or S

-esti

mat

or (s

)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

(c)

0005

01015

02025

03035

04

Med

ian

run

time (

s)

Uni

form

Nor

mal

Logi

stic

D-e

xpon

entia

l

CN1

CN2

Slas

h

Cauc

hy

Asy

mm

etric

LS fittingM-estimatorS-estimator

(d)

Figure 3 Box plots of run times using (a) penalized LS-estimation (b) penalized 119872-estimation (c) penalized 119878-estimation and (d) thecomparison of median run time for the simulated samples from each of the 9 distribution scenarios with replicate number 119869 = 103

0 02 04 06 08 10

05

1

15

2

25

Balloon dataLS fitting

M-estimatorS-estimator

Figure 4 Regression fitted curves for the balloon data using penal-ized spline smoothing models including the penalized LS fitting(blue dashed) 119872-estimator (green dot-dashed) and 119878-estimator(red solid line)

the 119878-estimator seems to behave slightly more superior to the119872-estimator since the former accords with the data trendbetter especially at the ridge and two terminal nodes

At the same time under the platform specified in Sec-tion 31 the algorithm execution times of this weatherballoon data set with 4984 observations are 23 s 351 s and589 s for the penalized LS regression the penalized119872-typeestimator and the penalized 119878-estimator respectively

4 Conclusions

This paper conducts a comprehensively comparative analysisof the current two popular robust penalized spline smoothingmethods the penalized119872-type estimator and the penalized119878-estimation both of which are reelaborated starting fromtheir origins with their derivation process reformulated andthe ultimate algorithms reorganized under a unified frame-work

Experiments consist of firstly one simulated syntheticdata set with the outliers in 9 possible scenarios and then apractical weather balloon data set for the aforementionedpenalized regression spline whose performances are thor-oughly evaluated from the aspects of fitting accuracy robust-ness and execution time

Conclusions from these experiments are that robustpenalized spline smoothing methods can be resistant to the

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

10 Mathematical Problems in Engineering

influence of outliers compared with the nonrobust penalizedLS spline fitting method Further the119872-estimator can exertstable performance only for the observed samples withmoderate perturbation error while the 119878-estimator behavesdominantly well even for the cases with heavy tail weightsor higher percentage of contaminations but necessary withmore executive timeTherefore taking both the solving accu-racy and implementation efficiency of the fitted regressionspline into account119872-type estimator is generally preferredwhereas if only the fitting credibility is of concern then the119878-estimator naturally deserves to become the first choice

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Mr Bin WANG is the beneficiary of a doctoral grant fromthe AXA Research Fund Besides the authors acknowl-edge the funding support from the Ministry of Scienceand Technology of China (Project nos 2012BAJ15B04 and2012AA12A305) and the National Natural Science Founda-tion (Project no 41331175)

References

[1] J H Friedman and B W Silverman ldquoFlexible parsimonioussmoothing and additive modelingrdquo Technometrics vol 31 no1 pp 3ndash21 1989

[2] G Marra and R Radice ldquoPenalised regression splines theoryand application to medical researchrdquo Statistical Methods inMedical Research vol 19 no 2 pp 107ndash125 2010

[3] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian and New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[4] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash518 1986

[5] C Kelly and J Rice ldquoMonotone smoothing with applicationto dose-response curves and the assessment of synergismrdquoBiometrics vol 46 no 4 pp 1071ndash1085 1990

[6] P C Besse H Cardot and F Ferraty ldquoSimultaneous non-parametric regressions of unbalanced longitudinal datardquo Com-putational Statistics amp Data Analysis vol 24 no 3 pp 255ndash2701997

[7] C L Mallows ldquoSome comments on Cprdquo Technometrics vol 15no 4 pp 661ndash675 1973

[8] H Akaike ldquoMaximum likelihood identification of Gaussianautoregressive moving average modelsrdquo Biometrika vol 60 no2 pp 255ndash265 1973

[9] C M Hurvich J S Simonoff and C Tsai ldquoSmoothing param-eter selection in nonparametric regression using an improvedAkaike information criterionrdquo Journal of the Royal StatisticalSociety B vol 60 no 2 pp 271ndash293 1998

[10] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash102 1996

[11] M PWand ldquoOn the optimal amount of smoothing in penalisedspline regressionrdquo Biometrika vol 86 no 4 pp 936ndash940 1999

[12] D Ruppert and R J Carroll ldquoSpatially-adaptive penalties forspline fittingrdquo Australian amp New Zealand Journal of Statisticsvol 42 no 2 pp 205ndash223 2000

[13] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[14] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge MassUSA 2003

[15] T Krivobokova Theoretical and Practical Aspects of PenalizedSpline Smoothing Bielefeld University 2006

[16] CM Crainiceanu D Ruppert andM PWand ldquoBayesian anal-ysis for penalized spline regression usingWinBUGSrdquo Journal ofStatistical Software vol 14 no 14 2005

[17] S N Wood ldquoThin plate regression splinesrdquo Journal of the RoyalStatistical Society B Statistical Methodology vol 65 no 1 pp95ndash114 2003

[18] L Zhou and H Pan ldquoSmoothing noisy data for irregularregions using penalized bivariate splines on triangulationsrdquoComputational Statistics vol 29 no 1-2 pp 263ndash281 2014

[19] P J Huber ldquoRobust estimation of a location parameterrdquo Annalsof Mathematical Statistics vol 35 no 1 pp 73ndash101 1964

[20] R R Wilcox Introduction to Robust Estimation and HypothesisTesting Academic Press New York NY USA 2012

[21] P J Huber E Ronchetti and MyiLibrary Robust StatisticsWiley Online Library 1981

[22] D D Cox ldquoAsymptotics for 119872-type smoothing splinesrdquo TheAnnals of Statistics vol 11 no 2 pp 530ndash551 1983

[23] P Rousseeuw and V Yohai ldquoRobust regression by means of S-estimatorsrdquo in Robust and Nonlinear Time Series Analysis vol26 of Lecture Notes in Statistics pp 256ndash272 Springer NewYork NY USA 1984

[24] M Salibian-Barrera and V J Yohai ldquoA fast algorithm for S-regression estimatesrdquo Journal of Computational and GraphicalStatistics vol 15 no 2 pp 414ndash427 2006

[25] M CetIn and O Toka ldquoThe comparing of S-estimator andM-estimators in linear regressionrdquo Gazi University Journal ofScience vol 24 no 4 pp 747ndash752 2011

[26] H Oh D Nychka T Brown and P Charbonneau ldquoPeriodanalysis of variable stars by robust smoothingrdquo Journal of theRoyal Statistical Society C Applied Statistics vol 53 no 1 pp15ndash30 2004

[27] T C M Lee and H Oh ldquoRobust penalized regression splinefitting with application to additive mixed modelingrdquo Computa-tional Statistics vol 22 no 1 pp 159ndash171 2007

[28] R Finger ldquoInvestigating the performance of different estima-tion techniques for crop yield data analysis in crop insuranceapplicationsrdquoAgricultural Economics vol 44 no 2 pp 217ndash2302013

[29] K Tharmaratnam G Claeskens C Croux and M Salibian-Barrera ldquo119878-estimation for penalized regression splinesrdquo Journalof Computational and Graphical Statistics vol 19 no 3 pp 609ndash625 2010

[30] P Craven and G Wahba ldquoSmoothing noisy data with splinefunctions estimating the correct degree of smoothing by themethod of generalized cross-validationrdquo Numerische Mathe-matik vol 31 no 4 pp 377ndash403 1979

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

Mathematical Problems in Engineering 11

[31] D L Donoho and P J Huber ldquoThe notion of breakdown pointrdquoin A Festschrift for Erich L Lehmann pp 157ndash184 1983

[32] R A Maronna and V J Yohai ldquoThe breakdown point of simul-taneous general119872 estimates of regression and scalerdquo Journal ofthe American Statistical Association vol 86 no 415 pp 699ndash703 1991

[33] A E Beaton and JW Tukey ldquoThe fitting of power series mean-ing polynomials illustrated on band-spectroscopic datardquo Tech-nometrics vol 16 no 2 pp 147ndash192 1974

[34] S M Ross A First Course in Probability Pearson Prentice HallSaddle River NJ USA 8th edition 2010

[35] L Davies and U Gather ldquoThe identification of multiple out-liersrdquo Journal of the American Statistical Association vol 88 no423 pp 782ndash801 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Comparative Analysis for Robust Penalized ... · CV ( ) = 1 =1 ... Another more robust and relatively latest approach for smoothing the noisy data is the -estimation

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of