two-parameter ridge regression and its convergence to the eventual pairwise model
TRANSCRIPT
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 1/15
Mathematical and Computer Modelling 44 (2006) 304–318
www.elsevier.com/locate/mcm
Two-parameter ridge regression and its convergence to the eventualpairwise model
Stan Lipovetsky∗
GfK Custom Research Inc., 8401 Golden Valley Road, Minneapolis, MN 55427, United States
Received 22 September 2005; received in revised form 6 January 2006; accepted 9 January 2006
Abstract
A generalization of the one-parameter ridge regression to a two-parameter model and its asymptotic behavior, which has various
better fitting characteristics, is considered. For the two-parameter model, the coefficients of regression, their t -statistics and net
effects, the residual variance and coefficient of multiple determination, the characteristics of bias, efficiency, and generalized
cross-validation are very stable and as the ridge parameter increases they eventually reach asymptotic levels. In particular, the
beta coefficients of multiple two-parameter ridge regression models converge to a solution proportional to the coefficients of
paired regressions. The suggested technique produces robust regression models not prone to multicollinearity, with interpretable
coefficients and other characteristics convenient for analysis of the models.c 2006 Elsevier Ltd. All rights reserved.
Keywords: Ridge regression; Multicollinearity; Robust regression; Paired models
1. Introduction
Regression analysis is one of the main tools of statistical modeling widely used for the prediction of the
dependent variable from a set of independent variables. Regressions are very useful for prediction, but can be not
so helpful for the analysis and interpretation of the role of the individual predictors because of their multicollinearity.
Multicollinearity makes the parameter estimates fluctuate wildly with a negligible change in the sample, yields signs
of the regression coefficients opposite to the signs of the pair correlations, results in theoretically important variables
having insignificant coefficients, causes a reduction in statistical power, and leads to wider confidence intervals for the
coefficients with the result that they could be incorrectly identified as being insignificant [1,2]. For overcoming the
deficiencies of multicollinearity the one-parameter ridge regression is probably the most widely used technique [3–8],
although some other approaches have also been developed [9–12]. The one-parameter ridge model, or Ridge-1, secures
the solution for an ill-conditioned or even degenerate matrix of predictor correlations. However, in comparison with
ordinary least squares (OLS) regression, Ridge-1 has a worse quality of fit. It also does not satisfy the Pythagorean
relationship between the empirical sum of squares and those explained and not explained by the regression. This makes
Ridge-1 similar to non-linear models, with the results more difficult to interpret. Parameter k of Ridge-1 is usually
∗ Tel.: +1 763 542 0800; fax: +1 763 542 0864. E-mail address: [email protected].
0895-7177/$ - see front matter c
2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.mcm.2006.01.017
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 2/15
S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318 305
chosen to be a positive number close to zero, because with a larger k the coefficients of the model quickly approach
zero, and the regression quality characteristics quickly diminish. But with a small k the coefficients of Ridge-1 are
usually similar to those of OLS, and therefore could have opposite signs to paired correlation signs, which can impair
the interpretability of the regressor influence in the model.
In this paper a technique of two-parameter ridge regression, or Ridge-2, is considered, and various characteristics
of ridge solutions are described, including the effective number of parameters, the residual error variance, the
bias and variance of the parameter estimates, and the generalized cross-validation (GCV) of the prediction quality
[13–17]. With all these characteristics Ridge-2 always outperforms the Ridge-1 model. Besides providing a better
approximation, Ridge-2 has good properties of orthogonality between residuals and predicted values of the dependent
variable. Unlike for Ridge-1, increasing the parameter k stabilizes the Ridge-2 coefficients, which eventually converge
to a solution proportional to the paired regressions of the dependent variable by the independent ones. Thus, the
multiple regression coefficients attain the interpretability of the paired relations, and at the same time the quality of fit
remains high.
This paper is organized as follows. Sections 2 and 3 describe the ordinary least squares and Ridge-1 regressions,
respectively. Section 4 considers the generalization to Ridge-2 regression, with several criteria for the choice of the
tuning parameter, and convergence to the eventual pairwise ridge solution. Section 5 presents numerical results, and
Section 6 summarizes.
2. OLS regression
Let us consider briefly the ordinary least squares (OLS) regression model and some of its features. For the
standardized (centered and normalized by standard deviation) variables a multiple linear regression is yi = β1 xi1 +· · · + βn xin + εi , or in matrix form:
y = ˜ y + ε = X β + ε, (1)
where X is an N by n matrix with elements xi j of i th observations (i = 1, . . . , N ) of j th independent variables
( j = 1, . . . , n), y is the vector of observations for the dependent variable, β is the nth-order vector of beta coefficients
for the standardized regression, ˜ y is the value of the dependent variable ˜ y = X β predicted by the model, and ε is
a vector of deviations from the theoretical relationship. The least squares objective for the regression corresponds to
minimizing the sum of squared deviations:
S2 = ε2 = ( y − X β) ( y − X β) = 1 − 2β r + β C β, (2)
where a prime denotes transposition, the variance of the standardized y equals one, y y = 1, the symbols C and r
correspond to the correlation matrix C = X X and the vector of the correlations with the dependent variable r = X y.
The first-order condition of the objective (2) minimized by the vector ∂ S2/∂β = 0 yields a normal system of equations
with the corresponding solution:
C β = r , β = C −1r . (3)
The vector of standardized coefficients of regression β in the OLS solution (3) is defined via the inverse correlation
matrix C −1
. The quality of the model is estimated using the residual sum of squares (2), or using the coefficient of multiple determination defined as:
R2 = 1 − S2 = 2β r − β C β = β (2r − C β). (4)
At the minimum of the objective (2) the coefficient of multiple determination reaches its maximum, which (taking
into account the relations (3)) can be presented in the equivalent forms:
R2 = β r = β C β. (5)
The terms β j r j of the scalar product in (5) are the net effects used to estimate the contribution of the regressors to the
model (1).
Let us describe some properties of OLS regression needed for further consideration. Using the theoretical vector
˜ y,
(1), it is possible to represent the system C β = r (3) as X ˜ y = X y. This means that the vector of paired correlations
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 3/15
306 S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318
of predictors with the empirical y coincides with such a vector of correlations with the theoretical ˜ y. Using residual
errors ε = y − X β = y − ˜ y from (1), it is easy to represent the system X ˜ y = X y as follows:
X ε = 0. (6)
This shows the orthogonality between each regressor xk (columns in matrix X ) and the vector of residual errors. For
non-standardized variables, when the first column in the design matrix represents an identity vector e (correspondingto the intercept in the regression), the first equation in the system (6) expresses the equality of total errors to zero,
eε = 0, which also represents the equality of means for empirical and theoretical vectors, ¯ y = ¯̃ y. A scalar product
of beta coefficients with the vectors in (6) shows that the relation of orthogonality with errors also holds for the
theoretical ˜ y:
β X ε = ˜ yε = 0. (7)
Using ε = y − ˜ y in (7) yields the equality:
y ˜ y = ˜ y ˜ y = β X X β = R2, (8)
or the projection of the empirical on the theoretical vector of the dependent variable equals the theoretical sum of
squares for vector ˜ y, which is the coefficient (5) of multiple determination. The sum of squares
y y = ( ˜ y + ε)( ˜ y + ε) = ˜ y ˜ y + 2 ˜ yε + εε, (9)
using relations y y = 1 from (2), the coefficient (8), the orthogonality (7), and the residual sum of squares (2), can be
represented as:
1 = R2 + S2. (10)
It is a Pythagorean relationship between the unit of the original empirical sum of squares and the sum of squares
explained ( R2) and not explained (S2) by the regression. This decomposition of the empirical standardized squared
norm of the dependent variable is very convenient for estimating the model quality using the total F -statistic defined
by the ratio of variances for the regression and the residual portions in (10).If any regressors are highly correlated or multicollinear, matrix C (3) becomes ill-conditioned, so its determinant
is close to zero, and the inverse matrix in (3) produces a solution with highly inflated values of the coefficients
of regression. The values of these coefficients often have signs opposite to the corresponding pair correlations of
regressors with the dependent variable, presumably important variables become statistically insignificant, and some
of the net effects become negative. Such a regression can be used for prediction, but it becomes useless for the analysis
and interpretation of the predictors’ role in the model.
3. Ridge-1 regression
One-parameter ridge regression is widely used for overcoming the difficulties of multicollinearity. For the
standardized variables, adding to the least squares objective (2) a penalizing function of the squared norm for the
vector of regression coefficients (which prevents their inflation) yields a conditional objective:
S2 = ε2 + k βrd2 = 1 − 2β rdr + β
rdC βrd + k β rdβrd, (11)
where βrd denotes a vector of the ridge regression estimates for the coefficients in (1), and k is a positive parameter.
Minimizing the objective (11) using vector βrd yields a system of equations and its corresponding solution as follows:
(C + k I )βrd = r , βrd = (C + k I )−1r , (12)
where I is the identity matrix of nth order. This solution exists even for a singular matrix of correlations C . If k = 0
the Ridge-1 model (11) and (12) reduces to the OLS regression model (2) and (3).
Ridge-1 has proved to be very helpful in applied regression modeling, although its results lack the quality of fit
of a regular regression, and cannot always be clearly interpreted because its properties are more similar to those of a
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 4/15
S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318 307
non-linear regression. For instance, the coefficient of multiple determination (4) with the solution (12) is:
R2rd = β
rd(2r − C βrd) = 2r (C + k I )−1r − r (C + k I )−1C (C + k I )−1r . (13)
But r (C + k I )−1r = r (C + k I )−1C (C + k I )−1r , or in other notation
β rd
r =
β rd
C βrd, (14)
so the coefficient of multiple determination (13) of Ridge-1 cannot be reduced to an expression similar to (5) for the
OLS model. With the theoretical vector ˜ yrd = X βrd from the Ridge-1 solution, the relation (14) can be represented as
y ˜ yrd = ˜ yrd ˜ yrd, so Ridge-1 does not satisfy the relations (5) and (10) holding for the OLS regression.
The eigenproblem Ca = λa for the matrix of correlations among the regressors yields the spectral decompositionC = A diag(λ j ) A, where A is the matrix containing the eigenvectors a j in its columns, and diag(λ j ) is a diagonal
matrix of the eigenvalues λ j . By the eigenproblem results, the Ridge-1 solution (12) can be presented as follows:
βrd = A diag((λ j + k )−1) Ar . (15)
Increasing k pushes the Ridge-1 solution (15) to zero proportionally to the reciprocal of the k value. The coefficient
of multiple determination (13) can be represented as:
R2rd = r A diag
2
λ j + k − λ j
(λ j + k )2
Ar = r A diag
λ j + 2k
(λ j + k )2
Ar
= r (C + k I )−1(C + 2k I )(C + k I )−1r = β rdC βrd + 2k β
rdβrd. (16)
So R2rd = β
rdC βrd, as in the case of OLS (5). The quality of fit measured by the coefficient of multiple determination
for the Ridge-1 model, (16), reaches zero proportionally to the reciprocal of k .Some other characteristics of the Ridge-1 model include effective number of parameters, residual error variance,
bias and variance of the parameter estimates, and generalized cross-validation (GCV) of the prediction quality. The
effective number of parameters is:
n∗ = Tr((C + k I )−1
C ) =n
j=1
λ j
λ j + k , (17)
where Tr denotes the trace of the matrix. For k = 0 the value (17) coincides with the total number n of the predictors,
and with a larger k the value of n∗ diminishes. The residual error variance is:
S2res = (1 − R2
rd)/( N − n − 1), (18)
where R2rd is defined by (16), and the effective number of parameters n∗, (17), can be used in place of the number n.
The expectation for the vector of parameters is:
M (βrd) = (C + k I )−1 M (r ) = (C + k I )−1 X M ( y) = (C + k I )−1 X M ( ˜ y + ε)
=(C
+k I )−1( X X βrd
+M (ε))
=(C
+k I )−1C βrd,
(19)
with the error expectation M (ε) = 0. For any k > 0 the matrix product (C + k I )−1C does not coincide with the
identity matrix, so the Ridge-1 solution is biased, while the OLS (k = 0) solution is not biased. A measure of the
mean bias can be constructed as the squared norm of the difference between the matrix (19) and the identity matrix
divided by the effective number of parameters, (17):
Bias(βrd) = 1
n∗ (C + k I )−1C − I 2 = 1
n∗ Tr
diag
λ j
λ j + k − 1
2
= k 2
n∗
n j=1
1
(λ j + k )2. (20)
The variance of the parameter estimates is defined via the squared norm of the deviation of the solution (12) from its
expectation (19):
Var(βrd) · n = βrd − M (βrd)2 = β rdβrd − 2β rd M (βrd) + M (β rd) M (βrd). (21)
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 5/15
308 S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318
The generalized cross-validation criterion widely used for choosing k value is defined as:
GCV = N ( I − H ) y2
(Tr( I − H ))2, (22)
where the hat-matrix H corresponds to projection of the empirical to the theoretical vector of the dependent variable
for the Ridge-1 model, or ˜ y = X βrd = X (C + k I )−1 X y ≡ H y. Note that Tr( H ) = Tr( X X (C + k I )−
1), so it equals
the effective number of parameters, (17). Then (22) can be represented explicitly as follows:
GCV = N y − ˜ y2
( N − n∗)2= 1
1 − n∗/ N ·
N i=1
( yi − ˜ yi )2
N − n∗ , (23)
so the meaning of the GCV characteristic is the unbiased residual error variance estimated from the effective number
of parameters. The fraction n∗/ N (the finite population correction) is usually negligible, and the GCV as an unbiased
residual variance has a minimum among the number of variables n∗, or in the k -parameter profile.
4. Ridge-2 regression and its convergence to the eventual ridge model
Consider a generalization of the regularization (11) with two additional items:
S2 = y − X b2 + k 1b2 + k 2 X y − b2 + k 3 y( y − X b)2
= (1 − 2br + bCb) + k 1(bb) + k 2(r r − 2br + bb) + k 3(1 − 2br + brr b).(24)
The vector b is an estimator of the coefficients of regression (1) from the multiple objective (24) with positive
parameters k . The first two items in (24) coincide with those in the Ridge-1 objective (11). The next item with k 2pushes the estimates b closer to the paired correlations r with the dependent variable, which helps to get a solution
with interpretable coefficients. The last item with k 3 expresses the relation yε = y y − y ˜ y = 1 − ˜ y ˜ y = 1 − R2
due to (8), so its minimizing corresponds to obtaining the maximum coefficient of multiple determination. Some other
regularization criteria have also been considered [18–20].Minimization of (24) yields a matrix equation Cb + k 1b + k 2b + k 3rr b = r + k 2r + k 3r . The scalar product
r b can be considered as another constant and combined with the parameter k 3, so this item on the left-hand side is
proportional to vector r and can be transferred to the right-hand side of this equation. Then combining constants on
each side of this equation, it is easy to reduce it to the following system with the corresponding solution:
(C + k I )b = qr , b = q(C + k I )−1r , (25)
where k and q are two new constant parameters. It is a Ridge-2 model that differs from the Ridge-1 (12) only in
the parameter q, so the solution of Ridge-2 is proportional to the Ridge-1 solution. This seemingly minor change
produces, as will be shown, very important improvements.
For a current value of the free ridge parameter k , the value of the second parameter q can be found using a criterion
of maximum quality of the regression fit. For such a criterion, consider first the coefficient of multiple determination
(4). Substituting solution (25) into (4) yields a function quadratic in q:
˜ R2 = 2q[r (C + k I )−1r ] − q2[r (C + k I )−1C (C + k I )−1r ] ≡ 2q Q1 − q2 Q2, (26)
where ˜ R2 denotes the coefficient of multiple determination for the Ridge-2 model, Q1 and Q2 denote two quadratic
forms in the brackets of (26). The value q where this concave function reaches its maximum is:
q = Q1
Q2= r (C + k I )−1r
r (C + k I )−1C (C + k I )−1r . (27)
So the second parameter in the solution (25) is uniquely defined as a quotient of two quadratic forms dependent on
one parameter k . Substituting q from (27) into (26) yields the maximum coefficient of multiple determination, as well
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 6/15
S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318 309
as its representation in the following two equivalent forms:
˜ R2 = [r (C + k I )−1r ]2
r (C + k I )−1C (C + k I )−1r = r b = bCb. (28)
This expression shows that the coefficient of multiple determination for Ridge-2 is presented similarly to the OLS
regression (5).Using the theoretical vector ˜ y = X b it is easy to see that the right-hand side equality in (28) presents the property
y ˜ y = ˜ y ˜ y for Ridge-2, similar to (8) for OLS regression. This equality leads to ˜ yε = 0, so (7) and all the relations
(8)–(10) are true for Ridge-2. Conditions of orthogonality of the predictors with errors (6) are not fulfilled exactly, but
numerical evaluations show that the deviations from orthogonality for Ridge-2 are always very close to zero. Instead
of orthogonality for each predictor (6), there exists orthogonality of their linear combination with the weights of the
coefficients of Ridge-2 regression. Indeed, rewriting ˜ yε = 0 as b X ε = 0 shows that the scalar product of the vector
of solution b with the vector X ε of deviations from orthogonality equals zero.
Thus, in the Ridge-2 solution (25) the term k serves for regularization of an ill-conditioned matrix, and the term q
is used for tuning the fit quality. The coefficient of multiple determination for the Ridge-2 model with the additional
parameter is always bigger than that of the Ridge-1 model. Using the term (27) in (25) yields the Ridge-2 solution in
the explicit form:
b = r (C + k I )−1r
r (C + k I )−1C (C + k I )−1r (C + k I )−1r . (29)
Consider the behavior of this solution with the parameter k increasing. In the limit of big k , the matrix C + k I
gets a dominant diagonal, so the inverse matrix (C + k I )−1 reduces to the scalar matrix k −1 I . The terms k −2 in
the numerator and denominator of (29) are mutually cancelled, and the Ridge-2 solution eventually converges to an
asymptote independent of k :
b = k −2r r
k −2r Cr r =
r r
r Cr
r ≡ γ r , (30)
where the constant γ is defined as the always positive ratio of two quadratic forms. Thus, unlike for the Ridge-1 solution (12) diminishing to zero, the coefficients of the Ridge-2 solution become proportional to the elements
of the vector r (30) of the pairwise regressions with beta coefficients equal the pair correlations of y with each
regressor. The signs of the Ridge-2 coefficients b coincide with the signs of the pair correlations r . This guarantees
clear interpretability of the eventual Ridge-2 solution, and the positive net effect contributions b j r j = γ r 2 j (5) of the
regressors to the coefficient of multiple determination (28).
Behaviors of the tuning parameter q (27) and of the coefficient of multiple determination (28) with k increasing
are as follows, respectively:
q = k −1r r
k −2r Cr = γ k , ˜ R2 = k −2(r r )2
k −2r Cr = γ (r r ), (31)
with the same constant γ as in (30). Thus, eventually q is linearly proportional to k , but the coefficient of multipledetermination with k increasing goes to a constant independent of k , so the quality of fit is not decreasing. The constant
of proportionality in (30) and (31) can be estimated using the squared norm defined by the maximum eigenvalue of
the eigenproblem Ca = λa, so:
γ = r r
r Cr ≥ r 2
r 2C = 1
λmax. (32)
Numerical runs support the above-described features of the eventual ridge regression. For increasing parameter k ,
the Ridge-2 coefficient of multiple determination ˜ R2 (28) stays consistently, (31), close to the maximum R2 value,
(5), of the OLS model, while the Ridge-1 coefficient R2rd, (16), quickly diminishes to zero. This means that in Ridge-2
modeling, k can be increased without losing the quality of regression until the asymptotic solution (30) is reached,
with the interpretable coefficients of multiple regression proportional to the pair correlations of y with the x’s.
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 7/15
310 S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318
Consider characteristics of Ridge-2 regression similar to those of the relations (17)–(22) of Ridge-1. The effective
number of parameters n∗2 for Ridge-2 is defined as (17) multiplied by the term q , (27). For big k the term q becomes
proportional to it, (31), and using (32) yields:
n∗2 = qTr((C + k I )−1C ) = k
λmax
Tr(C )
k = n
λmax. (33)
So the effective number of parameters for Ridge-2 reaches an asymptote. The residual error variance (18) for Ridge-2
can be found from the coefficient of multiple determination, (28). Using its eventual reach, (31), and the effective
number (33), it becomes:
S2res = (1 − r r /λmax)/( N − n/λmax − 1), (34)
so for big k the residual variance is also stabilized.
The expectation for the Ridge-2 parameters equals the Ridge-1 expression (19) multiplied by q:
M (b) = q(C + k I )−1 M (r ) = q(C + k I )−1 X M ( ˜ y + ε) = q(C + k I )−1Cb. (35)
Unlike for (19) where the bias increases with the k value, the bias of Ridge-2 coefficients stabilizes. Indeed, due to(31) and (32), the expression (35) reduces to M (b) = λ−1maxCb which does not depend on k . The mean bias for Ridge-2
can be constructed similarly to that for Ridge-1, (20). Eventually for big k , this expression goes to a constant:
Bias(b) = 1
n∗2
q(C + k I )−1C − I 2 = λmax
nTr
diag
λ j
λmax− 1
2
, (36)
where the relations (31)–(33) are used. It is interesting to note that the scalar product of the expectation (35) with the
vector of correlations yields exactly the expression (28):
r M (b) = qr (C + k I )−1C b = q2r (C + k I )−1C (C + k I )−1r = ˜ R2, (37)
where q , (28), and b, (25), are used. So in spite of the bias, the coefficient of multiple determination for Ridge-2 isthe same for the vector of solution b (25) and for its expectation M (b) (35). This property does not hold for Ridge-1
regression. The efficiency of the solution is defined according to the variance of the parameter estimates for Ridge-2,
similarly to the case for Ridge-1, (21), via the squared norm of deviation of the solution (25) from its expectation,
(35):
Var(b) · n = b − M (b)2 = bb − 2qb(C + k I )−1Cb + q2b(C + k I )−2C 2b. (38)
The variance can be estimated using the effective number of parameters (33) in place of the total n.
The tuning parameter q can not only be found from the criterion (26) of the maximum for the coefficient of multiple
determination. Consider a criterion of a minimum variance of the coefficient estimates, (38), with b, (25), when (38)
reduces to:
Var(b) · n = q2[r (C + k I )−2r ] − 2q3[r (C + k I )−3Cr ] + q4[r (C + k I )−4C 2r ]≡ q2 Q3 − 2q3 Q4 + q4 Q5,
(39)
where Q3, Q4, and Q5 denote the corresponding quadratic forms in the brackets (this notation is used to distinguish
the quadratic forms in (39) from those in (26) and (27)). Note that the matrix C and the matrices of its functions
are commutative because their sets of eigenvectors are the same. The derivative of (39) with respect to q yields
an equation 2q(Q3 − 3q Q4 + 2q2 Q5) = 0, and the only root defining the local minimum of the variance (39) is
q = (3Q4 +
9Q24 − 8Q3 Q5)/(4Q5). This q guarantees maximum efficiency of the estimates and can be used in the
solution of (25) and (26) and in other characteristics of the Ridge-2 regression.
The tuning parameter q can also be found by minimization of the GCV criterion (22) applied to Ridge-2. The
estimated value of the dependent variable for the solution of (25) is ˜ y = X b = q X (C + k I )−1 X y ≡ H 2 y, where the
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 8/15
S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318 311
projection hat-matrix for the Ridge-2 model is denoted as H 2. Substituting it into the GCV, (22), yields:
GCV = N 1 − 2q[r (C + k I )−1r ] + q2[r (C + k I )−1C (C + k I )−1r ]
( N − qTr((C + k I )−1C ))2= N
1 − 2q Q1 + q2 Q2
( N − qn∗)2, (40)
where Q1 and Q2 denote the same quadratic forms as in (26), and n∗ is the effective number of parameters, (17). The
expression in the numerator of (40) corresponds to the residual sum of squares 1 − ˜ R2
, with ˜ R2
defined in (26), sothe GCV criterion (40) can be interpreted as an unbiased residual variance for the Ridge-2 model. Minimizing (40)
using the parameter q yields the expression q = (Q1 − n∗/ N )/(Q2 − Q1n∗/ N ). The fraction n∗/ N is usually small
and diminishing with k increase; this expression is actually close to q = Q1/Q2 — that is, the same q parameter as
in (27). This guarantees the predictive optimum due to GCV criterion (40). Numerical simulations demonstrate that
the results of Ridge-2 modeling are very similar with any criterion, (26) and (39), or (40), used to define the tuning
parameter q .
All regular statistical characteristics of a multiple regression can be constructed for the Ridge-2 model as well,
including the variance–covariance matrix for the coefficient estimates and their t -statistics. The F -statistic for Ridge-
2 regression with the property of regular decomposition (10) can be defined similarly to OLS regression via the
coefficient of multiple determination. The transformation of the standardized coefficients b to the coefficients a in the
original units can be performed in the same way as for the regular regression, and the Ridge-2 solution is invariantwith respect to the scale transformation of the variables, similarly to the OLS regression.
Thus, Ridge-2 can reach the eventual pair regression coefficients in multiple regression without losing the quality
of fit and prediction. Suppose all paired correlations in r are positive, or the scales of the predictors with negative
correlations with y can be reversed to make all correlations positive. The positive solution in the ridge modeling can
be achieved when the matrix C + k I with k increasing begins to satisfy the conditions of the Farkas lemma, or some
other known criteria [21,22]. It is always possible to obtain the positive solution simply by increasing k , and no matter
how large k is for the Ridge-2 model, the eventual model is practically at the same level of quality of fit.
5. Numerical examples
Consider numerical runs for various characteristics of ridge models traced by means of the increasing k value.For example, the cars data set from [23] is used, which is also available in [24] (“cu.summary” data file). The data
contains dimensions and mechanical specifications for 111 different cars, supplied by the manufacturers and measured
for consumer reports. The variables are: y — weight, pounds; x1 — length overall, inches; x2 — wheel base length,
inches; x3 — width, inches; x4 — height, inches; x5 — front–head distance between the car’s head-liner and the head
of a 5 ft 9 in. front seat passenger, inches; x6 — rear–head distance between the car’s head-liner and the head of a 5 ft
9 in. rear seat passenger, inches; x7 — front leg room maximum, inches; x8 — rear seating, the fore-and-aft room,
inches; x9 — front shoulder room, inches; x10 — rear shoulder room, inches; x11 — turning circle radius, feet; x12
— displacement of the engine, cubic inches; x13 — HP, the net horsepower; x14 — tank fuel refill capacity, gallons; x15 — HP revolution red line, or the maximum safe engine speed, rpm. The cars’ weights are used for the regression
modeling with the dimension and specification variables. Weight is an aggregate characteristic that has a strong impact
on other cumulative characteristics of a car, such as mileage per gallon, and price (their correlations with weight equal
−0.87 and 0.70, respectively). The aim of the modeling consists of the evaluation of the predictor contribution in theweight variation, which in turn can help to identify the better engineering solutions.
A general view of the main characteristics of regressions is presented in Fig. 1, where various characteristics are
traced by means of the k parameter of ridge regressions. Three variants of the Ridge-2 regressions are presented: with
the tuning parameter q defined by the criteria of maximum coefficient of multiple determination (26), of minimum
variance (39), and of minimum GCV (40). In the curves in Fig. 1, these criteria are denoted as maxR2, minVar , andminGCV . In Fig. 1(A) we see that these three parameters q behave very similarly — they have similar positive constant
slopes (31) as k increases, while for Ridge-1 this parameter is constant, q = 1. The curve for q defined by the maxR2
criterion is higher than the other curves.
In Fig. 1(B) the effective number of parameters is shown. The value of n∗ for Ridge-1 is diminishing, by (17),
while for Ridge-2 it stabilizes, in accordance with (33), which in this case is n∗2
=n/λmax
=15/7.16
=2.09. The
best performance corresponds to the q curve defined by the maxR2 criterion. In Fig. 1(C) the residual error variance
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 9/15
312 S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318
(A) Tuning parameter q. (B) Effective number of parameters.
(C) Residual error variance. (D) Coeff. of multiple determination.
(E) Quotient (beta∗r /R2). (F) Bias of beta estimates.
(G) Variance of beta estimates. (H) Generalized cross-validation.
Fig. 1. Ridge trace of the main regression characteristics.
(18) and (34) (with the effective number of parameters) is shown — it grows quickly for Ridge-1 but becomes stable
for Ridge-2 models. The lowest error variance corresponds to the curve of q defined by the maxR2 criterion.
In Fig. 1(D) the coefficients of multiple determination are presented — they go to the constant asymptote (31)
slightly below the maximum value for Ridge-2, but decrease hyperbolically, (16), for Ridge-1. The best values of the
multiple determination are obtained, naturally, with q defined by the maxR2 criterion. Fig. 1(E) shows the quotient of
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 10/15
S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318 313
the scalar product br divided by the value of the coefficient of multiple determination. Only for the Ridge-2 model
with q defined by maxR2 is this ratio due to (28) identically equal to one. For the other two variants of q with the
minimum variance or GCV, this ratio is less than one but still constant, while for Ridge-1 with inequality (14), this ratio
decreases as k grows. This means that Ridge-2 with q, (27), and Ridge-1 have the best and the worst orthogonality
features, respectively.
Fig. 1(F) presents the bias of the coefficient estimates (20) and (36) for Ridge-1 and Ridge-2 — again, this
characteristic is worsening for the former and is fairly stable for the latter model. Ridge-2 with q defined by maxR2
demonstrates the best results. The next plot, Fig. 1(G), shows the variance of the coefficient estimates (constructed
with the effective number of parameters) — the Ridge-1 variance (21) is quickly growing while the Ridge-2 variance
(38) reaches a constant asymptote. The Ridge-2 curves with q defined by minVar and minGCV have the smallest
coefficient variances, and the curve of maxR2 is very close to them.
And Fig. 1(H) presents the GCV (23) and (40) behaviors — after the minimum, GCV goes speedily up for the
Ridge-1 model, but it stays at practically the same level for the Ridge-2 regressions. This means that the unbiased
estimate of the residual error is worsening for Ridge-1 but is consistent for Ridge-2 along the parameter k profiling.
Here too the Ridge-2 curves with q defined by minVar and minGCV have the smallest unbiased residual variances,
although the curve of maxR2 is practically indistinguishable from them.
Fig. 2 presents the ridge profiles for each of the beta coefficients of the regressions with fifteen regressors. We
see that for each beta coefficient the curves of Ridge-2 with q taken according to the criteria of maxR2, minVar , andminGCV are practically indistinguishable, while Ridge-1 is always distant from them. Along k , the Ridge-2 profiles
of beta coefficients go to constant asymptotes eventually proportional to the paired correlation of y with each x (30),
while the Ridge-1 betas (15) promptly diminish to zero (horizontal line). Several variables ( x5, x8, x9, and x12) have
negative signs for k = 0, or for the OLS model. These signs of the coefficients in multiple regressions are opposite
to the signs of the paired correlations of these x’s with y. At k = 0.05, two of these regressors, x9 and x12, already
have a positive relation to the dependent variable. Eventually, x5 attains a positive coefficient at k = 1.95; then x8 also
becomes positive, beginning at k = 6.6. The last variable x15 is negatively correlated with the dependent variable, so
its coefficient stays negative all along the trace.
The last plot, Fig. 3, presents the ridge trace of the absolute value of the t -statistics for the beta coefficients of
regressions. Again, the results are noticeably better for Ridge-2 than for Ridge-1 models. This is true for two reasons.
First, the beta coefficients themselves have higher absolute values for Ridge-2 than for Ridge-1 regressions. Second,the standard errors of beta coefficients, being proportional to the regression residual errors
√ 1 − R2, are smaller for
Ridge-2 with the higher value of multiple determination than for Ridge-1. For instance, taking k = 10, the Ridge-2
and Ridge-1 coefficients of multiple determination are ˜ R2 ≈ 0.88 and R2rd ≈ 0.55 (Fig. 1(D)). The t -statistics are
reciprocally proportional to the standard errors of the models, and those for the Ridge-2 and Ridge-1 models are
related as
(1 − R2
rd)/(1 − ˜ R2) ≈ 1.9. So the t -values for the beta coefficients are approximately twice as large for
Ridge-2 as for Ridge-1 — see Fig. 3. Most of the coefficient t -value curves go above the level of the critical t -value
(the horizontal line at the value 1.96 corresponding to the 95% confidence level for a two-tailed normal distribution or
approximately for t -statistics for the number of degrees of freedom above a hundred). Only the x8 variable eventually
becomes insignificant in all the models.
Considering all the characteristics described above, we prefer those with q defined by the maxR2 criterion, (27),
since it covers most of the optimal qualities, performs the best as indicated by the coefficient of multiple determination,and possesses the property of orthogonality (28). In some numerical runs the minVar and minGCV criteria could yield
unfeasible solutions, such as a negative q for a particular k value, while the maxR2 criterion (27) works well, always
producing positive q. Thus, the q obtained using the maxR2 criterion is recommended for practical applications.
Table 1 presents results for the regular OLS and Ridge-2 regressions. First, the pair correlations of the dependent
variable with the regressors are shown. In the next three columns the OLS beta coefficients (3), together with their
t -statistics and the net effects, are presented. The variables x5, x8, x9, x12 have opposite signs in their relationship
with the dependent variable in the multiple regression in comparison with correlations because of multicollinearity.
The total of the net effects equals the coefficient (5) of multiple determination R2 = 0.95, and the F -statistic equals
117.7, so the quality of the model is high. The results of Ridge-2 modeling are given on the right-hand side of Table 1.
The choice of q made using the criterion (27). The solution for multiple regression with coefficients of the same signs
as their correlations with the dependent variable is reached with the ridge parameter k = 6.6, and the corresponding
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 11/15
314 S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318
Fig. 2. Ridge trace of the beta coefficients.
tuning parameter is q = 1.98. The next two columns after the Ridge-2 beta coefficients contain their t -statistics (these
are noticeably higher than in the OLS regression), and deviations from orthogonality, (6), which are close to zero. The
last column shows the net effects and their total, equal to the coefficient (28) of multiple determination ˜ R2 = 0.88 (the
F -statistic is 44.9), which is close to the quality of the OLS regression. All net effects are positive and can be compared
according to their contributions. For the corresponding Ridge-1 the inequality (14) is numerically 0.666 = 0.445.
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 12/15
S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318 315
Fig. 3. Ridge trace of the t -statistics for the beta coefficients.
The ratio of coefficients of multiple determination of Ridge-2 and Ridge-1 is 1.33, so the Ridge-2 model performs
better by 33%.
To estimate the stability of the regression quality, the random sampling of 80 observations out of 100 available
observations was applied to a hundred subsets, and the coefficients of multiple determination for the OLS, Ridge-1,
and Ridge-2 models were evaluated. The results are presented on the left-hand side of Table 2. The regular regression
has the highest R2, the Ridge-2 regression follows it closely, and Ridge-1 has noticeably worse results. Also, about
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 13/15
316 S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318
Table 1
Cars data: OLS and Ridge-2 regressions
Variable OLS Ridge-2
Cor Beta t -Stat. Net eff. Beta t -Stat. Orthog. Net eff.
x1 Length .81 .36 5.9 .29 .12 18.2 −.01 .10
x2
Wheel .80 .11 1.7 .09 .11 17.8−
.04 .09
x3 Width .86 .19 2.3 .16 .13 20.2 −.02 .11
x4 Height .44 .32 5.9 .14 .07 9.0 .06 .04
x5 Front.Hd .25 −.05 −1.5 −.01 .02 2.6 −.11 .00
x6 Rear.Hd .34 .01 0.2 .00 .05 6.1 −.03 .01
x7 Front.Leg .31 .02 0.8 .01 .05 5.5 .02 .02
x8 Rear.Seat .15 −.18 −3.6 −.03 .00 0.0 −.15 .00
x9 Front.Shld .82 −.03 −0.4 −.02 .12 18.6 −.02 .09
x10 Rear.Shld .21 .04 0.9 .01 .02 2.2 −.08 .00
x11 Turning .77 .04 0.9 .03 .11 15.7 −.03 .08
x12 Disp. .85 −.09 −1.0 −.07 .13 19.6 .00 .11
x13 HP .70 .35 5.7 .25 .12 15.5 .10 .10
x14 Tank .85 .08 1.5 .07 .13 19.2 .04 .12
x15 HP.revs −.48 −.08 −1.4 .04 −.06 −7.3 .09 .02
R2 .95 .88
Table 2
Cars data: sampling for R2 and the residual root mean square error
Statistic R2 RMSE
OLS k Ridge-1 q Ridge-2 OLS k Ridge-1 q Ridge-2
Min .94 1.50 .66 1.25 .87 135.2 1.50 179.4 1.27 160.6
Max .97 7.00 .88 2.02 .93 302.2 6.50 353.3 1.91 241.1
Mean .95 3.68 .78 1.56 .90 172.3 3.67 270.5 1.56 200.3
Std .01 1.35 .06 0.19 .01 31.9 1.45 41.0 0.21 21.0
half of the observations were randomly picked out, and the other half were used in the forecasting of the dependent
variable using the models. Estimations of the residual root mean square error (RMSE) are presented on the right-
hand side of Table 2. Again, the Ridge-2 results are close to the OLS model, having practically the same predictive
power. But unlike the OLS regression, the Ridge-2 one has all coefficients of regression interpretable, with positive
net effects.
For the second example, a data set from a real study of customer satisfaction with a bank is used. The variables are:
y — overall satisfaction with the bank, x1 — likelihood of using the bank in future, x2 — likelihood of recommending
the bank, x3 — satisfaction with the value provided, x4 — satisfaction with loan experience, x5 — service best fitted
to needs, x6 — feeling like a valued customer, x7 — approval time versus expectations, x8 — kept informed on
financial status, x9 — promptness of communication, x10 — satisfaction with problem processing. There are about
5600 observations, and all the variables are correlated positively. The predictor contribution in the explanation of thedependent variable variance was investigated in this research.
Table 3 (arranged similarly to Table 1) shows that the coefficient of multiple determination R2 = 0.64 (the
F -statistic equals 980), so it is a model of a high quality. However, the variables x7 and x9 have negative net effects,
although each regressor increases the coefficient of determination, which is a multicollinearity effect. The Ridge-2
model reached all positive coefficients with the ridge parameter k = 0.07, and the corresponding second parameter
is q = 1.018. The next columns contain t -statistics for the coefficients, orthogonality values which are very close to
zero, and the net effects and their total, equal to the coefficient of multiple determination ˜ R2 = 0.63 (the F -statistic
is 977). These results are very close to the regular regression, but all have positive net effects that can be used for
comparison of the predictors by means of their contributions to the dependent variable.
To estimate the stability of the regressions, various sampling evaluations were performed. About 400 random
samples of 100 observations were used to evaluate coefficients of multiple determination for the OLS, Ridge-1, and
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 14/15
S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318 317
Table 3
Bank data: OLS and Ridge-2 regressions
Variable OLS Ridge-2
Cor Beta t -Stat. Net eff. Beta t -Stat. Orthog. Net eff.
x1 In Future .53 .02 2.1 .01 .04 3.6 −.01 .02
x2
Recommend .61 .10 8.4 .06 .11 10.0−
.00 .06
x3 Value .71 .18 11.7 .13 .17 14.2 −.00 .12
x4 Loan .72 .29 21.9 .22 .27 24.0 .01 .20
x5 Fit Needs .60 .05 4.1 .03 .06 5.9 −.01 .04
x6 Feeling .70 .22 16.3 .15 .21 18.4 .00 .14
x7 Timely .32 −.01 −0.6 −.00 .00 0.0 −.01 .00
x8 Info .60 .04 2.8 .02 .05 4.3 −.01 .03
x9 Communicate .53 −.01 −0.5 −.00 .01 0.54 −.01 .00
x10 Problems .57 .03 2.6 .02 .04 4.2 −.01 .02
R2 .64 .63
Table 4
Bank data: sampling for R2
and residual root mean square error
Statistic R2 RMSE
OLS k Ridge-1 q Ridge-2 OLS k Ridge-1 q Ridge-2
Min .44 .00 .40 1.00 .43 .86 .00 .86 1.00 .85
Max .84 4.55 .82 1.84 .82 .96 .48 .96 1.10 .96
Mean .67 .67 .64 1.14 .65 .90 .14 .90 1.03 .90
Std .08 .56 .08 0.11 .08 .02 .11 .02 0.02 .02
Ridge-2 models. Table 4 containing descriptive statistics (arranged similarly to Table 2) shows that the R2 results
for regular and ridge models are close; however, in each particular data subset all regular regressions have negative
signs not only for x7 and x9 but also for all of the other predictors. At the same time, the ridge regressions all have
positive coefficients corresponding to the chosen level of the parameter k . Also, half (about 2800) of the observationswere randomly picked up for constructing models, and the other half of the data set was used for the forecasting of
the dependent variable by means of the models and for estimation of the residual root mean square error. Table 4
also presents the RMSE results obtained in 100 samplings using the OLS and ridge regressions. Again, Ridge-2
demonstrates high quality and gains practically the same predicting power as the linear model, having at the same
time interpretable coefficients and positive inputs from all the predictors.
6. Summary
One-parameter ridge regression uses a regularization term to reduce the influence of multicollinearity on the
regression coefficients. The two-parameter ridge regression considered uses an additional term for tuning the quality
of the model to the highest possible level. In contrast to the one-parameter ridge model, the two-parameter ridgeregression has such useful properties as orthogonality between the variables and residual errors, less bias and a more
efficient solution, higher precision of fitness, and it encompasses other helpful features of linear regression. The results
of the two-parameter ridge regression are robust, not prone to multicollinearity, and are easily interpretable. With
increase of the ridge parameter the model eventually reaches the asymptotic level with beta coefficients of multiple
regression proportional to the coefficients of pair regressions, which guarantees positive shares of the variables’
individual roles in the model. The suggested approach would be useful for theoretical consideration and could serve
in numerous practical aims of regression modeling and analysis.
Acknowledgements
The author thanks the reviewers for their suggestions that improved the paper.
7/31/2019 Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
http://slidepdf.com/reader/full/two-parameter-ridge-regression-and-its-convergence-to-the-eventual-pairwise 15/15
318 S. Lipovetsky / Mathematical and Computer Modelling 44 (2006) 304–318
References
[1] A. Grapentine, Managing multicollinearity, Journal of Marketing Research 9 (1997) 11–21.
[2] C.H. Mason, W.D. Perreault, Collinearity, power, and interpretation of multiple regression analysis, Journal of Marketing Research 28 (1991)
268–280.
[3] A.E. Hoerl, R.W. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12 (1970) 55–67.
[4] A.E. Hoerl, R.W. Kennard, Ridge Regression, in: S. Kotz, N.L. Johnson (Eds.), Encyclopedia of Statistical Sciences, vol. 8, 1988, pp. 129–136.
[5] A.E. Hoerl, R.W. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics 42 (2000) 80–86.
[6] A.E. Hoerl, R.W. Kennard, K.F. Baldwin, Ridge regression: some simulation, Communications in Statistics A4 (1975) 105–124.
[7] P.J. Brown, Measurement, Regression and Calibration, Oxford University Press, Oxford, 1994.
[8] J.S. Chipman, Linear restrictions, rank reduction, and biased estimation in linear regression, Linear Algebra and its Applications 289 (1999)
55–74.
[9] S. Lipovetsky, M. Conklin, Multiobjective regression modifications for collinearity, Computers and Operations Research 28 (2001)
1333–1345.
[10] S. Lipovetsky, M. Conklin, Enhance-synergism and suppression effects in multiple regression, Mathematical Education in Science and
Technology 35 (2004) 391–402.
[11] S. Lipovetsky, M. Conklin, Decision making by variable contribution in discriminant, logit, and regression analyses, Information Technology
and Decision Making 3 (2004) 265–279.
[12] S. Lipovetsky, M. Conklin, Regression by data segments via discriminant analysis, Journal of Modern Applied Statistical Methods 4 (2005)
63–74.
[13] G.H. Golub, M. Heath, G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21 (1979)215–223.
[14] P. Craven, G. Wahba, Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized
cross-validation, Numerical Mathematics 31 (1979) 317–403.
[15] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York,
2001.
[16] D.M. Hawkins, X. Yin, A faster algorithm for ridge regression of reduced rank data, Computational Statistics & Data Analysis 40 (2002)
253–262.
[17] R.J. Santos, A.R. De Pierro, A cheaper way to compute Generalized Cross-Validation as a stopping rule for linear stationary iterative methods,
Journal of Computational and Graphical Statistics 12 (2003) 417–433.
[18] S. Lipovetsky, M. Conklin, Dual- and triple-mode matrix approximation and regression modelling, Applied Stochastic Models in Business
and Industry 19 (2003) 291–301.
[19] S. Lipovetsky, M. Conklin, Multi-objective modeling and two-parameter ridge regression, in: Proceedings of Joint Statistical Meeting of ASA,
Toronto, Canada, 2004, pp. 2511–2517.
[20] S. Lipovetsky, M. Conklin, Ridge regression in two parameter solution, Applied Stochastic Models in Business and Industry 21 (2005)525–540.
[21] B.D. Craven, Mathematical Programming and Control Theory, Chapman and Hall, London, 1978.
[22] R. Redheffer, All solutions in the unit cube, The American Mathematical Monthly 107 (2000) 868–869.
[23] J.M. Chambers, T.J. Hastie, Statistical Models in S, Wadsworth and Brooks, Pacific Grove, CA, 1992.
[24] MathSoft, S-PLUS’2000, Seattle, WA, 1999.