researcharticle revision: variance inflation in regressionpeople.virginia.edu/~der/pdf/vif.pdf ·...

15
Hindawi Publishing Corporation Advances in Decision Sciences Volume 2013, Article ID 671204, 15 pages http://dx.doi.org/10.1155/2013/671204 Research Article Revision: Variance Inflation in Regression D. R. Jensen 1 and D. E. Ramirez 2 1 Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA 2 Department of Mathematics, University of Virginia, P.O. Box 400137, Charlottesville, VA 22904, USA Correspondence should be addressed to D. E. Ramirez; [email protected] Received 12 October 2012; Accepted 10 December 2012 Academic Editor: Khosrow Moshirvaziri Copyright © 2013 D. R. Jensen and D. E. Ramirez. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Variance Inflation Factors (VIFs) are reexamined as conditioning diagnostics for models with intercept, with and without centering regressors to their means as oſt debated. Conventional VIFs, both centered and uncentered, are flawed. To rectify matters, two types of orthogonality are noted: vector-space orthogonality and uncorrelated centered regressors. e key to our approach lies in feasible Reference models encoding orthogonalities of these types. For models with intercept it is found that (i) uncentered VIFs are not ratios of variances as claimed, owing to infeasible Reference models; (ii) instead they supply informative angles between subspaces of regressors; (iii) centered VIFs are incomplete if not misleading, masking collinearity of regressors with the intercept; and (iv) variance deflation may occur, where ill-conditioned data yield smaller variances than their orthogonal surrogates. Conventional VIFs have all regressors linked, or none, oſten untenable in practice. Beyond these, our models enable the unlinking of regressors that can be unlinked, while preserving dependence among those intrinsically linked. Moreover, known collinearity indices are extended to encompass angles between subspaces of regressors. To reaccess ill-conditioned data, we consider case studies ranging from elementary examples to data from the literature. 1. Introduction Given {Y = X + } of full rank with zero-mean, uncorrelated and homoscedastic errors, the equations {X X = X Y} yield the OLS estimators for = [ 1 ,..., ] as unbiased with dispersion matrix ( ) = 2 V and V= [ ] = (X X) −1 . Ill-conditioning, as near dependent columns of X, exerts profound and interlinked consequences, causing “crucial elements of X X to be large and unstable,” “creating inflated variances”; estimates excessive in magnitude, irregular in sign, and “very sensitive to small changes in X”; and unstable algorithms having “degraded numerical accuracy.” See [13] for example. Ill-conditioning diagnostics include the condition number 1 (X X), the ratio of its largest to smallest eigenvalues, and the Variance Inflation Factors {VIF( )= ;1≤≤} with U = X X, that is, ratios of actual ( ) to “ideal” (1/ ) variances, had the columns of X been orthogonal. On scaling the latter to unit lengths and ordering {VIF( )= ;1≤ ≤ } as { 1 2 ≥ ⋅⋅⋅ ≥ }, 1 is identified in [4] as “the best single measure of the conditioning of the data.” In addition, the bounds 1 1 (X X) ≤ ( 1 +⋅⋅⋅+ ) of [5] apply also in stepwise regression as in [69]. Users deserve to be apprised not only that data are ill conditioned, but also about workings of the diagnostics themselves. Accordingly, we undertake here to rectify long- standing misconceptions in the use and properties of VIFs, necessarily retracing through several decades to their begin- nings. To choose a model, with or without intercept, is sub- stantive, is specific to each experimental paradigm, and is beyond the scope of the present study. Whatever the choice, fixed in advance in a particular setting, these models follow on specializing = +1, then = , respectively, as 0 :{Y = X 0 + } with intercept, and M :{Y = X + } without, where X 0 =[1 , X]; X =[X 1 , X 2 ,..., X ] comprise vectors of regressors; and = [ 0 , ] with 0 as inter- cept and = [ 1 ,..., ] as slopes. e VIFs as defined are essentially undisputed for models in w , as noted in [10], serving to gage effects of nonorthogonal regressors

Upload: others

Post on 18-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

Hindawi Publishing CorporationAdvances in Decision SciencesVolume 2013 Article ID 671204 15 pageshttpdxdoiorg1011552013671204

Research ArticleRevision Variance Inflation in Regression

D R Jensen1 and D E Ramirez2

1 Department of Statistics Virginia Tech Blacksburg VA 24061 USA2Department of Mathematics University of Virginia PO Box 400137 Charlottesville VA 22904 USA

Correspondence should be addressed to D E Ramirez dervirginiaedu

Received 12 October 2012 Accepted 10 December 2012

Academic Editor Khosrow Moshirvaziri

Copyright copy 2013 D R Jensen and D E Ramirez This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Variance Inflation Factors (VIFs) are reexamined as conditioning diagnostics for models with intercept with and without centeringregressors to their means as oft debated Conventional VIFs both centered and uncentered are flawed To rectify matters two typesof orthogonality are noted vector-space orthogonality and uncorrelated centered regressorsThe key to our approach lies in feasibleReference models encoding orthogonalities of these types For models with intercept it is found that (i) uncentered VIFs are notratios of variances as claimed owing to infeasible Reference models (ii) instead they supply informative angles between subspacesof regressors (iii) centered VIFs are incomplete if not misleading masking collinearity of regressors with the intercept and (iv)variance deflation may occur where ill-conditioned data yield smaller variances than their orthogonal surrogates ConventionalVIFs have all regressors linked or none often untenable in practice Beyond these our models enable the unlinking of regressorsthat can be unlinked while preserving dependence among those intrinsically linked Moreover known collinearity indices areextended to encompass angles between subspaces of regressors To reaccess ill-conditioned data we consider case studies rangingfrom elementary examples to data from the literature

1 Introduction

Given Y = X120573+120598 of full rank with zero-mean uncorrelatedand homoscedastic errors the 119901 equations X1015840X120573 = X1015840Yyield the OLS estimators for 120573 = [120573

1 120573

119901]1015840 as unbiased

with dispersion matrix 119881() = 1205902V and V= [119907

119894119895] = (X1015840X)

minus1Ill-conditioning as near dependent columns of X exertsprofound and interlinked consequences causing ldquocrucialelements of X1015840X to be large and unstablerdquo ldquocreating inflatedvariancesrdquo estimates excessive in magnitude irregular insign and ldquovery sensitive to small changes inXrdquo and unstablealgorithms having ldquodegraded numerical accuracyrdquo See [1ndash3]for example

Ill-conditioning diagnostics include the condition number1198881(X1015840X) the ratio of its largest to smallest eigenvalues and

the Variance Inflation Factors VIF(120573119895) = 119906119895119895119907119895119895 1 le 119895 le 119901

with U = X1015840X that is ratios of actual (119907119895119895) to ldquoidealrdquo (1119906

119895119895)

variances had the columns ofX been orthogonal On scalingthe latter to unit lengths and ordering VIF(120573

119895) = 119907

119895119895 1 le

119895 le 119901 as 1198811ge 1198812ge sdot sdot sdot ge 119881

119901 1198811is identified in [4] as

ldquothe best single measure of the conditioning of the datardquo Inaddition the bounds 119881

1le 1198881(X1015840X) le 119901(119881

1+ sdot sdot sdot + 119881

119901) of [5]

apply also in stepwise regression as in [6ndash9]Users deserve to be apprised not only that data are

ill conditioned but also about workings of the diagnosticsthemselves Accordingly we undertake here to rectify long-standing misconceptions in the use and properties of VIFsnecessarily retracing through several decades to their begin-nings

To choose a model with or without intercept is sub-stantive is specific to each experimental paradigm and isbeyond the scope of the present study Whatever the choicefixed in advance in a particular setting these models followon specializing 119901 = 119896 + 1 then 119901 = 119896 respectively as1198720 Y = X

0120579 + 120598 with intercept and M

119908 Y = X120573 + 120598

without where X0= [1119899X] X = [X

1X2 X

119896] comprise

vectors of regressors and 1205791015840

= [12057301205731015840

] with 1205730as inter-

cept and 1205731015840 = [1205731 120573

119896] as slopes The VIFs as defined

are essentially undisputed for models in 119872w as noted in[10] serving to gage effects of nonorthogonal regressors

2 Advances in Decision Sciences

as ratios of variances In contrast a yet unresolved debatesurrounds the choice of conditioning diagnostics for mod-els in 119872

0 namely between uncentered regressors giving

VIF119906s for [120573

0 1205731 120573

119896] and regressors X

119894centered to their

means giving VIF119888s for [120573

1 120573

119896] Specifically VIF

119906(120573119895) =

119906119895119895119907119895119895 119895 = 0 1 119896 on taking U = X1015840

0X0and V =

(X10158400X0)minus1 In contrast letting R = [119877

119894119895] be the correlation

matrix for = [1205731 120573

119896]1015840

and Rminus1 = [119877119894119895] its inverse then

centered versions are VIF119888(120573119895) = 119877119895119895 1 le 119895 le 119896 for slopes

onlyIt is seen here that (i) these differ profoundly in regard

to their respective concepts of orthogonality (ii) objectivesand meanings differ accordingly (iii) sharp divisions trace tomuddling these concepts and (iv) this distinction assumesa pivotal role here Over time a large body of widely heldbeliefs conjectures intrinsic propositions and conventionalwisdom has accumulated much from flawed logic some tobe dispelled here The key to our studies is that VIFs tobe meaningful must compare actual variances to those ofan ldquoidealrdquo second-moment matrix as reference the latter toembody the conjectured type of orthogonality This differsbetween centered and uncentered diagnostics and for bothtypes requires the reference matrix to be feasible An outlinefollows

Our undertaking consists essentially of four partsThe first is a literature survey of some essentials of ill-conditioning to include the divide between centered anduncentered diagnostics and conventional VIFs The struc-ture of orthogonality is examined next Anomalies inusage meaning and interpretation of conventional VIFs areexposed analytically and through elementary and transparentcase studies Long-standing but ostensible foundations inturn are reassessed and corrected through the constructionof ldquoReference modelsrdquo These are moment arrays constrainedto encode orthogonalities of the types considered Neitherarray returns the conventional VIF

119906s nor VIF

119888s Direct

rules for finding the amended Reference models are givenpreempting the need for constrained numerical algorithmsFinally studies of ill-conditioned data from the literature arereexamined in light of these findings

2 Preliminaries

21 Notation Designate byR119901 the Euclidean 119901-space Matri-ces and vectors are set in bold type the transpose and inverseof A are A1015840 and Aminus1 and A(119894119895) refers on occasion to the(119894 119895) element of A Special arrays are the identity I

119901 the unit

vector 1119901= [1 1 1]

1015840isin R119901 the diagonal matrix D

119886=

Diag(1198861 119886

119901) and B

119899= (I119899minus 119899minus1111989911015840119899) as idempotent of

rank 119899 minus 1 The Frobenius norm of x isin R119901 is x = (x1015840x)12ForX of order (119899times119901) and rank 119901 le 119899 a generalized inverse isdesignated as Xdagger its ordered singular values as 120590(X) = 120585

1ge

1205852

ge sdot sdot sdot ge 120585119901

gt 0 and by S119901(X) sub R119899 the subspace of

R119899 spanned by the columns of X By accepted conventionits condition number is 119888

2(X) = 120585

1120585119901 specifically 119888

2(X) =

[1198881(X1015840X)]

12 For our model Y = X120573 + 120598 with dispersion

119881(120598)=1205902I119899 we take 120590

2= 10 unless stated otherwise since

variance ratios are scale invariant

22 Case Study 1 A First Look That anomalies pervadeconventional VIF

119906s and VIF

119888s may be seen as follows Given

1198720 119884119894= 1205730+12057311198831+12057321198832+120598119894 the designX

0= [15X1X2]

of order (5 times 3)U=X10158400X0 and its inverse V=(X1015840

0X0)minus1 as in

X10158400= [

[

1 1 1 1 1

0 05 05 1 1

minus1 1 1 0 0

]

]

U = [

[

5 3 1

3 25 1

1 1 3

]

]

V = [

[

072222 minus088889 005556

minus088889 155556 minus022222

005556 minus022222 038889

]

]

(1)

Conventional centered and uncentered VIFs are

[VIF119888(1205731) VIF

119888(1205732)] = [10889 10889] (2)

[VIF119906(1205730)VIF

119906(1205731)VIF

119906(1205732)]= [36111 38889 11667]

(3)

respectively the former for slopes only and the latter takingreciprocals of the diagonals of X1015840

0X0as reference

A Critique The following points are basic Note first thatmodel (1) is nonorthogonal in both the centered and uncen-tered regressors

Remark 1 The VIF119906s are not ratios of variances and thus fail

to gage relative increases in variances owing to nonorthog-onal columns of X

0 This follows since the first row and

column of the second-moment matrix X10158400X0= U are fixed

and nonzero by design so that taking X10158400X0to be diagonal as

reference cannot be feasible

Remark 1 runs contrary to assertions throughout the liter-ature In consequence for models in 119872

0the mainstay VIF

119906s

in recent vogue are largely devoid of meaning Subsequentlythese are identified instead with angles quantifying degrees ofmulticollinearity among the regressors

On the other hand feasible Reference models for allparameters as identified later for centered and uncentereddata in Definition 13 Section 42 give

[VF119888(1205730)VF119888(1205731)VF119888(1205732)]= [09913 10889 10889]

(4)

[VF119907(1205730)VF119907(1205731)VF119907(1205732)]= [07704 08889 08889]

(5)

in lieu of conventional VIF119888s and VIF

119906s respectively The

former comprise corrected VIF119888s extended to include the

intercept Both sets in fact are genuine variance inflationfactors as ratios of variances in themodel (1) relative to thosein Reference models feasible for centered and for uncenteredregressors respectively

This example flagrantly contravenes conventional wis-dom (i) variances for slopes are inflated in (4) but for the

Advances in Decision Sciences 3

intercept deflated in comparison with the feasible centeredreference Specifically 120573

0is estimated here with greater

efficiency VF119888(1205730) = 09913 in the initial design (1) despite

nonorthogonality of its centered regressors (ii) Variances areuniformly smaller in the model (1) than in its feasible uncen-tered reference from (5) thus exhibiting Variance Deflationdespite nonorthogonality of the uncentered regressors Afull explication of the anatomy of this study appears laterIn support we next examine technical details needed insubsequent developments

23 Types of Orthogonality The ongoing divergence betweencentered and uncentered diagnostics traces in part to mean-ing ascribed to orthogonality Specifically the orthogonalityof columns of X in 119872

119908refers unambiguously to the vector-

space concept X119894perp X119895 that is X1015840

119894X119895

= 0 as does thenotion of collinearity of regressors with the constant vectorin 1198720 We refer to this as 119881-orthogonality in short 119881perp In

contrast nonorthogonality in 1198720often refers instead to the

statistical concept of correlation among columns of X whenscaled and centered to their means We refer to its negationas 119862-orthogonality or 119862

perp Distinguishing between thesenotions is fundamental as confusion otherwise is evident Forexample it is asserted in [11 p125] that ldquothe simple correlationcoefficient 119903

119894119895does measure linear dependency between 119909

119894

and 119909119895in the datardquo

24 The Models 1198720 Consider 119872

0 Y = X

0120579 + 120598 X

0=

[1119899X] 1205791015840 = [120573

01205731015840] together with

X10158400X0= [

119899 119899X1015840

119899X X1015840X] (X1015840

0X0)minus1

= [11988800

c01

c101584001

G ] (6)

where 119899X1015840 = 11015840119899X 11988800

= [119899 minus 1198992X1015840(X1015840X)

minus1X]minus1 C =

(X1015840Xminus119899XX1015840) =X1015840B119899X andG = Cminus1 from block-partitioned

inverses In later sections we often will denote the mean-centeringmatrix 119899XX1015840 byM In particular the centered formX rarr Z = B

119899X arises exclusively in models with intercept

with or without reparametrizing (1205730120573) rarr (120572120573) in the

mean-centered regressors where 120572 = 1205730+12057311198831+ sdot sdot sdot + 120573

119896119883119896

Scaling Z to unit column lengths gives Z1015840Z in correlationform with unit diagonals

3 Historical Perspective

Our objective is an overdue revision of the tenets of varianceinflation in regression To provide context we next surveyextenuating issues from the literature Direct quotations areintended not to subvert stances taken by the cited authorsModels in 119872

0are at issue since as noted in [10] centered

diagnostics have no place in119872119908

31 Background Aspects of ill-conditioning span a consid-erable literature over many decades Regarding Y = X120573 +120598 scaling columns of X to equal lengths approximatelyminimizes the condition number 119888

2(X) [12 p120] based on

[13] Nonetheless 1198882(X) is cast in [9] as a blunt instrument for

ill-conditioning prompting the need for VIFs and other localconstructs Stewart [9] credits VIFs in concept to Daniel andin name to Marquardt

Ill-conditioning points beyond OLS in view of difficultiescited earlier Remedies proffered in [14 15] include trans-forming variables adding new data and deleting variable(s)after checking critical features of the reduced model Otherintended palliatives include Ridge and Partial Least Squaresas compared in [16] Principal Components regression Sur-rogate models as in [17] All are intended to return reducedstandard errors at the expense of bias Moreover Surrogatesolutions more closely resemble those from an orthogonalsystem than Ridge [18] Together the foregoing and otheroptions comprise a substantial literature as collateral to butapart from the present study

32 To Center Advocacy for centering includes the follow-ing

(i) VIFs often are defined categorically as the diagonalsof the inverse of the correlation matrix of scaled andcentered regressors see [4 9 11 19] for exampleThese are VIF

119888s widely adopted without justification

as default to the exclusion of VIF119906s

(ii) It is asserted [4] that ldquocentering removes the nones-sential ill-conditioning thus reducing the varianceinflation in the coefficient estimatesrdquo

(iii) Centering is advocated when predictor variables arefar removed from origins on the basic data scales [1011]

33 Not to Center Advocacy for uncentered diagnosticsincludes the following caveats from proponents of centering

(i) Uncentered data should be examined only if an esti-mate of the intercept is of interest [9 10 20]

(ii) ldquoIf the domain of prediction includes the full rangefrom the natural origin through the range of data thecollinearity diagnostics should not bemean centeredrdquo[10 p84]

Other issues against centering derive in part from numer-ical analysis and work by Belsley

(i) Belsley [1] identifies 1198882(X) for a system Y = X120573+120598 as

ldquothe potential relative change in the LS solution b thatcan result from a small relative change in the datardquo

(ii) These require structurally interpretable variables asldquoones whose numerical values and (relative) differ-ences derive meaning and interpretability fromknowledge of the structure of the underlying lsquorealityrsquobeing modeledrdquo [1 p75]

(iii) ldquoThere is no such thing as lsquononessentialrsquo ill-condi-tioningrdquo and ldquomean-centering can remove from thedata the information needed to assess conditioningcorrectlyrdquo [1 p74]

(iv) ldquoCollinearity with the intercept can quite generallycorrupt the estimates of all parameters in the model

4 Advances in Decision Sciences

whether or not the intercept is itself of interest andwhether or not the data have been centeredrdquo [21 p90]

(v) An example [22 p121] gives severely ill-conditioneddata perfectly conditioned in centered form ldquocenter-ing alters neitherrdquo inflated variances nor extremelysensitive parameter estimates in the basic data more-over ldquodiagnosing the conditioning of the centereddata (which are perfectly conditioned) would com-pletely overlook this situation whereas diagnosticsbased on the basic data would notrdquo

(vi) To continue from [22] ill-conditioning persists inthe propagation of disturbances in that ldquoa 1 percentrelative change in theX

119894rsquos results in over a 40 relative

change in the estimatesrdquo despite perfect conditioningin centered form and ldquoknowledge of the effect ofsmall relative changes in the centered data is notmeaningful for assessing the sensitivity of the basic LSproblemrdquo since relative changes and their meaningsin centered anduncentered data oftendiffermarkedly

(vii) Regarding choice of origin ldquo(1) the investigator mustbe able to pick an origin against which small relativechanges can be appropriately assessed and (2) it is thedata measured relative to this origin that are relevantto diagnosing the conditioning of the LS problemrdquo[22 p126]

Other desiderata pertain

(i) ldquoBecause rewriting the model (in standardized vari-ables) does not affect any of the implicit estimates ithas no effect on the amount of information containedin the datardquo [23 p76]

(ii) Consequences of ill-advised diagnostics can besevere Degraded numerical accuracy traces to nearcollinearity of regressors with the constant vectorIn short centering fails to prevent a loss in numer-ical accuracy centered diagnostics are unable todiscern these potential accuracy problems whereasuncentered diagnostics are seen to work well Twowidely used statistical packages SAS and SPSS-Xfail to detect this type of ill-conditioning throughuse of centered diagnostics and thus return highlyinaccurate coefficient estimates For further detailssee [3]

On balance formodels in1198720the jury is out regarding the

use of centered or uncentered diagnostics to include VIFsEven Belsley [1] (and elsewhere) concedes circumstanceswhere centering does achieve structurally interpretable mod-els Of note is that the foregoing listed citations to Belsleyapply strictly to condition numbers 119888

2(X) other purveyors of

ill-conditioning specifically VIFs are not treated explicitly

34 A Synthesis It bears notice that (i) the origin how-ever remote from the cluster of regressors is essential forprediction and (ii) the prediction variance is invariant toparametrizing in centered or uncentered forms Additionalremarks are codified next for subsequent referral

Remark 2 Typically 119884 represents response to input vari-ables 119883

1 1198832 119883

119896 In a controlled experiment levels

are determined beforehand by subject-matter considerationsextraneous to the experiment to include minimal valuesHowever remote the origin on the basic data scales it seemsinformative in such circumstances to identify the originwith these minima In such cases the intercept is often ofsingular interest since 120573

0is then the standard against which

changes in 119884 are to be gaged as regressors vary We adopt thisconvention in subsequent studies from the archival literature

Remark 3 In summary the divergence in views whetherto center or not appears to be that critical aspects of ill-conditioning known and widely accepted for models in119872

119908

have been expropriated over decades mindlessly and withoutverification to apply point-by-point for models in119872

0

4 The Structure of Orthogonality

This section develops the foundations for Reference modelscapturing orthogonalities of types 119881

perp and 119862perp Essential

collateral results are given in support as well

41 Collinearity Indices Stewart [9] reexamines ill-condi-tioning from the perspective of numerical analysis Detailsfollow where X is a generic matrix of regressors havingcolumns x

119895 1 le 119895 le 119901 and Xdagger = (X1015840X)

minus1X1015840 is thegeneralized inverse of note having xdagger

119895 1 le 119895 le 119901 as

its typical rows Each corresponding collinearity index 120581 isdefined in [9 p72] as

120581119895=10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817 1 le 119895 le 119901 (7)

constructed so as to be scale invariant Observe that xdagger1198952 is

found along the principal diagonal of [(Xdagger)(Xdagger)1015840] = (X1015840X)minus1

WhenX(119899times119896) inX0= [1119899X] is centered and scaled Section

3 of [9] shows that the centered collinearity indices 120581119888satisfy

1205812

119888119895= VIF

119888(120573119895) 1 le 119895 le 119896 In 119872

0with parameters (120573

0

120573) values corresponding to xdagger1198952 from Xdagger

0= (X10158400X0)minus1X10158400

lie along the principal diagonal of (X10158400X0)minus1 the uncentered

collinearity indices 120581119906now satisfy 120581

2

119906119895= x1198952sdot xdagger1198952

119895 =

0 1 119896 In particular since x0

= 1119899 we have 120581

2

1199060=

119899xdagger02 Moreover in119872

0it follows that the uncentered VIF

119906s

are squares of the collinearity indices that is VIF119906(120573119895) =

1205812

119906119895 119895 = 0 1 119896 Note the asymmetry that VIF

119888s exclude

the intercept in contrast to the inclusive VIF119906s That the

label Variance Inflation Factors for the latter is a misnomer iscovered in Remark 1 Nonetheless we continue the familiarnotation VIF

119906(120573119895) = 1205812

119906119895 119895 = 0 1 119896

Transcending the essential developments of [9] are con-nections between collinearity indices and angles between

Advances in Decision Sciences 5

subspaces To these ends choose a typical x119895in X0 and

rearrange X0as [x119895X[119895]] We next seek elements of

Q1015840119895(X10158400X0)minus1

Q119895= [

x1015840119895x119895

x1015840119895X[119895]

X1015840[119895]x119895X1015840[119895]X[119895]

]

minus1

(8)

as reordered by the permutation matrix Q119895 From the clock-

wise rule the (1 1) element of the inverse is

[x1015840119895(I119899minus P119895) x119895]minus1

= [x1015840119895x119895minus x1015840119895P119895x119895]minus1

=10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817

2 (9)

in succession for each 119895 = 0 1 119896 where P119895

=

X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]

is the projection operator onto the sub-space S

119901(X[119895]) sub R119899 spanned by the columns of X

[119895] These

in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly

for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-

conditioning as follows

Theorem 4 For models in 1198720let VIF

119906(120573119895) = 120581

2

119906119895 119895 = 0

1 119896 be conventional uncentered VIF119906s in terms of Stew-

artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows

(i) Angles between (x119895S119901(X[119895])) are given by 120579

119895=

arccos[(1minus11205812uj)12

] in succession for 119895 = 0 1 119896

(ii) In particular 1205790

= arccos[(1 minus 11205812u0)12

] quantifiesthe degree of collinearity between the regressors and theconstant vector

(iii) Similarly let B119899X = Z = [z

1 z

119896] be regressors

centered to their means rearrange as [z119895Z[119895]] and let

VIF119888(120573119895) = 120581

2

119888119895 119895 = 1 119896 be centered VIF

119888s in

terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z

119895S119901(Z[119895])) are given by

120579119895= arccos[(1 minus 11205812cj)

12] 1 le j le k

Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x

1198952

= P119895x1198952+

x119895minus P119895x1198952 where x

119895minus P119895x1198952

= RS119895is the residual sum

of squares from the projection It follows that the principalangle between (x

119895P119895x119895) is given by

cos (120579119895)=

x1015840119895P119895x119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895

10038171003817100381710038171003817

=

10038171003817100381710038171003817P119895x119895

1003817100381710038171003817100381710038171003817100381710038171003817x119895

10038171003817100381710038171003817

=radic1 minusRS119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817

2=radic1 minus

1

1205812119906119895

(10)

for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x

0P0x0) with x

0= 1119899and

P0= X(X1015840X)

minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z

119895P119895z119895) where P

119895

now is the projection operator onto the subspace S119901(Z[119895]) sub

R119899 spanned by the columns of Z[119895] to complete our proof

Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579

119895lt 300 deg

42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881

perp and 119862perp The keys

are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872

119908with moment matrix X1015840X

our opening paragraph prescribes as reference the modelR119881

= D = Diag(11988911 119889

119901119901) as diagonal elements of

X1015840X for assessing 119881perp-orthogonality Moreover on scaling

columns of X to equal lengths R119881is perfectly conditioned

with 1198881(R119881) = 10 In addition every model in 119872

119908clearly

conforms with its Reference in the sense that R119881is positive

definite as distinct from models in1198720to follow

Consider again models in 1198720as in (6) let C = (X1015840X minus

M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889

11 119889

119896119896) comprise the diagonal elements of

X1015840X

(i) 119881perpReference ModelThe uncentered VIF119906s in119872

0 defined

as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of

diagonal elements of X10158400X0 appear to have seen exclusive

usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1

Theorem 6 Statements that conventional VIF119906s quantify var-

iance inflation owing to nondiagonalX10158400X0are false for models

in1198720having X = 0

Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840

0X0 this usage is predicated on the false

postulate that X10158400X0can be diagonal for X = 0 Specifically

[1119899X] are linearly independent that is 11015840

119899X = 0 if and only

if X has been mean centered beforehand

To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the

price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590

2119899 and Var(sdot | X = 0) =

120590211988800

ge 1205902119899 from (6) their ratio is shown in [25] to be

1205812

1199060= 11989911988800

ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573

0| X = 0) = 120590

211988800

whereas Var( | X = 0) = 1205902119899 with 120572 = 120573

0+ 12057311198831+

sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find

the conventional VIF119906(1205730) to be useful for purposes to follow

Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579

0=

arccos[(1 minus 11205812u0)12

] as an informative gage on the extent of

6 Advances in Decision Sciences

this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity

Instead of conventional VIF119906s given the foregoing dis-

claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact

Definition 8 Given a model in 1198720with second-moment

matrixX10158400X0 the amendedReferencemodel for assessing119881perp-

orthogonality is R119881

= [ 119899 119899X1015840

119899X D] with D = Diag(119889

11 119889

119896119896)

as diagonal elements of X1015840X provided that R119881is positive

definite We identify a model to be 119881perp-orthogonal when

X10158400X0= R119881

As anticipated a prospective R119881

fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601

119894designates

the angle between (1119899X119894) 119894 = 1 119896

Lemma 9 Take R119881

as a prospective Reference for 119881perp-

orthogonality

(i) In order that R119881maybe positive definite it is necessary

that 119899X1015840Dminus1X lt 1 that is that 119899sum119896

119894=1(1198832

119894119889119894119894) lt 1

(ii) Equivalently it is necessary thatsum119896119894=1

cos2(120601119894) lt 1with

120601119894as the angle between (1

119899Xi) 119894 = 1 119896

(iii) The Reference variance for 1205730is Var(120573

0| X1015840X = D) =

[119899(1 minus 119899X1015840Dminus1X)]minus1

(iv) The Reference variances for slopes are given by

Var (120573119894| X1015840X = D) =

1

119889119894119894

[1 +1205741198832

119894

119889119894119894

] 1 le 119894 le 119896 (11)

where 120574 = 119899(1 minus 119899X1015840Dminus1X)

Proof The clockwise rule for determinants gives |R119881| =

|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0

The computation

cos (120601119894) =

11015840119899X119894

1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894

1003817100381710038171003817=

11015840119899X119894

radic119899119889119894119894

= radic1198991198832

119894

119889119894119894

(12)

in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1

119881

is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On

identifying a = b = X 120572 = 119899 and 119886lowast

119894= 119886119894119889119894119894 in Theorem

833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof

Corollary 10 For the case 119896 = 2 in order that R119881maybe

positive definite it is necessary that 1206011+ 1206012gt 90 deg

Proof Beginning with Lemma 9(ii) compute

cos (1206011+ 1206012)

= cos1206011cos1206012minus sin120601

1sin1206012

= cos1206011cos1206012minus[1minus(cos2120601

1+cos2120601

2)+cos2120601

1cos21206012]12

(13)

which is lt0 when 1 minus (cos21206011+ cos2120601

2) gt 0

Moreover the matrix R119881itself is intrinsically ill conditioned

owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896

Lemma 11 Let R119881= [ 119899 T1015840

T 119888I119896

] as in Definition 8 with T = 119899XD = 119888I

119896 and 119888 gt 0

(i) The ordered eigenvalues of R119881are [120582

1 119888 119888 120582

119901]

where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin

[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591

2= T1015840T

(ii) The roots are positive andR119881is positive definite if and

only if (119899119888 minus 1205912) gt 0

(iii) If R119881

is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591

2

Proof Eigenvalues are roots of the determinantal equation

Det [(119899 minus 120582) T1015840T (119888 minus 120582) I

119896

]

= 0 lArrrArr (119888 minus 120582)119896minus1

[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0

(14)

from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582

2minus (119899 + 119888)120582 + 119899119888 minus 120591

2= 0

to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591

2) and the greater

root is positive Conclusion (iii) follows directly to completeour proof

(119894119894) 119862perp Reference Model As noted the notion of 119862

perp-or-thogonality applies exclusively for models in 119872

0 Accord-

ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862

perp-orthogonality and to anticipate notation forsubsequent use we have the following

Definition 12 (i) For a model in 1198720with second-moment

matrix X10158400X0and inverse as in (6) let C

119908= (W minus M) and

identify an indicator vector G = [11989212 11989213 119892

1119896 11989223

1198922119896 119892

119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic

order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C

119908is zero

and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC

119908is (X1015840119894X119895minus119899119883119894119883119895)

that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B

119899X as the regressors centered to their means

Advances in Decision Sciences 7

(ii) In particular the Reference model in1198720for assessing

119862perp-orthogonality is R

119862= [ 119899 119899X

1015840

119899X W] such that C

119908= (W minus

M) and its inverse G from (6) are diagonal that is takingW = [119908

119894119895] such that 119908

119894119894= 119889119894119894 1 le 119894 le 119896 and 119908

119894119895=

119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this

case we identify the model to be 119862perp-orthogonal

Recall that conventional VIF119888s for [120573

1 1205732 120573

119896] are

ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872

0implies that

Z1015840Z is diagonal However the converse fails and instead isembodied in R

119862isin 1198720 In consequence conventional VIF

119888s

are deficient in applying only to slopes whereas the VF119888s

resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573

119896]

Definition 13 Designate VF119907(sdot) and VF

119888(sdot) as Variance Fac-

tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs

Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that

they are translation invariant as well To these ends we shiftthe columns of X=[X

1X2 X

119896] to a new origin through

X rarr X minus L = H where L = [11988811119899 11988821119899 119888

1198961119899] The

resulting model is

Y = (12057301119899+ L120573) +H120573 + 120598

lArrrArr Y = (1205730+ 120595) 1

119899+H120573 + 120598

(15)

thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888

119896120573119896)

Corresponding to X0= [1119899X] is U

0= [1119899H] and to X1015840

0X0

and its inverse are

U10158400U0= [

119899 11015840119899H

H10158401119899

H1015840H] (U10158400U0)minus1

= [11988700

b01

b101584001

G ] (16)

in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows

Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the

shifted version Y = (1205730+ 120595)1

119899+ H120573 + 120576 both in 119872

0 Then

the matrices G appearing in (6) and (16) are identical

Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of

(H1015840H minus 119899minus1H10158401

11989911015840119899H) = H1015840 (I

119899minus 119899minus1111989911015840119899)H

= H1015840B119899H = X1015840B

119899X

(17)

since B119899L = 0 to complete our proof

These facts in turn support subsequent claims that cen-tered VIF

119888s are translation and scale invariant for slopes

1015840

=

[1205731 1205732 120573

119896] apart from the intercept 120573

0

43 A Critique Again we distinguish VIF119888s and VIF

119906s from

centered and uncentered regressorsThe following commentsapply

(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840

0X0

orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862

perp are exclusive andhence work at crossed purposes

(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881

perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C

119908(119894119895) = X1015840

119894X119895minus 119899119883119894119883119895=

0 and X1015840119894X119895

= 0 Conversely if X indeed is 119881perp-

orthogonal so that X1015840X = D= Diag(11988911 119889

119896119896)

then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF

119888s are tenuous

(C3) Conventional VIF119906s based on uncentered X in 119872

0

with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840

0X0can be diagonal

(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868

= VIF119895(minus119868)

VIF119895 1 le 119895 le 119896 are studied in [27]

for assessing the impact of subsets on variance infla-tion Here VIF

119895is from the full data and VIF

119895(minus119868)on

deleting observations in the index set 119868 Similarly [28]proposes using VIF

(119894)on deleting the 119894th observation

In the present context these would gain substance onmodifying VF

119907and VF

119888accordingly

5 Case Study 1 Continued

51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872

0 119884119894= 1205730+12057311198831+

12057321198832+120598119894 of Section 22 the designX

0= [15X1X2] of order

(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in

expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840

0X0nor the

centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as

listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X

0 Indeed the only

opportunity for 119881perp-orthogonality here is between columns

[X1X2]

Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS

119895= x

119895minus P119895x1198952 119895 = 0 1 2 fur-

ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579

0between the constant vector

and the span of the regressor vectors is 1205790= 31751 deg as

anticipated in Remark 7

52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix

U (119909) = [

[

5 3 1

3 25 119909

1 119909 3

]

]

(18)

8 Advances in Decision Sciences

Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X

1198952 for 1205812

119906119895= VI F

119906(120573119895) for (1 minus 1120581

2

119906119895)12 and for 120579

119895 for 119895 = 0 1 2

119895 RS119895

X1198952

1205812

119895(1 minus 1120581

2

119906119895)12

120579119895(deg)

0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792

Invoking Definition 8 we seek a 119881perp-orthogonal Reference

with moment matrix U(0) having X10158401X2

= 0 that is 119863(0)This is found constructively on retaining [1

5 ∙X2] in X

0

but replacing X1by X01

= [1 05 05 1 0]1015840 as listed in

Table 2 giving the design 119863(0) with columns [15X01X2] as

ReferenceAccordingly R

119881= U(0) is ldquoas 119881

perp-orthogonal as it cangetrdquo given the lengths and sums of (X

1X2) as prescribed

in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]

1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =

234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances

Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]

minus1

= 09375

Var (1205731|X1015840X=D)=

1

11988911

[1+1205741198832

1

11988911

]=2

5[1+

072120574

5]=17500

Var (1205732|X1015840X=D)=

1

11988922

[1+1205741198832

2

11988922

]=1

3[1+

004120574

3]=04375

(19)

As Reference these combine with actual variances from119881 = [119881

119894119895] at (1) giving VF

119907for the original designX

0relative

to R119881 For example VF

119907(1205730) = 0722209375 = 07704 with

11988111

= 07222 from (1) and 09375 from (19) This contrastswith VIF

119906(1205730) = 36111 in (3) as in Section 22 and Table 2

it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881

perp-orthogonal R119881

namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]

of our basic design are listed for comparison in Table 2 where119909 = X1015840

1X2for the various designs The designs themselves are

exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]

from Table 2 and substituting each into the template X00

=

[15 ∙X2] The design 119863(2) was constructed but not listed

since its X10158400X0is not invertible Clearly these are actual

designs amenable to experimental deployment

53 119862perp-Orthogonality Continuing X0

= [15X1X2] and

invoking Definition 12(ii) we seek a 119862perp-orthogonal

Reference having the matrix C119908

as diagonal This is

identified as 119863(06) in Table 2 From this the matrix R119862and

its inverse are

R119862= [

[

5 3 1

3 25 06

1 06 3

]

]

Rminus1119862

= [

[

07286 minus08572 minus00714

minus08571 14286 00

minus00714 00 03571

]

]

(20)

The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-

tional VIF119888s for 120573

1and 120573

2only our VF

119888(1205730) here reflects

Variance Deflationwherein 1205730is estimated more precisely in

the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose

that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF

119888(1205731) =

12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)

from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862

perp-orthogonalAs in Remark 2 we next alter the basic designX

0at (1) on

shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X

1015840

0X0 andits inverse V are

U = [

[

5 42426 42426

42426 5 4

42426 4 5

]

]

V = [

[

1 minus04714 minus04714

minus04714 07778 minus02222

minus04714 minus02222 07778

]

]

(21)

giving conventional VIF119906s as [5 38889 38889] Against 119862perp-

orthogonality this gives the diagnostics

[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]

(22)

demonstrating in comparison with (4) that VF119888s apart from

1205730 are invariant under translation and scaling a consequence

of Lemma 14We further seek to compare variances for the shifted

and scaled (X1X2) against 119881perp-orthogonality as Reference

However from U = X10158400X0at (21) we determine that 119883

1=

084853 = 1198832and 5[(119883

2

15) + (119883

2

25)] = 14400 Since

the latter exceeds unity we infer from Lemma 9(i) that

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 2: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

2 Advances in Decision Sciences

as ratios of variances In contrast a yet unresolved debatesurrounds the choice of conditioning diagnostics for mod-els in 119872

0 namely between uncentered regressors giving

VIF119906s for [120573

0 1205731 120573

119896] and regressors X

119894centered to their

means giving VIF119888s for [120573

1 120573

119896] Specifically VIF

119906(120573119895) =

119906119895119895119907119895119895 119895 = 0 1 119896 on taking U = X1015840

0X0and V =

(X10158400X0)minus1 In contrast letting R = [119877

119894119895] be the correlation

matrix for = [1205731 120573

119896]1015840

and Rminus1 = [119877119894119895] its inverse then

centered versions are VIF119888(120573119895) = 119877119895119895 1 le 119895 le 119896 for slopes

onlyIt is seen here that (i) these differ profoundly in regard

to their respective concepts of orthogonality (ii) objectivesand meanings differ accordingly (iii) sharp divisions trace tomuddling these concepts and (iv) this distinction assumesa pivotal role here Over time a large body of widely heldbeliefs conjectures intrinsic propositions and conventionalwisdom has accumulated much from flawed logic some tobe dispelled here The key to our studies is that VIFs tobe meaningful must compare actual variances to those ofan ldquoidealrdquo second-moment matrix as reference the latter toembody the conjectured type of orthogonality This differsbetween centered and uncentered diagnostics and for bothtypes requires the reference matrix to be feasible An outlinefollows

Our undertaking consists essentially of four partsThe first is a literature survey of some essentials of ill-conditioning to include the divide between centered anduncentered diagnostics and conventional VIFs The struc-ture of orthogonality is examined next Anomalies inusage meaning and interpretation of conventional VIFs areexposed analytically and through elementary and transparentcase studies Long-standing but ostensible foundations inturn are reassessed and corrected through the constructionof ldquoReference modelsrdquo These are moment arrays constrainedto encode orthogonalities of the types considered Neitherarray returns the conventional VIF

119906s nor VIF

119888s Direct

rules for finding the amended Reference models are givenpreempting the need for constrained numerical algorithmsFinally studies of ill-conditioned data from the literature arereexamined in light of these findings

2 Preliminaries

21 Notation Designate byR119901 the Euclidean 119901-space Matri-ces and vectors are set in bold type the transpose and inverseof A are A1015840 and Aminus1 and A(119894119895) refers on occasion to the(119894 119895) element of A Special arrays are the identity I

119901 the unit

vector 1119901= [1 1 1]

1015840isin R119901 the diagonal matrix D

119886=

Diag(1198861 119886

119901) and B

119899= (I119899minus 119899minus1111989911015840119899) as idempotent of

rank 119899 minus 1 The Frobenius norm of x isin R119901 is x = (x1015840x)12ForX of order (119899times119901) and rank 119901 le 119899 a generalized inverse isdesignated as Xdagger its ordered singular values as 120590(X) = 120585

1ge

1205852

ge sdot sdot sdot ge 120585119901

gt 0 and by S119901(X) sub R119899 the subspace of

R119899 spanned by the columns of X By accepted conventionits condition number is 119888

2(X) = 120585

1120585119901 specifically 119888

2(X) =

[1198881(X1015840X)]

12 For our model Y = X120573 + 120598 with dispersion

119881(120598)=1205902I119899 we take 120590

2= 10 unless stated otherwise since

variance ratios are scale invariant

22 Case Study 1 A First Look That anomalies pervadeconventional VIF

119906s and VIF

119888s may be seen as follows Given

1198720 119884119894= 1205730+12057311198831+12057321198832+120598119894 the designX

0= [15X1X2]

of order (5 times 3)U=X10158400X0 and its inverse V=(X1015840

0X0)minus1 as in

X10158400= [

[

1 1 1 1 1

0 05 05 1 1

minus1 1 1 0 0

]

]

U = [

[

5 3 1

3 25 1

1 1 3

]

]

V = [

[

072222 minus088889 005556

minus088889 155556 minus022222

005556 minus022222 038889

]

]

(1)

Conventional centered and uncentered VIFs are

[VIF119888(1205731) VIF

119888(1205732)] = [10889 10889] (2)

[VIF119906(1205730)VIF

119906(1205731)VIF

119906(1205732)]= [36111 38889 11667]

(3)

respectively the former for slopes only and the latter takingreciprocals of the diagonals of X1015840

0X0as reference

A Critique The following points are basic Note first thatmodel (1) is nonorthogonal in both the centered and uncen-tered regressors

Remark 1 The VIF119906s are not ratios of variances and thus fail

to gage relative increases in variances owing to nonorthog-onal columns of X

0 This follows since the first row and

column of the second-moment matrix X10158400X0= U are fixed

and nonzero by design so that taking X10158400X0to be diagonal as

reference cannot be feasible

Remark 1 runs contrary to assertions throughout the liter-ature In consequence for models in 119872

0the mainstay VIF

119906s

in recent vogue are largely devoid of meaning Subsequentlythese are identified instead with angles quantifying degrees ofmulticollinearity among the regressors

On the other hand feasible Reference models for allparameters as identified later for centered and uncentereddata in Definition 13 Section 42 give

[VF119888(1205730)VF119888(1205731)VF119888(1205732)]= [09913 10889 10889]

(4)

[VF119907(1205730)VF119907(1205731)VF119907(1205732)]= [07704 08889 08889]

(5)

in lieu of conventional VIF119888s and VIF

119906s respectively The

former comprise corrected VIF119888s extended to include the

intercept Both sets in fact are genuine variance inflationfactors as ratios of variances in themodel (1) relative to thosein Reference models feasible for centered and for uncenteredregressors respectively

This example flagrantly contravenes conventional wis-dom (i) variances for slopes are inflated in (4) but for the

Advances in Decision Sciences 3

intercept deflated in comparison with the feasible centeredreference Specifically 120573

0is estimated here with greater

efficiency VF119888(1205730) = 09913 in the initial design (1) despite

nonorthogonality of its centered regressors (ii) Variances areuniformly smaller in the model (1) than in its feasible uncen-tered reference from (5) thus exhibiting Variance Deflationdespite nonorthogonality of the uncentered regressors Afull explication of the anatomy of this study appears laterIn support we next examine technical details needed insubsequent developments

23 Types of Orthogonality The ongoing divergence betweencentered and uncentered diagnostics traces in part to mean-ing ascribed to orthogonality Specifically the orthogonalityof columns of X in 119872

119908refers unambiguously to the vector-

space concept X119894perp X119895 that is X1015840

119894X119895

= 0 as does thenotion of collinearity of regressors with the constant vectorin 1198720 We refer to this as 119881-orthogonality in short 119881perp In

contrast nonorthogonality in 1198720often refers instead to the

statistical concept of correlation among columns of X whenscaled and centered to their means We refer to its negationas 119862-orthogonality or 119862

perp Distinguishing between thesenotions is fundamental as confusion otherwise is evident Forexample it is asserted in [11 p125] that ldquothe simple correlationcoefficient 119903

119894119895does measure linear dependency between 119909

119894

and 119909119895in the datardquo

24 The Models 1198720 Consider 119872

0 Y = X

0120579 + 120598 X

0=

[1119899X] 1205791015840 = [120573

01205731015840] together with

X10158400X0= [

119899 119899X1015840

119899X X1015840X] (X1015840

0X0)minus1

= [11988800

c01

c101584001

G ] (6)

where 119899X1015840 = 11015840119899X 11988800

= [119899 minus 1198992X1015840(X1015840X)

minus1X]minus1 C =

(X1015840Xminus119899XX1015840) =X1015840B119899X andG = Cminus1 from block-partitioned

inverses In later sections we often will denote the mean-centeringmatrix 119899XX1015840 byM In particular the centered formX rarr Z = B

119899X arises exclusively in models with intercept

with or without reparametrizing (1205730120573) rarr (120572120573) in the

mean-centered regressors where 120572 = 1205730+12057311198831+ sdot sdot sdot + 120573

119896119883119896

Scaling Z to unit column lengths gives Z1015840Z in correlationform with unit diagonals

3 Historical Perspective

Our objective is an overdue revision of the tenets of varianceinflation in regression To provide context we next surveyextenuating issues from the literature Direct quotations areintended not to subvert stances taken by the cited authorsModels in 119872

0are at issue since as noted in [10] centered

diagnostics have no place in119872119908

31 Background Aspects of ill-conditioning span a consid-erable literature over many decades Regarding Y = X120573 +120598 scaling columns of X to equal lengths approximatelyminimizes the condition number 119888

2(X) [12 p120] based on

[13] Nonetheless 1198882(X) is cast in [9] as a blunt instrument for

ill-conditioning prompting the need for VIFs and other localconstructs Stewart [9] credits VIFs in concept to Daniel andin name to Marquardt

Ill-conditioning points beyond OLS in view of difficultiescited earlier Remedies proffered in [14 15] include trans-forming variables adding new data and deleting variable(s)after checking critical features of the reduced model Otherintended palliatives include Ridge and Partial Least Squaresas compared in [16] Principal Components regression Sur-rogate models as in [17] All are intended to return reducedstandard errors at the expense of bias Moreover Surrogatesolutions more closely resemble those from an orthogonalsystem than Ridge [18] Together the foregoing and otheroptions comprise a substantial literature as collateral to butapart from the present study

32 To Center Advocacy for centering includes the follow-ing

(i) VIFs often are defined categorically as the diagonalsof the inverse of the correlation matrix of scaled andcentered regressors see [4 9 11 19] for exampleThese are VIF

119888s widely adopted without justification

as default to the exclusion of VIF119906s

(ii) It is asserted [4] that ldquocentering removes the nones-sential ill-conditioning thus reducing the varianceinflation in the coefficient estimatesrdquo

(iii) Centering is advocated when predictor variables arefar removed from origins on the basic data scales [1011]

33 Not to Center Advocacy for uncentered diagnosticsincludes the following caveats from proponents of centering

(i) Uncentered data should be examined only if an esti-mate of the intercept is of interest [9 10 20]

(ii) ldquoIf the domain of prediction includes the full rangefrom the natural origin through the range of data thecollinearity diagnostics should not bemean centeredrdquo[10 p84]

Other issues against centering derive in part from numer-ical analysis and work by Belsley

(i) Belsley [1] identifies 1198882(X) for a system Y = X120573+120598 as

ldquothe potential relative change in the LS solution b thatcan result from a small relative change in the datardquo

(ii) These require structurally interpretable variables asldquoones whose numerical values and (relative) differ-ences derive meaning and interpretability fromknowledge of the structure of the underlying lsquorealityrsquobeing modeledrdquo [1 p75]

(iii) ldquoThere is no such thing as lsquononessentialrsquo ill-condi-tioningrdquo and ldquomean-centering can remove from thedata the information needed to assess conditioningcorrectlyrdquo [1 p74]

(iv) ldquoCollinearity with the intercept can quite generallycorrupt the estimates of all parameters in the model

4 Advances in Decision Sciences

whether or not the intercept is itself of interest andwhether or not the data have been centeredrdquo [21 p90]

(v) An example [22 p121] gives severely ill-conditioneddata perfectly conditioned in centered form ldquocenter-ing alters neitherrdquo inflated variances nor extremelysensitive parameter estimates in the basic data more-over ldquodiagnosing the conditioning of the centereddata (which are perfectly conditioned) would com-pletely overlook this situation whereas diagnosticsbased on the basic data would notrdquo

(vi) To continue from [22] ill-conditioning persists inthe propagation of disturbances in that ldquoa 1 percentrelative change in theX

119894rsquos results in over a 40 relative

change in the estimatesrdquo despite perfect conditioningin centered form and ldquoknowledge of the effect ofsmall relative changes in the centered data is notmeaningful for assessing the sensitivity of the basic LSproblemrdquo since relative changes and their meaningsin centered anduncentered data oftendiffermarkedly

(vii) Regarding choice of origin ldquo(1) the investigator mustbe able to pick an origin against which small relativechanges can be appropriately assessed and (2) it is thedata measured relative to this origin that are relevantto diagnosing the conditioning of the LS problemrdquo[22 p126]

Other desiderata pertain

(i) ldquoBecause rewriting the model (in standardized vari-ables) does not affect any of the implicit estimates ithas no effect on the amount of information containedin the datardquo [23 p76]

(ii) Consequences of ill-advised diagnostics can besevere Degraded numerical accuracy traces to nearcollinearity of regressors with the constant vectorIn short centering fails to prevent a loss in numer-ical accuracy centered diagnostics are unable todiscern these potential accuracy problems whereasuncentered diagnostics are seen to work well Twowidely used statistical packages SAS and SPSS-Xfail to detect this type of ill-conditioning throughuse of centered diagnostics and thus return highlyinaccurate coefficient estimates For further detailssee [3]

On balance formodels in1198720the jury is out regarding the

use of centered or uncentered diagnostics to include VIFsEven Belsley [1] (and elsewhere) concedes circumstanceswhere centering does achieve structurally interpretable mod-els Of note is that the foregoing listed citations to Belsleyapply strictly to condition numbers 119888

2(X) other purveyors of

ill-conditioning specifically VIFs are not treated explicitly

34 A Synthesis It bears notice that (i) the origin how-ever remote from the cluster of regressors is essential forprediction and (ii) the prediction variance is invariant toparametrizing in centered or uncentered forms Additionalremarks are codified next for subsequent referral

Remark 2 Typically 119884 represents response to input vari-ables 119883

1 1198832 119883

119896 In a controlled experiment levels

are determined beforehand by subject-matter considerationsextraneous to the experiment to include minimal valuesHowever remote the origin on the basic data scales it seemsinformative in such circumstances to identify the originwith these minima In such cases the intercept is often ofsingular interest since 120573

0is then the standard against which

changes in 119884 are to be gaged as regressors vary We adopt thisconvention in subsequent studies from the archival literature

Remark 3 In summary the divergence in views whetherto center or not appears to be that critical aspects of ill-conditioning known and widely accepted for models in119872

119908

have been expropriated over decades mindlessly and withoutverification to apply point-by-point for models in119872

0

4 The Structure of Orthogonality

This section develops the foundations for Reference modelscapturing orthogonalities of types 119881

perp and 119862perp Essential

collateral results are given in support as well

41 Collinearity Indices Stewart [9] reexamines ill-condi-tioning from the perspective of numerical analysis Detailsfollow where X is a generic matrix of regressors havingcolumns x

119895 1 le 119895 le 119901 and Xdagger = (X1015840X)

minus1X1015840 is thegeneralized inverse of note having xdagger

119895 1 le 119895 le 119901 as

its typical rows Each corresponding collinearity index 120581 isdefined in [9 p72] as

120581119895=10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817 1 le 119895 le 119901 (7)

constructed so as to be scale invariant Observe that xdagger1198952 is

found along the principal diagonal of [(Xdagger)(Xdagger)1015840] = (X1015840X)minus1

WhenX(119899times119896) inX0= [1119899X] is centered and scaled Section

3 of [9] shows that the centered collinearity indices 120581119888satisfy

1205812

119888119895= VIF

119888(120573119895) 1 le 119895 le 119896 In 119872

0with parameters (120573

0

120573) values corresponding to xdagger1198952 from Xdagger

0= (X10158400X0)minus1X10158400

lie along the principal diagonal of (X10158400X0)minus1 the uncentered

collinearity indices 120581119906now satisfy 120581

2

119906119895= x1198952sdot xdagger1198952

119895 =

0 1 119896 In particular since x0

= 1119899 we have 120581

2

1199060=

119899xdagger02 Moreover in119872

0it follows that the uncentered VIF

119906s

are squares of the collinearity indices that is VIF119906(120573119895) =

1205812

119906119895 119895 = 0 1 119896 Note the asymmetry that VIF

119888s exclude

the intercept in contrast to the inclusive VIF119906s That the

label Variance Inflation Factors for the latter is a misnomer iscovered in Remark 1 Nonetheless we continue the familiarnotation VIF

119906(120573119895) = 1205812

119906119895 119895 = 0 1 119896

Transcending the essential developments of [9] are con-nections between collinearity indices and angles between

Advances in Decision Sciences 5

subspaces To these ends choose a typical x119895in X0 and

rearrange X0as [x119895X[119895]] We next seek elements of

Q1015840119895(X10158400X0)minus1

Q119895= [

x1015840119895x119895

x1015840119895X[119895]

X1015840[119895]x119895X1015840[119895]X[119895]

]

minus1

(8)

as reordered by the permutation matrix Q119895 From the clock-

wise rule the (1 1) element of the inverse is

[x1015840119895(I119899minus P119895) x119895]minus1

= [x1015840119895x119895minus x1015840119895P119895x119895]minus1

=10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817

2 (9)

in succession for each 119895 = 0 1 119896 where P119895

=

X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]

is the projection operator onto the sub-space S

119901(X[119895]) sub R119899 spanned by the columns of X

[119895] These

in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly

for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-

conditioning as follows

Theorem 4 For models in 1198720let VIF

119906(120573119895) = 120581

2

119906119895 119895 = 0

1 119896 be conventional uncentered VIF119906s in terms of Stew-

artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows

(i) Angles between (x119895S119901(X[119895])) are given by 120579

119895=

arccos[(1minus11205812uj)12

] in succession for 119895 = 0 1 119896

(ii) In particular 1205790

= arccos[(1 minus 11205812u0)12

] quantifiesthe degree of collinearity between the regressors and theconstant vector

(iii) Similarly let B119899X = Z = [z

1 z

119896] be regressors

centered to their means rearrange as [z119895Z[119895]] and let

VIF119888(120573119895) = 120581

2

119888119895 119895 = 1 119896 be centered VIF

119888s in

terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z

119895S119901(Z[119895])) are given by

120579119895= arccos[(1 minus 11205812cj)

12] 1 le j le k

Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x

1198952

= P119895x1198952+

x119895minus P119895x1198952 where x

119895minus P119895x1198952

= RS119895is the residual sum

of squares from the projection It follows that the principalangle between (x

119895P119895x119895) is given by

cos (120579119895)=

x1015840119895P119895x119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895

10038171003817100381710038171003817

=

10038171003817100381710038171003817P119895x119895

1003817100381710038171003817100381710038171003817100381710038171003817x119895

10038171003817100381710038171003817

=radic1 minusRS119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817

2=radic1 minus

1

1205812119906119895

(10)

for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x

0P0x0) with x

0= 1119899and

P0= X(X1015840X)

minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z

119895P119895z119895) where P

119895

now is the projection operator onto the subspace S119901(Z[119895]) sub

R119899 spanned by the columns of Z[119895] to complete our proof

Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579

119895lt 300 deg

42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881

perp and 119862perp The keys

are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872

119908with moment matrix X1015840X

our opening paragraph prescribes as reference the modelR119881

= D = Diag(11988911 119889

119901119901) as diagonal elements of

X1015840X for assessing 119881perp-orthogonality Moreover on scaling

columns of X to equal lengths R119881is perfectly conditioned

with 1198881(R119881) = 10 In addition every model in 119872

119908clearly

conforms with its Reference in the sense that R119881is positive

definite as distinct from models in1198720to follow

Consider again models in 1198720as in (6) let C = (X1015840X minus

M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889

11 119889

119896119896) comprise the diagonal elements of

X1015840X

(i) 119881perpReference ModelThe uncentered VIF119906s in119872

0 defined

as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of

diagonal elements of X10158400X0 appear to have seen exclusive

usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1

Theorem 6 Statements that conventional VIF119906s quantify var-

iance inflation owing to nondiagonalX10158400X0are false for models

in1198720having X = 0

Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840

0X0 this usage is predicated on the false

postulate that X10158400X0can be diagonal for X = 0 Specifically

[1119899X] are linearly independent that is 11015840

119899X = 0 if and only

if X has been mean centered beforehand

To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the

price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590

2119899 and Var(sdot | X = 0) =

120590211988800

ge 1205902119899 from (6) their ratio is shown in [25] to be

1205812

1199060= 11989911988800

ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573

0| X = 0) = 120590

211988800

whereas Var( | X = 0) = 1205902119899 with 120572 = 120573

0+ 12057311198831+

sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find

the conventional VIF119906(1205730) to be useful for purposes to follow

Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579

0=

arccos[(1 minus 11205812u0)12

] as an informative gage on the extent of

6 Advances in Decision Sciences

this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity

Instead of conventional VIF119906s given the foregoing dis-

claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact

Definition 8 Given a model in 1198720with second-moment

matrixX10158400X0 the amendedReferencemodel for assessing119881perp-

orthogonality is R119881

= [ 119899 119899X1015840

119899X D] with D = Diag(119889

11 119889

119896119896)

as diagonal elements of X1015840X provided that R119881is positive

definite We identify a model to be 119881perp-orthogonal when

X10158400X0= R119881

As anticipated a prospective R119881

fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601

119894designates

the angle between (1119899X119894) 119894 = 1 119896

Lemma 9 Take R119881

as a prospective Reference for 119881perp-

orthogonality

(i) In order that R119881maybe positive definite it is necessary

that 119899X1015840Dminus1X lt 1 that is that 119899sum119896

119894=1(1198832

119894119889119894119894) lt 1

(ii) Equivalently it is necessary thatsum119896119894=1

cos2(120601119894) lt 1with

120601119894as the angle between (1

119899Xi) 119894 = 1 119896

(iii) The Reference variance for 1205730is Var(120573

0| X1015840X = D) =

[119899(1 minus 119899X1015840Dminus1X)]minus1

(iv) The Reference variances for slopes are given by

Var (120573119894| X1015840X = D) =

1

119889119894119894

[1 +1205741198832

119894

119889119894119894

] 1 le 119894 le 119896 (11)

where 120574 = 119899(1 minus 119899X1015840Dminus1X)

Proof The clockwise rule for determinants gives |R119881| =

|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0

The computation

cos (120601119894) =

11015840119899X119894

1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894

1003817100381710038171003817=

11015840119899X119894

radic119899119889119894119894

= radic1198991198832

119894

119889119894119894

(12)

in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1

119881

is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On

identifying a = b = X 120572 = 119899 and 119886lowast

119894= 119886119894119889119894119894 in Theorem

833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof

Corollary 10 For the case 119896 = 2 in order that R119881maybe

positive definite it is necessary that 1206011+ 1206012gt 90 deg

Proof Beginning with Lemma 9(ii) compute

cos (1206011+ 1206012)

= cos1206011cos1206012minus sin120601

1sin1206012

= cos1206011cos1206012minus[1minus(cos2120601

1+cos2120601

2)+cos2120601

1cos21206012]12

(13)

which is lt0 when 1 minus (cos21206011+ cos2120601

2) gt 0

Moreover the matrix R119881itself is intrinsically ill conditioned

owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896

Lemma 11 Let R119881= [ 119899 T1015840

T 119888I119896

] as in Definition 8 with T = 119899XD = 119888I

119896 and 119888 gt 0

(i) The ordered eigenvalues of R119881are [120582

1 119888 119888 120582

119901]

where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin

[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591

2= T1015840T

(ii) The roots are positive andR119881is positive definite if and

only if (119899119888 minus 1205912) gt 0

(iii) If R119881

is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591

2

Proof Eigenvalues are roots of the determinantal equation

Det [(119899 minus 120582) T1015840T (119888 minus 120582) I

119896

]

= 0 lArrrArr (119888 minus 120582)119896minus1

[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0

(14)

from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582

2minus (119899 + 119888)120582 + 119899119888 minus 120591

2= 0

to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591

2) and the greater

root is positive Conclusion (iii) follows directly to completeour proof

(119894119894) 119862perp Reference Model As noted the notion of 119862

perp-or-thogonality applies exclusively for models in 119872

0 Accord-

ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862

perp-orthogonality and to anticipate notation forsubsequent use we have the following

Definition 12 (i) For a model in 1198720with second-moment

matrix X10158400X0and inverse as in (6) let C

119908= (W minus M) and

identify an indicator vector G = [11989212 11989213 119892

1119896 11989223

1198922119896 119892

119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic

order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C

119908is zero

and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC

119908is (X1015840119894X119895minus119899119883119894119883119895)

that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B

119899X as the regressors centered to their means

Advances in Decision Sciences 7

(ii) In particular the Reference model in1198720for assessing

119862perp-orthogonality is R

119862= [ 119899 119899X

1015840

119899X W] such that C

119908= (W minus

M) and its inverse G from (6) are diagonal that is takingW = [119908

119894119895] such that 119908

119894119894= 119889119894119894 1 le 119894 le 119896 and 119908

119894119895=

119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this

case we identify the model to be 119862perp-orthogonal

Recall that conventional VIF119888s for [120573

1 1205732 120573

119896] are

ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872

0implies that

Z1015840Z is diagonal However the converse fails and instead isembodied in R

119862isin 1198720 In consequence conventional VIF

119888s

are deficient in applying only to slopes whereas the VF119888s

resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573

119896]

Definition 13 Designate VF119907(sdot) and VF

119888(sdot) as Variance Fac-

tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs

Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that

they are translation invariant as well To these ends we shiftthe columns of X=[X

1X2 X

119896] to a new origin through

X rarr X minus L = H where L = [11988811119899 11988821119899 119888

1198961119899] The

resulting model is

Y = (12057301119899+ L120573) +H120573 + 120598

lArrrArr Y = (1205730+ 120595) 1

119899+H120573 + 120598

(15)

thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888

119896120573119896)

Corresponding to X0= [1119899X] is U

0= [1119899H] and to X1015840

0X0

and its inverse are

U10158400U0= [

119899 11015840119899H

H10158401119899

H1015840H] (U10158400U0)minus1

= [11988700

b01

b101584001

G ] (16)

in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows

Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the

shifted version Y = (1205730+ 120595)1

119899+ H120573 + 120576 both in 119872

0 Then

the matrices G appearing in (6) and (16) are identical

Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of

(H1015840H minus 119899minus1H10158401

11989911015840119899H) = H1015840 (I

119899minus 119899minus1111989911015840119899)H

= H1015840B119899H = X1015840B

119899X

(17)

since B119899L = 0 to complete our proof

These facts in turn support subsequent claims that cen-tered VIF

119888s are translation and scale invariant for slopes

1015840

=

[1205731 1205732 120573

119896] apart from the intercept 120573

0

43 A Critique Again we distinguish VIF119888s and VIF

119906s from

centered and uncentered regressorsThe following commentsapply

(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840

0X0

orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862

perp are exclusive andhence work at crossed purposes

(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881

perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C

119908(119894119895) = X1015840

119894X119895minus 119899119883119894119883119895=

0 and X1015840119894X119895

= 0 Conversely if X indeed is 119881perp-

orthogonal so that X1015840X = D= Diag(11988911 119889

119896119896)

then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF

119888s are tenuous

(C3) Conventional VIF119906s based on uncentered X in 119872

0

with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840

0X0can be diagonal

(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868

= VIF119895(minus119868)

VIF119895 1 le 119895 le 119896 are studied in [27]

for assessing the impact of subsets on variance infla-tion Here VIF

119895is from the full data and VIF

119895(minus119868)on

deleting observations in the index set 119868 Similarly [28]proposes using VIF

(119894)on deleting the 119894th observation

In the present context these would gain substance onmodifying VF

119907and VF

119888accordingly

5 Case Study 1 Continued

51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872

0 119884119894= 1205730+12057311198831+

12057321198832+120598119894 of Section 22 the designX

0= [15X1X2] of order

(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in

expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840

0X0nor the

centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as

listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X

0 Indeed the only

opportunity for 119881perp-orthogonality here is between columns

[X1X2]

Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS

119895= x

119895minus P119895x1198952 119895 = 0 1 2 fur-

ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579

0between the constant vector

and the span of the regressor vectors is 1205790= 31751 deg as

anticipated in Remark 7

52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix

U (119909) = [

[

5 3 1

3 25 119909

1 119909 3

]

]

(18)

8 Advances in Decision Sciences

Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X

1198952 for 1205812

119906119895= VI F

119906(120573119895) for (1 minus 1120581

2

119906119895)12 and for 120579

119895 for 119895 = 0 1 2

119895 RS119895

X1198952

1205812

119895(1 minus 1120581

2

119906119895)12

120579119895(deg)

0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792

Invoking Definition 8 we seek a 119881perp-orthogonal Reference

with moment matrix U(0) having X10158401X2

= 0 that is 119863(0)This is found constructively on retaining [1

5 ∙X2] in X

0

but replacing X1by X01

= [1 05 05 1 0]1015840 as listed in

Table 2 giving the design 119863(0) with columns [15X01X2] as

ReferenceAccordingly R

119881= U(0) is ldquoas 119881

perp-orthogonal as it cangetrdquo given the lengths and sums of (X

1X2) as prescribed

in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]

1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =

234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances

Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]

minus1

= 09375

Var (1205731|X1015840X=D)=

1

11988911

[1+1205741198832

1

11988911

]=2

5[1+

072120574

5]=17500

Var (1205732|X1015840X=D)=

1

11988922

[1+1205741198832

2

11988922

]=1

3[1+

004120574

3]=04375

(19)

As Reference these combine with actual variances from119881 = [119881

119894119895] at (1) giving VF

119907for the original designX

0relative

to R119881 For example VF

119907(1205730) = 0722209375 = 07704 with

11988111

= 07222 from (1) and 09375 from (19) This contrastswith VIF

119906(1205730) = 36111 in (3) as in Section 22 and Table 2

it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881

perp-orthogonal R119881

namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]

of our basic design are listed for comparison in Table 2 where119909 = X1015840

1X2for the various designs The designs themselves are

exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]

from Table 2 and substituting each into the template X00

=

[15 ∙X2] The design 119863(2) was constructed but not listed

since its X10158400X0is not invertible Clearly these are actual

designs amenable to experimental deployment

53 119862perp-Orthogonality Continuing X0

= [15X1X2] and

invoking Definition 12(ii) we seek a 119862perp-orthogonal

Reference having the matrix C119908

as diagonal This is

identified as 119863(06) in Table 2 From this the matrix R119862and

its inverse are

R119862= [

[

5 3 1

3 25 06

1 06 3

]

]

Rminus1119862

= [

[

07286 minus08572 minus00714

minus08571 14286 00

minus00714 00 03571

]

]

(20)

The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-

tional VIF119888s for 120573

1and 120573

2only our VF

119888(1205730) here reflects

Variance Deflationwherein 1205730is estimated more precisely in

the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose

that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF

119888(1205731) =

12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)

from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862

perp-orthogonalAs in Remark 2 we next alter the basic designX

0at (1) on

shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X

1015840

0X0 andits inverse V are

U = [

[

5 42426 42426

42426 5 4

42426 4 5

]

]

V = [

[

1 minus04714 minus04714

minus04714 07778 minus02222

minus04714 minus02222 07778

]

]

(21)

giving conventional VIF119906s as [5 38889 38889] Against 119862perp-

orthogonality this gives the diagnostics

[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]

(22)

demonstrating in comparison with (4) that VF119888s apart from

1205730 are invariant under translation and scaling a consequence

of Lemma 14We further seek to compare variances for the shifted

and scaled (X1X2) against 119881perp-orthogonality as Reference

However from U = X10158400X0at (21) we determine that 119883

1=

084853 = 1198832and 5[(119883

2

15) + (119883

2

25)] = 14400 Since

the latter exceeds unity we infer from Lemma 9(i) that

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 3: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

Advances in Decision Sciences 3

intercept deflated in comparison with the feasible centeredreference Specifically 120573

0is estimated here with greater

efficiency VF119888(1205730) = 09913 in the initial design (1) despite

nonorthogonality of its centered regressors (ii) Variances areuniformly smaller in the model (1) than in its feasible uncen-tered reference from (5) thus exhibiting Variance Deflationdespite nonorthogonality of the uncentered regressors Afull explication of the anatomy of this study appears laterIn support we next examine technical details needed insubsequent developments

23 Types of Orthogonality The ongoing divergence betweencentered and uncentered diagnostics traces in part to mean-ing ascribed to orthogonality Specifically the orthogonalityof columns of X in 119872

119908refers unambiguously to the vector-

space concept X119894perp X119895 that is X1015840

119894X119895

= 0 as does thenotion of collinearity of regressors with the constant vectorin 1198720 We refer to this as 119881-orthogonality in short 119881perp In

contrast nonorthogonality in 1198720often refers instead to the

statistical concept of correlation among columns of X whenscaled and centered to their means We refer to its negationas 119862-orthogonality or 119862

perp Distinguishing between thesenotions is fundamental as confusion otherwise is evident Forexample it is asserted in [11 p125] that ldquothe simple correlationcoefficient 119903

119894119895does measure linear dependency between 119909

119894

and 119909119895in the datardquo

24 The Models 1198720 Consider 119872

0 Y = X

0120579 + 120598 X

0=

[1119899X] 1205791015840 = [120573

01205731015840] together with

X10158400X0= [

119899 119899X1015840

119899X X1015840X] (X1015840

0X0)minus1

= [11988800

c01

c101584001

G ] (6)

where 119899X1015840 = 11015840119899X 11988800

= [119899 minus 1198992X1015840(X1015840X)

minus1X]minus1 C =

(X1015840Xminus119899XX1015840) =X1015840B119899X andG = Cminus1 from block-partitioned

inverses In later sections we often will denote the mean-centeringmatrix 119899XX1015840 byM In particular the centered formX rarr Z = B

119899X arises exclusively in models with intercept

with or without reparametrizing (1205730120573) rarr (120572120573) in the

mean-centered regressors where 120572 = 1205730+12057311198831+ sdot sdot sdot + 120573

119896119883119896

Scaling Z to unit column lengths gives Z1015840Z in correlationform with unit diagonals

3 Historical Perspective

Our objective is an overdue revision of the tenets of varianceinflation in regression To provide context we next surveyextenuating issues from the literature Direct quotations areintended not to subvert stances taken by the cited authorsModels in 119872

0are at issue since as noted in [10] centered

diagnostics have no place in119872119908

31 Background Aspects of ill-conditioning span a consid-erable literature over many decades Regarding Y = X120573 +120598 scaling columns of X to equal lengths approximatelyminimizes the condition number 119888

2(X) [12 p120] based on

[13] Nonetheless 1198882(X) is cast in [9] as a blunt instrument for

ill-conditioning prompting the need for VIFs and other localconstructs Stewart [9] credits VIFs in concept to Daniel andin name to Marquardt

Ill-conditioning points beyond OLS in view of difficultiescited earlier Remedies proffered in [14 15] include trans-forming variables adding new data and deleting variable(s)after checking critical features of the reduced model Otherintended palliatives include Ridge and Partial Least Squaresas compared in [16] Principal Components regression Sur-rogate models as in [17] All are intended to return reducedstandard errors at the expense of bias Moreover Surrogatesolutions more closely resemble those from an orthogonalsystem than Ridge [18] Together the foregoing and otheroptions comprise a substantial literature as collateral to butapart from the present study

32 To Center Advocacy for centering includes the follow-ing

(i) VIFs often are defined categorically as the diagonalsof the inverse of the correlation matrix of scaled andcentered regressors see [4 9 11 19] for exampleThese are VIF

119888s widely adopted without justification

as default to the exclusion of VIF119906s

(ii) It is asserted [4] that ldquocentering removes the nones-sential ill-conditioning thus reducing the varianceinflation in the coefficient estimatesrdquo

(iii) Centering is advocated when predictor variables arefar removed from origins on the basic data scales [1011]

33 Not to Center Advocacy for uncentered diagnosticsincludes the following caveats from proponents of centering

(i) Uncentered data should be examined only if an esti-mate of the intercept is of interest [9 10 20]

(ii) ldquoIf the domain of prediction includes the full rangefrom the natural origin through the range of data thecollinearity diagnostics should not bemean centeredrdquo[10 p84]

Other issues against centering derive in part from numer-ical analysis and work by Belsley

(i) Belsley [1] identifies 1198882(X) for a system Y = X120573+120598 as

ldquothe potential relative change in the LS solution b thatcan result from a small relative change in the datardquo

(ii) These require structurally interpretable variables asldquoones whose numerical values and (relative) differ-ences derive meaning and interpretability fromknowledge of the structure of the underlying lsquorealityrsquobeing modeledrdquo [1 p75]

(iii) ldquoThere is no such thing as lsquononessentialrsquo ill-condi-tioningrdquo and ldquomean-centering can remove from thedata the information needed to assess conditioningcorrectlyrdquo [1 p74]

(iv) ldquoCollinearity with the intercept can quite generallycorrupt the estimates of all parameters in the model

4 Advances in Decision Sciences

whether or not the intercept is itself of interest andwhether or not the data have been centeredrdquo [21 p90]

(v) An example [22 p121] gives severely ill-conditioneddata perfectly conditioned in centered form ldquocenter-ing alters neitherrdquo inflated variances nor extremelysensitive parameter estimates in the basic data more-over ldquodiagnosing the conditioning of the centereddata (which are perfectly conditioned) would com-pletely overlook this situation whereas diagnosticsbased on the basic data would notrdquo

(vi) To continue from [22] ill-conditioning persists inthe propagation of disturbances in that ldquoa 1 percentrelative change in theX

119894rsquos results in over a 40 relative

change in the estimatesrdquo despite perfect conditioningin centered form and ldquoknowledge of the effect ofsmall relative changes in the centered data is notmeaningful for assessing the sensitivity of the basic LSproblemrdquo since relative changes and their meaningsin centered anduncentered data oftendiffermarkedly

(vii) Regarding choice of origin ldquo(1) the investigator mustbe able to pick an origin against which small relativechanges can be appropriately assessed and (2) it is thedata measured relative to this origin that are relevantto diagnosing the conditioning of the LS problemrdquo[22 p126]

Other desiderata pertain

(i) ldquoBecause rewriting the model (in standardized vari-ables) does not affect any of the implicit estimates ithas no effect on the amount of information containedin the datardquo [23 p76]

(ii) Consequences of ill-advised diagnostics can besevere Degraded numerical accuracy traces to nearcollinearity of regressors with the constant vectorIn short centering fails to prevent a loss in numer-ical accuracy centered diagnostics are unable todiscern these potential accuracy problems whereasuncentered diagnostics are seen to work well Twowidely used statistical packages SAS and SPSS-Xfail to detect this type of ill-conditioning throughuse of centered diagnostics and thus return highlyinaccurate coefficient estimates For further detailssee [3]

On balance formodels in1198720the jury is out regarding the

use of centered or uncentered diagnostics to include VIFsEven Belsley [1] (and elsewhere) concedes circumstanceswhere centering does achieve structurally interpretable mod-els Of note is that the foregoing listed citations to Belsleyapply strictly to condition numbers 119888

2(X) other purveyors of

ill-conditioning specifically VIFs are not treated explicitly

34 A Synthesis It bears notice that (i) the origin how-ever remote from the cluster of regressors is essential forprediction and (ii) the prediction variance is invariant toparametrizing in centered or uncentered forms Additionalremarks are codified next for subsequent referral

Remark 2 Typically 119884 represents response to input vari-ables 119883

1 1198832 119883

119896 In a controlled experiment levels

are determined beforehand by subject-matter considerationsextraneous to the experiment to include minimal valuesHowever remote the origin on the basic data scales it seemsinformative in such circumstances to identify the originwith these minima In such cases the intercept is often ofsingular interest since 120573

0is then the standard against which

changes in 119884 are to be gaged as regressors vary We adopt thisconvention in subsequent studies from the archival literature

Remark 3 In summary the divergence in views whetherto center or not appears to be that critical aspects of ill-conditioning known and widely accepted for models in119872

119908

have been expropriated over decades mindlessly and withoutverification to apply point-by-point for models in119872

0

4 The Structure of Orthogonality

This section develops the foundations for Reference modelscapturing orthogonalities of types 119881

perp and 119862perp Essential

collateral results are given in support as well

41 Collinearity Indices Stewart [9] reexamines ill-condi-tioning from the perspective of numerical analysis Detailsfollow where X is a generic matrix of regressors havingcolumns x

119895 1 le 119895 le 119901 and Xdagger = (X1015840X)

minus1X1015840 is thegeneralized inverse of note having xdagger

119895 1 le 119895 le 119901 as

its typical rows Each corresponding collinearity index 120581 isdefined in [9 p72] as

120581119895=10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817 1 le 119895 le 119901 (7)

constructed so as to be scale invariant Observe that xdagger1198952 is

found along the principal diagonal of [(Xdagger)(Xdagger)1015840] = (X1015840X)minus1

WhenX(119899times119896) inX0= [1119899X] is centered and scaled Section

3 of [9] shows that the centered collinearity indices 120581119888satisfy

1205812

119888119895= VIF

119888(120573119895) 1 le 119895 le 119896 In 119872

0with parameters (120573

0

120573) values corresponding to xdagger1198952 from Xdagger

0= (X10158400X0)minus1X10158400

lie along the principal diagonal of (X10158400X0)minus1 the uncentered

collinearity indices 120581119906now satisfy 120581

2

119906119895= x1198952sdot xdagger1198952

119895 =

0 1 119896 In particular since x0

= 1119899 we have 120581

2

1199060=

119899xdagger02 Moreover in119872

0it follows that the uncentered VIF

119906s

are squares of the collinearity indices that is VIF119906(120573119895) =

1205812

119906119895 119895 = 0 1 119896 Note the asymmetry that VIF

119888s exclude

the intercept in contrast to the inclusive VIF119906s That the

label Variance Inflation Factors for the latter is a misnomer iscovered in Remark 1 Nonetheless we continue the familiarnotation VIF

119906(120573119895) = 1205812

119906119895 119895 = 0 1 119896

Transcending the essential developments of [9] are con-nections between collinearity indices and angles between

Advances in Decision Sciences 5

subspaces To these ends choose a typical x119895in X0 and

rearrange X0as [x119895X[119895]] We next seek elements of

Q1015840119895(X10158400X0)minus1

Q119895= [

x1015840119895x119895

x1015840119895X[119895]

X1015840[119895]x119895X1015840[119895]X[119895]

]

minus1

(8)

as reordered by the permutation matrix Q119895 From the clock-

wise rule the (1 1) element of the inverse is

[x1015840119895(I119899minus P119895) x119895]minus1

= [x1015840119895x119895minus x1015840119895P119895x119895]minus1

=10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817

2 (9)

in succession for each 119895 = 0 1 119896 where P119895

=

X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]

is the projection operator onto the sub-space S

119901(X[119895]) sub R119899 spanned by the columns of X

[119895] These

in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly

for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-

conditioning as follows

Theorem 4 For models in 1198720let VIF

119906(120573119895) = 120581

2

119906119895 119895 = 0

1 119896 be conventional uncentered VIF119906s in terms of Stew-

artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows

(i) Angles between (x119895S119901(X[119895])) are given by 120579

119895=

arccos[(1minus11205812uj)12

] in succession for 119895 = 0 1 119896

(ii) In particular 1205790

= arccos[(1 minus 11205812u0)12

] quantifiesthe degree of collinearity between the regressors and theconstant vector

(iii) Similarly let B119899X = Z = [z

1 z

119896] be regressors

centered to their means rearrange as [z119895Z[119895]] and let

VIF119888(120573119895) = 120581

2

119888119895 119895 = 1 119896 be centered VIF

119888s in

terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z

119895S119901(Z[119895])) are given by

120579119895= arccos[(1 minus 11205812cj)

12] 1 le j le k

Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x

1198952

= P119895x1198952+

x119895minus P119895x1198952 where x

119895minus P119895x1198952

= RS119895is the residual sum

of squares from the projection It follows that the principalangle between (x

119895P119895x119895) is given by

cos (120579119895)=

x1015840119895P119895x119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895

10038171003817100381710038171003817

=

10038171003817100381710038171003817P119895x119895

1003817100381710038171003817100381710038171003817100381710038171003817x119895

10038171003817100381710038171003817

=radic1 minusRS119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817

2=radic1 minus

1

1205812119906119895

(10)

for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x

0P0x0) with x

0= 1119899and

P0= X(X1015840X)

minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z

119895P119895z119895) where P

119895

now is the projection operator onto the subspace S119901(Z[119895]) sub

R119899 spanned by the columns of Z[119895] to complete our proof

Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579

119895lt 300 deg

42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881

perp and 119862perp The keys

are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872

119908with moment matrix X1015840X

our opening paragraph prescribes as reference the modelR119881

= D = Diag(11988911 119889

119901119901) as diagonal elements of

X1015840X for assessing 119881perp-orthogonality Moreover on scaling

columns of X to equal lengths R119881is perfectly conditioned

with 1198881(R119881) = 10 In addition every model in 119872

119908clearly

conforms with its Reference in the sense that R119881is positive

definite as distinct from models in1198720to follow

Consider again models in 1198720as in (6) let C = (X1015840X minus

M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889

11 119889

119896119896) comprise the diagonal elements of

X1015840X

(i) 119881perpReference ModelThe uncentered VIF119906s in119872

0 defined

as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of

diagonal elements of X10158400X0 appear to have seen exclusive

usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1

Theorem 6 Statements that conventional VIF119906s quantify var-

iance inflation owing to nondiagonalX10158400X0are false for models

in1198720having X = 0

Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840

0X0 this usage is predicated on the false

postulate that X10158400X0can be diagonal for X = 0 Specifically

[1119899X] are linearly independent that is 11015840

119899X = 0 if and only

if X has been mean centered beforehand

To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the

price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590

2119899 and Var(sdot | X = 0) =

120590211988800

ge 1205902119899 from (6) their ratio is shown in [25] to be

1205812

1199060= 11989911988800

ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573

0| X = 0) = 120590

211988800

whereas Var( | X = 0) = 1205902119899 with 120572 = 120573

0+ 12057311198831+

sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find

the conventional VIF119906(1205730) to be useful for purposes to follow

Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579

0=

arccos[(1 minus 11205812u0)12

] as an informative gage on the extent of

6 Advances in Decision Sciences

this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity

Instead of conventional VIF119906s given the foregoing dis-

claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact

Definition 8 Given a model in 1198720with second-moment

matrixX10158400X0 the amendedReferencemodel for assessing119881perp-

orthogonality is R119881

= [ 119899 119899X1015840

119899X D] with D = Diag(119889

11 119889

119896119896)

as diagonal elements of X1015840X provided that R119881is positive

definite We identify a model to be 119881perp-orthogonal when

X10158400X0= R119881

As anticipated a prospective R119881

fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601

119894designates

the angle between (1119899X119894) 119894 = 1 119896

Lemma 9 Take R119881

as a prospective Reference for 119881perp-

orthogonality

(i) In order that R119881maybe positive definite it is necessary

that 119899X1015840Dminus1X lt 1 that is that 119899sum119896

119894=1(1198832

119894119889119894119894) lt 1

(ii) Equivalently it is necessary thatsum119896119894=1

cos2(120601119894) lt 1with

120601119894as the angle between (1

119899Xi) 119894 = 1 119896

(iii) The Reference variance for 1205730is Var(120573

0| X1015840X = D) =

[119899(1 minus 119899X1015840Dminus1X)]minus1

(iv) The Reference variances for slopes are given by

Var (120573119894| X1015840X = D) =

1

119889119894119894

[1 +1205741198832

119894

119889119894119894

] 1 le 119894 le 119896 (11)

where 120574 = 119899(1 minus 119899X1015840Dminus1X)

Proof The clockwise rule for determinants gives |R119881| =

|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0

The computation

cos (120601119894) =

11015840119899X119894

1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894

1003817100381710038171003817=

11015840119899X119894

radic119899119889119894119894

= radic1198991198832

119894

119889119894119894

(12)

in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1

119881

is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On

identifying a = b = X 120572 = 119899 and 119886lowast

119894= 119886119894119889119894119894 in Theorem

833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof

Corollary 10 For the case 119896 = 2 in order that R119881maybe

positive definite it is necessary that 1206011+ 1206012gt 90 deg

Proof Beginning with Lemma 9(ii) compute

cos (1206011+ 1206012)

= cos1206011cos1206012minus sin120601

1sin1206012

= cos1206011cos1206012minus[1minus(cos2120601

1+cos2120601

2)+cos2120601

1cos21206012]12

(13)

which is lt0 when 1 minus (cos21206011+ cos2120601

2) gt 0

Moreover the matrix R119881itself is intrinsically ill conditioned

owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896

Lemma 11 Let R119881= [ 119899 T1015840

T 119888I119896

] as in Definition 8 with T = 119899XD = 119888I

119896 and 119888 gt 0

(i) The ordered eigenvalues of R119881are [120582

1 119888 119888 120582

119901]

where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin

[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591

2= T1015840T

(ii) The roots are positive andR119881is positive definite if and

only if (119899119888 minus 1205912) gt 0

(iii) If R119881

is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591

2

Proof Eigenvalues are roots of the determinantal equation

Det [(119899 minus 120582) T1015840T (119888 minus 120582) I

119896

]

= 0 lArrrArr (119888 minus 120582)119896minus1

[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0

(14)

from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582

2minus (119899 + 119888)120582 + 119899119888 minus 120591

2= 0

to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591

2) and the greater

root is positive Conclusion (iii) follows directly to completeour proof

(119894119894) 119862perp Reference Model As noted the notion of 119862

perp-or-thogonality applies exclusively for models in 119872

0 Accord-

ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862

perp-orthogonality and to anticipate notation forsubsequent use we have the following

Definition 12 (i) For a model in 1198720with second-moment

matrix X10158400X0and inverse as in (6) let C

119908= (W minus M) and

identify an indicator vector G = [11989212 11989213 119892

1119896 11989223

1198922119896 119892

119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic

order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C

119908is zero

and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC

119908is (X1015840119894X119895minus119899119883119894119883119895)

that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B

119899X as the regressors centered to their means

Advances in Decision Sciences 7

(ii) In particular the Reference model in1198720for assessing

119862perp-orthogonality is R

119862= [ 119899 119899X

1015840

119899X W] such that C

119908= (W minus

M) and its inverse G from (6) are diagonal that is takingW = [119908

119894119895] such that 119908

119894119894= 119889119894119894 1 le 119894 le 119896 and 119908

119894119895=

119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this

case we identify the model to be 119862perp-orthogonal

Recall that conventional VIF119888s for [120573

1 1205732 120573

119896] are

ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872

0implies that

Z1015840Z is diagonal However the converse fails and instead isembodied in R

119862isin 1198720 In consequence conventional VIF

119888s

are deficient in applying only to slopes whereas the VF119888s

resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573

119896]

Definition 13 Designate VF119907(sdot) and VF

119888(sdot) as Variance Fac-

tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs

Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that

they are translation invariant as well To these ends we shiftthe columns of X=[X

1X2 X

119896] to a new origin through

X rarr X minus L = H where L = [11988811119899 11988821119899 119888

1198961119899] The

resulting model is

Y = (12057301119899+ L120573) +H120573 + 120598

lArrrArr Y = (1205730+ 120595) 1

119899+H120573 + 120598

(15)

thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888

119896120573119896)

Corresponding to X0= [1119899X] is U

0= [1119899H] and to X1015840

0X0

and its inverse are

U10158400U0= [

119899 11015840119899H

H10158401119899

H1015840H] (U10158400U0)minus1

= [11988700

b01

b101584001

G ] (16)

in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows

Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the

shifted version Y = (1205730+ 120595)1

119899+ H120573 + 120576 both in 119872

0 Then

the matrices G appearing in (6) and (16) are identical

Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of

(H1015840H minus 119899minus1H10158401

11989911015840119899H) = H1015840 (I

119899minus 119899minus1111989911015840119899)H

= H1015840B119899H = X1015840B

119899X

(17)

since B119899L = 0 to complete our proof

These facts in turn support subsequent claims that cen-tered VIF

119888s are translation and scale invariant for slopes

1015840

=

[1205731 1205732 120573

119896] apart from the intercept 120573

0

43 A Critique Again we distinguish VIF119888s and VIF

119906s from

centered and uncentered regressorsThe following commentsapply

(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840

0X0

orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862

perp are exclusive andhence work at crossed purposes

(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881

perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C

119908(119894119895) = X1015840

119894X119895minus 119899119883119894119883119895=

0 and X1015840119894X119895

= 0 Conversely if X indeed is 119881perp-

orthogonal so that X1015840X = D= Diag(11988911 119889

119896119896)

then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF

119888s are tenuous

(C3) Conventional VIF119906s based on uncentered X in 119872

0

with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840

0X0can be diagonal

(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868

= VIF119895(minus119868)

VIF119895 1 le 119895 le 119896 are studied in [27]

for assessing the impact of subsets on variance infla-tion Here VIF

119895is from the full data and VIF

119895(minus119868)on

deleting observations in the index set 119868 Similarly [28]proposes using VIF

(119894)on deleting the 119894th observation

In the present context these would gain substance onmodifying VF

119907and VF

119888accordingly

5 Case Study 1 Continued

51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872

0 119884119894= 1205730+12057311198831+

12057321198832+120598119894 of Section 22 the designX

0= [15X1X2] of order

(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in

expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840

0X0nor the

centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as

listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X

0 Indeed the only

opportunity for 119881perp-orthogonality here is between columns

[X1X2]

Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS

119895= x

119895minus P119895x1198952 119895 = 0 1 2 fur-

ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579

0between the constant vector

and the span of the regressor vectors is 1205790= 31751 deg as

anticipated in Remark 7

52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix

U (119909) = [

[

5 3 1

3 25 119909

1 119909 3

]

]

(18)

8 Advances in Decision Sciences

Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X

1198952 for 1205812

119906119895= VI F

119906(120573119895) for (1 minus 1120581

2

119906119895)12 and for 120579

119895 for 119895 = 0 1 2

119895 RS119895

X1198952

1205812

119895(1 minus 1120581

2

119906119895)12

120579119895(deg)

0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792

Invoking Definition 8 we seek a 119881perp-orthogonal Reference

with moment matrix U(0) having X10158401X2

= 0 that is 119863(0)This is found constructively on retaining [1

5 ∙X2] in X

0

but replacing X1by X01

= [1 05 05 1 0]1015840 as listed in

Table 2 giving the design 119863(0) with columns [15X01X2] as

ReferenceAccordingly R

119881= U(0) is ldquoas 119881

perp-orthogonal as it cangetrdquo given the lengths and sums of (X

1X2) as prescribed

in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]

1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =

234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances

Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]

minus1

= 09375

Var (1205731|X1015840X=D)=

1

11988911

[1+1205741198832

1

11988911

]=2

5[1+

072120574

5]=17500

Var (1205732|X1015840X=D)=

1

11988922

[1+1205741198832

2

11988922

]=1

3[1+

004120574

3]=04375

(19)

As Reference these combine with actual variances from119881 = [119881

119894119895] at (1) giving VF

119907for the original designX

0relative

to R119881 For example VF

119907(1205730) = 0722209375 = 07704 with

11988111

= 07222 from (1) and 09375 from (19) This contrastswith VIF

119906(1205730) = 36111 in (3) as in Section 22 and Table 2

it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881

perp-orthogonal R119881

namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]

of our basic design are listed for comparison in Table 2 where119909 = X1015840

1X2for the various designs The designs themselves are

exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]

from Table 2 and substituting each into the template X00

=

[15 ∙X2] The design 119863(2) was constructed but not listed

since its X10158400X0is not invertible Clearly these are actual

designs amenable to experimental deployment

53 119862perp-Orthogonality Continuing X0

= [15X1X2] and

invoking Definition 12(ii) we seek a 119862perp-orthogonal

Reference having the matrix C119908

as diagonal This is

identified as 119863(06) in Table 2 From this the matrix R119862and

its inverse are

R119862= [

[

5 3 1

3 25 06

1 06 3

]

]

Rminus1119862

= [

[

07286 minus08572 minus00714

minus08571 14286 00

minus00714 00 03571

]

]

(20)

The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-

tional VIF119888s for 120573

1and 120573

2only our VF

119888(1205730) here reflects

Variance Deflationwherein 1205730is estimated more precisely in

the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose

that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF

119888(1205731) =

12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)

from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862

perp-orthogonalAs in Remark 2 we next alter the basic designX

0at (1) on

shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X

1015840

0X0 andits inverse V are

U = [

[

5 42426 42426

42426 5 4

42426 4 5

]

]

V = [

[

1 minus04714 minus04714

minus04714 07778 minus02222

minus04714 minus02222 07778

]

]

(21)

giving conventional VIF119906s as [5 38889 38889] Against 119862perp-

orthogonality this gives the diagnostics

[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]

(22)

demonstrating in comparison with (4) that VF119888s apart from

1205730 are invariant under translation and scaling a consequence

of Lemma 14We further seek to compare variances for the shifted

and scaled (X1X2) against 119881perp-orthogonality as Reference

However from U = X10158400X0at (21) we determine that 119883

1=

084853 = 1198832and 5[(119883

2

15) + (119883

2

25)] = 14400 Since

the latter exceeds unity we infer from Lemma 9(i) that

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 4: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

4 Advances in Decision Sciences

whether or not the intercept is itself of interest andwhether or not the data have been centeredrdquo [21 p90]

(v) An example [22 p121] gives severely ill-conditioneddata perfectly conditioned in centered form ldquocenter-ing alters neitherrdquo inflated variances nor extremelysensitive parameter estimates in the basic data more-over ldquodiagnosing the conditioning of the centereddata (which are perfectly conditioned) would com-pletely overlook this situation whereas diagnosticsbased on the basic data would notrdquo

(vi) To continue from [22] ill-conditioning persists inthe propagation of disturbances in that ldquoa 1 percentrelative change in theX

119894rsquos results in over a 40 relative

change in the estimatesrdquo despite perfect conditioningin centered form and ldquoknowledge of the effect ofsmall relative changes in the centered data is notmeaningful for assessing the sensitivity of the basic LSproblemrdquo since relative changes and their meaningsin centered anduncentered data oftendiffermarkedly

(vii) Regarding choice of origin ldquo(1) the investigator mustbe able to pick an origin against which small relativechanges can be appropriately assessed and (2) it is thedata measured relative to this origin that are relevantto diagnosing the conditioning of the LS problemrdquo[22 p126]

Other desiderata pertain

(i) ldquoBecause rewriting the model (in standardized vari-ables) does not affect any of the implicit estimates ithas no effect on the amount of information containedin the datardquo [23 p76]

(ii) Consequences of ill-advised diagnostics can besevere Degraded numerical accuracy traces to nearcollinearity of regressors with the constant vectorIn short centering fails to prevent a loss in numer-ical accuracy centered diagnostics are unable todiscern these potential accuracy problems whereasuncentered diagnostics are seen to work well Twowidely used statistical packages SAS and SPSS-Xfail to detect this type of ill-conditioning throughuse of centered diagnostics and thus return highlyinaccurate coefficient estimates For further detailssee [3]

On balance formodels in1198720the jury is out regarding the

use of centered or uncentered diagnostics to include VIFsEven Belsley [1] (and elsewhere) concedes circumstanceswhere centering does achieve structurally interpretable mod-els Of note is that the foregoing listed citations to Belsleyapply strictly to condition numbers 119888

2(X) other purveyors of

ill-conditioning specifically VIFs are not treated explicitly

34 A Synthesis It bears notice that (i) the origin how-ever remote from the cluster of regressors is essential forprediction and (ii) the prediction variance is invariant toparametrizing in centered or uncentered forms Additionalremarks are codified next for subsequent referral

Remark 2 Typically 119884 represents response to input vari-ables 119883

1 1198832 119883

119896 In a controlled experiment levels

are determined beforehand by subject-matter considerationsextraneous to the experiment to include minimal valuesHowever remote the origin on the basic data scales it seemsinformative in such circumstances to identify the originwith these minima In such cases the intercept is often ofsingular interest since 120573

0is then the standard against which

changes in 119884 are to be gaged as regressors vary We adopt thisconvention in subsequent studies from the archival literature

Remark 3 In summary the divergence in views whetherto center or not appears to be that critical aspects of ill-conditioning known and widely accepted for models in119872

119908

have been expropriated over decades mindlessly and withoutverification to apply point-by-point for models in119872

0

4 The Structure of Orthogonality

This section develops the foundations for Reference modelscapturing orthogonalities of types 119881

perp and 119862perp Essential

collateral results are given in support as well

41 Collinearity Indices Stewart [9] reexamines ill-condi-tioning from the perspective of numerical analysis Detailsfollow where X is a generic matrix of regressors havingcolumns x

119895 1 le 119895 le 119901 and Xdagger = (X1015840X)

minus1X1015840 is thegeneralized inverse of note having xdagger

119895 1 le 119895 le 119901 as

its typical rows Each corresponding collinearity index 120581 isdefined in [9 p72] as

120581119895=10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817 1 le 119895 le 119901 (7)

constructed so as to be scale invariant Observe that xdagger1198952 is

found along the principal diagonal of [(Xdagger)(Xdagger)1015840] = (X1015840X)minus1

WhenX(119899times119896) inX0= [1119899X] is centered and scaled Section

3 of [9] shows that the centered collinearity indices 120581119888satisfy

1205812

119888119895= VIF

119888(120573119895) 1 le 119895 le 119896 In 119872

0with parameters (120573

0

120573) values corresponding to xdagger1198952 from Xdagger

0= (X10158400X0)minus1X10158400

lie along the principal diagonal of (X10158400X0)minus1 the uncentered

collinearity indices 120581119906now satisfy 120581

2

119906119895= x1198952sdot xdagger1198952

119895 =

0 1 119896 In particular since x0

= 1119899 we have 120581

2

1199060=

119899xdagger02 Moreover in119872

0it follows that the uncentered VIF

119906s

are squares of the collinearity indices that is VIF119906(120573119895) =

1205812

119906119895 119895 = 0 1 119896 Note the asymmetry that VIF

119888s exclude

the intercept in contrast to the inclusive VIF119906s That the

label Variance Inflation Factors for the latter is a misnomer iscovered in Remark 1 Nonetheless we continue the familiarnotation VIF

119906(120573119895) = 1205812

119906119895 119895 = 0 1 119896

Transcending the essential developments of [9] are con-nections between collinearity indices and angles between

Advances in Decision Sciences 5

subspaces To these ends choose a typical x119895in X0 and

rearrange X0as [x119895X[119895]] We next seek elements of

Q1015840119895(X10158400X0)minus1

Q119895= [

x1015840119895x119895

x1015840119895X[119895]

X1015840[119895]x119895X1015840[119895]X[119895]

]

minus1

(8)

as reordered by the permutation matrix Q119895 From the clock-

wise rule the (1 1) element of the inverse is

[x1015840119895(I119899minus P119895) x119895]minus1

= [x1015840119895x119895minus x1015840119895P119895x119895]minus1

=10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817

2 (9)

in succession for each 119895 = 0 1 119896 where P119895

=

X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]

is the projection operator onto the sub-space S

119901(X[119895]) sub R119899 spanned by the columns of X

[119895] These

in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly

for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-

conditioning as follows

Theorem 4 For models in 1198720let VIF

119906(120573119895) = 120581

2

119906119895 119895 = 0

1 119896 be conventional uncentered VIF119906s in terms of Stew-

artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows

(i) Angles between (x119895S119901(X[119895])) are given by 120579

119895=

arccos[(1minus11205812uj)12

] in succession for 119895 = 0 1 119896

(ii) In particular 1205790

= arccos[(1 minus 11205812u0)12

] quantifiesthe degree of collinearity between the regressors and theconstant vector

(iii) Similarly let B119899X = Z = [z

1 z

119896] be regressors

centered to their means rearrange as [z119895Z[119895]] and let

VIF119888(120573119895) = 120581

2

119888119895 119895 = 1 119896 be centered VIF

119888s in

terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z

119895S119901(Z[119895])) are given by

120579119895= arccos[(1 minus 11205812cj)

12] 1 le j le k

Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x

1198952

= P119895x1198952+

x119895minus P119895x1198952 where x

119895minus P119895x1198952

= RS119895is the residual sum

of squares from the projection It follows that the principalangle between (x

119895P119895x119895) is given by

cos (120579119895)=

x1015840119895P119895x119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895

10038171003817100381710038171003817

=

10038171003817100381710038171003817P119895x119895

1003817100381710038171003817100381710038171003817100381710038171003817x119895

10038171003817100381710038171003817

=radic1 minusRS119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817

2=radic1 minus

1

1205812119906119895

(10)

for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x

0P0x0) with x

0= 1119899and

P0= X(X1015840X)

minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z

119895P119895z119895) where P

119895

now is the projection operator onto the subspace S119901(Z[119895]) sub

R119899 spanned by the columns of Z[119895] to complete our proof

Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579

119895lt 300 deg

42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881

perp and 119862perp The keys

are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872

119908with moment matrix X1015840X

our opening paragraph prescribes as reference the modelR119881

= D = Diag(11988911 119889

119901119901) as diagonal elements of

X1015840X for assessing 119881perp-orthogonality Moreover on scaling

columns of X to equal lengths R119881is perfectly conditioned

with 1198881(R119881) = 10 In addition every model in 119872

119908clearly

conforms with its Reference in the sense that R119881is positive

definite as distinct from models in1198720to follow

Consider again models in 1198720as in (6) let C = (X1015840X minus

M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889

11 119889

119896119896) comprise the diagonal elements of

X1015840X

(i) 119881perpReference ModelThe uncentered VIF119906s in119872

0 defined

as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of

diagonal elements of X10158400X0 appear to have seen exclusive

usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1

Theorem 6 Statements that conventional VIF119906s quantify var-

iance inflation owing to nondiagonalX10158400X0are false for models

in1198720having X = 0

Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840

0X0 this usage is predicated on the false

postulate that X10158400X0can be diagonal for X = 0 Specifically

[1119899X] are linearly independent that is 11015840

119899X = 0 if and only

if X has been mean centered beforehand

To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the

price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590

2119899 and Var(sdot | X = 0) =

120590211988800

ge 1205902119899 from (6) their ratio is shown in [25] to be

1205812

1199060= 11989911988800

ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573

0| X = 0) = 120590

211988800

whereas Var( | X = 0) = 1205902119899 with 120572 = 120573

0+ 12057311198831+

sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find

the conventional VIF119906(1205730) to be useful for purposes to follow

Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579

0=

arccos[(1 minus 11205812u0)12

] as an informative gage on the extent of

6 Advances in Decision Sciences

this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity

Instead of conventional VIF119906s given the foregoing dis-

claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact

Definition 8 Given a model in 1198720with second-moment

matrixX10158400X0 the amendedReferencemodel for assessing119881perp-

orthogonality is R119881

= [ 119899 119899X1015840

119899X D] with D = Diag(119889

11 119889

119896119896)

as diagonal elements of X1015840X provided that R119881is positive

definite We identify a model to be 119881perp-orthogonal when

X10158400X0= R119881

As anticipated a prospective R119881

fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601

119894designates

the angle between (1119899X119894) 119894 = 1 119896

Lemma 9 Take R119881

as a prospective Reference for 119881perp-

orthogonality

(i) In order that R119881maybe positive definite it is necessary

that 119899X1015840Dminus1X lt 1 that is that 119899sum119896

119894=1(1198832

119894119889119894119894) lt 1

(ii) Equivalently it is necessary thatsum119896119894=1

cos2(120601119894) lt 1with

120601119894as the angle between (1

119899Xi) 119894 = 1 119896

(iii) The Reference variance for 1205730is Var(120573

0| X1015840X = D) =

[119899(1 minus 119899X1015840Dminus1X)]minus1

(iv) The Reference variances for slopes are given by

Var (120573119894| X1015840X = D) =

1

119889119894119894

[1 +1205741198832

119894

119889119894119894

] 1 le 119894 le 119896 (11)

where 120574 = 119899(1 minus 119899X1015840Dminus1X)

Proof The clockwise rule for determinants gives |R119881| =

|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0

The computation

cos (120601119894) =

11015840119899X119894

1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894

1003817100381710038171003817=

11015840119899X119894

radic119899119889119894119894

= radic1198991198832

119894

119889119894119894

(12)

in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1

119881

is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On

identifying a = b = X 120572 = 119899 and 119886lowast

119894= 119886119894119889119894119894 in Theorem

833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof

Corollary 10 For the case 119896 = 2 in order that R119881maybe

positive definite it is necessary that 1206011+ 1206012gt 90 deg

Proof Beginning with Lemma 9(ii) compute

cos (1206011+ 1206012)

= cos1206011cos1206012minus sin120601

1sin1206012

= cos1206011cos1206012minus[1minus(cos2120601

1+cos2120601

2)+cos2120601

1cos21206012]12

(13)

which is lt0 when 1 minus (cos21206011+ cos2120601

2) gt 0

Moreover the matrix R119881itself is intrinsically ill conditioned

owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896

Lemma 11 Let R119881= [ 119899 T1015840

T 119888I119896

] as in Definition 8 with T = 119899XD = 119888I

119896 and 119888 gt 0

(i) The ordered eigenvalues of R119881are [120582

1 119888 119888 120582

119901]

where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin

[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591

2= T1015840T

(ii) The roots are positive andR119881is positive definite if and

only if (119899119888 minus 1205912) gt 0

(iii) If R119881

is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591

2

Proof Eigenvalues are roots of the determinantal equation

Det [(119899 minus 120582) T1015840T (119888 minus 120582) I

119896

]

= 0 lArrrArr (119888 minus 120582)119896minus1

[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0

(14)

from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582

2minus (119899 + 119888)120582 + 119899119888 minus 120591

2= 0

to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591

2) and the greater

root is positive Conclusion (iii) follows directly to completeour proof

(119894119894) 119862perp Reference Model As noted the notion of 119862

perp-or-thogonality applies exclusively for models in 119872

0 Accord-

ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862

perp-orthogonality and to anticipate notation forsubsequent use we have the following

Definition 12 (i) For a model in 1198720with second-moment

matrix X10158400X0and inverse as in (6) let C

119908= (W minus M) and

identify an indicator vector G = [11989212 11989213 119892

1119896 11989223

1198922119896 119892

119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic

order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C

119908is zero

and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC

119908is (X1015840119894X119895minus119899119883119894119883119895)

that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B

119899X as the regressors centered to their means

Advances in Decision Sciences 7

(ii) In particular the Reference model in1198720for assessing

119862perp-orthogonality is R

119862= [ 119899 119899X

1015840

119899X W] such that C

119908= (W minus

M) and its inverse G from (6) are diagonal that is takingW = [119908

119894119895] such that 119908

119894119894= 119889119894119894 1 le 119894 le 119896 and 119908

119894119895=

119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this

case we identify the model to be 119862perp-orthogonal

Recall that conventional VIF119888s for [120573

1 1205732 120573

119896] are

ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872

0implies that

Z1015840Z is diagonal However the converse fails and instead isembodied in R

119862isin 1198720 In consequence conventional VIF

119888s

are deficient in applying only to slopes whereas the VF119888s

resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573

119896]

Definition 13 Designate VF119907(sdot) and VF

119888(sdot) as Variance Fac-

tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs

Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that

they are translation invariant as well To these ends we shiftthe columns of X=[X

1X2 X

119896] to a new origin through

X rarr X minus L = H where L = [11988811119899 11988821119899 119888

1198961119899] The

resulting model is

Y = (12057301119899+ L120573) +H120573 + 120598

lArrrArr Y = (1205730+ 120595) 1

119899+H120573 + 120598

(15)

thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888

119896120573119896)

Corresponding to X0= [1119899X] is U

0= [1119899H] and to X1015840

0X0

and its inverse are

U10158400U0= [

119899 11015840119899H

H10158401119899

H1015840H] (U10158400U0)minus1

= [11988700

b01

b101584001

G ] (16)

in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows

Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the

shifted version Y = (1205730+ 120595)1

119899+ H120573 + 120576 both in 119872

0 Then

the matrices G appearing in (6) and (16) are identical

Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of

(H1015840H minus 119899minus1H10158401

11989911015840119899H) = H1015840 (I

119899minus 119899minus1111989911015840119899)H

= H1015840B119899H = X1015840B

119899X

(17)

since B119899L = 0 to complete our proof

These facts in turn support subsequent claims that cen-tered VIF

119888s are translation and scale invariant for slopes

1015840

=

[1205731 1205732 120573

119896] apart from the intercept 120573

0

43 A Critique Again we distinguish VIF119888s and VIF

119906s from

centered and uncentered regressorsThe following commentsapply

(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840

0X0

orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862

perp are exclusive andhence work at crossed purposes

(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881

perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C

119908(119894119895) = X1015840

119894X119895minus 119899119883119894119883119895=

0 and X1015840119894X119895

= 0 Conversely if X indeed is 119881perp-

orthogonal so that X1015840X = D= Diag(11988911 119889

119896119896)

then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF

119888s are tenuous

(C3) Conventional VIF119906s based on uncentered X in 119872

0

with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840

0X0can be diagonal

(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868

= VIF119895(minus119868)

VIF119895 1 le 119895 le 119896 are studied in [27]

for assessing the impact of subsets on variance infla-tion Here VIF

119895is from the full data and VIF

119895(minus119868)on

deleting observations in the index set 119868 Similarly [28]proposes using VIF

(119894)on deleting the 119894th observation

In the present context these would gain substance onmodifying VF

119907and VF

119888accordingly

5 Case Study 1 Continued

51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872

0 119884119894= 1205730+12057311198831+

12057321198832+120598119894 of Section 22 the designX

0= [15X1X2] of order

(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in

expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840

0X0nor the

centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as

listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X

0 Indeed the only

opportunity for 119881perp-orthogonality here is between columns

[X1X2]

Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS

119895= x

119895minus P119895x1198952 119895 = 0 1 2 fur-

ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579

0between the constant vector

and the span of the regressor vectors is 1205790= 31751 deg as

anticipated in Remark 7

52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix

U (119909) = [

[

5 3 1

3 25 119909

1 119909 3

]

]

(18)

8 Advances in Decision Sciences

Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X

1198952 for 1205812

119906119895= VI F

119906(120573119895) for (1 minus 1120581

2

119906119895)12 and for 120579

119895 for 119895 = 0 1 2

119895 RS119895

X1198952

1205812

119895(1 minus 1120581

2

119906119895)12

120579119895(deg)

0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792

Invoking Definition 8 we seek a 119881perp-orthogonal Reference

with moment matrix U(0) having X10158401X2

= 0 that is 119863(0)This is found constructively on retaining [1

5 ∙X2] in X

0

but replacing X1by X01

= [1 05 05 1 0]1015840 as listed in

Table 2 giving the design 119863(0) with columns [15X01X2] as

ReferenceAccordingly R

119881= U(0) is ldquoas 119881

perp-orthogonal as it cangetrdquo given the lengths and sums of (X

1X2) as prescribed

in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]

1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =

234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances

Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]

minus1

= 09375

Var (1205731|X1015840X=D)=

1

11988911

[1+1205741198832

1

11988911

]=2

5[1+

072120574

5]=17500

Var (1205732|X1015840X=D)=

1

11988922

[1+1205741198832

2

11988922

]=1

3[1+

004120574

3]=04375

(19)

As Reference these combine with actual variances from119881 = [119881

119894119895] at (1) giving VF

119907for the original designX

0relative

to R119881 For example VF

119907(1205730) = 0722209375 = 07704 with

11988111

= 07222 from (1) and 09375 from (19) This contrastswith VIF

119906(1205730) = 36111 in (3) as in Section 22 and Table 2

it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881

perp-orthogonal R119881

namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]

of our basic design are listed for comparison in Table 2 where119909 = X1015840

1X2for the various designs The designs themselves are

exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]

from Table 2 and substituting each into the template X00

=

[15 ∙X2] The design 119863(2) was constructed but not listed

since its X10158400X0is not invertible Clearly these are actual

designs amenable to experimental deployment

53 119862perp-Orthogonality Continuing X0

= [15X1X2] and

invoking Definition 12(ii) we seek a 119862perp-orthogonal

Reference having the matrix C119908

as diagonal This is

identified as 119863(06) in Table 2 From this the matrix R119862and

its inverse are

R119862= [

[

5 3 1

3 25 06

1 06 3

]

]

Rminus1119862

= [

[

07286 minus08572 minus00714

minus08571 14286 00

minus00714 00 03571

]

]

(20)

The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-

tional VIF119888s for 120573

1and 120573

2only our VF

119888(1205730) here reflects

Variance Deflationwherein 1205730is estimated more precisely in

the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose

that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF

119888(1205731) =

12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)

from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862

perp-orthogonalAs in Remark 2 we next alter the basic designX

0at (1) on

shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X

1015840

0X0 andits inverse V are

U = [

[

5 42426 42426

42426 5 4

42426 4 5

]

]

V = [

[

1 minus04714 minus04714

minus04714 07778 minus02222

minus04714 minus02222 07778

]

]

(21)

giving conventional VIF119906s as [5 38889 38889] Against 119862perp-

orthogonality this gives the diagnostics

[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]

(22)

demonstrating in comparison with (4) that VF119888s apart from

1205730 are invariant under translation and scaling a consequence

of Lemma 14We further seek to compare variances for the shifted

and scaled (X1X2) against 119881perp-orthogonality as Reference

However from U = X10158400X0at (21) we determine that 119883

1=

084853 = 1198832and 5[(119883

2

15) + (119883

2

25)] = 14400 Since

the latter exceeds unity we infer from Lemma 9(i) that

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 5: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

Advances in Decision Sciences 5

subspaces To these ends choose a typical x119895in X0 and

rearrange X0as [x119895X[119895]] We next seek elements of

Q1015840119895(X10158400X0)minus1

Q119895= [

x1015840119895x119895

x1015840119895X[119895]

X1015840[119895]x119895X1015840[119895]X[119895]

]

minus1

(8)

as reordered by the permutation matrix Q119895 From the clock-

wise rule the (1 1) element of the inverse is

[x1015840119895(I119899minus P119895) x119895]minus1

= [x1015840119895x119895minus x1015840119895P119895x119895]minus1

=10038171003817100381710038171003817xdagger119895

10038171003817100381710038171003817

2 (9)

in succession for each 119895 = 0 1 119896 where P119895

=

X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]

is the projection operator onto the sub-space S

119901(X[119895]) sub R119899 spanned by the columns of X

[119895] These

in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly

for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-

conditioning as follows

Theorem 4 For models in 1198720let VIF

119906(120573119895) = 120581

2

119906119895 119895 = 0

1 119896 be conventional uncentered VIF119906s in terms of Stew-

artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows

(i) Angles between (x119895S119901(X[119895])) are given by 120579

119895=

arccos[(1minus11205812uj)12

] in succession for 119895 = 0 1 119896

(ii) In particular 1205790

= arccos[(1 minus 11205812u0)12

] quantifiesthe degree of collinearity between the regressors and theconstant vector

(iii) Similarly let B119899X = Z = [z

1 z

119896] be regressors

centered to their means rearrange as [z119895Z[119895]] and let

VIF119888(120573119895) = 120581

2

119888119895 119895 = 1 119896 be centered VIF

119888s in

terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z

119895S119901(Z[119895])) are given by

120579119895= arccos[(1 minus 11205812cj)

12] 1 le j le k

Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x

1198952

= P119895x1198952+

x119895minus P119895x1198952 where x

119895minus P119895x1198952

= RS119895is the residual sum

of squares from the projection It follows that the principalangle between (x

119895P119895x119895) is given by

cos (120579119895)=

x1015840119895P119895x119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895

10038171003817100381710038171003817

=

10038171003817100381710038171003817P119895x119895

1003817100381710038171003817100381710038171003817100381710038171003817x119895

10038171003817100381710038171003817

=radic1 minusRS119895

10038171003817100381710038171003817x119895

10038171003817100381710038171003817

2=radic1 minus

1

1205812119906119895

(10)

for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x

0P0x0) with x

0= 1119899and

P0= X(X1015840X)

minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z

119895P119895z119895) where P

119895

now is the projection operator onto the subspace S119901(Z[119895]) sub

R119899 spanned by the columns of Z[119895] to complete our proof

Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579

119895lt 300 deg

42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881

perp and 119862perp The keys

are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872

119908with moment matrix X1015840X

our opening paragraph prescribes as reference the modelR119881

= D = Diag(11988911 119889

119901119901) as diagonal elements of

X1015840X for assessing 119881perp-orthogonality Moreover on scaling

columns of X to equal lengths R119881is perfectly conditioned

with 1198881(R119881) = 10 In addition every model in 119872

119908clearly

conforms with its Reference in the sense that R119881is positive

definite as distinct from models in1198720to follow

Consider again models in 1198720as in (6) let C = (X1015840X minus

M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889

11 119889

119896119896) comprise the diagonal elements of

X1015840X

(i) 119881perpReference ModelThe uncentered VIF119906s in119872

0 defined

as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of

diagonal elements of X10158400X0 appear to have seen exclusive

usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1

Theorem 6 Statements that conventional VIF119906s quantify var-

iance inflation owing to nondiagonalX10158400X0are false for models

in1198720having X = 0

Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840

0X0 this usage is predicated on the false

postulate that X10158400X0can be diagonal for X = 0 Specifically

[1119899X] are linearly independent that is 11015840

119899X = 0 if and only

if X has been mean centered beforehand

To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the

price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590

2119899 and Var(sdot | X = 0) =

120590211988800

ge 1205902119899 from (6) their ratio is shown in [25] to be

1205812

1199060= 11989911988800

ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573

0| X = 0) = 120590

211988800

whereas Var( | X = 0) = 1205902119899 with 120572 = 120573

0+ 12057311198831+

sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find

the conventional VIF119906(1205730) to be useful for purposes to follow

Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579

0=

arccos[(1 minus 11205812u0)12

] as an informative gage on the extent of

6 Advances in Decision Sciences

this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity

Instead of conventional VIF119906s given the foregoing dis-

claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact

Definition 8 Given a model in 1198720with second-moment

matrixX10158400X0 the amendedReferencemodel for assessing119881perp-

orthogonality is R119881

= [ 119899 119899X1015840

119899X D] with D = Diag(119889

11 119889

119896119896)

as diagonal elements of X1015840X provided that R119881is positive

definite We identify a model to be 119881perp-orthogonal when

X10158400X0= R119881

As anticipated a prospective R119881

fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601

119894designates

the angle between (1119899X119894) 119894 = 1 119896

Lemma 9 Take R119881

as a prospective Reference for 119881perp-

orthogonality

(i) In order that R119881maybe positive definite it is necessary

that 119899X1015840Dminus1X lt 1 that is that 119899sum119896

119894=1(1198832

119894119889119894119894) lt 1

(ii) Equivalently it is necessary thatsum119896119894=1

cos2(120601119894) lt 1with

120601119894as the angle between (1

119899Xi) 119894 = 1 119896

(iii) The Reference variance for 1205730is Var(120573

0| X1015840X = D) =

[119899(1 minus 119899X1015840Dminus1X)]minus1

(iv) The Reference variances for slopes are given by

Var (120573119894| X1015840X = D) =

1

119889119894119894

[1 +1205741198832

119894

119889119894119894

] 1 le 119894 le 119896 (11)

where 120574 = 119899(1 minus 119899X1015840Dminus1X)

Proof The clockwise rule for determinants gives |R119881| =

|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0

The computation

cos (120601119894) =

11015840119899X119894

1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894

1003817100381710038171003817=

11015840119899X119894

radic119899119889119894119894

= radic1198991198832

119894

119889119894119894

(12)

in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1

119881

is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On

identifying a = b = X 120572 = 119899 and 119886lowast

119894= 119886119894119889119894119894 in Theorem

833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof

Corollary 10 For the case 119896 = 2 in order that R119881maybe

positive definite it is necessary that 1206011+ 1206012gt 90 deg

Proof Beginning with Lemma 9(ii) compute

cos (1206011+ 1206012)

= cos1206011cos1206012minus sin120601

1sin1206012

= cos1206011cos1206012minus[1minus(cos2120601

1+cos2120601

2)+cos2120601

1cos21206012]12

(13)

which is lt0 when 1 minus (cos21206011+ cos2120601

2) gt 0

Moreover the matrix R119881itself is intrinsically ill conditioned

owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896

Lemma 11 Let R119881= [ 119899 T1015840

T 119888I119896

] as in Definition 8 with T = 119899XD = 119888I

119896 and 119888 gt 0

(i) The ordered eigenvalues of R119881are [120582

1 119888 119888 120582

119901]

where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin

[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591

2= T1015840T

(ii) The roots are positive andR119881is positive definite if and

only if (119899119888 minus 1205912) gt 0

(iii) If R119881

is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591

2

Proof Eigenvalues are roots of the determinantal equation

Det [(119899 minus 120582) T1015840T (119888 minus 120582) I

119896

]

= 0 lArrrArr (119888 minus 120582)119896minus1

[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0

(14)

from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582

2minus (119899 + 119888)120582 + 119899119888 minus 120591

2= 0

to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591

2) and the greater

root is positive Conclusion (iii) follows directly to completeour proof

(119894119894) 119862perp Reference Model As noted the notion of 119862

perp-or-thogonality applies exclusively for models in 119872

0 Accord-

ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862

perp-orthogonality and to anticipate notation forsubsequent use we have the following

Definition 12 (i) For a model in 1198720with second-moment

matrix X10158400X0and inverse as in (6) let C

119908= (W minus M) and

identify an indicator vector G = [11989212 11989213 119892

1119896 11989223

1198922119896 119892

119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic

order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C

119908is zero

and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC

119908is (X1015840119894X119895minus119899119883119894119883119895)

that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B

119899X as the regressors centered to their means

Advances in Decision Sciences 7

(ii) In particular the Reference model in1198720for assessing

119862perp-orthogonality is R

119862= [ 119899 119899X

1015840

119899X W] such that C

119908= (W minus

M) and its inverse G from (6) are diagonal that is takingW = [119908

119894119895] such that 119908

119894119894= 119889119894119894 1 le 119894 le 119896 and 119908

119894119895=

119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this

case we identify the model to be 119862perp-orthogonal

Recall that conventional VIF119888s for [120573

1 1205732 120573

119896] are

ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872

0implies that

Z1015840Z is diagonal However the converse fails and instead isembodied in R

119862isin 1198720 In consequence conventional VIF

119888s

are deficient in applying only to slopes whereas the VF119888s

resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573

119896]

Definition 13 Designate VF119907(sdot) and VF

119888(sdot) as Variance Fac-

tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs

Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that

they are translation invariant as well To these ends we shiftthe columns of X=[X

1X2 X

119896] to a new origin through

X rarr X minus L = H where L = [11988811119899 11988821119899 119888

1198961119899] The

resulting model is

Y = (12057301119899+ L120573) +H120573 + 120598

lArrrArr Y = (1205730+ 120595) 1

119899+H120573 + 120598

(15)

thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888

119896120573119896)

Corresponding to X0= [1119899X] is U

0= [1119899H] and to X1015840

0X0

and its inverse are

U10158400U0= [

119899 11015840119899H

H10158401119899

H1015840H] (U10158400U0)minus1

= [11988700

b01

b101584001

G ] (16)

in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows

Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the

shifted version Y = (1205730+ 120595)1

119899+ H120573 + 120576 both in 119872

0 Then

the matrices G appearing in (6) and (16) are identical

Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of

(H1015840H minus 119899minus1H10158401

11989911015840119899H) = H1015840 (I

119899minus 119899minus1111989911015840119899)H

= H1015840B119899H = X1015840B

119899X

(17)

since B119899L = 0 to complete our proof

These facts in turn support subsequent claims that cen-tered VIF

119888s are translation and scale invariant for slopes

1015840

=

[1205731 1205732 120573

119896] apart from the intercept 120573

0

43 A Critique Again we distinguish VIF119888s and VIF

119906s from

centered and uncentered regressorsThe following commentsapply

(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840

0X0

orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862

perp are exclusive andhence work at crossed purposes

(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881

perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C

119908(119894119895) = X1015840

119894X119895minus 119899119883119894119883119895=

0 and X1015840119894X119895

= 0 Conversely if X indeed is 119881perp-

orthogonal so that X1015840X = D= Diag(11988911 119889

119896119896)

then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF

119888s are tenuous

(C3) Conventional VIF119906s based on uncentered X in 119872

0

with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840

0X0can be diagonal

(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868

= VIF119895(minus119868)

VIF119895 1 le 119895 le 119896 are studied in [27]

for assessing the impact of subsets on variance infla-tion Here VIF

119895is from the full data and VIF

119895(minus119868)on

deleting observations in the index set 119868 Similarly [28]proposes using VIF

(119894)on deleting the 119894th observation

In the present context these would gain substance onmodifying VF

119907and VF

119888accordingly

5 Case Study 1 Continued

51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872

0 119884119894= 1205730+12057311198831+

12057321198832+120598119894 of Section 22 the designX

0= [15X1X2] of order

(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in

expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840

0X0nor the

centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as

listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X

0 Indeed the only

opportunity for 119881perp-orthogonality here is between columns

[X1X2]

Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS

119895= x

119895minus P119895x1198952 119895 = 0 1 2 fur-

ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579

0between the constant vector

and the span of the regressor vectors is 1205790= 31751 deg as

anticipated in Remark 7

52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix

U (119909) = [

[

5 3 1

3 25 119909

1 119909 3

]

]

(18)

8 Advances in Decision Sciences

Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X

1198952 for 1205812

119906119895= VI F

119906(120573119895) for (1 minus 1120581

2

119906119895)12 and for 120579

119895 for 119895 = 0 1 2

119895 RS119895

X1198952

1205812

119895(1 minus 1120581

2

119906119895)12

120579119895(deg)

0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792

Invoking Definition 8 we seek a 119881perp-orthogonal Reference

with moment matrix U(0) having X10158401X2

= 0 that is 119863(0)This is found constructively on retaining [1

5 ∙X2] in X

0

but replacing X1by X01

= [1 05 05 1 0]1015840 as listed in

Table 2 giving the design 119863(0) with columns [15X01X2] as

ReferenceAccordingly R

119881= U(0) is ldquoas 119881

perp-orthogonal as it cangetrdquo given the lengths and sums of (X

1X2) as prescribed

in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]

1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =

234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances

Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]

minus1

= 09375

Var (1205731|X1015840X=D)=

1

11988911

[1+1205741198832

1

11988911

]=2

5[1+

072120574

5]=17500

Var (1205732|X1015840X=D)=

1

11988922

[1+1205741198832

2

11988922

]=1

3[1+

004120574

3]=04375

(19)

As Reference these combine with actual variances from119881 = [119881

119894119895] at (1) giving VF

119907for the original designX

0relative

to R119881 For example VF

119907(1205730) = 0722209375 = 07704 with

11988111

= 07222 from (1) and 09375 from (19) This contrastswith VIF

119906(1205730) = 36111 in (3) as in Section 22 and Table 2

it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881

perp-orthogonal R119881

namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]

of our basic design are listed for comparison in Table 2 where119909 = X1015840

1X2for the various designs The designs themselves are

exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]

from Table 2 and substituting each into the template X00

=

[15 ∙X2] The design 119863(2) was constructed but not listed

since its X10158400X0is not invertible Clearly these are actual

designs amenable to experimental deployment

53 119862perp-Orthogonality Continuing X0

= [15X1X2] and

invoking Definition 12(ii) we seek a 119862perp-orthogonal

Reference having the matrix C119908

as diagonal This is

identified as 119863(06) in Table 2 From this the matrix R119862and

its inverse are

R119862= [

[

5 3 1

3 25 06

1 06 3

]

]

Rminus1119862

= [

[

07286 minus08572 minus00714

minus08571 14286 00

minus00714 00 03571

]

]

(20)

The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-

tional VIF119888s for 120573

1and 120573

2only our VF

119888(1205730) here reflects

Variance Deflationwherein 1205730is estimated more precisely in

the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose

that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF

119888(1205731) =

12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)

from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862

perp-orthogonalAs in Remark 2 we next alter the basic designX

0at (1) on

shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X

1015840

0X0 andits inverse V are

U = [

[

5 42426 42426

42426 5 4

42426 4 5

]

]

V = [

[

1 minus04714 minus04714

minus04714 07778 minus02222

minus04714 minus02222 07778

]

]

(21)

giving conventional VIF119906s as [5 38889 38889] Against 119862perp-

orthogonality this gives the diagnostics

[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]

(22)

demonstrating in comparison with (4) that VF119888s apart from

1205730 are invariant under translation and scaling a consequence

of Lemma 14We further seek to compare variances for the shifted

and scaled (X1X2) against 119881perp-orthogonality as Reference

However from U = X10158400X0at (21) we determine that 119883

1=

084853 = 1198832and 5[(119883

2

15) + (119883

2

25)] = 14400 Since

the latter exceeds unity we infer from Lemma 9(i) that

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 6: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

6 Advances in Decision Sciences

this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity

Instead of conventional VIF119906s given the foregoing dis-

claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact

Definition 8 Given a model in 1198720with second-moment

matrixX10158400X0 the amendedReferencemodel for assessing119881perp-

orthogonality is R119881

= [ 119899 119899X1015840

119899X D] with D = Diag(119889

11 119889

119896119896)

as diagonal elements of X1015840X provided that R119881is positive

definite We identify a model to be 119881perp-orthogonal when

X10158400X0= R119881

As anticipated a prospective R119881

fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601

119894designates

the angle between (1119899X119894) 119894 = 1 119896

Lemma 9 Take R119881

as a prospective Reference for 119881perp-

orthogonality

(i) In order that R119881maybe positive definite it is necessary

that 119899X1015840Dminus1X lt 1 that is that 119899sum119896

119894=1(1198832

119894119889119894119894) lt 1

(ii) Equivalently it is necessary thatsum119896119894=1

cos2(120601119894) lt 1with

120601119894as the angle between (1

119899Xi) 119894 = 1 119896

(iii) The Reference variance for 1205730is Var(120573

0| X1015840X = D) =

[119899(1 minus 119899X1015840Dminus1X)]minus1

(iv) The Reference variances for slopes are given by

Var (120573119894| X1015840X = D) =

1

119889119894119894

[1 +1205741198832

119894

119889119894119894

] 1 le 119894 le 119896 (11)

where 120574 = 119899(1 minus 119899X1015840Dminus1X)

Proof The clockwise rule for determinants gives |R119881| =

|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0

The computation

cos (120601119894) =

11015840119899X119894

1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894

1003817100381710038171003817=

11015840119899X119894

radic119899119889119894119894

= radic1198991198832

119894

119889119894119894

(12)

in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1

119881

is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On

identifying a = b = X 120572 = 119899 and 119886lowast

119894= 119886119894119889119894119894 in Theorem

833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof

Corollary 10 For the case 119896 = 2 in order that R119881maybe

positive definite it is necessary that 1206011+ 1206012gt 90 deg

Proof Beginning with Lemma 9(ii) compute

cos (1206011+ 1206012)

= cos1206011cos1206012minus sin120601

1sin1206012

= cos1206011cos1206012minus[1minus(cos2120601

1+cos2120601

2)+cos2120601

1cos21206012]12

(13)

which is lt0 when 1 minus (cos21206011+ cos2120601

2) gt 0

Moreover the matrix R119881itself is intrinsically ill conditioned

owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896

Lemma 11 Let R119881= [ 119899 T1015840

T 119888I119896

] as in Definition 8 with T = 119899XD = 119888I

119896 and 119888 gt 0

(i) The ordered eigenvalues of R119881are [120582

1 119888 119888 120582

119901]

where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin

[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591

2= T1015840T

(ii) The roots are positive andR119881is positive definite if and

only if (119899119888 minus 1205912) gt 0

(iii) If R119881

is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591

2

Proof Eigenvalues are roots of the determinantal equation

Det [(119899 minus 120582) T1015840T (119888 minus 120582) I

119896

]

= 0 lArrrArr (119888 minus 120582)119896minus1

[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0

(14)

from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582

2minus (119899 + 119888)120582 + 119899119888 minus 120591

2= 0

to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591

2) and the greater

root is positive Conclusion (iii) follows directly to completeour proof

(119894119894) 119862perp Reference Model As noted the notion of 119862

perp-or-thogonality applies exclusively for models in 119872

0 Accord-

ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862

perp-orthogonality and to anticipate notation forsubsequent use we have the following

Definition 12 (i) For a model in 1198720with second-moment

matrix X10158400X0and inverse as in (6) let C

119908= (W minus M) and

identify an indicator vector G = [11989212 11989213 119892

1119896 11989223

1198922119896 119892

119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic

order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C

119908is zero

and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC

119908is (X1015840119894X119895minus119899119883119894119883119895)

that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B

119899X as the regressors centered to their means

Advances in Decision Sciences 7

(ii) In particular the Reference model in1198720for assessing

119862perp-orthogonality is R

119862= [ 119899 119899X

1015840

119899X W] such that C

119908= (W minus

M) and its inverse G from (6) are diagonal that is takingW = [119908

119894119895] such that 119908

119894119894= 119889119894119894 1 le 119894 le 119896 and 119908

119894119895=

119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this

case we identify the model to be 119862perp-orthogonal

Recall that conventional VIF119888s for [120573

1 1205732 120573

119896] are

ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872

0implies that

Z1015840Z is diagonal However the converse fails and instead isembodied in R

119862isin 1198720 In consequence conventional VIF

119888s

are deficient in applying only to slopes whereas the VF119888s

resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573

119896]

Definition 13 Designate VF119907(sdot) and VF

119888(sdot) as Variance Fac-

tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs

Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that

they are translation invariant as well To these ends we shiftthe columns of X=[X

1X2 X

119896] to a new origin through

X rarr X minus L = H where L = [11988811119899 11988821119899 119888

1198961119899] The

resulting model is

Y = (12057301119899+ L120573) +H120573 + 120598

lArrrArr Y = (1205730+ 120595) 1

119899+H120573 + 120598

(15)

thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888

119896120573119896)

Corresponding to X0= [1119899X] is U

0= [1119899H] and to X1015840

0X0

and its inverse are

U10158400U0= [

119899 11015840119899H

H10158401119899

H1015840H] (U10158400U0)minus1

= [11988700

b01

b101584001

G ] (16)

in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows

Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the

shifted version Y = (1205730+ 120595)1

119899+ H120573 + 120576 both in 119872

0 Then

the matrices G appearing in (6) and (16) are identical

Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of

(H1015840H minus 119899minus1H10158401

11989911015840119899H) = H1015840 (I

119899minus 119899minus1111989911015840119899)H

= H1015840B119899H = X1015840B

119899X

(17)

since B119899L = 0 to complete our proof

These facts in turn support subsequent claims that cen-tered VIF

119888s are translation and scale invariant for slopes

1015840

=

[1205731 1205732 120573

119896] apart from the intercept 120573

0

43 A Critique Again we distinguish VIF119888s and VIF

119906s from

centered and uncentered regressorsThe following commentsapply

(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840

0X0

orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862

perp are exclusive andhence work at crossed purposes

(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881

perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C

119908(119894119895) = X1015840

119894X119895minus 119899119883119894119883119895=

0 and X1015840119894X119895

= 0 Conversely if X indeed is 119881perp-

orthogonal so that X1015840X = D= Diag(11988911 119889

119896119896)

then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF

119888s are tenuous

(C3) Conventional VIF119906s based on uncentered X in 119872

0

with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840

0X0can be diagonal

(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868

= VIF119895(minus119868)

VIF119895 1 le 119895 le 119896 are studied in [27]

for assessing the impact of subsets on variance infla-tion Here VIF

119895is from the full data and VIF

119895(minus119868)on

deleting observations in the index set 119868 Similarly [28]proposes using VIF

(119894)on deleting the 119894th observation

In the present context these would gain substance onmodifying VF

119907and VF

119888accordingly

5 Case Study 1 Continued

51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872

0 119884119894= 1205730+12057311198831+

12057321198832+120598119894 of Section 22 the designX

0= [15X1X2] of order

(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in

expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840

0X0nor the

centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as

listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X

0 Indeed the only

opportunity for 119881perp-orthogonality here is between columns

[X1X2]

Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS

119895= x

119895minus P119895x1198952 119895 = 0 1 2 fur-

ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579

0between the constant vector

and the span of the regressor vectors is 1205790= 31751 deg as

anticipated in Remark 7

52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix

U (119909) = [

[

5 3 1

3 25 119909

1 119909 3

]

]

(18)

8 Advances in Decision Sciences

Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X

1198952 for 1205812

119906119895= VI F

119906(120573119895) for (1 minus 1120581

2

119906119895)12 and for 120579

119895 for 119895 = 0 1 2

119895 RS119895

X1198952

1205812

119895(1 minus 1120581

2

119906119895)12

120579119895(deg)

0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792

Invoking Definition 8 we seek a 119881perp-orthogonal Reference

with moment matrix U(0) having X10158401X2

= 0 that is 119863(0)This is found constructively on retaining [1

5 ∙X2] in X

0

but replacing X1by X01

= [1 05 05 1 0]1015840 as listed in

Table 2 giving the design 119863(0) with columns [15X01X2] as

ReferenceAccordingly R

119881= U(0) is ldquoas 119881

perp-orthogonal as it cangetrdquo given the lengths and sums of (X

1X2) as prescribed

in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]

1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =

234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances

Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]

minus1

= 09375

Var (1205731|X1015840X=D)=

1

11988911

[1+1205741198832

1

11988911

]=2

5[1+

072120574

5]=17500

Var (1205732|X1015840X=D)=

1

11988922

[1+1205741198832

2

11988922

]=1

3[1+

004120574

3]=04375

(19)

As Reference these combine with actual variances from119881 = [119881

119894119895] at (1) giving VF

119907for the original designX

0relative

to R119881 For example VF

119907(1205730) = 0722209375 = 07704 with

11988111

= 07222 from (1) and 09375 from (19) This contrastswith VIF

119906(1205730) = 36111 in (3) as in Section 22 and Table 2

it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881

perp-orthogonal R119881

namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]

of our basic design are listed for comparison in Table 2 where119909 = X1015840

1X2for the various designs The designs themselves are

exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]

from Table 2 and substituting each into the template X00

=

[15 ∙X2] The design 119863(2) was constructed but not listed

since its X10158400X0is not invertible Clearly these are actual

designs amenable to experimental deployment

53 119862perp-Orthogonality Continuing X0

= [15X1X2] and

invoking Definition 12(ii) we seek a 119862perp-orthogonal

Reference having the matrix C119908

as diagonal This is

identified as 119863(06) in Table 2 From this the matrix R119862and

its inverse are

R119862= [

[

5 3 1

3 25 06

1 06 3

]

]

Rminus1119862

= [

[

07286 minus08572 minus00714

minus08571 14286 00

minus00714 00 03571

]

]

(20)

The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-

tional VIF119888s for 120573

1and 120573

2only our VF

119888(1205730) here reflects

Variance Deflationwherein 1205730is estimated more precisely in

the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose

that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF

119888(1205731) =

12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)

from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862

perp-orthogonalAs in Remark 2 we next alter the basic designX

0at (1) on

shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X

1015840

0X0 andits inverse V are

U = [

[

5 42426 42426

42426 5 4

42426 4 5

]

]

V = [

[

1 minus04714 minus04714

minus04714 07778 minus02222

minus04714 minus02222 07778

]

]

(21)

giving conventional VIF119906s as [5 38889 38889] Against 119862perp-

orthogonality this gives the diagnostics

[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]

(22)

demonstrating in comparison with (4) that VF119888s apart from

1205730 are invariant under translation and scaling a consequence

of Lemma 14We further seek to compare variances for the shifted

and scaled (X1X2) against 119881perp-orthogonality as Reference

However from U = X10158400X0at (21) we determine that 119883

1=

084853 = 1198832and 5[(119883

2

15) + (119883

2

25)] = 14400 Since

the latter exceeds unity we infer from Lemma 9(i) that

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 7: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

Advances in Decision Sciences 7

(ii) In particular the Reference model in1198720for assessing

119862perp-orthogonality is R

119862= [ 119899 119899X

1015840

119899X W] such that C

119908= (W minus

M) and its inverse G from (6) are diagonal that is takingW = [119908

119894119895] such that 119908

119894119894= 119889119894119894 1 le 119894 le 119896 and 119908

119894119895=

119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this

case we identify the model to be 119862perp-orthogonal

Recall that conventional VIF119888s for [120573

1 1205732 120573

119896] are

ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872

0implies that

Z1015840Z is diagonal However the converse fails and instead isembodied in R

119862isin 1198720 In consequence conventional VIF

119888s

are deficient in applying only to slopes whereas the VF119888s

resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573

119896]

Definition 13 Designate VF119907(sdot) and VF

119888(sdot) as Variance Fac-

tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs

Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that

they are translation invariant as well To these ends we shiftthe columns of X=[X

1X2 X

119896] to a new origin through

X rarr X minus L = H where L = [11988811119899 11988821119899 119888

1198961119899] The

resulting model is

Y = (12057301119899+ L120573) +H120573 + 120598

lArrrArr Y = (1205730+ 120595) 1

119899+H120573 + 120598

(15)

thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888

119896120573119896)

Corresponding to X0= [1119899X] is U

0= [1119899H] and to X1015840

0X0

and its inverse are

U10158400U0= [

119899 11015840119899H

H10158401119899

H1015840H] (U10158400U0)minus1

= [11988700

b01

b101584001

G ] (16)

in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows

Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the

shifted version Y = (1205730+ 120595)1

119899+ H120573 + 120576 both in 119872

0 Then

the matrices G appearing in (6) and (16) are identical

Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of

(H1015840H minus 119899minus1H10158401

11989911015840119899H) = H1015840 (I

119899minus 119899minus1111989911015840119899)H

= H1015840B119899H = X1015840B

119899X

(17)

since B119899L = 0 to complete our proof

These facts in turn support subsequent claims that cen-tered VIF

119888s are translation and scale invariant for slopes

1015840

=

[1205731 1205732 120573

119896] apart from the intercept 120573

0

43 A Critique Again we distinguish VIF119888s and VIF

119906s from

centered and uncentered regressorsThe following commentsapply

(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840

0X0

orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862

perp are exclusive andhence work at crossed purposes

(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881

perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C

119908(119894119895) = X1015840

119894X119895minus 119899119883119894119883119895=

0 and X1015840119894X119895

= 0 Conversely if X indeed is 119881perp-

orthogonal so that X1015840X = D= Diag(11988911 119889

119896119896)

then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF

119888s are tenuous

(C3) Conventional VIF119906s based on uncentered X in 119872

0

with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840

0X0can be diagonal

(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868

= VIF119895(minus119868)

VIF119895 1 le 119895 le 119896 are studied in [27]

for assessing the impact of subsets on variance infla-tion Here VIF

119895is from the full data and VIF

119895(minus119868)on

deleting observations in the index set 119868 Similarly [28]proposes using VIF

(119894)on deleting the 119894th observation

In the present context these would gain substance onmodifying VF

119907and VF

119888accordingly

5 Case Study 1 Continued

51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872

0 119884119894= 1205730+12057311198831+

12057321198832+120598119894 of Section 22 the designX

0= [15X1X2] of order

(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in

expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840

0X0nor the

centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as

listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X

0 Indeed the only

opportunity for 119881perp-orthogonality here is between columns

[X1X2]

Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS

119895= x

119895minus P119895x1198952 119895 = 0 1 2 fur-

ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579

0between the constant vector

and the span of the regressor vectors is 1205790= 31751 deg as

anticipated in Remark 7

52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix

U (119909) = [

[

5 3 1

3 25 119909

1 119909 3

]

]

(18)

8 Advances in Decision Sciences

Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X

1198952 for 1205812

119906119895= VI F

119906(120573119895) for (1 minus 1120581

2

119906119895)12 and for 120579

119895 for 119895 = 0 1 2

119895 RS119895

X1198952

1205812

119895(1 minus 1120581

2

119906119895)12

120579119895(deg)

0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792

Invoking Definition 8 we seek a 119881perp-orthogonal Reference

with moment matrix U(0) having X10158401X2

= 0 that is 119863(0)This is found constructively on retaining [1

5 ∙X2] in X

0

but replacing X1by X01

= [1 05 05 1 0]1015840 as listed in

Table 2 giving the design 119863(0) with columns [15X01X2] as

ReferenceAccordingly R

119881= U(0) is ldquoas 119881

perp-orthogonal as it cangetrdquo given the lengths and sums of (X

1X2) as prescribed

in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]

1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =

234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances

Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]

minus1

= 09375

Var (1205731|X1015840X=D)=

1

11988911

[1+1205741198832

1

11988911

]=2

5[1+

072120574

5]=17500

Var (1205732|X1015840X=D)=

1

11988922

[1+1205741198832

2

11988922

]=1

3[1+

004120574

3]=04375

(19)

As Reference these combine with actual variances from119881 = [119881

119894119895] at (1) giving VF

119907for the original designX

0relative

to R119881 For example VF

119907(1205730) = 0722209375 = 07704 with

11988111

= 07222 from (1) and 09375 from (19) This contrastswith VIF

119906(1205730) = 36111 in (3) as in Section 22 and Table 2

it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881

perp-orthogonal R119881

namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]

of our basic design are listed for comparison in Table 2 where119909 = X1015840

1X2for the various designs The designs themselves are

exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]

from Table 2 and substituting each into the template X00

=

[15 ∙X2] The design 119863(2) was constructed but not listed

since its X10158400X0is not invertible Clearly these are actual

designs amenable to experimental deployment

53 119862perp-Orthogonality Continuing X0

= [15X1X2] and

invoking Definition 12(ii) we seek a 119862perp-orthogonal

Reference having the matrix C119908

as diagonal This is

identified as 119863(06) in Table 2 From this the matrix R119862and

its inverse are

R119862= [

[

5 3 1

3 25 06

1 06 3

]

]

Rminus1119862

= [

[

07286 minus08572 minus00714

minus08571 14286 00

minus00714 00 03571

]

]

(20)

The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-

tional VIF119888s for 120573

1and 120573

2only our VF

119888(1205730) here reflects

Variance Deflationwherein 1205730is estimated more precisely in

the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose

that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF

119888(1205731) =

12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)

from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862

perp-orthogonalAs in Remark 2 we next alter the basic designX

0at (1) on

shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X

1015840

0X0 andits inverse V are

U = [

[

5 42426 42426

42426 5 4

42426 4 5

]

]

V = [

[

1 minus04714 minus04714

minus04714 07778 minus02222

minus04714 minus02222 07778

]

]

(21)

giving conventional VIF119906s as [5 38889 38889] Against 119862perp-

orthogonality this gives the diagnostics

[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]

(22)

demonstrating in comparison with (4) that VF119888s apart from

1205730 are invariant under translation and scaling a consequence

of Lemma 14We further seek to compare variances for the shifted

and scaled (X1X2) against 119881perp-orthogonality as Reference

However from U = X10158400X0at (21) we determine that 119883

1=

084853 = 1198832and 5[(119883

2

15) + (119883

2

25)] = 14400 Since

the latter exceeds unity we infer from Lemma 9(i) that

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 8: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

8 Advances in Decision Sciences

Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X

1198952 for 1205812

119906119895= VI F

119906(120573119895) for (1 minus 1120581

2

119906119895)12 and for 120579

119895 for 119895 = 0 1 2

119895 RS119895

X1198952

1205812

119895(1 minus 1120581

2

119906119895)12

120579119895(deg)

0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792

Invoking Definition 8 we seek a 119881perp-orthogonal Reference

with moment matrix U(0) having X10158401X2

= 0 that is 119863(0)This is found constructively on retaining [1

5 ∙X2] in X

0

but replacing X1by X01

= [1 05 05 1 0]1015840 as listed in

Table 2 giving the design 119863(0) with columns [15X01X2] as

ReferenceAccordingly R

119881= U(0) is ldquoas 119881

perp-orthogonal as it cangetrdquo given the lengths and sums of (X

1X2) as prescribed

in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]

1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =

234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances

Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]

minus1

= 09375

Var (1205731|X1015840X=D)=

1

11988911

[1+1205741198832

1

11988911

]=2

5[1+

072120574

5]=17500

Var (1205732|X1015840X=D)=

1

11988922

[1+1205741198832

2

11988922

]=1

3[1+

004120574

3]=04375

(19)

As Reference these combine with actual variances from119881 = [119881

119894119895] at (1) giving VF

119907for the original designX

0relative

to R119881 For example VF

119907(1205730) = 0722209375 = 07704 with

11988111

= 07222 from (1) and 09375 from (19) This contrastswith VIF

119906(1205730) = 36111 in (3) as in Section 22 and Table 2

it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881

perp-orthogonal R119881

namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]

of our basic design are listed for comparison in Table 2 where119909 = X1015840

1X2for the various designs The designs themselves are

exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]

from Table 2 and substituting each into the template X00

=

[15 ∙X2] The design 119863(2) was constructed but not listed

since its X10158400X0is not invertible Clearly these are actual

designs amenable to experimental deployment

53 119862perp-Orthogonality Continuing X0

= [15X1X2] and

invoking Definition 12(ii) we seek a 119862perp-orthogonal

Reference having the matrix C119908

as diagonal This is

identified as 119863(06) in Table 2 From this the matrix R119862and

its inverse are

R119862= [

[

5 3 1

3 25 06

1 06 3

]

]

Rminus1119862

= [

[

07286 minus08572 minus00714

minus08571 14286 00

minus00714 00 03571

]

]

(20)

The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-

tional VIF119888s for 120573

1and 120573

2only our VF

119888(1205730) here reflects

Variance Deflationwherein 1205730is estimated more precisely in

the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose

that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF

119888(1205731) =

12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)

from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862

perp-orthogonalAs in Remark 2 we next alter the basic designX

0at (1) on

shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X

1015840

0X0 andits inverse V are

U = [

[

5 42426 42426

42426 5 4

42426 4 5

]

]

V = [

[

1 minus04714 minus04714

minus04714 07778 minus02222

minus04714 minus02222 07778

]

]

(21)

giving conventional VIF119906s as [5 38889 38889] Against 119862perp-

orthogonality this gives the diagnostics

[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]

(22)

demonstrating in comparison with (4) that VF119888s apart from

1205730 are invariant under translation and scaling a consequence

of Lemma 14We further seek to compare variances for the shifted

and scaled (X1X2) against 119881perp-orthogonality as Reference

However from U = X10158400X0at (21) we determine that 119883

1=

084853 = 1198832and 5[(119883

2

15) + (119883

2

25)] = 14400 Since

the latter exceeds unity we infer from Lemma 9(i) that

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 9: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

Advances in Decision Sciences 9

Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)

and corresponding variances Var (1205730)Var (120573

1)Var (120573

2)

Design 11988311

11988312

11988313

11988314

11988315

Var (1205730) Var (120573

1) Var (120573

2)

119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087

119881perp-orthogonality is incompatible with this configuration

of the data so that the VF119907s are undefined Equivalently

on evaluating 1206011

= 1206012

= 2290 deg and 1206011+ 1206012

=

4580 lt 90 deg Corollary 10 asserts that R119881is not positive

definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors

54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX

0andX1015840

0X0in

1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is

VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF

119888s for

such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862

perp orthogonality in connection with Table 2(C1) For models in 119872

0having X = 0 we reiterate that

the uncentered VIF119906s at expression (3) namely

[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840

0X0

cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF

119907s

at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the

119881perp-nonorthogonal array119863(1)

(C2) For 119881perp-orthogonality the claim P1 thus is false by

counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840

1X2= 1 relative to

the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573

0(119909)] Var[120573

1(119909)]

Var[1205732(119909)] in Table 2 are all seen to decrease from

119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881

perp-orthogonal 119863(0) to the 119881

perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature

(C4) For 119862perp-orthogonal designs the claim P2 is false by

counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862

perp-nonorthogonal design as demonstrated by the VF

119888s

for 1205730at (4) and (22) Similar trends are confirmed

elsewhere as seen subsequently

(C5) VFs for 1205730are critical in prediction where prediction

variances necessarily depend on Var(1205730) especially

for predicting near the origin of the system of coor-dinates

(C6) Dissonance between 119881perp and 119862

perp is seen in Table 2where 119863(06) as 119862

perp-orthogonal with X10158401X2

= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0

(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872

0 and

they serve to illuminate the contributing structures

6 Orthogonal and Linked Arrays

A genuine 119881perp-orthogonal array was generated as eigenvec-

tors E = [E1E2 E

8] from a positive definite (8 times 8)

matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X

1X2X3] as the

second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872

0to be analyzed In addition linked vectors

(Z1Z2Z3) were constructed as Z

1= 119888(09E

1+ 01E

2)

Z2

= 119888(09E2+ 01E

3) and Z

3= 119888(09E

3+ 01E

4) where

119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation

61 The Model 1198720 We consider in turn the orthogonal and

the linked series

611 Orthogonal Data Matrices X10158400X0and (X1015840

0X0)minus1 for the

orthogonal data under model1198720are listed in Table 4 where

variances occupy diagonals of (X10158400X0)minus1 The conventional

uncentered VIF119906s are [120296 102144 112256 105888]

Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579

0= 65748 deg as

in Theorem 4(ii) Moreover the angle between X1and the

span of [1119899X2X3] namely 120579

1= 81670 deg is not 90 deg

because of collinearity with the constant vector despite themutual orthogonality of [X

1X2X3] Observe here thatX1015840

0X0

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 10: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

10 Advances in Decision Sciences

Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)

Orthogonal (X1X2X3) Linked (Z

1Z2Z3)

18

X1

X2

X3

18

Z1

Z2

Z3

1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567

Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872

0

X10158400X0

(X10158400X0)minus1

8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324

already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus

the VF119907s are all unity

In view of dissonance between 119881perp and 119862

perp orthogonalitythe sums of squares and products for the mean-centered Xare

X1015840B119899X = [

[

78572 minus03411 minus02365

minus03411 71847 minus05652

minus02365 minus05652 76082

]

]

(23)

reflecting nonnegligible negative dependencies to distin-guish the 119881

perp-orthogonal X from the 119862perp-nonorthogonal

matrix Z = B119899X of deviations

To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862

perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840

0X0)minus1 in Table 4 to those

of Rminus1119862

in Table 5 are

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [10168 10032 10082 10070]

(24)

reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]

612 Linked Data For the Linked Data under model 1198720

the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered

VIF119906s are now [120320 103408 113760 105480] These

again fail to gage variance inflation since Z10158400Z0cannot be

diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579

0= 65735

degrees

Remark 15 Observe that the choice for R119862may be posed as

seeking R119862

= U(119909) such that Rminus1119862(2 3) = 00 then solving

numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as

X10158401X2minus 35 = 0 so that X1015840

1X2= 06 in R

119862at expression (20)

To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus

110889)12 = 02857 and 120579 = 73398 deg as the angle between

the vectors [X1X2] of (1) when centered to their means For

slopes in 1198720 [9] shows that VIF

119888(120573119895) lt VIF

119906(120573119895) 1 le

119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF

119906(1205733) = 03889(13) = 11667 and for

VIF119888(1205733) = 03889(128) = 10889 but denominators are

reciprocals of X10158402X2= 3 and sum

5

119895=1(1198832119895

minus 1198832)2= 28

(119894) 119881perp Reference Model To rectify this malapropism we seek

in Definition 8 a model R119881as Reference This is found on

setting all off-diagonal elements of Z10158400Z0to zero excluding

the first row and column Its inverse Rminus1119881

is listed in Table 6The VF

119907s are ratios of diagonal elements of (Z1015840

0Z0)minus1 on the

left to diagonal elements of Rminus1119881

on the right namely

[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]

= [09697 09991 09936 09943]

(25)

Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the

119881perp-nonorthogonal model Z1015840

0Z0than in the 119881

perp-orthogonalmodel R

119881of Definition 8 As in Section 54 this serves again

to refute the tenet that ill-conditioning espouses inflated

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 11: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

Advances in Decision Sciences 11

Table 5 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the orthogonal data X0 under model1198720

R119862

Rminus1119862

8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314

Table 6 Matrices (Z10158400Z0)minus1 and Rminus1

119881for checking 119881

perp-orthogonality for the linked data under model1198720

(Z10158400Z0)minus1 Rminus1

119881

01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326

variances for models in1198720 that is that VFs necessarily equal

or exceed unity(119894119894) 119862

perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R

119862such that C

119908= (W minus

M) is diagonal The required R119862

and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1

119862on the right

in Table 7 their ratios give the centered VF119888s as

[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]

= [09920 10049 10047 10030]

(26)

We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment

As in Remark 15 the reference matrix R119862is constrained

by the first and second moments from X10158400X0and has free

parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries

The constraints for 119862perp-orthogonality are Rminus1

119862(2 3) =

Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system of

equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here

62 The Model 119872119908 For the Linked Data take 119884

119894= 12057311198851198941+

12057321198851198942

+ 12057331198851198943

+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598

where the expected response at 1198851

= 1198852

= 1198853

= 0

is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840

0Z0(4 times 4)

from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios

of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF

119906s

namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF

119906sThroughout Section 6 these

comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios

7 Case Study Body Fat Data

71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X

1 triceps skinfold thickness X

2 thigh circumfer-

ence and X3 mid-arm circumference under119872

0 119884119894= 1205730+

12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported

the condition number of X10158400X0is 1039931 and the VIF

119906s are

[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X

119894 = 50 119894 = 1 2 3 the

resulting condition number of the revisedX10158400X0is 2 9696518

and the uncenteredVIF119906s are as before from invariance under

scalingAgainst complaints that regressors are remote from their

natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X

119894 = 50 119894 = 1 2 3

This is motivated on grounds that centered diagnostics apartfrom VIF

119888(1205730) are invariant under shifts and scalings from

Lemma 14 In addition under the new origin [0 0 0] 1205730

assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]

Taking these shifted and scaled vectors as columnsof X = [X

1X2X3] in the revised X

0= [1

119899X]

values for X10158400X0

and its inverse are listed in Table 9The condition number of X1015840

0X0

is now 1136969 andits VIF

119906s are [6775617998742782174484] Additionally

from Theorem 4(ii) we recover the angle 1205790= 22592 deg to

quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for

X is

R = [

[

10 00847 08781

00847 10 01424

08781 01424 10

]

]

(27)

indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor

119862perp-orthogonal since neither the lower right (3 times 3) block of

X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 12: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

12 Advances in Decision Sciences

Table 7 Matrices R119862and Rminus1

119862for checking 119862

perp-orthogonality in the linked data under model1198720

R119862

Rminus1119862

8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314

Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908

(Z1015840Z)minus1 Cminus1

01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123

(119894) 119881perp Reference Model To compare this with a 119881

perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881

perp-orthogonalityso that the VF

119907s are undefined Comparisons with 119862

perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering

matrix is

119899XX1015840 = [

[

188889 189402 187499

189402 189916 188007

187499 188007 186118

]

]

(28)

To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840

0X0by corresponding off-

diagonal elements of 119899XX1015840 The result is the matrix R119862and

its inverse as given in Table 10 where diagonal elements ofRminus1119862

are the Reference variances The VF119888s found as ratios

of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1

119862

are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case

72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12

To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a

followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8

Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892

12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]

[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are

listed in Table 12 To fix ideas the Reference R119862for G =

[1 0 0] that is for the constraint R119862(2 3) = X1015840

0X0(2 3) =

194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840

0X0)minus1 relative to those of

Rminus1119862

in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]

In short R119862is ldquoas 119862

perp-orthogonal as it could berdquo given theconstraint Other VF

119888s in Table 12 proceed similarly For all

cases in Table 12 excluding G = [0 1 1] 1205730is estimated

with greater efficiency in the original model than any of thepostulated Reference models

As in Remark 15 the reference matrix R119862

for G =

[1 0 0] is constrained by X10158401X2

= 194533 to be retainedwhereas X1015840

1X3

= 1199091and X1015840

2X3

= 1199092are to be determined

from Rminus1119862(2 4) = Rminus1

119862(3 4) = 0 Although this system

can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here

Recall 1198831 triceps skinfold thickness 119883

2 thigh cir-

cumference and 1198833 mid-arm circumference and from

(27) that (1198831 1198833) are strongly linked Invoking the cutoff

rule [24] for VIFs in excess of 40 it is seen for G =

[0 0 0] in Table 12 that VF119888(1205731) and VF

119888(1205733) are excessive

where [X1X2X3] are forced to be mutually uncorrelated

as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883

1 1198833) are postulated to be

uncorrelated however incrediblyOn the other hand VF

119888s for G isin [0 1 0] [1 1 0]

[0 1 1] in Table 12 are likewise comparable where (X1X3)

are allowed to remain linked at X10158401X3= 242362 Here the

VF119888s reflect negligible changes in efficiency of the estimates

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 13: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

Advances in Decision Sciences 13

Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872

0centered to minima and scaled to equal lengths

X10158400X0

(X10158400X0)minus1

20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979

Table 10 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]

R119862

Rminus1119862

20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565

in comparison with the originalX10158400X0 where [X

1X2X3] are

all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X

1X2)

and (X2X3) were pairwise uncorrelated

Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840

1X2and

X10158401X3are retained at their initial values under X1015840

0X0 namely

194533 and 242362 from Table 9 with (X2X3) unlinked

Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is

[VF (1205730) VF (120573

1) VF (120573

2) VF (120573

3)]

= [09196

0984810073

1005710282

1020810208

10208]

= [09338 10017 10072 10000]

(29)

8 Conclusions

Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X

0= [1119899X]

with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF

119888s for mean-centered regressors and

VIF119906s for uncentered data We take issue with both Conven-

tional but clouded vision holds that (i) VIF119906s gage variance

inflation owing to nonorthogonal columns of X0as vectors

in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF

119888s of

value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872

119908 appear to have been

expropriated without verification for models in1198720hence the

anomaliesAccordingly we have distinguished vector-space orthog-

onality (119881perp) of columns of X0 in contrast to orthogonality

(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their

Variance Factors as VF119907rsquos and VF

119888rsquos respectively Contrary

to convention we demonstrate analytically and through casestudies that (i1015840) VIF

119906s do not gage variance inflation (ii1015840)

VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates

In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872

0 Our summary conclusions follow

(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows

(S2) ConventionalVIF119888s apply incompletely to slopes only

not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862

perp-orthogonality pertainsthen VF

119888s may supplant VIF

119888s as applicable to all of

[1205730 1205731 120573

119896] Moreover the latter may reveal that

VF119888(1205730) lt 10 as in cited examples and of special

import in predicting near the origin of the coordinatesystem

(S3) Our VF119907s capture correctly the concept apparently

intended by conventional VIF119906s Specifically if 119881perp-

orthogonality pertains then VF119907s as genuine ratios

of variances may supplant VIF119906s as now discredited

gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for

different reasons to Belsleyrsquos and othersrsquo contention

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 14: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

14 Advances in Decision Sciences

Table 11 Matrices R119862and Rminus1

119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]

R119862

Rminus1119862

20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565

Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892

12 11989213 11989223]

from Definition 12

Elements of G VF119888s

11989212

11989213

11989223

1205730

1205731

1205732

1205733

0 0 0 06665 43996 10282 44586

1 0 0 07002 43681 10208 44586

0 1 0 09196 10073 10282 10208

0 0 1 07201 43996 10073 43681

1 1 0 09848 10057 10208 10208

1 0 1 07595 43681 10003 43681

0 1 1 10248 10073 10073 10160

that uncentered VIF119906s are essential in assessing ill-

conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X

0=

[1119899X1X2 X

119896] with the subspace spanned by

remaining columns(S5) Specifically the degree of collinearity of regressors

with the intercept is quantified by 1205790deg which

if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579

0in importance as a critical ill-conditioning

diagnostic(S6) In contrast Theorem 4(iii) identifies conventional

VIF119888s with angles between a centered regressor and

the subspace spanned by remaining centered regres-sors For example

1205791= arccos([

[

1 minus1

VIFc ( ldquo1205731)]

]

12

) (30)

is the angle between the centered vector Z1= B119899X1

and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1

119899isin R119899 and thus bear on the

geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to

be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances

attributed to allowable partial dependencies amongsome but not other regressors

In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF

119888(120573119895) 1 le 119895 le 119896 To compute

the uncentered VIF119906s one needs to define a variable for

the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF

119888s and VF

119907s as introduced here require

additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here

References

[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984

[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982

[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988

[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975

[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977

[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983

Page 15: ResearchArticle Revision: Variance Inflation in Regressionpeople.virginia.edu/~der/pdf/VIF.pdf · variances”; estimates excessive in magnitude, irregular in sign,and“verysensitivetosmallchangesin

Advances in Decision Sciences 15

[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975

[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970

[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987

[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984

[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990

[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980

[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969

[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996

[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003

[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006

[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008

[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010

[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981

[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983

[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987

[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986

[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980

[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007

[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984

[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969

[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997

[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000

[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003

[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983