misconceptions about multicollinearity in …...editorial misconceptions about multicollinearity in...

16
EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas Lindner 1 , Jonas Puck 1 and Alain Verbeke 2,3,4 1 Institute for International Business, Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna, Austria; 2 Haskayne School of Business, University of Calgary, 2500 University Drive NW, Calgary T2T1N4, Canada; 3 Henley Business School, University of Reading, Reading, UK; 4 Solvay Business School, Vrije Universiteit Brussel, Brussels, Belgium Correspondence: A Verbeke, Haskayne School of Business, University of Calgary, 2500 University Drive NW, Calgary T2T1N4, Canada e-mail: [email protected] Abstract Collinearity between independent variables is a recurrent problem in quantitative empirical research in International Business (IB). We explore insufficient and inappropriate treatment of collinearity and use simulations to illustrate the potential impact on results. We also show how IB researchers doing quantitative work can avoid collinearity issues that lead to spurious and unstable results. Our six principal insights are the following: first, multicollinearity does not introduce bias. It is not an econometric problem in the sense that it would violate assumptions necessary for regression models to work. Second, variance inflation factors are indicators of standard errors that are too large, not too small. Third, coefficient instability is not a consequence of multicollinearity. Fourth, in the presence of a higher partial correlation between the variables, it can paradoxically become more problematic to omit one of these variables. Fifth, ignoring clusters in data can lead to spurious results. Sixth, accounting for country clusters does not pick up all country-level variation. Journal of International Business Studies (2020) 51, 283–298. https://doi.org/10.1057/s41267-019-00257-1 Keywords: multicollinearity; collinearity; regression analysis; hierarchical modeling; quantitative research methods INTRODUCTION International Business (IB) researchers frequently rely on quantita- tive empirical analyses. Indeed, more than a third of the Journal of International Business Studies (JIBS) articles available online (as of January 2019) include some form of regression analysis. As the unit of analysis is often the international firm, IB researchers are seldom able to collect data through controlled experiments where partic- ipants are randomly assigned to treatment and control groups, or through other ways of making sure observations are independent. As a result, IB researchers usually deal with large control models in their regression analyses in order to make their units of analysis comparable. Nielsen and Raswant (2018), among others, highlight that integrating the ‘‘right’’ control variables is a significant challenge for IB researchers. Consequently, IB studies may suffer Online publication date: 5 August 2019 Journal of International Business Studies (2020) 51, 283–298 ª 2019 Academy of International Business All rights reserved 0047-2506/20 www.jibs.net

Upload: others

Post on 01-Jun-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

EDITORIAL

Misconceptions about multicollinearity

in international business research:

Identification, consequences, and remedies

Thomas Lindner1, Jonas Puck1

and Alain Verbeke2,3,4

1Institute for International Business, Vienna

University of Economics and Business,Welthandelsplatz 1, 1020 Vienna, Austria;2Haskayne School of Business, University of

Calgary, 2500 University Drive NW,Calgary T2T1N4, Canada; 3Henley Business

School, University of Reading, Reading, UK;4Solvay Business School, Vrije Universiteit Brussel,

Brussels, Belgium

Correspondence:A Verbeke, Haskayne School of Business,University of Calgary, 2500 University DriveNW, Calgary T2T1N4, Canadae-mail: [email protected]

AbstractCollinearity between independent variables is a recurrent problem in

quantitative empirical research in International Business (IB). We explore

insufficient and inappropriate treatment of collinearity and use simulations toillustrate the potential impact on results. We also show how IB researchers

doing quantitative work can avoid collinearity issues that lead to spurious and

unstable results. Our six principal insights are the following: first,multicollinearity does not introduce bias. It is not an econometric problem in

the sense that it would violate assumptions necessary for regression models to

work. Second, variance inflation factors are indicators of standard errors that aretoo large, not too small. Third, coefficient instability is not a consequence of

multicollinearity. Fourth, in the presence of a higher partial correlation between

the variables, it can paradoxically become more problematic to omit one ofthese variables. Fifth, ignoring clusters in data can lead to spurious results.

Sixth, accounting for country clusters does not pick up all country-level

variation.

Journal of International Business Studies (2020) 51, 283–298.https://doi.org/10.1057/s41267-019-00257-1

Keywords: multicollinearity; collinearity; regression analysis; hierarchical modeling;quantitative research methods

INTRODUCTIONInternational Business (IB) researchers frequently rely on quantita-tive empirical analyses. Indeed, more than a third of the Journal ofInternational Business Studies (JIBS) articles available online (as ofJanuary 2019) include some form of regression analysis. As the unitof analysis is often the international firm, IB researchers are seldomable to collect data through controlled experiments where partic-ipants are randomly assigned to treatment and control groups, orthrough other ways of making sure observations are independent.As a result, IB researchers usually deal with large control models intheir regression analyses in order to make their units of analysiscomparable. Nielsen and Raswant (2018), among others, highlightthat integrating the ‘‘right’’ control variables is a significantchallenge for IB researchers. Consequently, IB studies may suffer

Online publication date: 5 August 2019

Journal of International Business Studies (2020) 51, 283–298ª 2019 Academy of International Business All rights reserved 0047-2506/20

www.jibs.net

Page 2: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

from omitted variable bias, which affects therobustness and validity of their results.

Extensive control models potentially pose achallenge in quantitative IB research, which inconjunction with dependence on typically smallsets of historical data make it almost impossible tobase quantitative analysis on unrelated variables.Survivor bias, for example, makes it difficult toinvestigate how balance sheet positions affectprofitability since the firms for which data areavailable are those with a combination of resourcesthat lead to company success. Hence, a large bodyof quantitative IB research features empirical anal-yses that have highly collinear variables. This isreflected in the fact that multicollinearity isacknowledged in about 400 of the approximately1400 quantitative articles in JIBS available online asof January 2019; in at least 150 of them authorscompute variance inflation factors (VIFs) to deter-mine whether a variable introduces ‘‘too much’’multicollinearity into a regression analysis. Justhow much collinearity in terms of VIFs is too muchvaries from a high of 20 (Greene, 2003, Griffiths,Judge, Hill, Lutkepohl, & Lee, 1985), to over 10(Wooldridge, 2014), to as low as five (Rogerson,2001) or even three (Read & Read, 2004). Inaddition to there being no agreement about justwhat is too much collinearity, there is a wide rangeof strategies to address it. One of the most frequent(e.g. Meyer & Sinani, 2009; Muethel & Bond, 2013;Zhao, Park, & Zhou, 2014) is to exclude variablesthat have high partial correlations with othervariables. While doing that can indeed reducecollinearity among variables, it simultaneouslyincreases the risk of omitted variable bias.

Unlike in IB, multicollinearity among explana-tory variables has received a lot of attention ineconometric theory and in econometric texts (e.g.,Goldberger, 1991; Greene, 2003; Wooldridge,2014). This is not to say that the topic has beenneglected in IB. Editorials (e.g., Cuervo-Cazurra,Andersson, Brannen, Nielsen, & Reuber, 2016) havehelped considerably in raising awareness and havealso offered potential solutions. Nonetheless, thereis some misunderstanding about how findings fromeconometric theory can be applied to empiricalresearch in IB. We identify in this paper threeimportant gaps between econometric theory andthe practice of applied IB research. First, there arealmost no illustrations in econometric textbooks ofhow problems arising from potential collinearitybetween explanatory variables can be detected.Second, there is a paucity of simulations that

illustrate how not treating multicollinearity, ortreating it insufficiently, may lead to severe bias,spurious findings, and flawed conclusions. Third,econometric textbooks tend to illustrate the effectof the violation of an assumption on point esti-mates or their variance with ‘‘clean’’ examples, i.e.,simple cases that deal with one problem at a time.Yet, IB research usually deals with complex rela-tionships between many interrelated variables. Tothe best of our knowledge, there has been noinvestigation into how specific econometric anddata problems affect regression outcomes in so-called ‘‘messy’’ cases where, among other issues,multicollinearity is present.We address all three gaps. First, we use a simu-

lation to show how researchers can identify multi-collinearity issues in regression results. There hasbeen little discussion of what happens whenresearchers drop or compound collinear variablesto reduce multicollinearity – and hence VIFs. Weprovide evidence of the consequences of suchomissions. Second, we show how instability inresults (Enright, 2009) is related to multicollinear-ity between explanatory variables, and explain whythat is. We discuss how instability arises frominappropriate treatment of multicollinearity, andrespond to calls for explanations of how this couldbe remedied. For instance, Kalnins (2018) investi-gates how collinearity affects coefficient instabilityunder measurement error. We first investigatecollinearity between relevant independent vari-ables, and then between relevant and irrelevantindependent variables. We combine the findings todevelop a decision matrix that researchers canutilize to determine which situation they may wantto avoid, and when using it, outline strategies todetect and respond to potential multicollinearityissues. Third, we present a complex simulationexample that mimics the data analysis process in atypical quantitative IB research project. We use theexample to show how multicollinearity may affectresearch outcomes if treated inappropriately.Finally, we summarize our findings and suggestionsand based on them, we develop a list ofrecommendations.This paper contributes to the IB literature by

highlighting misconceptions about collinearity andhow empirical challenges associated with it can beaddressed. We argue that multicollinearity does notintroduce bias into regression results and that resultinstability is not a consequence of it. We do showwhich effects researchers can expect if multi-collinearity is addressed inappropriately or

Misconceptions about multicollinearity Thomas Lindner et al.

284

Journal of International Business Studies

Page 3: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

insufficiently. As the true relationship betweendependent and independent constructs usually isunknown, the kinds of simulations and illustra-tions we present can help researchers in identifyingwhich econometric problems they might be facingand how they can expect their results to changeunder different specifications. In doing so, wecontribute to the body of methodological literatureby reducing the gap between ‘‘clean’’ methodolog-ical results in econometric theory and the applica-tion of such theory to real-world, ‘‘messy’’ IBresearch phenomena.

MULTICOLLINEARITYMulticollinearity is typically one of the mostdiscussed issues in the empirics section of IBmanuscripts. In JIBS, approximately 400 of thearticles available online (as of January 2019) explic-itly address the topic. Much of the discussioncenters around VIFs and their detrimental effecton regression results. Our review of papers revealedtwo main sub-issues: variance inflation and coeffi-cient stability given high collinearity.

It is acknowledged in the econometric literaturethat ‘‘[…] the ‘problem’ of multicollinearity is notreally well defined’’ (Wooldridge, 2014, p. 83), butthere is nonetheless consistency in how specificconsequences of collinearity can (or cannot) betreated. Specifically, econometric textbooks arguethat multicollinearity is caused by small samplesize, i.e., ‘‘micronumerosity’’ (e.g., Goldberger,1991), and that its solution lies in collecting moredata. This also means that multicollinearity is asample characteristic: when two variables exhibitcollinearity in one sample, it does not follow thatthey will be collinear in all other samples as well.Consequently, the problems discussed most in IBresearch – coefficient stability and variance infla-tion – have relatively straightforward answerswhen drawing on econometric theory.

Nonetheless, there is still controversy in IB overhow to successfully address collinearity. We thinkthis is due partly to a lack of examples usingsimulation, and partly also to the available exam-ples being too ‘‘clean’’ to improve our understand-ing, when in fact IB research contexts tend to be‘‘messy’’, i.e., there are many non-independentvariables and sample sizes tend to be relativelysmall. Consequently, we will address stability andvariance inflation using simulations and a morecomplex example than usually found in theliterature.

In this paper, ‘‘collinearity’’ and ‘‘multicollinear-ity’’ refer to a relationship between two or moredistinct constructs, not two or more measures ofthe same construct. Before moving to possiblesolutions, we will, using a minimum of economet-ric theory and several illustrations from simulation,investigate the effect that multicollinearity has onpoint estimators and estimator variance whenmulticollinearity between explanatory variables ispresent. First, we will look into a case where bothcollinear variables are part of the true relationshipbetween independent and dependent constructs.We will then investigate the effects of multi-collinearity between independent constructs whenone of these constructs is not part of the proposedrelationship between independent and dependentconstructs. In both cases, the hypothetical researchcontext for simulation is that of the determinantsof a firm’s foreign direct investment (FDI).For this simulation, we assume in very generic

terms, and as a stylized but realistic example, thatthe amount firms invest abroad could be a functionof three variables, typical for relatively youngMNEs. First, we acknowledge the role of priorinternational experience (intexp), reflecting thefirm’s – or its founders’ – demonstrated ability towork across borders and the learning associatedwith this prior work, thereby indirectly measuringthe non-location boundedness of its firm-specificadvantages (FSAs), see Verbeke, Zargarzadeh, andOsiyevskyy (2014). Second, firm size (size) reflectsthe firm’s proven ability to grow domestically, itselfthe result of underlying location-bound FSAs interms of, e.g., innovative knowledge or entrepre-neurial effectiveness (whereby the question ofcourse arises, whether these domestically relevantFSAs can be deployed effectively across borders).Third, there is organizational slack (slack) in thesense of a quantity of unused resources that from aPenrosean perspective can be deployed to expandinto foreign markets (Verbeke & Yuan, 2013). Firmsize and international experience thus reflect ingeneric terms the quality of the firm’s resourcebase, whereas slack represents the quantity ofresources available for expansion. Consequently,in the following subsections we investigate varia-tions of the relationship:

FDI ¼ b0 þ b1 � intexpþ b2 � sizeþ b3 � slackþN 0;0:5ð Þ:

Wang, Hong, Kafouros, and Wright (2012) inves-tigate a similar relationship using a much more

Misconceptions about multicollinearity Thomas Lindner et al.

285

Journal of International Business Studies

Page 4: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

complex econometric model, which, nevertheless,contains the three variables presented above. Forthe sake of simplicity, we assume the aboverelationship to hold. Unless indicated otherwise,we ran 100 models with different correlationsbetween international experience (intexp) and size,keeping slack independent. The observations forintexp and slack will be drawn from a covariatedistribution with correlations varying from justabove - 1 to just below 1. This means that multi-collinearity will vary across the models. We willinvestigate the effects of collinearity on coeffi-cients, errors, and other model parameters. Eachof the 100 models will have 1000 simulated obser-vations. The R code and data for all simulationspresented in this paper are available from theauthors upon request.

Collinearity between Relevant IndependentVariablesIn this subsection, we assume that FDI is a functionof international experience, firm size, and slack,following the specification

FDI ¼ 2 � intexp þ 4 � sizeþ 3 � slackþN 0;0:5ð Þ:

Variance inflationIn the first analysis, we are interested in how thecorrelation between intexp and size affects thestandard errors of the coefficients in the regression.The model estimated is

dFDI ¼ b0 þ b1 � intexpþ b2 � sizeþ b3 � slackþ e:

If the estimation works appropriately, we shouldsee the true coefficients in our regression output,i.e., b0 should be 0, b1 should be 2, b2 should be 4,and b3 should be 3. The left panel of Figure 1 showsthe standard errors for the variables in the regres-sion depending on the correlation between theobservations for intexp and size. It is evident thatincreasing the correlation (in absolute terms)between intexp and size leads to an inflation of therespective standard errors, but importantly, thestandard error of the coefficient for slack does notchange with this collinearity. Thus, there is noevidence for ‘‘smearing’’ (Greene, 2003, p. 265) ofthe error inflation to other independent variables.In this context, ‘‘smearing’’ would mean that thestandard errors of the other coefficients are affectedby collinearity between international experienceand slack.The right panel of Figure 1 shows the square root

of VIFs against the correlation between intexp andsize. The VIFs for intexp and size are almost equal,therefore only one of the two lines is clearly visible.Comparing the left and right panels, we see thatthere is a strong relationship between the VIFs anderror inflation. The VIF of slack, which is quiteconstant at 1 throughout the bandwidth of corre-lations, again indicates that collinearity betweenintexp and size does not ‘‘smear’’ over to otherindependent variables. The strong relationship

Figure 1 The effect of correlations on standard errors and VIFs. Left panel: standard error of coefficient for intexp (red), size (green),

slack (blue, dashed), and constant (black, dotted). Right panel: variance inflation factors of intexp (red), size (green), and slack (blue,

dashed).

Misconceptions about multicollinearity Thomas Lindner et al.

286

Journal of International Business Studies

Page 5: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

between the square root of the VIFs for intexp andsize call for an illustration of the direct relationshipbetween the two. This is given in Figure 2. It is clearthat there is an almost perfect model fit betweenthem. The regression lines for international expe-rience in the left panel of Figure 2 and firm size inthe right one almost perfectly capture the variationof the two.

This illustration shows that variance inflationfactors are representations of how much error termsare inflated if multicollinearity is present in a linearregression model. The square root of the VIFs is thefactor by which error terms are inflated comparedto when there is no multicollinearity. It is impor-tant to note, however, that inflated variance doesnot lead to spurious results. In fact, the opposite istrue: The higher the collinearity, i.e., the higher theVIFs, the harder it is to find statistical evidence of arelationship. If collinearity between two variablesincreases, the standard errors in a regression areinflated correspondingly (hence the name for VIFs‘‘variance inflation factors’’). If we compare twoscenarios, whereby the effects of two independentvariables on a dependent construct are the same,but the two independent variables are correlated todifferent degrees, we will obtain higher standarderrors (and consequently lower t values, and higherp values) the higher the correlation (in absoluteterms) between the two independent variables. Thismeans that the statistical support for a relationshipas suggested by a regression analysis becomes

weaker if the independent construct is highlycorrelated with another independent (or control)variable. In the following section, we explorefurther how the statistical evidence for a relation-ship between independent and dependent variableschanges with collinearity under different scenarios.

Spurious coefficientsGiven that VIFs are fully explained by the relativeerror terms of the collinear variables, the next stepis to investigate the effect of collinearity on thecoefficients in a regression. Figure 3 shows thatthere is no relationship between the correlation ofintexp and size and any of the coefficients in theregression. The red line for b1, the coefficient ofintexp, is stationary at its real value of 2, the greenline indicating b2, the coefficient of size, is station-ary at 4, the dashed blue line indicating b3, thecoefficient of slack, is stationary at 3, and the dottedblack line, the intercept, is stationary around 0. Thedifferences in the degree to which the lines inFigure 3 fluctuate are the result of how muchvariance the respective variables are assumed tohave when we create them in the simulation. Thefact that no trend can be observed for any of thevariables with increasing correlations shows thatcollinearity does not affect the coefficients in aregression.In reality, it is quite rare in IB research that the

‘‘true’’ underlying relationship is known. Conse-quently, researchers frequently discuss collinearity

Figure 2 The relationship of standard errors with VIFs. Relationship between relative error terms and square root of VIF for

international experience (left) and firm size (right).

Misconceptions about multicollinearity Thomas Lindner et al.

287

Journal of International Business Studies

Page 6: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

when the size or the significance of a coefficientchanges as a collinear variable is added to a model.We illustrate this by estimating the equation

dFDI0 ¼ b00 þ b01 � intexpþ b03 � slackþ e0;

while the true dependency of FDI on experience,size, and slack still looks the same as before:

FDI ¼ 2 � intexp þ 4 � sizeþ 3 � slackþN 0;0:5ð Þ:

When a collinear variable is omitted, VIFs arelow, and there is no indication that a relevantvariable has been omitted. In our case, the standarderror of the coefficient of b01 for internationalexperience (intexp) is significantly reduced if ahighly collinear variable (here firm size) is omittedfrom the regression, as is clear from Figure 4. Thus,omitting a relevant but collinear variable is prob-lematic in a regression because it deflates standarderrors and may lead to spurious findings. Of coursethis is not what researchers intend when they dropone of the two variables to ‘‘remedy’’ high partialcollinearity. As we show in this simulation, this cancreate spurious significance by deflating the stan-dard error of the included variable. If in doubt, aresearcher would be well advised to keep thevariables in the regression model. Although thismay inflate standard errors, it will not createspurious results. As Wooldridge (p. 83) put it when

writing on the same topic, ‘‘[…] multicollinearityviolates none of our assumptions.’’While it is undeniably problematic to omit a

collinear variable from a regression as it may lead tospuriously significant coefficients, it is much worseif it introduces bias into the coefficient of theremaining variables themselves. To investigate theextent to which this is the case, we again simulatethe incomplete regression model (with coefficientsb0i) shown above. We also run the complete regres-sion model:

dFDI ¼ b0 þ b1 � intexpþ b2 � sizeþ b3 � slackþ e

using the same data as in the incomplete one. Wecompare the coefficients b0i to bi. Figure 5 shows theresults of the simulation for the coefficients b01 andb1 of international experience (intexp) if the colli-near variable firm size (size) is omitted (red, dashed)or included (red, solid).

Figure 5 shows that omitting a relevant, butcollinear, variable may not only lead to deflatederror terms and hence spuriously significant coef-ficients, but also to biased ones. The degree of biaswill depend on the degree of correlation betweenthe variable included and the one omitted. Hence,the omission of relevant but collinear variables maycreate spuriously significant results, it may severelybias coefficients, it may even do both at the sametime, yielding spuriously significant biased coeffi-cients. In our case, a researcher who omits the

Figure 3 The effect of correlations on regression coefficients.

Relationship between correlation of intexp and size and the

coefficients of intexp (red), size (green), slack (blue, dashed), and

the intercept (black, dotted) in the regression.

Figure 4 The effect of variable omission on error deflation.

Relationship between correlation of intexp and size and the

standard error of b01, the coefficient for international experience.

Misconceptions about multicollinearity Thomas Lindner et al.

288

Journal of International Business Studies

Page 7: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

variable firm size from a regression model whenthere is a negative correlation between interna-tional experience and firm size, might concludethat international experience has a negative effecton a firm’s FDI. We can also see from Figure 5 thatthe higher the collinearity (in absolute terms)between two variables, the more severe the conse-quences of an omission. At the same time, includ-ing collinear variables in the regression onlyinflates standard errors and hence makes regressionresults, if they turn out to be significant, moreconservative. This is because collinearity, when onehas independent variables in a correctly specifiedmodel, makes the estimation more efficient andincreases standard errors (as expressed by VIFs)while leaving coefficient estimates unbiased.

This result also explains what many researchersin IB have found when they add to a regressionmodel a variable that is highly collinear with onealready in the model: significance levels changesubstantially and the sign of the coefficient of thevariable previously in the model changes dramat-ically. Table 1 shows such a case. If the correlationbetween intexp and size is - 0.8, a researcher whoomits size from the regression model may concludethat there is a significant negative effect of inter-national experience on FDI (model 1, p\0.0001).This result, however, is spurious as the true under-lying relationship (as specified above) suggests thatthe coefficient on intexp should be 2. If the collinearvariable size is added to the regression model, the

researcher will find a significant positive relation-ship between international experience and FDI(model 2, p\ 0.0001). Given that in reality we donot know the underlying ‘‘true’’ model, it is impor-tant not to drop collinear variables, as doing so maylead to spurious, biased, or both spurious andbiased results. Researchers – and reviewers – mightdraw the conclusion that high collinearity makesthe coefficient of international experience ‘‘unsta-ble’’. We hope that our illustration has shown thisto be flawed thinking. The problem in model 1 isnot ‘‘instability caused by collinearity’’ but omis-sion of a relevant collinear variable. Of course, itcould happen that two highly collinear variableshave some outlier observations such that whencontrolling for one and adding the other to themodel, outliers become highly influential and leadto a flip in the signs of the coefficients. In such acase, the problem is not collinearity but too fewobservations, because this is what drives the impactof outliers on regression results. This brings us backto Goldberger (1991) equating a multicollinearityproblem with a ‘‘micronumerosity’’ one.

Collinearity between Relevant and IrrelevantIndependent VariablesSo far, our discussion has focused on collinearitybetween two variables that are both relevant for thedependent construct. Even though theory fre-quently gives a researcher a good starting point,in the real world, the ‘‘true’’ underlying relationshipbetween independent and dependent constructs isnot known. Let us address cases where one of the

Figure 5 The effect of variable omission on coefficient

estimates. Relationship between correlation of intexp and size

and the coefficients b1 (red, solid) and b01 (red, dashed).

Table 1 The effect of variable omission on bias in regression

results

(1) (2)

Intercept 30.248 - 0.212

(7.867) (1.576)

International experience - 1.205 2.036

(0.075) (0.026)

Firm size 3.989

(0.026)

Slack 2.545 3.011

(1.572) (0.312)

Observations 1000 1000

Adjusted R2 0.206 0.969

Regression results for incomplete (1) and complete (2) models. Coeffi-cients b01 (model 1) and b1 (model 2) for international experience differsignificantly. Correlation between intexp and size equals - 0.80. Truecoefficients are 0 for the intercept, 2 for international experience, 4 forfirm size, and 3 for slack. Standard errors are in parentheses.

Misconceptions about multicollinearity Thomas Lindner et al.

289

Journal of International Business Studies

Page 8: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

collinear independent variables is irrelevant, i.e.,not part of the ‘‘true’’ relationship. Given that ourillustrations up to now show that the coefficients ina complete model are independent of VIFs (and theother way around), we will only discuss potentiallyspurious results resulting from omission here. The‘‘true’’ underlying relationship between the threeindependent variables international experience (in-texp), firm size (size), as well as slack, and thedependent variable firm FDI is now

FDI ¼ 2 � intexp þ 0 � sizeþ 3 � slackþN 0;0:5ð Þ¼ 2 � intexp þ 3 � slackþN 0;0:5ð Þ:

Figure 6 shows the relative errors of the non-omitted, independent variable for the followingregressions (where first size and then intexp areomitted):

dFDI01 ¼ b00;1 þ b01;1 � intexpþ b03;1 � slackþ e01;

dFDI02 ¼ b00;2 þ b02;2 � sizeþ b03;2 � slackþ e02;

In the left panel of Figure 6, the irrelevant variable,firm size, is omitted from the regression. As onewould expect, the omission of an irrelevant variablehas no effect on the variance of the remainingcoefficients. This also holds true if the omittedvariable is collinear with one that is included. Onthe other hand, and as we have shown previously,if the relevant variable international experience isomitted, the error of the irrelevant variable firm sizeis deflated. The omission might consequently lead

to a spuriously significant impact of firm size onFDI while the true impact should be zero.

Given the potential for error deflation, omitting acollinear variable may even lead to spurious resultsif coefficients are biased sufficiently far from theirtrue value. This can be especially problematic if thetrue value of the coefficient is zero, i.e., there is nounderlying relationship, as in the case of firm size(size) in this example. To show this, we again runboth the regression with omitted variables and thatincluding both intexp and size:

dFDI1 ¼ b0;1 þ b1;1 � intexp þ b2;1 � sizeþ b3;1 � slackþ e1

Figure 7 shows the dependency of the coeffi-cients for international experience (intexp) and firmsize on the correlation between intexp and size inregressions when the respective other variable isomitted. In the left panel, the coefficients in thecorrectly specified and mis-specified models arealmost the same. This tells us that omitting acollinear but irrelevant variable does not introducebias in the coefficient of the collinear relevantvariable. Omitting a relevant collinear variable,however, deflates errors and biases the coefficient ofthe irrelevant variable, as shown in the right panel.Finally, we illustrate how an econometrician

might make an incorrect interpretation of changesin regression results that may be attributed to‘‘multicollinearity’’. Table 2 shows regressionresults for estimations with firm size omitted(model 1), with international experience omitted

Figure 6 The effect of omitting irrelevant and relevant collinear variables on standard errors. Relationship between correlation of

intexp and size and the (left panel) error of coefficient b01;1 of international experience (intexp) as well as the (right panel) error of

coefficient b02;2 of firm size (size) with the respective other variable omitted from regression.

Misconceptions about multicollinearity Thomas Lindner et al.

290

Journal of International Business Studies

Page 9: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

(model 2), and with a fully specified model (model3). In this simulation, the correlation betweeninternational experience and firm size is set to 0.8(as opposed to - 0.8 in the above example). Yet, wewould expect, given what is shown in Figures 6 and7 that omitting an irrelevant correlated variable(size in model 1) would not affect the true relation-ships in model 1. Indeed, the coefficient for inter-national experience is unbiased in model 1.However, if the relevant variable intexp, which iscorrelated with the irrelevant variable size, is omit-ted, a researcher might find spuriously significanteffects of size on FDI (model 2, p\0.0001). As isshown in model 3, there is no statistically

significant relationship between firm size and FDI(p\0.5) despite the spuriously significant positiveresult on firm size in model 2.Using simulations, we have highlighted the

problems multicollinearity creates in multipleregression analysis. The analytical background ofour illustrations is readily available in econometricstexts such as those by Wooldridge (2014) andGreene (2003). The most important conclusion tobe drawn is that multicollinearity, while it mayinflate standard errors, is not an econometricproblem per se. In other words, it does not reducethe degree to which regression results are valid, butonly underestimates reliability. The most impor-tant issue is the risk of obtaining biased resultsthrough the incorrect omission of relevant vari-ables. We hope that we have convincingly shownthat this may lead to spurious significance and biasand thus render a regression analysis useless. At thesame time, multicollinearity in a regression modelis likely to do no worse than inflate standard errorswhich, although not optimal, will do no more thanmake results more conservative. Moreover, addingcollinear variables to a linear model does notintroduce instability. Nonetheless, we agree thatthe interpretation of coefficients of highly collinearvariables is difficult and may require a ‘‘jointinterpretation’’, but this is not an econometricproblem as such. We highly recommend, if therewere any doubt, that collinear variables beincluded, rather than excluded from models. If allrelevant variables are accounted for, the parameter

Table 2 The effect of omitting irrelevant and relevant collinear

variables on regression results

(1) (2) (3)

Intercept 0.917 8.994 0.892

(1.596) (4.054) (1.597)

International experience 1.988 2.004

(0.016) (0.027)

Firm size 1.550 - 0.019

(0.040) (0.026)

Slack 2.824 2.276 2.824

(0.318) (0.809) (0.318)

Observations 1000 1000 1000

Adjusted R2 0.939 0.602 0.938

Regression results for coefficients b0i;1, and b0i;2, as well as bi;1. The cor-relation between intexp and size equals 0.8. Standard errors are inparentheses.

Figure 7 The effect of omitting irrelevant and relevant collinear variables on regression coefficients. Relationship between correlation

of intexp and size and the (left panel) coefficient of international experience as well as (right panel) coefficient of firm size. Dashed lines

are for coefficients b01;1 and b02;2 when the respective other variable (intexp and size) are omitted (models 1 and 2 in Table 2), and solid

lines when they are included (coefficients b1;1 and b2;1 in model 3 in Table 2).

Misconceptions about multicollinearity Thomas Lindner et al.

291

Journal of International Business Studies

Page 10: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

estimates in the full model are unbiased despitecollinearity (Cohen, West, & Aiken, 2014).

Table 3 gives an overview of the consequences ofincluding or not including a relevant versus irrel-evant collinear variable in a regression model.Naturally, not including an irrelevant variable hasno detrimental consequences on regression results(quadrant 4 in Table 3). It is clear that including acollinear variable, regardless of whether it is rele-vant, leads to error inflation and an increase in VIFs(quadrants 1 and 2 in Table 3), which makes itmore difficult for the researcher to identify relevantrelationships. It is important to note, however, thatnot including a relevant collinear variable will leadto both biased coefficients and deflated error terms.This means that there will be spurious findings onany coefficients with non-zero correlations with theexcluded collinear variables, and deflated errorterms for all those coefficients (quadrant 3 inTable 3).

A ‘‘MESSY’’ EXAMPLEOur illustrations have focused on isolated cases sofar. One reason so few findings from econometrictheory are fully used in applied research is thateconometric textbooks lack examples that featurethe kind of ‘‘messy’’ problems facing IB researchers.We therefore present a more complex (though stillvery stylized) illustration, that could be found inthe empirical section of a research paper. This timethe illustration centers on how much FDI, sub-sidiaries receive from their parent firm abroad. Thedata are for five countries with a different averageFDI inflow per subsidiary. Differences in FDI flowscould be driven by differences in GDP (as a proxyfor market size, and thus host-country locationadvantages), and modeled by a random interceptc0;j.

Such a data structure is known as ‘‘hierarchical’’,and the literature suggests that ‘‘multilevel analy-sis’’ (e.g., Hox, 1995), ‘‘hierarchical modeling’’ (e.g.,Steenbergen, 2012), or ‘‘random coefficient models’’(e.g., Alcacer, Chung, Hawk, & Pacheco-de-

Almeida, 2018) are ways to consider data structuredinto two (or more) levels. These methods providesignificant opportunities for IB research where dataare often naturally hierarchical. Yet, ignoring thehierarchical data structure in econometric specifi-cations leads to misspecification, and may be thereason for the spurious findings of country-levelvariables influencing firm-level decisions. The gen-eral idea is that, similar to the multicollinearityissues discussed in this paper, inappropriate treat-ment of clusters may introduce bias into regressionresults. It is important to note, however, that thereason for this bias is an unconsidered associationbetween observations, whereas with multicollinear-ity, the source of bias is the potential omission of avariable. As in the multicollinearity case, the reasonis bias resulting from non-independent error terms.The model specification shown below accounts fordiffering means in the predicted dependent vari-able (i.e., the c0;j) for the different countries j. Thisspecification is what some scholars call a fixed-effect hierarchical model (Hox, 1995), or a randomintercept model (Alcacer et al., 2018). It allows for aconstant term that is country-specific by includinga dummy for all countries but one (to avoid thedummy-variable trap).We again present a stylized example to illustrate

the broader points we want to make. In this stylizedexample, FDI inflows into subsidiaries are a func-tion of six variables. First, market size in a sub-sidiary country can be important to determine howmuch firms will be prepared to invest in thesubsidiary, at least in the context of market-seekinginvestment (e.g., Driffield & Munday, 2000). Sec-ond, FDI inflows will increase as a function of thelength of time during which a subsidiary has beenoperating in a target market (e.g., Asmussen, Ped-ersen, & Dhanaraj, 2009). Third, we speculate, inline with the concept of subsidiary-specific advan-tage (Rugman & Verbeke, 2001), that affiliates withsubstantial slack resources, which they can use forfurther investments, will receive less FDI, becauseslack allows them to invest further without addi-tional funds from headquarters. Fourth, if

Table 3 Consequences of different treatments of collinearity in regression models

Collinear variable included Collinear variable not included

Collinear variable relevant (1)

Error inflation

(3)

Biased coefficients, deflated standard errors

Collinear variable irrelevant (2)

Error inflation

(4)

Unbiased results

Overview of consequences if a relevant/irrelevant collinear variable is included/not included in a regression model.

Misconceptions about multicollinearity Thomas Lindner et al.

292

Journal of International Business Studies

Page 11: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

subsidiaries are run by more experienced CEOs,they will receive more FDI because those CEOs aresupposedly more competent to scale up the busi-ness (e.g., Reuber & Fischer, 1997). Fifth, similar tothe example in the section before, a subsidiary’sability to generate intangible assets will increaseinvestment in this subsidiary. Finally, politicalstability in the subsidiary country will also posi-tively influence FDI inflows (e.g., Habib & Zuraw-icki, 2002). The ‘‘true’’ model specification for FDIinflows into subsidiary i in country j takes thefollowing form:

FDIini;j ¼ c0;j þ 3 � customersi þ 4 � agei � 5 � slacki þ 6

� CEOexperiencei þ 7 � intangibleassetsiþ 800 � politicalstabilityj þN 0;1ð Þ:

The data that we use, for illustrative purposesonly, are again simulated. In contrast to theprevious examples, we create data that are to adifferent extent collinear, and which have a morecomplex data structure, with clusters on the coun-try level. Values for a subsidiary’s number ofcustomers, age, slack, CEO experience, and intan-gible assets (all dimensions of subsidiary level) aredrawn from a multivariate distribution withdefined correlations (see Table 4). Political stabilityis simulated to be independent of the subsidiary-level variables beyond the country clusters. Politicalstability varies only on the country level (j). Toillustrate how ignoring the clustered data structurecan bias regression results, we introduce the vari-able ‘‘subsidiary registration number’’ into themodel. This variable is quite obviously unrelatedto FDI inflows, but we will see how ignoring thedata structure may make an econometrician believethe ‘‘subsidiary registration number’’ may be an

important determinant of FDI inflows. This variableis simulated to be independent from all othervariables, and clustered on the country level.Since the sample size is not very large (500

observations), even the variables drawn from inde-pendent distributions will have non-zero correla-tions with the other variables. The correlationmatrix in Table 4 reflects the actual correlations,with substantial partial correlation between thevariables drawn from a joint distribution (numberof customers, age, slack, CEO experience, andintangible assets). Descriptive statistics of thedependent and independent variables are given inTable 4 as well.In the next step, we show what results a

researcher might obtain when fitting differentmodels to the simulated data. Table 5 showsregression results. We ran five OLS models (models1–5) and two fixed effect models that account forcountry-level differences in the mean amount ofFDI received by subsidiaries. Models 1 and 2 showthat omitting relevant variables that are not inde-pendent of those in the ‘‘true’’ model wreaks havocwith the conclusions that are drawn from linearmodeling, as is illustrated in the second section ofthis paper. From the results of model 1, a researchercould conclude that a subsidiary’s number ofcustomers has a positive effect on inward FDI(p\0.000000051). If subsidiary age is controlledfor, however, the conclusion from model 2 wouldbe that the number of customers has a negativeeffect on the FDI a subsidiary receives (p\0.047).Thus, the direction of the effect changes because apreviously omitted relevant variable is nowincluded.The effect of the number of customers on FDI

inflows disappears in model 3 (p\ 0.44) when we

Table 4 Descriptive statistics and partial correlations for the variables in the ‘‘messy’’ illustration

(1) (2) (3) (4) (5) (6) (7) (8)

FDIin (1) 1 0.216 0.614 - 0.336 0.435 0.242 - 0.550 0.674

Customers (2) 0.216 1 0.479 0.325 - 0.142 - 0.070 - 0.095 0.107

Age (3) 0.614 0.479 1 - 0.257 0.410 0.031 - 0.014 0.025

Slack (4) - 0.336 0.325 - 0.257 1 - 0.010 - 0.072 - 0.012 0.022

CEO experience (5) 0.435 - 0.142 0.410 - 0.010 1 0.028 0.057 - 0.055

Intangible assets (6) 0.242 - 0.070 0.031 - 0.072 0.028 1 0.023 - 0.035

‘‘Sub. Reg. No.’’ (7) - 0.550 - 0.095 - 0.014 - 0.012 0.057 0.023 1 - 0.815

Political stability (8) 0.674 0.107 0.025 0.022 - 0.055 - 0.035 - 0.815 1

Mean 78.565 4.957 2.979 0.176 6.036 0.541 321.718 0.016

SD 15.376 1.015 0.980 0.992 0.988 0.516 107.250 0.013

Min 41.164 2.259 - 0.247 - 2.963 3.359 - 1.162 - 11 0.005

Max 125.490 8.047 6.025 2.884 9.192 2.235 543 0.040

Misconceptions about multicollinearity Thomas Lindner et al.

293

Journal of International Business Studies

Page 12: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

control for subsidiary slack, which has a negativeeffect on FDI inflows (p\ 0.0000024). Yet, whenconsidering CEO experience, and hence all relevantindependent variables except country-level vari-ables (model 4), it turns out that there is in fact asignificant positive relationship between the num-ber of customers and the amount of FDI a sub-sidiary receives (p\ 10-8), as in the ‘‘true’’ model.At the same time, it appears from model 4 that thecoefficient of subsidiary age was biased upwards inmodels 2 and 3, and that CEO experience is anothersignificant predictor of FDI received by a subsidiary(p\ 10-15). A researcher concerned with multi-collinearity might well be troubled by increasingcollinearity, as shown by the increase in VIFs frommodel 1 to model 4 (Table 6). As mentioned above,proposed cutoffs for VIFs (from 20 to 3) do not help

in resolving the underlying econometric concerns.Yet, as we have argued and illustrated, collinearityitself is not an econometric problem in the sensethat it would violate assumptions necessary forregression models to work. Consequently, aresearcher would be well advised to include allrelevant variables irrespective of their collinearityor VIFs, so as to avoid misspecification and conse-quently bias (see the Table 3 discussion).The addition of a country intercept (c0;j) repre-

sents – at first sight – only a slight change to theeconometric specification. However, it ensures thenon-violation of the assumption of independentlydistributed error terms. Violation of this assump-tion is not a major problem if one is only interestedin firm-level variables, and if these are independentof each other. Yet, if the researcher adds a country-

Table 5 Regression results for the ‘‘messy’’ illustration

(1) (2) (3) (4) (5) (6) (7)

Intercept 56.790 50.774 47.765 8.022 36.832 16.232 - 0.619

(3.362) (2.720) (2.735) (5.010) (3.075) (5.137) (0.599)

Number of customers 3.552 - 1.172 0.522 4.489 3.359 3.186 3.176

(0.642) (0.587) (0.675) (0.759) (0.446) (0.076) (0.076)

Company age 10.108 8.491 3.433 3.920 3.778 3.787

(0.607) (0.684) (0.838) (0.491) (0.084) (0.083)

Slack - 2.990 - 5.522 - 5.118 - 5.216 - 5.211

(0.626) (0.641) (0.375) (0.064) (0.064)

CEO experience 5.881 6.014 6.134 6.129

(0.640) (0.374) (0.064) (0.064)

Intangible assets 7.701 6.448 6.363 6.549 6.801 7.070 7.078

(1.261) (1.014) (0.993) (0.919) (0.537) (0.092) (0.091)

‘‘Subsidiary registration number’’ - 0.080 - 0.002 - 0.001

(0.003) (0.001) (0.001)

Political stability 800.414

(6.221)

Country fixed effect NO NO NO NO NO YES YES

Observations 500 500 500 500 500 500 500

Log likelihood - 2045 - 1934 - 1923 - 1884 - 1615 - 765 - 744

Regression result for stepwise modeling of a ‘‘messy’’ case. DV is FDI inflows. Models 1 through 5 are OLS, models 6 and 7 are fixed effect hierarchicalmodels. The underlying simulated relationship is presented above. The registration number is irrelevant in the true model. Standard errors are inparentheses.

Table 6 VIFs for the ‘‘messy’’ illustration

(1) (2) (3) (4) (5)

Customers 1.005 1.311 1.809 2.673 2.691

Age 1.005 1.306 1.729 3.038 3.041

Slack 1.011 1.484 1.820 1.823

CEO experience 1.011 1.797 1.797

Intangible assets 1.011 1.012

Subsidiary registration number 1.013

Misconceptions about multicollinearity Thomas Lindner et al.

294

Journal of International Business Studies

Page 13: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

specific variable (such as national culture or avariable capturing national institutions), as is oftenthe case in IB research, this misspecification maybecome substantial and may lead to spurioussupport for hypotheses related to these variables.Models 5–7 in Table 5 show how this may comeabout.

In model 5, the researcher investigates whetherthere might be some country-level determinant ofFDI inflows into subsidiaries. By adding the variable‘‘subsidiary registration number’’ (which theresearcher might somehow believe is relevant toFDI inflows), the OLS regression shown in model 5of Table 5 suggests that the higher the subsidiary’sregistration number, the less FDI it receives(p\ 10-15). This effect seems to be statisticallyhighly significant. However, model 6, which is ahierarchical model that accounts for country clus-ters in the error terms, shows that this result isspurious, and that it disappears if one uses country-level fixed effects (p\ 0.06). If one accounts for thecountry-level variable ‘‘political stability’’ (model7), the spurious effect of ‘‘subsidiary registrationnumber’’ is even less significant (p\0.11).

Some researchers argue that introducing fixedeffects in a regression will eliminate the influenceof all variables that work at that level. For example,Delmar and Shane (2003, p. 1171) argue in therealm of ventures:

[…] fixed effects regression partials out the effect of venture-

level factors, such as the quality of the venture opportunity,

and allows for an unbiased estimate of the relationship

between business planning and the outcome under

investigation.

In their case, the cluster-level variable is ventures.While it is true that the estimate obtained usingcountry-fixed effects is unbiased by country-levelclusters (Griliches, 1986), this does not mean thatfixed effects eliminate all influence of variables atthat level. In model 7 in Table 5, we see thatpolitical stability does show up as highly significant(p\ 10-16), and with a coefficient that is onlyinsignificantly different from the ‘‘true’’ coefficientof 800, which represents the relationship betweenpolitical stability and FDI inflows presented, wherewe define the ‘‘true’’ relationship between FDIinflows and its above determinants. While coun-try-level fixed effects ensure that firm-level vari-ables are unbiased by the omission of country-levelvariables and eliminate spurious results of country-level variables, they do not eliminate the possibility

of finding statistical support for effects driven bycountry-level independent variables.In conclusion, this ‘‘messy’’ example shows two

things. First, omitting highly collinear variablesintroduces bias into regression result. Because thisbias may lead to spurious findings or spurious non-findings (as illustrated in greater detail in theprevious section), researchers that omit collinearvariables risk that all findings from a regressionanalysis lose their validity. In addition, one mustbear in mind that clusters in the data may also leadto spuriously significant results if clustering is nottreated correctly in econometric specifications.

FINDINGS AND RECOMMENDATIONSWe have illustrated how regression results changewhen we consider correlated variables in a regres-sion. More specifically, we have investigated howcorrelation affects error inflation; what omittingcorrelated variables does to regression results; andhow these results are different in scenarios wherethe omitted variable is relevant for explainingvariation in the dependent variable, as well as inscenarios where it is not. We have investigated boththe effects of omissions on standard errors inregressions, and on the point estimates (thecoefficients).In the ‘‘messy’’ illustration, we extended the

isolated analysis on multicollinearity to correspondto a more realistic scenario that an IB researchermay be confronted with: We used several variables,which are to different degrees correlated, and weintroduced clustering in the data structure. In sucha case, several econometric problems co-exist, andthe applied researcher must take care of severalpotential biases at the same time. In order tofacilitate the correct treatment of the differenteffects, the following insights need to be taken onboard:

1. Multicollinearity does not introduce bias. Collinear-ity itself is not an econometric problem in thesense that it would violate the assumptionsnecessary for regression models to work. If onevariable that is highly collinear with anotherone, or with several other ones, is added to aregression model, its collinearity does not affectregression coefficients. In all but extreme cases ofcollinearity (e.g., 0.8 and above, see Allison,1999, p. 64) researchers are advised to add, ratherthan omit, potentially relevant collinearvariables.

Misconceptions about multicollinearity Thomas Lindner et al.

295

Journal of International Business Studies

Page 14: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

2. Significant results under high variance inflationunderestimate, rather than overestimate, statisticalsignificance. Variance inflation factors are indica-tors of standard errors that are too large, not toosmall. In a regression model, VIFs represent thedegree to which regression coefficient variance istoo large, i.e., larger than in a model with onlyindependent variables with zero partial correla-tion. Consequently, significant results underhigh variance inflation may be taken to beconservative.

3. Coefficient instability is not a consequence of mul-ticollinearity. If coefficients change when a poten-tially collinear variable is added to a regressionmodel, it indicates that the model was mis-specified prior to the collinear variable beingadded. As over-specification only inflates errorterms while on the other hand under-specifica-tion introduces bias to coefficients and errorterms, we strongly advise researchers who areuncertain about a variable to add it rather thanto drop it.

4. The higher the partial correlation between the vari-ables, the more problematic it is to omit one. If twovariables are perfectly independent, omittingone will not affect the results of a regressionmodel. As soon as there is some correlationbetween an omitted and an included variable,however, coefficients will be biased and errorterms may be too low, potentially leading tobiased and spuriously significant results.

5. Ignoring clusters in data may lead to spurious results.While hierarchical models definitely have con-siderable potential, not accounting for clustersin the data may lead researchers to conclude thatrelationships between variables exist when inreality they do not. Whether one is interested incross-level relationships or not, it is crucial toaccount for clusters in the data.

6. Accounting for country clusters does not pick up allcountry-level variation. If a researcher uses hierar-chical modeling, country fixed effects, or ran-dom intercept models, the effects of country-level independent variables on e.g., firm-leveldependent variables can still be detected. Whileusing different intercepts across countries avoidsfinding spurious results from country-level vari-ables, it does not eliminate the potential influ-ence of all country-level explanatory variables.This is the case especially for time-variant coun-try-level variables.

CONCLUSIONInternational Business scholars engaged in empir-ical research usually cannot conduct controlledexperiments. Consequently, they have difficultyobtaining ex ante independent observations, andhence face the challenge of correlated variables.While multicollinearity is well analyzed in theeconometrics literature, illustrations are lacking ofhow it might affect research results in the kind of‘‘messy’’ empirical contexts with which IB research-ers must often contend. We do provide suchillustrations. As the underlying econometric theoryis readily available from econometrics texts, we donot repeat the derivations here. However, thetreatment of collinearity has continued to be asensitive issue in empirical IB research. In a largenumber of papers published in JIBS, authors explic-itly compute variance inflation factors (VIFs), anduse these as a means of choosing which variables toinclude in regression models. In at least 10% ofthese papers, the authors explicitly state thatvariables were dropped from models as a conse-quence of high VIFs.Our aim was to illustrate how multicollinearity

affects error terms in regression models so that IBresearchers can effectively specify empirical modelsand interpret regression results computed forcollinear variables. Although we recognize thathigh multicollinearity makes it difficult to interprettwo coefficients that were specified as being inde-pendent, we suggest that it is possible to do so byengaging in cautious interpretation. Thus, highcollinearity, either expressed by pairwise correla-tion or high VIFs, does not justify omitting relevantvariables from a regression, and we show that theconsequences of doing this can be biased coeffi-cients and spurious results.Our paper contributes to highlighting miscon-

ceptions about collinearity that cause problems insome IB research. We have shown that collinearitydoes not introduce bias and does not cause insta-bility of results. The example we gave of a ‘‘messy’’case illustrates both the kind of results researcherscan expect to see, if empirical models are under-specified, and how these results are likely to changeunder different specifications. The simulationapproach we adopted may help researchers connectfindings from econometric theory with their ownquantitative research. Our approach does havesome limitations. First, we did not thoroughlyaddress other specifications and related

Misconceptions about multicollinearity Thomas Lindner et al.

296

Journal of International Business Studies

Page 15: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

econometric problems – hierarchical data structuresfor instance. Second, we did not deal with theproblem of highly impactful outliers in modelssuffering from high multicollinearity, beyond sug-gesting the obvious solution of collecting more

data or eliminating these outliers in the first place.Third, we did not extend the discussion beyondregression analysis. All three issues deserve furthertreatment.

REFERENCESAlcacer, J., Chung, W., Hawk, A., & Pacheco-de-Almeida, G.2018. Applying random coefficient models to strategyresearch: Identifying and exploring firm heterogeneouseffects. Strategy Science, 3(3): 533–553.

Allison, P. D. 1999. Multiple regression: A primer. Newbury Park:Pine Forge Press.

Asmussen, C. G., Pedersen, T., & Dhanaraj, C. 2009. Host-country environment and subsidiary competence: Extendingthe diamond network model. Journal of International BusinessStudies, 40(1): 42–57.

Cohen, P., West, S. G., & Aiken, L. S. 2014. Applied multipleregression/correlation analysis for the behavioral sciences. Lon-don: Psychology Press.

Cuervo-Cazurra, A., Andersson, U., Brannen, M. Y., Nielsen, B.B., & Reuber, A. R. 2016. From the editors: Can I trust yourfindings? Ruling out alternative explanations in internationalbusiness research. Berlin: Springer.

Delmar, F., & Shane, S. 2003. Does business planning facilitatethe development of new ventures? Strategic ManagementJournal, 24(12): 1165–1185.

Driffield, N., & Munday, M. 2000. Industrial performance,agglomeration, and foreign manufacturing investment in theUK. Journal of International Business Studies, 31(1): 21–37.

Enright, M. J. 2009. The location of activities of manufacturingmultinationals in the Asia-Pacific. Journal of InternationalBusiness Studies, 40(5): 818–839.

Goldberger, A. S. 1991. A course in econometrics. Cambridge:Harvard University Press.

Greene, W. H. 2003. Econometric analysis. New Delhi: PearsonEducation India.

Griffiths, W. E., Judge, G. G., Hill, R. C., Lutkepohl, H., & Lee, T.-C. 1985. The theory and practice of econometrics. New York:Wiley.

Griliches, Z. 1986. Economic data issues. In Z. Griliches & M.D. Intriligator (Eds), Handbook of econometrics, Vol. 3:1465–1514. Amsterdam: North-Holland.

Habib, M., & Zurawicki, L. 2002. Corruption and foreign directinvestment. Journal of International Business Studies, 33(2):291–307.

Hox, J. J. 1995. Applied multilevel analysis. Amsterdam: TT-publikaties.

Kalnins, A. 2018. Multicollinearity: How common factors causetype 1 errors in multivariate regression. Strategic ManagementJournal, 39(8): 2362–2385.

Meyer, K. E., & Sinani, E. 2009. When and where does foreigndirect investment generate positive spillovers? A meta-analy-sis. Journal of International Business Studies, 40(7): 1075–1094.

Muethel, M., & Bond, M. H. 2013. National context andindividual employees’ trust of the out-group: The role ofsocietal trust. Journal of International Business Studies, 44(4):312–333.

Nielsen, B. B., & Raswant, A. 2018. The selection, use, andreporting of control variables in international businessresearch: A review and recommendations. Journal of WorldBusiness, 53: 958–968.

Read, D., & Read, N. L. 2004. Time discounting over thelifespan. Organizational Behavior and Human Decision Pro-cesses, 94(1): 22–32.

Reuber, A. R., & Fischer, E. 1997. The influence of themanagement team’s international experience on the interna-tionalization behaviors of SMEs. Journal of International Busi-ness Studies, 28(4): 807–825.

Rogerson, P. 2001. Statistical methods for geography. ThousandOaks: Sage.

Rugman, A. M., & Verbeke, A. 2001. Subsidiary-specific advan-tages in multinational enterprises. Strategic ManagementJournal, 22(3): 237–250.

Steenbergen, M. R. 2012. Hierarchical linear models for electoralresearch: A worked example in Stata. ELECDEM.www.elecdem.eu/media/universityofexeter/elecdem/pdfs/istanbulwkspjan2012/Hierarchical_Linear_Models_for_Electoral_Research_A_worked_example_in_Stata.pdf. Accessed February15, 2014.

Verbeke, A., Amin Zargarzadeh, M., & Osiyevskyy, O. 2014.Internalization theory, entrepreneurship and international newventures. Multinational Business Review, 22(3): 246–269.

Verbeke, A., & Yuan, W. 2013. The drivers of multinationalenterprise subsidiary entrepreneurship in China: A newresource-based view perspective. Journal of ManagementStudies, 50(2): 236–258.

Wang, C., Hong, J., Kafouros, M., & Wright, M. 2012. Exploringthe role of government involvement in outward FDI fromemerging economies. Journal of International Business Studies,43(7): 655–676.

Wooldridge, J. M. 2014. Introduction to econometrics: Europe,Middle East and Africa Edition. Boston: Cengage Learning.

Zhao, M., Park, S. H., & Zhou, N. 2014. MNC strategy andsocial adaptation in emerging markets. Journal of InternationalBusiness Studies, 45(7): 842–861.

ABOUT THE AUTHORSThomas Lindner is an Assistant Professor at theInstitute for International Business at WU Vienna.His research covers the influence of the externalenvironment on firm internationalization, inter-national corporate finance, and cross-national dis-tance. Thomas’ research predominantly usesquantitative methodologies.

Jonas Puck is Full Professor of International Busi-ness at WU Vienna University of Economics andBusiness. His current research interests lie in theoverlaps of global strategy, finance, and politicalscience. Jonas serves on the Editorial Boards of theJournal of International Business Studies, Long RangePlanning, European Management Journal, Journal ofWorld Business, and the Multinational BusinessReview. His research has been published in journals

Misconceptions about multicollinearity Thomas Lindner et al.

297

Journal of International Business Studies

Page 16: Misconceptions about multicollinearity in …...EDITORIAL Misconceptions about multicollinearity in international business research: Identification, consequences, and remedies Thomas

such as the Journal of International Business Studies,Journal of Management Studies, Global Strategy Jour-nal, and Journal of World Business, among others .

Dr. Alain Verbeke is a Professor of InternationalBusiness Strategy and holds the McCaig ResearchChair in Management at the Haskayne School ofBusiness, University of Calgary. He was elected asthe Inaugural Alan M. Rugman Memorial Fellow atthe Henley Business School, University of Reading,and is Editor-in-Chief of the Journal of Interna-tional Business Studies. He is also an Adjunct Pro-fessor at the Solvay Business School, VrijeUniversiteit Brussel. He has published several books

and numerous refereed articles, including manypieces in leading scholarly journals such as JIBS,JMS and SMJ. His research agenda consists ofrethinking and augmenting the core paradigms instrategic management and international business,especially internalization theory, which is a jointtransaction cost economics and resource-basedview of the firm, focused on the governance of newresource combinations. He has particular expertisein managing headquarters – subsidiary relation-ships and broader governance challenges in largemultinational enterprises.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutionalaffiliations.

Misconceptions about multicollinearity Thomas Lindner et al.

298

Journal of International Business Studies