estimating interaction effects using multiple regression herman aguinis, ph.d. mehalchin term...

Estimating Interaction Effects Using Multiple Regression

Herman Aguinis, Ph.D.

Mehalchin Term Professor of Management

The Business School

University of Colorado at Denver

www.cudenver.edu/~haguinis

Overview• What is an Interaction Effect?• The “So What” Question: Importance of

Interaction Effects for Theory and Practice• Estimating Interaction Effects Using Moderated

Multiple Regression (MMR)• Problems with MMR• Aguinis, Beaty, Boik, & Pierce (2005, J. of

Applied Psychology)• The “Now What” Question: Addressing problems

with MMR• Some Conclusions

What is an Interaction Effect?• The relationship between X and Y depends on Z

(i.e., a moderator)

X Y X Y Z Z• Other terms used:

– Population control variable (Gaylord & Carroll, 1948); Subgrouping variable (Frederiksen & Melville, 1954); Predictability variable (Ghiselli, 1956); Referent variable (Toops, 1959); Modifier variable (Grooms & Endler, 1960);Homologizer variable (Johnson, 1966)

Importance of Interaction Effects: Theory

• Going beyond main effects• We typically say “it depends”• More complex models• “If we want to know how well we are doing in the

biological, psychological, and social sciences, an index that will serve us well is how far we have advanced in our understanding of the moderator variables of our field” (Hall & Rosenthal, 1991, p. 447)

Importance of Interaction Effects: Practice

For example, personnel selection:• Test bias: The relationship between a test and a

criterion depends on gender or ethnicity• “No bias exists if the regression equations relating

the test and the criterion are indistinguishable for the groups in question” (Standards, 1999, p. 79)

• In other words, the X-Y relationship differs depending on the value of Z (e.g., 1 = Female, 0 = Male)

Illustration of Gender as a Moderator in Personnel Selection

Men

Common line

Women

X

Job

Per

form

ance

Test Scores

Ŷwomen

Ŷcommon

Ŷmen

Importance of Interaction Effects: Practice

• Management in General– Does an intervention work similarly well for,

for example, Cantonese and American employees working in Hong Kong? (categorical moderator)

• Example: Performance management system regarding teaching at university in Hong Kong. Would the same evaluation methods lead to employee (i.e., faculty) satisfaction depending on the national origin of faculty members?

Estimating Interaction Effects

• Moderated Multiple Regression (MMR)

• Ŷ = a + b1 X + b2 Z + b3 X·Z,

where Y = criterion (continuous variable)

X = predictor (typically continuous)

Z = moderator (continuous or categorical)

X·Z = product term carrying information about the moderating effect (i.e., interaction between X and Z)

Statistical Significance Test

• Ŷ = a + b1 X + b2 Z ;

• Ŷ = a + b1 X + b2 Z + b3 X·Z;

; Ho : ψ1 = ψ2

• Ho: β3 = 0 (using a t-statistic)

21R

22R

FR R k k

R N k

( ) / ( )

( ) / ( )22

12

2 1

22

21 1

Estimating Interaction Effects Using Moderated Multiple

Regression (MMR)

• For example:– Personnel selection: Y = measure of performance, X

= test score, Z = gender – Additional research areas: training, turnover,

performance appraisal, return on investment, mentoring, self-efficacy, job satisfaction, organizational commitment, and career development, among others

ZXbZbXbaY 321

^

Interpreting Interactions(Z is continuous)

• Ŷ = a + b1 X + b2 Z + b3 X·Z,

• b3 = 2 means that a one-unit change in X (Z) increases the slope of Y on Z (Y on X) by 2 points

Interpreting Interactions(Z is binary, dummy coded)

• Ŷ = a + b1 X + b2 Z + b3 X·Z,

• b3 = estimated difference between the slope of Y on X between the group coded as 1 and the group coded as 0.

• b2 = estimated difference between X scores for a member in group coded as 1 and a member in group coded as 0 assuming the scores on Y are 0.

• b1 = estimated X score for members of the group coded as 1 assuming the scores on Y are 0.

• a = mean score on X for members of group coded as 0.

Pervasive Use of MMR in the Organizational Sciences

• Recent review: MMR was used in over 600 attempts to detect moderating effects of categorical variables in AMJ, JAP, and PP between 1977-1998 (Aguinis, Beaty, Boik, & Pierce, 2005, JAP)

Selected Research on MMR• Aguinis (2004, Regression Analysis for Categorical Moderators, Guilford

Press)• Aguinis, Beaty, Boik, and Pierce (2005, J. of Applied Psychology)• Aguinis, Boik, and Pierce (2001, Organizational Research Methods)• Aguinis, Petersen, and Pierce (1999, Organizational Research Methods)• Aguinis and Pierce (1998, Organizational Research Methods)• Aguinis and Pierce (1998, Ed. & Psychological Measurement)• Aguinis and Stone-Romero (1997, J. of Applied Psychology)• Aguinis, Bommer, and Pierce (1996, Ed. & Psychological Measurement)• Aguinis (1995, J. of Management)

Methodology: Monte Carlo Simulations

• Research question: Does MMR do a good job at estimating moderating effects?

• Difficulty: We don’t know the population• Solution: Monte Carlo methodology

– Create a population– Generate random samples– Perform MMR analyses on samples– Compare population versus samples– Assess % of hits and misses

Problems with MMR

1. We don’t find moderators

2. If we find them, they are small

Why should we care? Theory: Failure to find support for correct

hypotheses (derailment of theory advancement process; model misspecification)

Practice: Erroneous decision making (e.g., over and under prediction of performance, implementation of ineffective interventions)

– Ethical implications– Legal implications

Some Culprits for Erroneous Estimation of Moderating Effects

• Small total sample size• Unequal sample size across moderator-based groups • Range restriction (i.e., truncation) in predictor variable

X• Scale coarseness• Violation of homogeneity of error variance assumption• Unreliability of measurement• Artificial dichotomization/polichotomization of

continuous variables• Interactive effects

Unequal Sample Size Across Moderator-based Subgroups

• Applies to categorical moderators (e.g., gender, national origin)

• In many research situations, n1 n2

• Two studies examined this issue (Aguinis & Stone-Romero, 1997; Stone, Alliger, and Aguinis, 1994) (see also Aguinis, 1995)

•

• Conclusion: n1 needs to be (.3 n2) or larger to detect medium moderating effects

Nn n

n n'

2 1 2

1 2

Truncation in Predictor X• Non-random sampling • Pervasive in field settings (systematic in personnel

selection/test validation research, [X,Y] | X > x)• Aguinis and Stone-Romero (1997) (categorical

moderator) McClelland and Judd, 1993 (continuous moderator)

• Truncation has a dramatic impact on power– N = 300, medium moderating effect, power = .81– Same conditions, truncation = .80, power = .51

• Conclusion: Even mild levels of truncation can have a substantial detrimental effect on power

Violation of Homogeneity of Error Variance Assumption

• Applies to categorical moderators• Error variance: Variance in Y that remains

after predicting Y from X is equal across subgroups (e.g., women, men)

• • Distinct from homoscedasticity assumption

e Y XYi i i( ) ( ) ( )( )2 2 21

Regression of Homoscedastic Data

Predictor (X)

181614121086420

Crit

erio

n (Y

)

10

8

6

4

2

0

Total Sample: Women & Men

Regression for Subgroups

Predictor (X)

181614121086420

Crit

erio

n (Y

)

10

8

6

4

2

0

Predictor (X)

181614121086420C

riter

ion

(Y)

10

8

6

4

2

0

Women Men

Artificial polichotomization of continuous variables

• Median split and other common methods for “simplifying the data” before conducting ANOVAs

• Cohen (1983) showed this practice is inappropriate• In the context of MMR, some have used a median split

procedure on continuous predictor Z and compared correlations across groups

• MMR always performs better than comparing artificially-created subgroups (Stone-Romero & Anderson, 1994)

• Conclusion: Do not polichotomize truly continuous predictors

Interactions Among Artifacts• Concurrent manipulation of truncation, N, n1 and n2, and

moderating effect magnitude (Aguinis & Stone-Romero, JAP, 1997) .

• Results: Methodological artifacts have interactive effects on power.

• Even if conditions conducive to high power are favorable regarding one factor (e.g., N), conditions unfavorable regarding other factors (e.g., truncation) will lead to low power.

• Conclusion: Relying on a single strategy (e.g., increase N) to improve power will not be successful if other methodological and statistical artifacts

Aguinis, Beaty, Boik, & Pierce (2005, JAP)

• Q1: What is the size of observed moderating effects of categorical variables in published research?

• Q2: What would the size of moderating effects of categorical variables be in published research under conditions of perfect reliability?

• Q3: What is the a priori power of MMR to detect moderating effects of categorical variables in published research?

• Q4: Do MMR tests reported in published research have sufficient statistical power to detect moderating effects conventionally defined as small, medium, and large?

Method

• Review of all articles published from 1969 to 1998 in Academy of Management Journal (AMJ), Journal of Applied Psychology (JAP), and Personnel Psychology (PP)

• Criteria for study inclusion:– At least one MMR analysis– The MMR analysis included a continuous criterion

Y, a continuous predictor X, and a categorical moderator Z

Effect Size and Power Computation

• Total of 636 MMR analyses• Moderator sample sizes for 507 (79.72%) • Moderator group sample sizes and predictor-

criterion rs for 261 (41.04%)• Effect sizes and power computation based on

261 MMR analyses for which ns and rs were available. We used SD information when available, and assumed homogeneity or error variance when this information was not available

Results (I)Frequency of MMR Use over Time:

1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997

Publication Year

0

20

40

60

80

100

120

Q1: Size of Observed Effects (I)

• Effect size metric:

• Median f 2 = .002,

• Mean (SD) = .009 (.025)

– 95% CI = .0089 to .0091

– 25th percentile = .0004

– 75th percentile = .0053

• Effect size values over time: r(261) = .15, p < .05

22

21

222

1 R

RRf

Q1: Size of Observed Effects (II)

• F(2, 258) = 4.97, p = .008, η2 = .04

• Tukey HSD tests: AMJ > JAP and PP > JAP

AMJ (k = 6)

JAP (k = 236)

PP (k = 19)

Mean (SD) Median

.040 (.047)

.025

.007 (.024)

.002

.017 (.025)

.006

Q1: Size of Observed Effects (III)

• F(2, 258) = 8.71, p < .001, η2 = .06

• Tukey HSD tests: Other > Ethnicity

Gender

(k = 63)

Ethnicity

(k = 45)

Other

(k = 153)

Mean (SD) Median

.005 (.011)

.002

.002 (.002)

.001

.013 (.031)

.002

Q1: Size of Observed Effects (IV)

• t(259) = -.226, p = ns

• t(259) = -0.95, p = ns

Personnel Selection (k = 20)

Other (k = 241)

Mean (SD); Median

.010 (.023); .001 .009 (.025) .002

Work Attitudes (k = 96) Other (k = 165)

Mean (SD); Median

.005 (.015); .002 .011 (.029) .002

Q2: Construct-level Effects (I)

• Median f 2 = .003– Increase of .001 over median observed effect size

• Mean (SD) = .017 – Increase of .008 over mean observed effect size

Q3: Statistical Power (I)

0.000.100.200.300.400.500.600.700.800.901.00

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11

Effect Size

Sta

tis

tic

al

Po

we

r

Q3: Statistical Power (II)

0.00

0.20

0.40

0.60

0.80

1.00

0.000 0.003 0.005 0.008 0.010 0.013 0.015 0.018 0.020 0.023

Effect Size

Sta

tis

tic

al

Po

we

r

Q4: Power to Detect Small, Medium, and Large Effects

• Small f 2 (.02); mean power = .84; 72% of tests would have a power of .80 or higher

• Medium f 2 (.15); mean power = .98• Large f 2 (.35); mean power = 1.0

Some Conclusions• We expected effect size to be small, but not so

small (i.e., median of .002)• Computation of construct-level effect sizes did not

improve things by much (i.e., median of .003)• More encouraging results:

– None of the 95% CIs around the mean effect size for the various comparisons included zero

– Effect sizes have increased over time– Given the observed sample sizes, mean power is

sufficient to detect effects ≥ .02– 72% of studies had sufficient power to detect an effect

≥ .02

Some Implications• Are theories in dozens of research domains incorrect in

hypothesizing moderators?• Are hundreds of researchers in dozens of disparate

domains wrong and population moderating effects so small?

• Could be, but….. more likely, methodological artifacts decrease the observed effect sizes substantially vis-à-vis their population counterparts

• More attention needs to be paid to design and analysis issues that decrease observed effect sizes

• Conventional definitions of effect size (f 2) for moderators should probably be revised

The “Now What” Question

• Before data are collected – Larger sample size *– More reliable measures *– Avoid truncated samples *– Use non-coarse scales (e.g., program by Aguinis,

Bommer, & Pierce, 1996, Ed. & Psych. Measurement)– Equalize sample size across moderator-based subgroups– Use computer programs in the public domain to estimate

sample size needed for desired power level– Gather information on research design trade-offs

* Easier said that done!

Tools to Improve Moderating Effect Estimation (Aguinis, 2004)

• Scale coarseness– Aguinis, Bommer, and Pierce (1996, Educational &

Psychological Measurement)

• Homogeneity of error variance– Aguinis, Petersen, and Pierce (1999, Organizational Research

Methods)

• Power estimation and research design trade-offs– Aguinis, Pierce, and Stone-Romero (1994, Educational &

Psychological Measurement)– Aguinis and Pierce (1998, Educational & Psychological

Measurement)– Aguinis, Boik, and Pierce (2001, Organizational Research

Methods)

Assessment of Assumption Compliance

• DeShon and Alexander’s (1996) 1.5 rule of thumb

• Bartlett’s homogeneity test:

M =

• k = number of sub-groups • nk = number of observations in each sub-group• s2 = sub-group variance on the criterion • v = degrees of freedom from which s2 is based

( ) log ( / ) log

( )( / / )

v v s v v s

kv v

ii e i i iii ii e i

i iii

2 2

11

3 11 1

Homogeneity is not Met... Now What?

• Use alternatives to MMR– Alexander and colleagues' normalized-t

approximation:

– OR James's second-order approximation:

z cc c

b

c c c c

b bc bi

( ) ( )

(

3 7 5 3

2 4

3 4 33 240 855

10 8 1000

a v b a c a t vi i i . ; ; ln( / )5 48 12 2

; where; where

2and; ki nv

2468111223

246821122

2468

21010111012202122

2468

20212223212

24620212223

24

2101011

2111012111212

22111112

212212223

2111112

212212223

2422

24

24

37945RRR41

327RR41

539

RRR4RR2RR4R2161

591535

RR4R6R4R163

25RR3R3R

123

RRR4R4RR2RR4R41

)1()R2RR4R2R2R4R2(

R4RR8R6R4R10R8

321T

c

)3k(1316

1

T)3(21c)(h

Program ALTMMR

• Calculates– Error variance ratio (highest if more than 2

subgroups)– Bartlett’s M– James’s J– Alexander’s A

• Uses sample descriptive data– nk , sx , sy , rxy

– User sets p = .05 or .01 (for all but James’s statistic)

Program ALTMMR

Described in detail in Aguinis (2004) Available at www.cudenver.edu/~haguinis/

(click on MMR icon on left side of page) Executable on-line or locally

http://www.cudenver.edu/~haguinis/

Power Estimation

• Program POWER– Aguinis, Pierce, and Stone-Romero (1994, Ed.

& Psych. Measurement)

• Program MMRPWR– Aguinis and Pierce (1998, Ed. & Psych.

Measurement)

• Program MMRPOWER– Aguinis, Boik, and Pierce (2001, Organizational

Research Methods)

Program MMRPOWER

• Problems/Challenges regarding POWER and MMRPWR programs:– Based on extrapolation from simulations:

Range of values is limited– Absence of factors known to affect power

of MMR (e.g., unreliability)

• Theoretical approximation to power:

P o w er P r

k

N kF H Gk N k j x y j j j

j

k

j

k1

21 01 2

1 2

1

1

1, ,

Program MMRPOWER

Described in detail in Aguinis (2004) Available at www.cudenver.edu/~haguinis/

(click on MMR icon on left side of page) Executable on-line or locally

http://www.cudenver.edu/~haguinis/

Some Conclusions• Observed moderating effects are very small• MMR is a low power test for detecting effect sizes

as typically observed• Researchers are not aware of problems with MMR• Implications for theory and practice• User-friendly programs are available and allow

researchers to improve moderating effect estimation

• Using these tools will allow researchers to make more informed decisions regarding the operation of moderating effects

estimating interaction effects using multiple regression herman aguinis, ph.d. mehalchin term...

Documents