estimating interaction effects using multiple regression herman aguinis, ph.d. mehalchin term...
TRANSCRIPT
Estimating Interaction Effects Using Multiple Regression
Herman Aguinis, Ph.D.
Mehalchin Term Professor of Management
The Business School
University of Colorado at Denver
www.cudenver.edu/~haguinis
Overview• What is an Interaction Effect?• The “So What” Question: Importance of
Interaction Effects for Theory and Practice• Estimating Interaction Effects Using Moderated
Multiple Regression (MMR)• Problems with MMR• Aguinis, Beaty, Boik, & Pierce (2005, J. of
Applied Psychology)• The “Now What” Question: Addressing problems
with MMR• Some Conclusions
What is an Interaction Effect?• The relationship between X and Y depends on Z
(i.e., a moderator)
X Y X Y Z Z• Other terms used:
– Population control variable (Gaylord & Carroll, 1948); Subgrouping variable (Frederiksen & Melville, 1954); Predictability variable (Ghiselli, 1956); Referent variable (Toops, 1959); Modifier variable (Grooms & Endler, 1960);Homologizer variable (Johnson, 1966)
Importance of Interaction Effects: Theory
• Going beyond main effects• We typically say “it depends”• More complex models• “If we want to know how well we are doing in the
biological, psychological, and social sciences, an index that will serve us well is how far we have advanced in our understanding of the moderator variables of our field” (Hall & Rosenthal, 1991, p. 447)
Importance of Interaction Effects: Practice
For example, personnel selection:• Test bias: The relationship between a test and a
criterion depends on gender or ethnicity• “No bias exists if the regression equations relating
the test and the criterion are indistinguishable for the groups in question” (Standards, 1999, p. 79)
• In other words, the X-Y relationship differs depending on the value of Z (e.g., 1 = Female, 0 = Male)
Illustration of Gender as a Moderator in Personnel Selection
Men
Common line
Women
X
Job
Per
form
ance
Test Scores
Ŷwomen
Ŷcommon
Ŷmen
Importance of Interaction Effects: Practice
• Management in General– Does an intervention work similarly well for,
for example, Cantonese and American employees working in Hong Kong? (categorical moderator)
• Example: Performance management system regarding teaching at university in Hong Kong. Would the same evaluation methods lead to employee (i.e., faculty) satisfaction depending on the national origin of faculty members?
Estimating Interaction Effects
• Moderated Multiple Regression (MMR)
• Ŷ = a + b1 X + b2 Z + b3 X·Z,
where Y = criterion (continuous variable)
X = predictor (typically continuous)
Z = moderator (continuous or categorical)
X·Z = product term carrying information about the moderating effect (i.e., interaction between X and Z)
Statistical Significance Test
• Ŷ = a + b1 X + b2 Z ;
• Ŷ = a + b1 X + b2 Z + b3 X·Z;
; Ho : ψ1 = ψ2
• Ho: β3 = 0 (using a t-statistic)
21R
22R
FR R k k
R N k
( ) / ( )
( ) / ( )22
12
2 1
22
21 1
Estimating Interaction Effects Using Moderated Multiple
Regression (MMR)
• For example:– Personnel selection: Y = measure of performance, X
= test score, Z = gender – Additional research areas: training, turnover,
performance appraisal, return on investment, mentoring, self-efficacy, job satisfaction, organizational commitment, and career development, among others
ZXbZbXbaY 321
^
Interpreting Interactions(Z is continuous)
• Ŷ = a + b1 X + b2 Z + b3 X·Z,
• b3 = 2 means that a one-unit change in X (Z) increases the slope of Y on Z (Y on X) by 2 points
Interpreting Interactions(Z is binary, dummy coded)
• Ŷ = a + b1 X + b2 Z + b3 X·Z,
• b3 = estimated difference between the slope of Y on X between the group coded as 1 and the group coded as 0.
• b2 = estimated difference between X scores for a member in group coded as 1 and a member in group coded as 0 assuming the scores on Y are 0.
• b1 = estimated X score for members of the group coded as 1 assuming the scores on Y are 0.
• a = mean score on X for members of group coded as 0.
Pervasive Use of MMR in the Organizational Sciences
• Recent review: MMR was used in over 600 attempts to detect moderating effects of categorical variables in AMJ, JAP, and PP between 1977-1998 (Aguinis, Beaty, Boik, & Pierce, 2005, JAP)
Selected Research on MMR• Aguinis (2004, Regression Analysis for Categorical Moderators, Guilford
Press)• Aguinis, Beaty, Boik, and Pierce (2005, J. of Applied Psychology)• Aguinis, Boik, and Pierce (2001, Organizational Research Methods)• Aguinis, Petersen, and Pierce (1999, Organizational Research Methods)• Aguinis and Pierce (1998, Organizational Research Methods)• Aguinis and Pierce (1998, Ed. & Psychological Measurement)• Aguinis and Stone-Romero (1997, J. of Applied Psychology)• Aguinis, Bommer, and Pierce (1996, Ed. & Psychological Measurement)• Aguinis (1995, J. of Management)
Methodology: Monte Carlo Simulations
• Research question: Does MMR do a good job at estimating moderating effects?
• Difficulty: We don’t know the population• Solution: Monte Carlo methodology
– Create a population– Generate random samples– Perform MMR analyses on samples– Compare population versus samples– Assess % of hits and misses
Problems with MMR
1. We don’t find moderators
2. If we find them, they are small
Why should we care? Theory: Failure to find support for correct
hypotheses (derailment of theory advancement process; model misspecification)
Practice: Erroneous decision making (e.g., over and under prediction of performance, implementation of ineffective interventions)
– Ethical implications– Legal implications
Some Culprits for Erroneous Estimation of Moderating Effects
• Small total sample size• Unequal sample size across moderator-based groups • Range restriction (i.e., truncation) in predictor variable
X• Scale coarseness• Violation of homogeneity of error variance assumption• Unreliability of measurement• Artificial dichotomization/polichotomization of
continuous variables• Interactive effects
Unequal Sample Size Across Moderator-based Subgroups
• Applies to categorical moderators (e.g., gender, national origin)
• In many research situations, n1 n2
• Two studies examined this issue (Aguinis & Stone-Romero, 1997; Stone, Alliger, and Aguinis, 1994) (see also Aguinis, 1995)
•
• Conclusion: n1 needs to be (.3 n2) or larger to detect medium moderating effects
Nn n
n n'
2 1 2
1 2
Truncation in Predictor X• Non-random sampling • Pervasive in field settings (systematic in personnel
selection/test validation research, [X,Y] | X > x)• Aguinis and Stone-Romero (1997) (categorical
moderator) McClelland and Judd, 1993 (continuous moderator)
• Truncation has a dramatic impact on power– N = 300, medium moderating effect, power = .81– Same conditions, truncation = .80, power = .51
• Conclusion: Even mild levels of truncation can have a substantial detrimental effect on power
Violation of Homogeneity of Error Variance Assumption
• Applies to categorical moderators• Error variance: Variance in Y that remains
after predicting Y from X is equal across subgroups (e.g., women, men)
• • Distinct from homoscedasticity assumption
e Y XYi i i( ) ( ) ( )( )2 2 21
Regression of Homoscedastic Data
Predictor (X)
181614121086420
Crit
erio
n (Y
)
10
8
6
4
2
0
Total Sample: Women & Men
Regression for Subgroups
Predictor (X)
181614121086420
Crit
erio
n (Y
)
10
8
6
4
2
0
Predictor (X)
181614121086420C
riter
ion
(Y)
10
8
6
4
2
0
Women Men
Artificial polichotomization of continuous variables
• Median split and other common methods for “simplifying the data” before conducting ANOVAs
• Cohen (1983) showed this practice is inappropriate• In the context of MMR, some have used a median split
procedure on continuous predictor Z and compared correlations across groups
• MMR always performs better than comparing artificially-created subgroups (Stone-Romero & Anderson, 1994)
• Conclusion: Do not polichotomize truly continuous predictors
Interactions Among Artifacts• Concurrent manipulation of truncation, N, n1 and n2, and
moderating effect magnitude (Aguinis & Stone-Romero, JAP, 1997) .
• Results: Methodological artifacts have interactive effects on power.
• Even if conditions conducive to high power are favorable regarding one factor (e.g., N), conditions unfavorable regarding other factors (e.g., truncation) will lead to low power.
• Conclusion: Relying on a single strategy (e.g., increase N) to improve power will not be successful if other methodological and statistical artifacts
Aguinis, Beaty, Boik, & Pierce (2005, JAP)
• Q1: What is the size of observed moderating effects of categorical variables in published research?
• Q2: What would the size of moderating effects of categorical variables be in published research under conditions of perfect reliability?
• Q3: What is the a priori power of MMR to detect moderating effects of categorical variables in published research?
• Q4: Do MMR tests reported in published research have sufficient statistical power to detect moderating effects conventionally defined as small, medium, and large?
Method
• Review of all articles published from 1969 to 1998 in Academy of Management Journal (AMJ), Journal of Applied Psychology (JAP), and Personnel Psychology (PP)
• Criteria for study inclusion:– At least one MMR analysis– The MMR analysis included a continuous criterion
Y, a continuous predictor X, and a categorical moderator Z
Effect Size and Power Computation
• Total of 636 MMR analyses• Moderator sample sizes for 507 (79.72%) • Moderator group sample sizes and predictor-
criterion rs for 261 (41.04%)• Effect sizes and power computation based on
261 MMR analyses for which ns and rs were available. We used SD information when available, and assumed homogeneity or error variance when this information was not available
Results (I)Frequency of MMR Use over Time:
1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997
Publication Year
0
20
40
60
80
100
120
Q1: Size of Observed Effects (I)
• Effect size metric:
• Median f 2 = .002,
• Mean (SD) = .009 (.025)
– 95% CI = .0089 to .0091
– 25th percentile = .0004
– 75th percentile = .0053
• Effect size values over time: r(261) = .15, p < .05
22
21
222
1 R
RRf
Q1: Size of Observed Effects (II)
• F(2, 258) = 4.97, p = .008, η2 = .04
• Tukey HSD tests: AMJ > JAP and PP > JAP
AMJ (k = 6)
JAP (k = 236)
PP (k = 19)
Mean (SD) Median
.040 (.047)
.025
.007 (.024)
.002
.017 (.025)
.006
Q1: Size of Observed Effects (III)
• F(2, 258) = 8.71, p < .001, η2 = .06
• Tukey HSD tests: Other > Ethnicity
Gender
(k = 63)
Ethnicity
(k = 45)
Other
(k = 153)
Mean (SD) Median
.005 (.011)
.002
.002 (.002)
.001
.013 (.031)
.002
Q1: Size of Observed Effects (IV)
• t(259) = -.226, p = ns
• t(259) = -0.95, p = ns
Personnel Selection (k = 20)
Other (k = 241)
Mean (SD); Median
.010 (.023); .001 .009 (.025) .002
Work Attitudes (k = 96) Other (k = 165)
Mean (SD); Median
.005 (.015); .002 .011 (.029) .002
Q2: Construct-level Effects (I)
• Median f 2 = .003– Increase of .001 over median observed effect size
• Mean (SD) = .017 – Increase of .008 over mean observed effect size
Q3: Statistical Power (I)
0.000.100.200.300.400.500.600.700.800.901.00
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
Effect Size
Sta
tis
tic
al
Po
we
r
Q3: Statistical Power (II)
0.00
0.20
0.40
0.60
0.80
1.00
0.000 0.003 0.005 0.008 0.010 0.013 0.015 0.018 0.020 0.023
Effect Size
Sta
tis
tic
al
Po
we
r
Q4: Power to Detect Small, Medium, and Large Effects
• Small f 2 (.02); mean power = .84; 72% of tests would have a power of .80 or higher
• Medium f 2 (.15); mean power = .98• Large f 2 (.35); mean power = 1.0
Some Conclusions• We expected effect size to be small, but not so
small (i.e., median of .002)• Computation of construct-level effect sizes did not
improve things by much (i.e., median of .003)• More encouraging results:
– None of the 95% CIs around the mean effect size for the various comparisons included zero
– Effect sizes have increased over time– Given the observed sample sizes, mean power is
sufficient to detect effects ≥ .02– 72% of studies had sufficient power to detect an effect
≥ .02
Some Implications• Are theories in dozens of research domains incorrect in
hypothesizing moderators?• Are hundreds of researchers in dozens of disparate
domains wrong and population moderating effects so small?
• Could be, but….. more likely, methodological artifacts decrease the observed effect sizes substantially vis-à-vis their population counterparts
• More attention needs to be paid to design and analysis issues that decrease observed effect sizes
• Conventional definitions of effect size (f 2) for moderators should probably be revised
The “Now What” Question
• Before data are collected – Larger sample size *– More reliable measures *– Avoid truncated samples *– Use non-coarse scales (e.g., program by Aguinis,
Bommer, & Pierce, 1996, Ed. & Psych. Measurement)– Equalize sample size across moderator-based subgroups– Use computer programs in the public domain to estimate
sample size needed for desired power level– Gather information on research design trade-offs
* Easier said that done!
Tools to Improve Moderating Effect Estimation (Aguinis, 2004)
• Scale coarseness– Aguinis, Bommer, and Pierce (1996, Educational &
Psychological Measurement)
• Homogeneity of error variance– Aguinis, Petersen, and Pierce (1999, Organizational Research
Methods)
• Power estimation and research design trade-offs– Aguinis, Pierce, and Stone-Romero (1994, Educational &
Psychological Measurement)– Aguinis and Pierce (1998, Educational & Psychological
Measurement)– Aguinis, Boik, and Pierce (2001, Organizational Research
Methods)
Assessment of Assumption Compliance
• DeShon and Alexander’s (1996) 1.5 rule of thumb
• Bartlett’s homogeneity test:
M =
• k = number of sub-groups • nk = number of observations in each sub-group• s2 = sub-group variance on the criterion • v = degrees of freedom from which s2 is based
( ) log ( / ) log
( )( / / )
v v s v v s
kv v
ii e i i iii ii e i
i iii
2 2
11
3 11 1
Homogeneity is not Met... Now What?
• Use alternatives to MMR– Alexander and colleagues' normalized-t
approximation:
– OR James's second-order approximation:
z cc c
b
c c c c
b bc bi
( ) ( )
(
3 7 5 3
2 4
3 4 33 240 855
10 8 1000
a v b a c a t vi i i . ; ; ln( / )5 48 12 2
; where; where
2and; ki nv
2468111223
246821122
2468
21010111012202122
2468
20212223212
24620212223
24
2101011
2111012111212
22111112
212212223
2111112
212212223
2422
24
24
37945RRR41
327RR41
539
RRR4RR2RR4R2161
591535
RR4R6R4R163
25RR3R3R
123
RRR4R4RR2RR4R41
)1()R2RR4R2R2R4R2(
R4RR8R6R4R10R8
321T
c
)3k(1316
1
T)3(21c)(h
Program ALTMMR
• Calculates– Error variance ratio (highest if more than 2
subgroups)– Bartlett’s M– James’s J– Alexander’s A
• Uses sample descriptive data– nk , sx , sy , rxy
– User sets p = .05 or .01 (for all but James’s statistic)
Program ALTMMR
Described in detail in Aguinis (2004) Available at www.cudenver.edu/~haguinis/
(click on MMR icon on left side of page) Executable on-line or locally
Power Estimation
• Program POWER– Aguinis, Pierce, and Stone-Romero (1994, Ed.
& Psych. Measurement)
• Program MMRPWR– Aguinis and Pierce (1998, Ed. & Psych.
Measurement)
• Program MMRPOWER– Aguinis, Boik, and Pierce (2001, Organizational
Research Methods)
Program MMRPOWER
• Problems/Challenges regarding POWER and MMRPWR programs:– Based on extrapolation from simulations:
Range of values is limited– Absence of factors known to affect power
of MMR (e.g., unreliability)
• Theoretical approximation to power:
P o w er P r
k
N kF H Gk N k j x y j j j
j
k
j
k1
21 01 2
1 2
1
1
1, ,
Program MMRPOWER
Described in detail in Aguinis (2004) Available at www.cudenver.edu/~haguinis/
(click on MMR icon on left side of page) Executable on-line or locally
Some Conclusions• Observed moderating effects are very small• MMR is a low power test for detecting effect sizes
as typically observed• Researchers are not aware of problems with MMR• Implications for theory and practice• User-friendly programs are available and allow
researchers to improve moderating effect estimation
• Using these tools will allow researchers to make more informed decisions regarding the operation of moderating effects