multiple regression. multiple regression previously discussed the one predictor scenario multiple...
TRANSCRIPT
Multiple Regression
Multiple regression Previously discussed the one predictor
scenario Multiple regression is the case of having two
or more independent variables predicting some outcome variable
Basic idea is the same as simple regression, however more will need to be considered in its interpretation
The best fitting plane Before we attempted to find the best fitting line to
our 2d scatterplot of values With the addition of another predictor our cloud of
values becomes 3d Now we are looking for what amounts to the best
fitting plane With 3 or more we get into hyperspace and dealing with a
regression surface Regression equation:
0 1 1 2 2ˆ ... p pY b b X b X b X
Linear combination The notion of a linear combination is important for you to
understand for MR and multivariate techniques in general Again, what MR analysis does is create a linear combination
(weighted sum) of the predictors The weights are important to help us assess the nature of the
predictor-DV relationships with consideration of the other variables in the model
We then look to see how the linear combination in a sense matches up with the DV
One way to think about it is we extract relevant information from predictors to help us understand the DV
Stage of Condom
Use
Stage of Condom
Use
(X3) Self-Efficacy of Condom Use
(X3) Self-Efficacy of Condom Use
(X2) Cons of Condom Use
(X2) Cons of Condom Use
(X1) Pros of Condom Use
(X1) Pros of Condom Use
(X4) Psychosexual Functioning
(X4) Psychosexual Functioning
X’
New Linear Combination
3
4
2
1
MR Example
Considerations in multiple regression Assumptions Overall fit Parameter estimates and variable importance Variable entry IV relationships Prediction
Assumptions: Normality The assumptions for simple
regression will continue to hold Normality, homoscedasticity,
independence Mulitvariate normality can be at
least partially checked through examination of individual variables for normality, linearity, and heteroscedasticity
Tests for multivariate normality seem to be easily obtained in every package except SPSS
Assumptions: Model Misspecification In addition, we must worry about
model misspecification Omitting relevant variables,
including irrelevant ones, incorrect paths
Not much one can do about omitting relevant variables, but it may produce biased and less valid results
However we can’t just throw in all the variables we can think of also Overfitting Violation of Ockham's razor
Including irrelevant variables contributes to the standard error of estimate (and thus the SE for our coefficients) which will affect the statistical tests on individual variables
Coefficientsa
.845 .409 2.067 .039
-.001 .017 -.003 -.075 .940 .256 -.003 -.003
.761 .078 .434 9.818 .000 .505 .416 .381
.007 .001 .238 5.754 .000 .370 .259 .223
(Constant)
Visits to healthprofessionals
Physical healthsymptoms
Stressful life events
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Zero-order Partial Part
Correlations
Dependent Variable: Mental health symptomsa.
Coefficientsa
.849 .403 2.105 .036
.759 .071 .432 10.622 .000 .505 .443 .412
.007 .001 .238 5.840 .000 .370 .262 .226
(Constant)
Physical healthsymptoms
Stressful life events
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Zero-order Partial Part
Correlations
Dependent Variable: Mental health symptomsa.
Example data Current salary predicted by educational level,
time since hire, and previous experience (N = 474)
As with any analysis, initial data analysis should be extensive prior to examination of the inferential analysis
Initial examination of data We can use the descriptives to give us a general feel for
what’s going on with the variables in question Here we can also see that months since hire and previous
experience are not too well correlated with our dependent variable of current salary Ack!
We’d also want to look at the scatterplots to further aid our assessment of the predictor-DV relationships
Descriptive Statistics
34419.57 17075.661 474
13.49 2.885 474
17016.09 7870.638 474
81.11 10.061 474
95.86 104.586 474
Current Salary
Educational Level (years)
Beginning Salary
Months since Hire
Previous Experience(months)
Mean Std. Deviation N
Correlations
1 .047 -.252** .661**
. .303 .000 .000
474 474 474 474
.047 1 .003 .084
.303 . .948 .067
474 474 474 474
-.252** .003 1 -.097*
.000 .948 . .034
474 474 474 474
.661** .084 -.097* 1
.000 .067 .034 .
474 474 474 474
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Educational Level (years)
Months since Hire
Previous Experience(months)
Current Salary
EducationalLevel (years)
Monthssince Hire
PreviousExperience(months) Current Salary
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation is significant at the 0.05 level (2-tailed).*.
Starting point: Statistical significance of the model The Anova summary table tells us whether our model is
statistically significant R2 different from zero Equation is better predictor than the mean
As with simple regression, the analysis involves the ratio of variance predicted to residual variance
As we can see, it is reflective of the relationship of the predictors to the DV (R2), the number of predictors in the model, and sample size
ANOVAb
6.13E+10 3 2.042E+10 125.176 .000a
7.67E+10 470 163112654.7
1.38E+11 473
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Previous Experience (months), Months since Hire,Educational Level (years)
a.
Dependent Variable: Current Salaryb.
2
2
( 1)
(1 )
, 1
R N pF
R p
df p N p
Multiple correlation coefficient The multiple correlation coefficient is the correlation
between the DV and the linear combination of predictors which minimizes the sum of the squared residuals
More simply, it is the correlation between the observed values and the values that would be predicted by our model
Its squared value (R2) is the amount of variance in the dependent variable accounted for by the independent variables
R2
Here it appears we have an OK model for predicting current salary
Model Summary
.666a .444 .441 $12,771.556Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Previous Experience (months),Months since Hire, Educational Level (years)
a.
Variable importance: Statistical significance After noting that our model is
viable, we can begin our interpretation of how the predictors’ relative contributions
To begin with we can examine the output to determine which variables statistically significantly contribute to the model
Standard error measure of the variability that
would be found among the different slopes estimated from other samples drawn from the same population
Variable importance: Statistical significance We can see from the output that only previous
experience and education level are statistically significant predictors
Coefficientsa
-27886.3 5529.479 -5.043 .000 -38751.849 -17020.730
4004.576 210.628 .677 19.013 .000 3590.687 4418.466
87.951 58.441 .052 1.505 .133 -26.887 202.788
11.936 5.803 .073 2.057 .040 .533 23.340
(Constant)
Educational Level (years)
Months since Hire
Previous Experience(months)
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Current Salarya.
Variable importance: Weights Statistical significance, as usual, is only a starting point for our
assessment of results What we’d really want is a measure of the unique contribution of an
IV to the model Unfortunately the regression coefficient, though useful in
understanding that particular variable’s relationship to the DV, is not useful for comparing to other IVs that are of a different scale
Coefficientsa
-27886.3 5529.479 -5.043 .000 -38751.849 -17020.730
4004.576 210.628 .677 19.013 .000 3590.687 4418.466
87.951 58.441 .052 1.505 .133 -26.887 202.788
11.936 5.803 .073 2.057 .040 .533 23.340
(Constant)
Educational Level (years)
Months since Hire
Previous Experience(months)
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Current Salarya.
Variable importance: standardized coefficients Standardized regression coefficients get around that problem Now we can see how much the DV will change in standard deviation units with
one standard deviation unit change in the IV (all others held constant) Here we can see that education level seems to have much more influence on the
DV Another 3 years of education >$11000 bump in salary
Coefficientsa
-27886.3 5529.479 -5.043 .000 -38751.849 -17020.730
4004.576 210.628 .677 19.013 .000 3590.687 4418.466
87.951 58.441 .052 1.505 .133 -26.887 202.788
11.936 5.803 .073 2.057 .040 .533 23.340
(Constant)
Educational Level (years)
Months since Hire
Previous Experience(months)
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Current Salarya.
Variable importance However we still have other output to help us
understand variable contribution Partial correlation is the contribution of an IV after
the contributions of the other IVs have been taken out of both the IV and DV
Semi-partial correlation is the unique contribution of an IV after the contribution of other IVs have been taken only out of the predictor in question
Variable importance: Partial correlation A+B+C+D represents all the
variability in the DV to be explained A+B+C = R2
The squared partial correlation is the amount a variable explains relative to the amount in the DV that is left to explain after the contributions of the other IVs have been removed from both the predictor and criterion
It is A/(A+D) For IV2 it would be B/(B+D)
Variable importance: Semipartial correlation The semipartial correlation (squared) is
perhaps the more useful measure of contribution
It refers to the unique contribution of A to the model, i.e. the relationship between the DV and IV after the contributions of the other IVs have been removed from the predictor
A/(A+B+C+D) For IV2
B/(A+B+C+D)
Interpretation (of the squared value): Out of all the variance to be accounted for,
how much does this variable explain that no other IV does or
How much would R2 drop if the variable were removed?
2 2 2 21 2 ... pR r sr sr
Variable importance
IV1
IV2
Note that exactly how partial and semi-partial will be figured will depend on the type of multiple regression employed.
The previous examples concerned a standard multiple regression situation.
For sequential (i.e. hierarchical) regression, the partial correlation would be IV1 = (A+C)/(A+C+D) IV2 = B/(B+D)
Variable importance For semi-partial correlation
IV1 = (A+C)/(A+B+C+D) IV2 same as before
The result for the addition of the second variable is the same as it would be in standard MR
Thus if the goal is to see the unique contribution of a single variable after all others have been controlled for, there is no real reason to perform a sequential over standard MR
In general terms, it is the unique contribution of the variable at the point it enters the equation (sequential or stepwise)
Variable importance: Example data The semipartial correlation is labeled as ‘part’ correlation
in SPSS Here we can see that education level is really doing all the
work in this model Obviously from some alternate universe
Coefficientsa
-27886.3 5529.479 -5.043 .000
4004.576 210.628 .677 19.013 .000 .661 .659 .654
87.951 58.441 .052 1.505 .133 .084 .069 .052
11.936 5.803 .073 2.057 .040 -.097 .094 .071
(Constant)
Educational Level (years)
Months since Hire
Previous Experience(months)
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Zero-order Partial Part
Correlations
Dependent Variable: Current Salarya.
Another example Mental health symptoms predicted by number
of doctor visits, physical health symptoms, number of stressful life events
Model Summary
.553a .306 .302 3.504Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Stressful life events, Visits tohealth professionals, Physical health symptoms
a.
ANOVAb
2498.626 3 832.875 67.820 .000a
5661.387 461 12.281
8160.013 464
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Stressful life events, Visits to health professionals, Physicalhealth symptoms
a.
Dependent Variable: Mental health symptomsb.
Here we see that physical health symptoms and stressful life events both significantly contribute to the model
Physical health symptoms more ‘important’
Coefficientsa
.845 .409 2.067 .039
-.001 .017 -.003 -.075 .940 .256 -.003 -.003
.761 .078 .434 9.818 .000 .505 .416 .381
.007 .001 .238 5.754 .000 .370 .259 .223
(Constant)
Visits to healthprofessionals
Physical healthsymptoms
Stressful life events
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Zero-order Partial Part
Correlations
Dependent Variable: Mental health symptomsa.
Variable Importance: Comparison Comparison of
standardized coefficients, partial, and semi-partial correlation coefficients
All of them are ‘partial’ correlations
1 2 12
212
1 2 12
2 22 12
1 2 12
212
( )( )
1 ( )
( )( )
1 1 ( )
( )( )
1 ( )
y y
y y
y
y y
r r r
r
r r rPartial
r r
r r rSemi Partial
r
Another Approach to Variable Importance The methods just provided give us a glimpse as to variable importance,
but interestingly we don’t have a unique contribution statistic that is a true decomposition of R-squared, i.e. that we could add each measure of importance to equal our overall R-squared
One that does provides an average R2 increase, depending on the order the variable enters into the model 3 predictor example A B C; B A C, C A B etc.
One way to think about it using what you’ve just learned is thinking of the squared semi-partial correlation whether a variable is first second third etc.
Note that the average is for all possible permutations E.g. the R-square contribution for B being first in the model includes B A C
and B C A, both of which would of course be the same value The following example comes from the survey data
Model Summary
.793a .629 .618 1.46685 .629 54.346 1 32 .000
.809b .654 .619 1.46367 .025 1.070 2 30 .356
Model1
2
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors: (Constant), war on terrora.
Predictors: (Constant), war on terror, mathmatical ability, grade for bushb.
Model Summary
.096a .009 -.022 2.39850 .009 .295 1 32 .591
.805b .649 .626 1.45116 .639 56.417 1 31 .000
.809c .654 .619 1.46367 .005 .472 1 30 .497
Model1
2
3
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors: (Constant), mathmatical abilitya.
Predictors: (Constant), mathmatical ability, war on terrorb.
Predictors: (Constant), mathmatical ability, war on terror, grade for bushc.
Model Summary
.742a .551 .537 1.61427 .551 39.296 1 32 .000
.799b .638 .615 1.47194 .087 7.488 1 31 .010
.809c .654 .619 1.46367 .016 1.351 1 30 .254
Model1
2
3
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors: (Constant), grade for busha.
Predictors: (Constant), grade for bush, war on terrorb.
Predictors: (Constant), grade for bush, war on terror, mathmatical abilityc.
Model Summary
.746a .557 .528 1.63025 .557 19.453 2 31 .000
.809b .654 .619 1.46367 .098 8.458 1 30 .007
Model1
2
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors: (Constant), mathmatical ability, grade for busha.
Predictors: (Constant), mathmatical ability, grade for bush, war on terrorb.
As Predictor 1: R2 = .629Note there are 2 models in which war would be .629
As Predictor 2: R2 change = .639 and .087
As Predictor 3: R2 change = .098There are 2 models in which war would be .098
Interpretation The average of these is the average contribution to R
square for a particular variable over all possible orderings In this case for war it is ~.36, i.e. on average, it increases
R square 36% of variance accounted for Furthermore, if we add up the average R-squared
contribution for all three… .36+.28+.01 = .65 .65 is the R2 for the model
R program example library(relaimpo) RegModel.1 <- lm(SOCIAL~BUSH+MTHABLTY+WAR, data=Dataset) calc.relimp(RegModel.1, type = c("lmg", "last", "first", "betasq", "pratt"))
Output: LMG, is what we were just talking about. LMG stands for Lindemann, Merenda and Gold, authors who introduced it
Last is simply the squared semi-partial correlation First is just the square of the simple bivariate correlation between predictor and
DV Beta square is the square of the beta coefficient with ‘all in’ Pratt is the product of the standardized correlation and the simple bivariate
correlation It too will add up to the model R2 but is not recommended, one reason being that it can
actually be negative
lmg last first betasq pratt
BUSH 0.278 0.005 0.551 0.024 0.116
MATH 0.012 0.016 0.009 0.016 0.012
WAR 0.363 0.098 0.629 0.439 0.526
*Note the relaimpo package is equipped to provide bootstrapped estimates
Different Methods Note that one’s assessment of relative importance may
depend on the method Much of the time those methods will largely agree,
but they may not, so use multiple estimates to help you decide
One might go with the LMG typically as it is both intuitive and a decomposition of R2
lmg last first betasq pratt
BUSH 0.278 0.005 0.551 0.024 0.116
MATH 0.012 0.016 0.009 0.016 0.012
WAR 0.363 0.098 0.629 0.439 0.526
Relative Importance Summary There are multiple ways to estimate a variable’s contribution to the
model, and some may be better than others A general approach: Check simple bivariate relationships.
If you don’t see worthwhile correlations with the DV there you shouldn’t expect much from your results regarding the model Check for outliers and compare with robust measures also
You may detect that some variables are so highly correlated that one is redundant
Statistical significance is not a useful means of assessing relative importance, nor is the raw coefficient
Standardized coefficients and partial correlations are a first step Compare standardized to simple correlations as a check on possible
suppression Of typical output the semi-partial correlation is probably the more
intuitive assessment The LMG is also intuitive, and is a natural decomposition of R2, unlike
the others
Relative Importance Summary One thing to keep in mind is that determining
variable importance, while possible for a single sample, should not be overgeneralized
Variable orderings likely will change upon repeated sampling E.g. while one might think that war and bush are better
than math (it certainly makes theoretical sense), saying that either would be better than the other would be quite a stretch with just one sample
What you see in your sample is specific to it, and it would be wise to not make any bold claims without validation
Regression Diagnostics Of course all of the previous information would be relatively useless if we
are not meeting our assumptions and/or have overly influential data points In fact, you shouldn’t be really looking at the results unless you test
assumptions and look for outliers, even though this requires running the analysis to begin with
Various tools are available for the detection of outliers Classical methods
Standardized Residuals (ZRESID) Studentized Residuals (SRESID) Studentized Deleted Residuals (SDRESID)
Ways to think about outliers Leverage Discrepancy Influence
Thinking ‘robustly’
Regression Diagnostics Standardized Residuals (ZRESID)
Standardized errors in prediction Mean 0, Sd = std. error of estimate To standardize, divide each residual by its s.e.e.
At best an initial indicator (e.g. the +2 rule of thumb), but because the case itself determines what the mean residual would be, almost useless
Studentized Residuals (SRESID) Same thing but studentized residual recognizes that the error
associated with predicting values far from the mean of X is larger than the error associated with predicting values closer to the mean of X
standard error is multiplied by a value that will allow the result to take this into account
Studentized Deleted Residuals (SDRESID) Studentized in which the standard error is calculated with the
case in question removed from the others
Regression Diagnostics Mahalanobis’ Distance
Mahalanobis distance is the distance of a case from the centroid of the remaining points (point where the means meet in n-dimensional space)
Cook’s Distance Identifies an influential data point whether in terms of predictor or
DV A measure of how much the residuals of all cases would change if a
particular case were excluded from the calculation of the regression coefficients.
With larger (relative) values, excluding a case would change the coefficients substantially.
DfBeta Change in the regression coefficient that results from the exclusion of
a particular case Note that you get DfBetas for each coefficient associated with the
predictors
Regression Diagnostics Leverage assesses outliers among the IVs
Mahalanobis distance Relatively high Mahalanobis suggests an outlier on one or more variables
Discrepancy Measures the extent to which a case is in line with others
Influence A product of leverage and discrepancy How much would the coefficients change if the case were deleted?
Cook’s distance, dfBetas
Outliers Influence plots With a couple measures of
‘outlierness’ we can construct a scatterplot to note especially problematic cases After fitting a regression model in R-
commander, i.e. running the analysis, this graph is available via point and click
Here we have what is actually a 3-d plot, with 2 outlier measures on the x and y axes (studentized residuals and ‘hat’ values, a measure of leverage) and a third in terms of the size of the circle (Cook’s distance)
For this example, case 35 appears to be a problem
Outliers It should be clear to interested readers whatever has
been done to deal with outliers, Applications such as S-plus, R, and even SAS and
Stata (pretty much all but SPSS) provide methods of robust regression analysis, and would be preferred
Summary: Outliers No matter the analysis, some cases will be the ‘most
extreme’. However, none may really qualify as being overly influential.
Whatever you do, always run some diagnostic analysis and do not ignore influential cases
It should be clear to interested readers whatever has been done to deal with outliers
As noted before, the best approach to dealing with outliers when they do occur is to run a robust regression with capable software