unit 6: the basics of multiple regression

© Judith D. Singer, Harvard Graduate School of Education Unit 6/Slide 1

Unit 6: The basics of multiple regression


The S-030 roadmap: Where’s this unit in the big picture?

Unit 2:Correlation

and causality

Unit 3:Inference for the regression model

Unit 4:Regression assumptions:

Evaluating their tenability

Unit 5:Transformations

to achieve linearity

Unit 6:The basics of

multiple regression

Unit 7:Statistical control in

depth:Correlation and

collinearity

Unit 10:Interaction and quadratic effects

Unit 8:Categorical predictors I:

Dichotomies

Unit 9:Categorical predictors II:

Polychotomies

Unit 11:Regression modeling

in practice

Unit 1:Introduction to

simple linear regression

Building a solid

foundation

Mastering the

subtleties

Adding additional predictors

Generalizing to other types of

predictors and effects

Pulling it all

together


In this unit, we’re going to learn about…

• Various representations of the multiple regression model:– An algebraic representation – A three-dimensional graphic representation– A two-dimensional graphic representation

• Multiple regression—how it works and helps improve predictions– Estimating the parameters of the multiple regression model– Holding predictors constant—what does this really mean?

• Plotting the fitted multiple regression model: – Deciding how to construct the plot– Choosing prototypical values – Learning how to actually construct the plot (and interpret it correctly!)

• R2 and the Analysis of Variance (ANOVA) in multiple regression• Inference in multiple regression

– The omnibus F-test in multiple regression– Individual t-tests

• How might we summarize multiple regression results in tables/figures?

• How do we test our regression assumptions?


US News Peer Ratings of Graduate Schools of Education (GSEs)

RQs: What predicts Peer Ratings?

• This unit (Unit 6) doctoral student characteristics

• Next unit (Unit 7) faculty research productivityLearn more about the

ratings at USNews.com

(education school ratings methodology

page)


Stem Leaf # Boxplot 47 0 1 0 46 45 0 1 | 44 00 2 | 43 000 3 | 42 0 1 | 41 00 2 | 40 0000 4 | 39 00000 5 | 38 00 2 | 37 00 2 +-----+ 36 000000 6 | | 35 000000000 9 | | 34 000000 6 *--+--* 33 00000000 8 | | 32 000000000 9 | | 31 0000000 7 +-----+ 30 0000000000 10 | 29 000000 6 | 28 000 3 | ----+----+----+----+ Multiply Stem.Leaf by 10**+1

A first look at the data: Peer Ratings, mean GREs and N Doc Grads

Ratings of US Graduate schools of education

Doc Peer USNewsID School GRE Grad Rat Rat

1 Harvard 6.625 60 450 100 2 UCLA 5.780 53 410 97 3 Stanford 6.775 38 470 95 4 TC 6.045 193 440 92 5 Vanderbilt 6.605 22 430 88 6 Northwestern 6.770 10 390 83 7 Berkeley 6.050 43 440 82 8 Penn 6.040 61 380 82 9 Michigan 6.090 38 430 7910 Madison 5.800 106 430 7911 NYU 5.960 112 360 7712 MinneTC 5.750 89 390 7313 Oregon 6.115 39 340 7114 MichiganState 5.865 52 420 7015 Indiana 5.960 110 390 6916 UTAustin 5.865 102 400 6917 Washington 5.930 37 370 6818 Urbana 6.330 50 410 6719 USC 5.695 119 360 6720 BC 5.845 42 360 66...

The UNIVARIATE ProcedureVariable: PeerRat

Mean 344.8276 Std Deviation 45.13190Median 340.0000 Variance 2036.88853Mode 300.0000 Range 190.00000 Interquartile Range 60.00000

Outcome: Peer Rating

RQs: What doctoral student characteristics predict variation in the

peer ratings of GSEs?Question predictor: Is it quality (GRE scores)?Control predictor: Is it size (N doc grads)?

n = 87

Stanford

HGSETC, Berkeley

St Johns, Cincinnati, USF


Examining the predictors: Mean GRE scores and Number of doctoral grads

The UNIVARIATE ProcedureVariable: GRE


Question Predictor: Mean GRE scores

Stem Leaf # Boxplot 67 78 2 0 66 02 2 | 65 4 1 | 64 | 63 3 1 | 62 | 61 2 1 | 60 04459 5 | 59 111366 6 | 58 04669 5 +-----+ 57 0004555688 10 | | 56 0225666 7 | | 55 000178 6 *--+--* 54 11244889 8 | | 53 3444556888 10 | | 52 0011122255788 13 +-----+ 51 | 50 1349 4 | 49 046 3 | 48 4 1 | 47 48 2 | ----+----+----+----+ Multiply Stem.Leaf by 10**-1

The UNIVARIATE ProcedureVariable: DocGrad


Control Predictor: Number of doctoral graduates

Stem Leaf # Boxplot 19 3 1 * 18 17 16 15 14 1 1 0 13 12 11 029 3 | 10 26 2 | 9 6 1 | 8 669 3 | 7 016 3 | 6 0011345678 10 +-----+ 5 0112233589 10 | | 4 0122356 7 | + | 3 0023346678889 13 *-----* 2 01112344778899 14 +-----+ 1 001235567888899 15 | 0 4468 4 | ----+----+----+----+ Multiply Stem.Leaf by 10**+1

Stanford, Northwestern

Georgia

HGSE, VanderbiltDelaware

AuburnSt John’s, Illinois State

TC

VA CommCornell

UC DavisUC Irvine

HGSE (60)


Simple linear regression of Peer Ratings on mean GRE scores

The REG ProcedureDependent Variable: PeerRat

Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > FModel 1 75507 75507 64.40 <.0001Error 85 99665 1172.53242Corrected Total 86 175172

Root MSE 34.24226 R-Square 0.4310

Parameter Estimates

Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 -40.51619 48.15953 -0.84 0.4025GRE 1 69.07083 8.60722 8.02 <.0001

GRERatrPee 07.6952.40ˆ

Effect is strong: 43.1% of the variation in ratings is associated with mean GRE scores

Effect is large and statistically

significant: Schools whose mean GRE

scores are 100 points higher have peer

ratings that are, on average, 69 points higher (p<0.0001)

Tentative conclusion:

Student body quality has an

effect (or at least it does not

controlling for size)


Simple linear regression of Peer Ratings on program size

DocLRatrPee 290.1863.247ˆ





Parameter Estimates

Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 247.62760 20.58800 12.03 <.0001L2Doc 1 18.90487 3.91546 4.83 <.0001

Effect is moderately

strong: 21.5% of the variation in ratings is

associated with number of doctoral

graduates

Effect is moderately large and statistically

significant: Programs that are

twice as large have peer ratings that are an average of 18.9

points higher (p<0.0001)

TC

Conclusion: We should control for

size when evaluating the effects of GRE

scores

Down in X


From simple regression to multiple regression: Putting it all together

kk XXXY ...22110

DocLGREPeerRat 2210

How does multiple regression help us?1. Simultaneous consideration of many contributing

factors 2. We explain more of the variation in Y3. More accurate predictions (so the residuals are

smaller)4. Provides a separate understanding of each

predictor, controlling for the effects of other predictors in the model (that is, holding all these other predictors constant)

GRERatrPee 07.6952.40ˆ DocLRatrPee 290.1863.247ˆ

More generally, let X1, X2, … Xk represent k predictors


What does the multiple regression model look like graphically?

Let’s go 3D!

School GRE L2Doc PeerRatStJohns 4.75 4.39 280IllState 4.79 5.04 290SDState 5.01 4.25 320UNM 5.04 5.49 300UCIrvine 5.20 3.00 310Utah 5.22 4.86 320UIChicago 5.33 4.58 350Uarizona 5.34 5.88 360Vcomm 5.39 2.00 300Georgia 5.49 7.14 380BU 5.61 4.52 340Cornell 5.62 2.00 350GMU 5.62 4.39 320SUNYAlb 5.65 5.39 330Madison 5.80 6.73 430BC 5.85 5.39 360MichSt 5.87 5.70 420Uconn 5.89 5.70 340Boulder 5.91 3.58 350NYU 5.96 6.81 360Iowa 6.01 6.02 360TC 6.05 7.59 440Harvard 6.62 5.91 450Stanford 6.78 5.25 470

n = 24 for display purposes

2804.394.75StJohns3204.395.62GMU3204.255.01SDState3503.585.91Boulder3103.005.20UCIrvine3002.005.39Vcomm3502.005.62Cornell

PeerRatL2DocGRESchool

Smallschools

4205.705.87MichSt3405.705.89Uconn3005.495.04UNM3305.395.65SUNYAlb3605.395.85BC4705.256.78Stanford2905.044.79IllState3204.865.22Utah3504.585.33UIChicago3404.525.61BU

Mediumschools

4407.596.05TC3807.145.49Georgia3606.815.96NYU4306.735.80Madison3606.026.01Iowa4505.916.62Harvard3605.885.34Uarizona

Largeschools

Schools with same GRE scores but different sizes

Schools with same GRE scores AND size

Schools with same size but different GRE

scores

Sorted by L2Doc (low to high) Sorted by GRE (low to high)


What does the multiple regression model look like graphically?

PEER

Fitted regression plane

Observations OVER-predicted

(purple)

Observations UNDER-predicted (blue)


Returning to Flatland, Part I: A 3D graph drawn in 2D (using perspective)

GRE

L2D

oc

225

275

325

375

425

475

4.5 55.5 6

6.5 7

3

4

5

6

7

Ratings are higher, on

average, in schools with

higher GRE scores

Ratings are

higher, on

average, in

larger schools

And this

holds at

each level

of L2Doc (ie,

holding

L2Doc

constant)

And this holds at

each level

of GRE (ie,

holding GRE

constant)

DocLGREPeerRat 2210Note that this image

has a different orientation than the

one on the last

slidePeerRat


Returning to Flatland, Part II: Projecting the 3D graph back into 2D

PeerRat

GRE

L2D

oc

225

275

325

375

425

475

4.5 55.5 6

6.5 7

3

4

5

6

7

34

567

DocLGREPeerRat 2210

Each of these lines describes the effect of GRE at a given value of L2Doc—notice that this effect is the same at all

levels of L2DocNotice that these lines are

equidistant (or at least they appear to be so in

perspective)


Returning to Flatland, Part II: Projecting the 3D graph back into 2D

PEER

GRE

34

56

7

PEER

34

56

7

Looking at our 3D plot from the side, we can see

how to move from the fitted plane

A two-dimensional representation of

prototypical fitted lines

…to…

Note that this image

has a different orientation than the

one on the last

slide


Multiple regression assumptions (with more than 1 predictor)

Y

X1

X2

kk XXXY ...22110

2.The straight line model is correct. The means of each of these distributions, the µ’s, may be joined by a plane.

3.Homoscedasticity. The variances of each of these distributions, the σ2’s, are identical.

1.At each combination of the X’s there is a distribution of Y. These distributions have a mean µ Y|X1…Xk and a variance of σ2

Y|X1…Xk…

4.Independence of observations.Conditional on each combination of the X’s, the values of Y are independent of each other (we still can’t see this visually)5.Normality. At each combination of the X’s the values of Y are normally distributed


Multiple regression results: Regressing Peer Ratings on both L2Doc and GRE


Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 2 99814 49907 55.63 <.0001Error 84 75359 897.12759Corrected Total 86 175172

Root MSE 29.95209 R-Square 0.5698Dependent Mean 344.82759 Adj R-Sq 0.5596Coeff Var 8.68611

Parameter Estimates Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 -87.29494 43.07364 -2.03 0.0459GRE 1 63.31660 7.60956 8.32 <.0001L2Doc 1 15.34201 2.94746 5.21 <.0001

DocLGRERatrPee 234.1532.6329.87ˆ

Interpretation of intercept: The value of Y when all X’s = 0. When L2Doc=0 & GRE=0, the

predicted Peer Rating is -87.29

Here, the intercept is not meaningful.

Interpretation of slope coefficients: Difference in Y per 1 unit difference in X, holding all other

X’s in the model constant.Holding mean GRE scores constant, schools with

twice as many doctoral graduates have peer ratings that are an average of 15.34 points higher.

Holding L2Doc constant, schools whose doctoral students have mean GRE scores that are 100 higher have peer ratings that are 63.32 points

higher.

Synonyms: “statistically controlling for,” “partialling out,” “holding

constant”


Understanding the fitted multiple regression model algebraically & graphically


Controlled effect of GRE can be seen in the

common slope (63.32) of these lines

Controlled effect of L2Doc can be seen in the common distance

(15.34) between these lines

GRErRatePe

GRErRatePe

32.6393.25ˆ

)4(34.1532.6329.87-ˆ

4L2Doc

GRErRatePe

GRErRatePe

32.6359.10ˆ

)5(34.1532.6329.87-ˆ

5L2Doc

GRErRatePe

GRErRatePe

32.6327.41ˆ

)3(34.1532.6329.87-ˆ

3L2Doc

GRErRatePe

GRErRatePe

32.6375.4ˆ

)6(34.1532.6329.87-ˆ

6L2Doc

AlgebraicallyPlug in different values of

L2Doc

-25.93-(-

41.27)15.34

-10.59-(-

25.93)15.34

4.75-(-

10.59)15.34

GraphicallyReturn to the plot from

before

34567


Conceptualizing a 2D graph that will display our findings

(80, 82.35)

OWNIQTIQSFO 91.072.9ˆ

(120, 118.67)

So let’s discuss how we make

these two decisions

In multiple regression, we use the same general approach, but because we have more than 1 predictor, we have to make two decisions:

1. Decide which predictor you’d like to display on the X axis—in multiple regression, you have several predictors, but in a 2D graph, you only have 1 X axis

2. For all the other predictors (note: here we have only 1 other predictor, but usually we have more) identify prototypical values you’d like to use for plotting

Having made these 2 decisions, we then:

3. Systematically substitute in the prototypical values for those predictor(s), which yields a set of partial regression equations

4. Plot each partial regression equation as before (substitute in any 2 values for the remaining predictor, get the corresponding value of y-hat, plot the points, and connect them)


Decision 1: Use sketches to select the predictor to display on the X axis

(note: I mean sketches that don’t need to be drawn perfectly to scale)

PeerRat

GRE

Large L2Doc

Small L2Doc

Medium L2Doc

From the multiple regression equation, we know that the fitted lines corresponding to 1 unit differences in L2Doc will be 15.34 rating

points apart


PeerRat

L2Doc

Hi GREs

Low GREs

Medium GREs

From the multiple regression equation, we know that the fitted lines corresponding to 1 unit

differences in GRE will be 63.32 rating points apart (and the slopes for L2Doc will be shallower

on this graph)

Two general principles when deciding:

It’s usually easier to see/talk about a predictor displayed on the X axis (because its effect is

seen through the slope)

Corollary: Usually put the question predictor on the X axis. You’re typically less interested in control

predictors and generally want to focus on question predictors

We now need to define these

prototypical values


Decision 2: Helpful strategies for selecting prototypical values

The UNIVARIATE ProcedureVariable: L2Doc

Mean 5.141533 Std Deviation 1.10755Median 5.247928 Variance 1.22666 Quantile Estimate

100% Max 7.5924695% 6.7813690% 6.4757375% Q3 5.9307450% Median 5.2479325% Q1 4.3923210% 3.700445% 3.321930% Min 2.00000 Stem Leaf # Boxplot 7 6 1 | 7 1 1 | 6 5677889 7 | 6 00001111244 11 | 5 5567777778999999 16 +-----+ 5 0001222222334444 16 *--+--* 4 556688889999 12 | | 4 012222223444 12 +-----+ 3 56799 5 | 3 033 3 | 2 6 1 | 2 00 2 0 ----+----+----+----+

Examine the distribution of the remaining predictors and consider selecting:

1. Substantively interesting values. This is easiest when the predictor has inherently appealing values (e.g., 8, 12, and 16 years of education in the US)

2. A range of percentiles. When there are no well-known values, consider using a range of percentiles (either the 25th, 50th and 75th or the 10th, 50th, and 90th)

3. The sample mean .5 (or 1) standard deviation. Best used with predictors with a symmetric distribution

4. The sample mean (on its own). If you don’t want to display a predictor’s effect but just control for it, using only the sample mean will yield a “controlled” fitted regression equation

Remember that exposition is easier if you select whole number values (if the scale permits) or easily communicated fractions (eg.,¼, ½, ¾, ⅛)

Mean = 5.14, sd = 1.1110th = 3.725th = 4.450th = 5.275th = 5.990th = 6.5

Use L2DOC = 4, 5, and 6

24 = 16 (small)25 = 32 (medium)

26 = 64 (large)


Substitute in the prototypical values to graph the fitted MR equation

Three prototypical lines representing the relationship between Peer Ratings and GRE scores, holding L2Doc

constant (again, notice the identical slopes but different

intercepts)


GRErRatePe

GRErRatePe

32.6393.25ˆ

)4(34.1532.6329.87-ˆ

4)(L2DocschoolsSmall

GRErRatePe

GRErRatePe

32.6359.10ˆ

)5(34.1532.6329.87-ˆ

5)(L2DocschoolsMedium

GRErRatePe

GRErRatePe

32.6375.4ˆ

)6(34.1532.6329.87-ˆ

6)(L2Docschools Large

Small

LargeMedium

The vertical distance between the parallel lines spaced 1 unit

apart is the slope coefficient for L2DOC

15.34


What would happen if we put L2Doc on the X axis?

The UNIVARIATE ProcedureVariable: GRE

Mean 5.578966 Std Deviation 0.428999Median 5.505000 Variance 0.184035 Quantile Estimate

100% Max 6.77595% 6.54090% 6.05075% Q3 5.84550% Median 5.50525% Q1 5.27510% 5.0405% 4.9450% Min 4.745

Stem Leaf # Boxplot 67 78 2 0 66 02 2 | 65 4 1 | 64 | 63 3 1 | 62 | 61 2 1 | 60 04459 5 | 59 111366 6 | 58 04669 5 +-----+ 57 0004555688 10 | | 56 0225666 7 | | 55 000178 6 *--+--* 54 11244889 8 | | 53 3444556888 10 | | 52 0011122255788 13 +-----+ 51 | 50 1349 4 | 49 046 3 | 48 4 1 | 47 48 2 | ----+----+----+----+ Multiply Stem.Leaf by 10**-1

Use GRE = 5, 5.5 and 6

6 (high GRE)5.5 (~median

GRE)5 (low GRE)


DocLrRatePe 234.1531.229ˆ (5.0)GRELow

DocLrRatePe 234.1597.260ˆ (5.5)GRE Med

DocLrRatePe 234.1563.292ˆ GRE(6.0) High

Lo GRE

Hi GRE

Med GRE

63.32

The vertical distance between the parallel lines spaced 1 unit apart (Low to High) is the slope

coefficient for GRE


Understanding how the fitted MR equation provides statistical control


Small

LargeMedium

354

369

385

Comparison 1: Programs with identical mean GRE scores may have different

ratings because of their sizeComparison 2: Programs

of equal size may have different ratings because

of the quality of their student body

Comparing fitted values on a line drawn perpendicular to the X axis is holding GRE

constant

(5.0, 321)

(6.0, 385)

(7.0, 448)

Comparing values “on” any fitted

line is holding L2Doc

constant

Comparison 3: There is more than one way to earn a specific rating

(this is an unusual view of the data because it’s

reasoning backwards—we don’t really ever want to

hold Y constant— but nevertheless it’s worth

thinking about)


Towards understanding how and why MR can provide a better fit

= Lo GRE= Med GRE

= Hi GRE

= Small Schools= Medium Schools

= Large Schools

Why multiple regression helps provide a better fit

If the additional predictor(s) improve(s) the quality of the fit, the observed values of Y (the

yi) will be closer to the predicted values of Y (the )

(the fitted values on the relevant line – i.e., the line corresponding to that specific combination

of predictor values)

iy

Lo GRE

Hi GRE

Med GRE

Large

SmallMedium

Why are these lines parallel?They’re parallel because we assume that they are. This is known as a main effects assumption: we’re assuming the effect of each predictor is the same regardless of the

levels of the other predictor. Might this not be a correct assumption?


GRERatrPee 07.6952.40ˆ DocLRatrPee 290.1863.247ˆ Note: In

actuality, there are

fitted lines for every value of

L2Doc, not just for large,

medium, and small schools.


Might these lines NOT be parallel?: Let’s imagine what else they might be

= Lo GRE= Med GRE

= Hi GRE

= Small Schools= Medium Schools

= Large Schools

Right now, let’s assume that the main effects assumption is

correct

What does it mean if the lines aren’t parallel?

• This says that the effect of one predictor (say the effect of L2Doc) differs by levels of the other predictor (here, GRE)

• This is called a statistical interaction and in Unit 10 we’ll learn how to test for it and modify the model if necessary

Hmmm…the larger the school, the larger the effect

of GRE?

Hmmm…the better the student body, the larger the effect of

program size?


From simple to multiple regression: R2 & the Analysis of Variance (ANOVA)

)ˆ( yyi DeviationRegr )( yyi

Deviation Total

SS TotalSS Regress

2R

iy

x

y

X

Y

y

iy

)ˆ( ii yy DeviationError

Reprise of R2 in simple linear regression

)ˆ( yyi DeviationRegr

iy

x

y

X

Y

y

iy

R2 in multiple linear regression


)( yyi Deviation Total

MST=SST/dfSSTn-1

MSE=SSE/dfSSE

(n-1) – # predictors

MSR=SSR/dfSSR# predictors

Mean Squaredf


Total

Error(Residual)

Model (Regression)

Sum of SquaresSource

2)ˆ( yySSR i

2)ˆ( ii yySSE

2)( ii yySST )(ˆˆ 2 YraVMST

)Residuals(ˆˆ 2| raVMSE XY

Note that this table and the formula for R2 apply in both simple and multiple regression—it’s only the fitted values of Y that change!

The residual is now the vertical distance

between the observation and the

fitted regression plane


Comparing fitted values and residuals from simple and multiple regression models

ID School GRE L2Doc PeerRat yhatgre yhatdoc yhatmr residgre residdoc residmr

1 Harvard 6.625 5.91 450 417.1 359.3 422.8 32.9 90.7 27.2 2 UCLA 5.780 5.73 410 358.7 355.9 366.6 51.3 54.1 43.4 3 Stanford 6.775 5.25 470 427.4 346.8 422.2 42.6 123.2 47.8 4 TC 6.045 7.59 440 377.0 391.2 411.9 63.0 48.8 28.1 5 Vanderbilt 6.605 4.46 430 415.7 331.9 399.3 14.3 98.1 30.7 6 Northwestern 6.770 3.32 390 427.1 310.4 392.3 -37.1 79.6 -2.3 7 Berkeley 6.050 5.43 440 377.4 350.2 379.0 62.6 89.8 61.0 8 Penn 6.040 5.93 380 376.7 359.7 386.1 3.3 20.3 -6.1 9 Michigan 6.090 5.25 430 380.1 346.8 378.8 49.9 83.2 51.210 Madison 5.800 6.73 430 360.1 374.8 383.2 69.9 55.2 46.8

....

81 Colorado 5.210 3.91 300 319.3 321.5 302.5 -19.3 -21.5 -2.582 UWMilw 5.030 4.25 330 306.9 327.9 296.4 23.1 2.1 33.683 Hofstra 5.910 3.70 290 367.7 317.6 343.7 -77.7 -27.6 -53.784 IllState 4.785 5.04 290 290.0 343.0 293.1 0.0 -53.0 -3.185 IndianaSt 4.955 4.17 290 301.7 326.5 290.4 -11.7 -36.5 -0.486 StJohns 4.745 4.39 280 287.2 330.7 280.5 -7.2 -50.7 -0.587 UVM 5.340 4.17 310 328.3 326.5 314.8 -18.3 -16.5 -4.8

GRERatrPee 07.6952.40ˆ

DocLRatrPee 290.1863.247ˆ


Sum of Squared Errors

99,665 137,470 75,359


Interpreting R2 and the Analysis of Variance in multiple regression


Sum of MeanSource DF Squares Square

Model 2 99814 49907 Error 84 75359 897.12759Corrected Total 86 175172



SourceSum of Squares

DfMean Square

Model (Regression)

99,814 2 49,907.0

0

Error(Residual)

75,359 84 897.13

Total 175,172 86 2,036.89

%1.432| GREYR %5.212

2| DocLYR

2ˆ,

2,..., 21 YYXXX rR

k

][

)0(21

2121

,

2|

2|

2,|

7 Unit in this on more

XX

XYXYXXY

runless

RRR

r=.75

=.752

57.0% of the variation in

Peer Ratings is associated with L2Doc

and GRE

y

y

Root MST = = 45.132036.89)(ˆ YDS


Statistical inference: Two distinct types of hypotheses we can test

Across all my predictors, is there anything going on, or would I do just as well without

them?

Controlling for all other predictors in the model, does

each individual predictor, Xj, have an effect?

all) at help

tdoesn' n(regressio

H0 0: 21 k

zero)-non is effect

spredictor' (at least 1

some j 0: 1H

effect) controlled has

predictor (

no

this

j 0: 0H

effect) controlled a

predictor (

0:

has

this

j 1H

Overall/Omnibus F test

Individual t-tests

With only 1 predictor (that is, in simple linear regression), these two tests are identical.

In multiple regression, these two types of tests are decidedly different!


Towards a heuristic understanding of the omnibus F-test:Comparing the regression decomposition when H0 is not true and is true

)ˆ( yyi DeviationRegr

iy

x

y

X

Y

y

iy

Regression decomposition if H0 is not true



Error deviations are small

Regression deviations are large

SS Error is small

SS Regression is large

MSE is small

MSR is large0H reject large, is if So,

MSE

MSR

)ˆ( yyi DeviationRegr iy

x

y

X

Y

y

iy

Regression decomposition if H0 is true



Error deviations are large

Regression deviations are small

SS Error is large

SS Regression is small

MSE is large

MSR is small0H reject to fail small, is if So,

MSE

MSR

MSE

MSRF

testFOmnibus

obs


Conducting omnibus hypothesis tests in multiple regression


63.5513.897

907,49

MSE

MSRFobs

Q: Is 55.63 “large

enough” to reject H0?

Because F2, 84 = 55.63 (p<0.0001), we reject H0 that

all ’s = 0 and conclude that at least one j 0

Sound statistical practice:When reporting F-tests, be sure to provide not just the p-value but also both the numerator and denominator degrees of

freedom1.001.301.561.72

1.221.381.641.77

1.571.681.782.01

1.831.932.032.24

2.212.312.402.60

2.372.462.562.76

2.602.702.792.99

3.003.093.183.39

3.843.944.034.24

inf1005025

df for denominator (MSE)

1000

120

20

10

5

4

3

2

1

df for numerator (MSR)

When the numerator df = 1, F = t2 (1.962=3.84) – this

relationship makes sense so that the omnibus F-test and the

single parameter t-test give identical results

Critical values of Fobserved (α=.05)

Omnibus F test: Across all my predictors, is there anything going on, or would I do just as

well without them?0:

0: 21

j

k

some

1

0

H

H


Conducting individual t-tests in multiple regression

Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 -87.29494 43.07364 -2.03 0.0459GRE 1 63.31660 7.60956 8.32 <.0001L2Doc 1 15.34201 2.94746 5.21 <.0001

Statistically controlling for L2Doc,

there is an effect of GRE

Statistically controlling for GRE,

there is an effect of L2DOC

Individual t-tests: Controlling for all other predictors in the

model, does each individual predictor, Xj, have an effect?

0:

0:

j

j

1

0

H

H

Individual t-tests in multiple regression are analogous to those in single variable

regression

The key difference comes in our interpretation of the results


How might we summarize the results of these analyses?

Comparison of regression models predicting peer ratings of US Graduate Schools of Education (n=87) (US News and World Report, 2005)

Predictor

Model A Model B

Model C

Intercept

-40.52(48.16)-0.84

247.63(20.59)12.03**

*

-87.29(43.07)-2.03*

Mean GRE

scores

69.07(8.61)

8.02***

63.32(7.61)

8.32***

Log2(N doctoral grads)

18.90(3.92)

4.83***

15.34(2.95)

5.21***

R2 43.1 21.5 57.0

F(df)p

64.40(1, 85)

<0.0001

23.31(1, 85)<0.000

1

55.63(2, 84)

<0.0001

Cell entries are estimated regression coefficients, (standard errors) and t-statistics.* p<0.05, ** p<0.01, *** p<0.001

Small

LargeMedium

Definitions of program sizeSmall: = 16 doctoral grads (24) Medium = 32 doctoral

grads (25) Large = 64 doctoral grads

(26)


Examining residuals to examine assumptions as in simple regression

Stem Leaf # Boxplot 20 6 1 | 18 17 2 | 16 3939 4 | 14 069 3 | 12 2 1 | 10 174 3 | 8 158579 6 | 6 4427 4 +-----+ 4 6759 4 | | 2 235800145 9 | | 0 6689028 7 *--+--* 0 9643119821 10 | | -2 671 3 | | -4 54437 5 | | -6 88075 5 +-----+ -8 860 3 | -10 862081 6 | -12 96754 5 | -14 10 2 | -16 37 2 | -18 3 1 | -20 | -22 | -24 | -26 8 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**-1

BerkeleyUCDavis, Ohio State

Delaware

HofstraUSF, UHouston

UC Berkeley

UC Berkeley

Delaware (very high GREs)

DelawarePlot residuals (either raw or studentized)

vs. each X

Under-predictedRatings are higher than we expected

Over-predictedRatings are lower than we expected

Studentized residuals


We should also plot residuals vs. fitted values

UC Berkeley379ˆ;440 yyobs

390ˆ;310 yyobs

Delaware (very high GREs)

• Possible nonlinearity?

• Might this improve when we add other predictors in Unit 7?

• Might this improve if we allow the effect GRE to interact with L2Doc (in Unit 10)


What’s the big takeaway from this unit?

• Multiple regression serves several purposes– We can more accurately explain the variation in the outcome Y by

considering several predictors simultaneously– The basic principles of model fitting, data analysis, and inference

remain essentially the same

• Inference in multiple regression focuses both on the overall model and on the role of individual predictors (controlling for other predictors in the model)– Omnibus F-tests tell about the model as a whole– Individual t-tests provide information about an individual

predictor when controlling for the other predictor(s) in the model

• We have to make wise decisions about how to best present findings– Multidimensional graphs would be ideal, but we usually find

ourselves displaying our findings in just two dimensions– Different plots emphasize different messages—you need to learn

how to think about what the prototypical plots will look like and make educated decisions about what plots to display

– Tables can be helpful in presenting results from several models that include different combinations of predictors


Appendix: Annotated PC-SAS Code for fitting multiple regression models

Note that the handouts include only annotations for the needed additional code. For the complete program, check program “Unit 6—EdSchools analysis” on the website.

Note that the handouts include only annotations for the needed additional code. For the complete program, check program “Unit 6—EdSchools analysis” on the website.

*-----------------------------------------------------------------*Fitting multiple regression model PEERRAT on L2DOC and GRE*-----------------------------------------------------------------*;

proc reg data=one; model PeerRat=L2Doc GRE; output out=resdat1 r=residual student=student predicted=yhat;

*-----------------------------------------------------------------*Univariate summary information on studentized residualsfrom multiple regression model PEERRAT on L2DOC GRE *----------------------------------------------------------------*;

proc univariate data=resdat1 plots; var student; id school;

*-----------------------------------------------------------------*Plotting studentized residuals vs. each Predictor and YHat *-----------------------------------------------------------------*;

proc gplot data = resdat1; plot student*(L2Doc GRE yhat); symbol value='dot';

*-----------------------------------------------------------------*Computing fitted values and residuals from the 3 models *-----------------------------------------------------------------*;

data one; set one;

yhatgre = -40.51619 + 69.07083*GRE; yhatdoc = 247.62750 + 18.90487*L2Doc; yhatmr = -87.29494 + 63.31660*GRE + 15.34201*L2Doc;

residgre = peerrat - yhatgre; residdoc = peerrat - yhatdoc; residmr = peerrat - yhatmr;

Proc reg allows you to fit multiple regression models by adding additional predictors to your model statement (following the equal “=” sign). The syntax for the output statement is similar, except that now you also need to ask for the predicted values (the fitted values of y), to use in residual plots to explore assumption violations.

Proc reg allows you to fit multiple regression models by adding additional predictors to your model statement (following the equal “=” sign). The syntax for the output statement is similar, except that now you also need to ask for the predicted values (the fitted values of y), to use in residual plots to explore assumption violations.

proc univariate can be used as usual (with the plots option) to analyze the new dataset RESDAT and to provide summary statistics for the residuals.

proc univariate can be used as usual (with the plots option) to analyze the new dataset RESDAT and to provide summary statistics for the residuals.

To analyze residual assumptions for multiple regression model use proc gplot to produce plots of the residuals vs. predicted value of Y.

To analyze residual assumptions for multiple regression model use proc gplot to produce plots of the residuals vs. predicted value of Y.

You can obtain these fitted values and residuals from the separate PROC REGs but it’s MUCH easier to just write code in a data step, which is what I did.

You can obtain these fitted values and residuals from the separate PROC REGs but it’s MUCH easier to just write code in a data step, which is what I did.


Glossary terms included in Unit 6

• Statistical control• Interactions• Main effects• Omnibus F-test

unit 6: the basics of multiple regression

Documents