multiple regression analysis - university of...

Multiple Regression Analysis

y = β0 + β1x1 + β2x2 + . . . βkxk + u 1. Estimation

Introduction

! Main drawback of simple regression is that with 1 RHS variable it is unlikely that u is uncorrelated with x

! Multiple regression allows us to control for those “other” factors

! The more variables we have the more of y we will be able to explain (better predictions)

! e.g. Earn=β0+ β1Educ+ β2Exper+u ! Allows us to measure the effect of education on

earnings holding experience fixed 1

Parallels with Simple Regression

! β0 is still the intercept ! β1 to βk all called slope parameters ! u is still the error term (or disturbance) ! Still need to make a zero conditional mean

assumption, so now assume that ! E(u|x1,x2, …,xk) = 0 ! other factors affecting y are not related on average

to x1,x2, …, xk ! Still minimizing the sum of squared residuals

2

Estimating Multiple Regression

y = !0 + !1x1 + !2x2 + ...+ !k xk

! The fitted equation can be written:

! And we want to minimize the sum of squared residuals:

yi ! "0 ! "1xi1 ! "2xi2 ! ...! "k xik( )2#! We will now have k+1 first order conditions

yi ! "0 ! "1xi1 ! "2xi2 ! ...! "k xik( ) = 0#xi1 yi ! "0 ! "1xi1 ! "2xi2 ! ...! "k xik( ) = 0#xi2 yi ! "0 ! "1xi1 ! "2xi2 ! ...! "k xik( ) = 0#

Etc. 3

Interpreting the Coefficients y = !0 + !1x1 + !2x2 + ...+ !k xk! We can obtain the predicted change in y given

changes in the x variables

! The intercept drops out because it’s a constant so its change is zero. If we hold x2,…,xk fixed, then

!y = "1!x1 + "2!x2 + ...+ "k!xk

!y = "1!x1 or!y!x1

= "14

Interpreting the Coefficients

! Note that this is the same interpretation we gave in the univariate regression model.   It tells us how much a 1 unit change in x

changes y.   In this more general case, it tells us the effect of

x1 on y, holding the other x’s constant.   We can call this a ceteris paribus effect.

! Of course tells us the ceteris paribus effect of a 1 unit change in x2, and so forth.

!1

!2

5

iClickers Question: What day of the week is today?

A) Monday B) Tuesday C) Wednesday D) Thursday E) Friday Question: Is next week Spring Break? A) Yes B) No 6

iClickers Press any letter on your iClicker for showing up on a rainy Friday after Midterm 1 and before Spring Break.

7

Ceteris Paribus interpretation

! Imagine we rerun our education/earnings regression, but include lots of controls on the RHS (x2, x3,….,xk)   If we control for everything that affects

earnings and is correlated with education, then we can truly claim that “holding everything else constant, gives an estimate of the effect of an extra year of education on earnings.”

  Because if everything else is controlled for, the zero conditional mean assumption holds.

!1

8

Ceteris Paribus interpretation

! This is a very powerful feature of multiple regression   Suppose only 2 things determine earnings,

education and experience.   In a laboratory setting, you’d want to pick a

bunch of people with the same experience (perhaps average experience) and then vary education across them, and look for differences in earnings that resulted.

  This is obviously impractical/unethical 9

Ceteris Paribus interpretation ! If you couldn’t conduct an experiment, you could go out

and select only people with average experience and then ask them how much education they have and how much they earn   Then run univariate regression of earnings on education   This is inconvenient to have to pick your sample so carefuly

! Multiple regression allows for you to control (almost as if you’re in a lab--more on that qualifier later in the course) for differences in individuals along dimensions other than their level of education.   Even though we haven’t picked a sample holding experience

constant, we can interpret the coefficient on education as if we held experience constant in our sampling.

10

A “Partialling Out” Interpretation ! Suppose that x2 affects y and that x1 and x2 are

correlated. This means the zero conditional mean assumption is violated if we simply estimate a univariate regression of y on x1. (convince yourself of this)   The usual way to deal with this is to “control for x2” by

including x2 on the right-hand side (multiple regression). In practice, this is how we’ll usually deal with the problem.

  In principle, there’s another way we could do this. If we could come up with a measure of x1 that is purged of its relationship with x2, then we could estimate a univariate regression of y on this “purged” version of x1. x2 would still be contained in the error term, but it would no longer be correlated with x1, so the zero conditional mean assumption would not be violated. 11

A “Partialling Out” Interpretation ! Consider the case where k=2, so if we do multiple

regression we estimate yi = !0 + !1xi1 + !2xi2 + ui! The estimate of coefficient, β1, is:

!1 = (xi1 " x1)# yi xi1 " x1( )2#

! The estimate of coefficient, β1, is: where the are residuals from the estimated regression

!1 = (ri1 " r1)yi#( ) (ri1 " r1# )2 ri1

xi1 = ! 0 + ! 2xi2

! Another way to express this same estimate is to estimate (by univariate regression) yi = !0 + !1ri1 + "i

12 xi1 = !0 +!2xi2 + ri ri1 = xi1 ! xi1

“Partialling Out” continued ! This implies that we estimate the same effect of x1 either

by 1. Regressing y on x1 and x2 (multiple regression) or by 2. Regressing y on residuals from a regression of x1 on x2

(univariate regression) ! These residuals are what is left in the variation of x1 after

x2 has been partialled out. They contain the variation of x1 that is independent of x2

! Therefore, only the part of x1 that is uncorrelated with x2 is being related to y

! By regressing y on the residuals of the regression of x1 on x2 we are estimating the effect of x1 on y after the effect of x2 on x1 has been “netted out” or “partialled out.”

13

Simple vs Multiple Reg Estimate

22110

110

ˆˆˆˆ regression multiple with the

~~~ regression simple theCompare

xxy

xy

!!!

!!

++=

+=•

:unless ˆ~ Generally, 11 !! "

1. (i.e. no partial effect of x2 on y) -the first order conditions will be the same in either case if

this is true or 2. x1 and x2 are uncorrelated in the sample.

-if this is the case then regressing x1 on x2 results in no partialling out (residuals give the complete variation in x1). In the general case (with k regressors), we need x1 uncorrelated with all other x’s.

!2 = 0

14

Goodness-of-Fit

yi ! y( )2" is the total sum of squares (SST)

yi ! y( )2" is the explained sum of squares (SSE)

ui2" is the residual sum of squares (SSR)

! As in the univariate case, we can think of each observation being made up of an explained part and an unexplained part

iii uyy ˆˆ +=

! We can then define the following:

! Then, SST=SSE+SSR 15

Goodness-of-Fit (continued) ! We can use this to think about how well our

sample regression line fits our sample data ! The fraction of the total sum of squares (SST) that

is explained by the model is called the R-squared of regression

! R2 = SSE/SST = 1 – SSR/SST ! Can also think of R-squared as the correlation

coefficient between the actual and fitted values of y

R2 =(yi ! y )(yi ! y )"( )2

(yi ! y )2"( ) (yi ! y )

2"( )16

More about R-squared

! R2 can never decrease when another independent variable is added to a regression, and usually will increase

! When you add another variable to the RHS, SSE will either not change (if that variable explains none of the variation in y), or it will increase (if that variable explains some of the variation in y)

! SST will not change ! So while it’s tempting to interpret an increase in the R-

squared after adding a variable to the RHS as indicative that the new variable is a useful addition to the model, one shouldn’t read too much into it.

17

Assumptions for Unbiasedness

! The assumptions needed to get unbiased estimates in simple regression can be restated

1. Population model is linear in parameters: y = β0 + β1x1 + β2x2 +…+ βkxk + u

2. We have a random sample of size n from the population model, {(xi1, xi2,…, xik, yi): i=1, 2, .., n} - i.e. no sample selection bias Then the sample model is

yi = β0 + β1xi1 + β2xi2 +…+ βkxik + ui

18

Assumptions for Unbiasedness 3. None of the x’s is constant, and there are no exact

linear relationships among the x’s *We say “perfect collinearity” exists when one of the x’s is an exact linear combination of other x’s *It’s OK for x’s to be correlated with each other, but perfect correlation makes estimation impossible; high correlation can be problematic.

4. E(u|x1, x2,… xk) = 0 *all of the explanatory variables are “exogenous” -We call an x variable “endogenous” if it is correlated with the error term. -Need to respecify model if at least one x variable is endogenous.

19

Examples of Perfect Collinearity 1. x1=2(x2) - One of these is redundant

*x2 perfectly predicts x1. So we should throw out either x1 or x2. *If an x is constant (e.g. x=c) , then it will be perfect linear function of β0. c=aβ0 for constant a.

2. The linear combinations can be more complicated *Suppose you had data on people’s age and their education, but not on their experience. If you wanted to measure experience, you could assume they started work right out of school and calculate experience as Exp=Age-(Educ+5), where the 5 accounts for the 5 years before we start school as kids.

20

Examples of Perfect Collinearity

*But then these three variables would be perfectly collinear (each a linear combination of the others) *in this case, only 2 of those variables could be included on the RHS.

3. Including income and income2 does not violate

this assumption (not an exact linear function)

21

Unbiasedness ! If assumptions 1-4 hold, it can be shown that:

! What happens if we include variables in our specification that don’t belong (over-specify)?   Called “overspecification” or “inclusion of irrelevant

regressors.”   Not a problem for unbiasedness.

! What if we exclude a variable from our specification that does belong?   This violates assumption 4. In general, OLS will be

biased; called “omitted variables problem”

E ! j( ) = ! j , j = {0, 1, ..., k}

i.e. that the estimated parameters are unbiased

22

Omitted Variable Bias

! This is one of the greatest problems in regression analysis   If you omit a variable that affects y and is correlated

with x, then the coefficient that you estimate on x will be partly picking up the effect of the omitted variable, and hence will misrepresent the true effect of x on y.

  This is why we need the zero conditional mean assumption to hold; and to make sure it holds, we need to include all relevant x variables.

23

iClickers ! Question: Suppose the true model of y is given by

but you don’t have data on x2 so you estimate instead. Assume that x1 and x2

are correlated, but not perfectly. This is a problem because:

A) x1 and x2 are perfectly collinear. B) u and v are different from each other. C) v contains x2. D) None of the above.

24

y = !0 + !1x1 + !2x2 + u

!

˜ y = ˜ " 0 + ˜ " 1x1 + v

Omitted Variable Bias ! Suppose the true model is given by y = !0 + !1x1 + !2x2 + u

! But we estimate

!y = !!0 +!!1x1, then

!!1 =xi1 " x1( )yi#xi1 " x1( )2#

! Recall the true model, so that yi = !0 + !1xi1 + !2xi2 + ui

25

Omitted Variable Bias (cont)

xi1 ! x1( )" #0 + #1xi1 + #2xi2 + ui( )= #0 xi1 ! x1( )" + #1 xi1 ! x1( )" xi1 + #2 xi1 ! x1( )" xi2+ xi1 ! x1( )" ui

recall: xi1 ! x1( )" =0, xi1 ! x1( )" xi1 = xi1 ! x1( )2"

! So the numerator of becomes

! Now the numerator becomes

!!1

26

!

"1 (xi1 # x 1)2 +$ "2 (xi1 # x 1)xi2 + (xi1 # x 1)ui$$


! Returning to our formula for !!1

!!1 =xi1 " x1( )yi#xi1 " x1( )2#

=!1 (xi1 " x )

2# + !2 xi1 " x1( )# xi2 + xi1 " x1( )# uixi1 " x1( )2#

= !1 +!2 xi1 " x1( )# xi2

xi1 " x1( )2#+

xi1 " x1( )# uixi1 " x1( )2#

27


so, !!1 = !1 + !2

xi1 " x1( )xi2#xi1 " x1( )2#

+xi1 " x1( )ui#xi1 " x1( )2#

! Taking expectations (conditional on x’s) and recalling that E(ui)=0, we get

E( !!1) = !1 + !2

(xi1 " x )xi2#(xi1 " x )

2#.

! In other words, our estimate is (generally) biased. :(

28

Omitted Variable Bias (cont) ! How might we interpret the bias? ! Consider the regression of x2 on x1:

!x2 =!!0 +!!1x1

then !!1 =xi1 " x1( )xi2#xi1 " x1( )2#

! The omitted variable bias equals zero if: 1. β2=0: (e.g. if x2 doesn’t belong in the model) 2. (e.g. if x1 and x2 don’t covary) !!1 = 0

29

Summary of Direction of Bias

β2 < 0

β2 > 0

Corr(x1, x2) < 0

Corr(x1, x2) > 0

! The direction of the bias, , depends on: 1.  The sign of

-positive if x1 x2 positively correlated -negative if x1 x2 negatively correlated

2. The sign of β2 Summary Table

Positive bias Positive bias Negative bias Negative bias

!

("2 * ˜ # 1)

!

˜ " 1

30

iClickers ! Question: Suppose the true model of y is given

by but you estimate instead. Suppose that x1 and x2 are negatively correlated, and that x2 has a negative effect on y. What will be the sign of the bias of ?

A) Positive. B) Negative. C) The bias will be zero. D) None of the above.

y = !0 + !1x1 + !2x2 + u

!

˜ y = ˜ " 0 + ˜ " 1x1 + v

!

˜ " 1

iClickers ! Suppose that 2 things, education and ability,

determine earnings. Assume that both matter positively for earnings, and that ability and education positively covary.

Question: If you estimate a model of earnings that excludes ability, the coefficient estimate on education will be

A)  positively biased. B)  negatively biased. C)  unbiased. D)  none of the above. 32

Omitted Variable Bias Summary

! The size of the bias is also important ! Typically we don’t know the magnitude of β2 or

but we can usually make an educated guess whether these are positive or negative

What about the more general case (k+1 variables)? ! Technically we can only sign the bias of the more

general case if all the included x’s are uncorrelated

!

˜ " 1

33

OVB Example 1 ! Suppose that the true model of the wage is

but you estimate . Which way will the estimated coefficient on educ be biased? 1) has same sign as corr(educ, exper)<0 2) β2>0 (because people with more experience tend to be more productive) Therefore bias= <0; the estimate is negatively biased

So, if you omit experience when it should be included, the estimated coefficient on educ will be negatively biased (too small on average)

!

wage = "0 + "1educ + "2experience + u

!

wage = "0 + "1educ + v

!

˜ " 1

!

("2 * ˜ # 1)

34

OVB Example 2 ! Suppose that the true model of the wage is

but you estimate . Which way will the estimated coefficient on educ be biased? * has same sign as corr(educ, ability)>0 * β2>0 (because people with more ability tend to be more productive) * bias= > 0; the estimate is positively biased

In other words, if you omit ability when it should be included, the estimated coefficient on educ will be positively biased (too big on average)

!

wage = "0 + "1educ + "2ability + u

!

wage = "0 + "1educ + v

!

("2 * ˜ # 1)!

˜ " 1

35

The More General Case

! For example, suppose that the true model is:

and we estimate:

!y = !!0 +!!1x1 +

!!2x2

y = !0 + !1x1 + !2x2 + !3x3 + u

! Suppose x1 and x3 are correlated. In general, both

will be biased. will only be unbiased !!1 and !!2

!!2

if x1 and x2 are uncorrelated. 36

Variance of the OLS Estimators

! We now know (if assumptions 1-4 hold) that the sampling distribution of our estimate is centered around the true parameter

! Want to think about how spread out this distribution is

! Much easier to think about this variance under an additional assumption, so

Assumption 5: ! Assume Var(u|x1, x2,…, xk) = σ2

(Homoskedasticity) ! This implies that Var(y|x1, x2,…, xk) = σ2 37

Variance of OLS (cont) ! The 4 assumptions for unbiasedness, plus this

homoskedasticity assumption are known as the Gauss-Markov assumptions

! These can be used to show:

Var ! j( ) = " 2

(xij # x j )2$ 1# Rj

2( )! refers to the R-squared obtained from a

regression of xj on all other independent variables (including an intercept)

!

R j2

38

Important to understand what is ! It’s not the R-squared for the regression of y on

the x’s (the usual R-squared we think of)   It’s an R-squared, but for a different regression   Suppose we’re calculating variance of the estimator

of x1; then is the R-squared from a regression of x1 on x2, x3, ..., xk

  If we’re calculating variance of the estimator of x2; then is the R-squared from a regression of x2 on x1, x3, ..., xk

  etc.   Whichever coefficient you’re calculating variance

for, you just regress that variable on all other RHS variables (this gives a measurement of how closely that variable is related—linearly—to other RHS vars)

!

R j2

!

R12

!

R22

39

Components of OLS Variances ! Thus, how precisely we can estimate the

coefficients (variance) depends on: 1.  The error variance (σ2):

-A larger σ2 implies higher variance of the slope coefficient estimators

-The more “noise” (unexplained variation) in the equation, the harder it is to get precise estimates.

-Just like univariate case 2.  The total variation in xj

-Larger SSTj gives us more variation of xj from which to discern an impact on y; lowers variance -Just like univariate case

40

Components of OLS Variances 3. The linear relationships among the independent variables

(Rj2). Note that we didn’t need this term in the univariate

case: *A larger Rj

2 implies larger variance for estimator *The more correlated the RHS variables, the higher the variance of the estimators (the more imprecise the estimates)

! The problem of highly correlated x variables is called “Multi-Collinearity”   Not clear what the solution is. Could drop some of the

correlated x variables, but then you may end up with omitted variables bias.

  Can increase your sample size, which will boost the SST, and maybe let you shrink the variance of your estimators that way.

41

iClickers ! Recall the expression for variance of the slope

OLS estimators in multivariate regression:

! Question: Suppose you estimate

and want an expression for the variance of . You would obtain the needed by estimating which of the following equations?

A) B) C) D) None of the above.

42

Var ! j( ) = " 2

(xij # x j )2$ 1# Rj

2( )

!

yi = "0 + "1x1i + "2x2i + "3x3i + ui

!

yi = "0 + "3x3i + vi!

R32

!

x3i = " 0 + " 1x1i + " 2x2i + wi

!

ˆ " 3

!

yi = "0 + "1x1i + "2x2i + "3x3i + ui

Example of Multi-Collinearity (high ) ! Suppose that you want to try to explain differences

in mortality rates across different states ! Might use variation in medical expenditures across

the states (e.g. expenditures on hospitals, medical equipment, doctors, nurses, etc.)

! Problem?   Expenditures on hospitals will tend to be highly

correlated with expenditures on doctors, etc.   So estimates of impact of hospital spending on

mortality rates will tend to be imprecise.

43

!

R j2

Variances in misspecified models ! Consider again the misspecified model (x2

omitted): !y =!!0 + !!1x1

! We showed that the estimate of β1 is biased if does not equal zero and that including x2

in this model does not bias the estimate even if β2=0; this suggests that adding x variables can only reduce bias

! So, Q: why not always throw in every possible RHS variable, just to be safe? A: Multicollinearity

! We know that in the misspecified model

Var( !!1) =

" 2

SST1

! =0 if no other variables on RHS; adding vars will

!

("2 * ˜ # 1)

!

R12

44

!

R12

Misspecified Models (cont) (holding σ2 constant)

! Unless x1 and x2 are uncorrelated (Rj2=0); then the

variances are the same (holding σ2 constant) ! Assuming that x1 and x2 are correlated: 1)  If β2=0: will both be unbiased and 2) If β2 not equal to 0: Then will be biased, and

will be unbiased, and variance of the misspecified estimator could be greater or less than the variance of the correctly specified estimator, depending on what happens to σ2

Thus, Var !!1( ) <Var !1( )

!!1 and !1

Var !!1( ) <Var !1( )

!!1 !1

45

Misspecified Models (cont)

! So, if x2 doesn’t affect y, (i.e. β2=0), it’s best to leave x2 out, because we’ll get a more precise estimate of the effect of x1 on y.

! Note that if x2 is uncorrelated with x1, but β2≠0 then including x2 will actually lower the variance of the estimated coeff. on x1, because its inclusion lowers σ2

! Generally, we need to include variables whose omission would cause bias   But we often don’t want to include other variables, if

their correlation with other RHS variables will reduce precision of the estimates.

46

Estimating the Error Variance ! We don’t know what the error variance, σ2, is,

because we don’t observe the errors, ui ! What we observe are the residuals, ûi

! We can use the residuals to form an estimate of the error variance as we did for simple regression

! 2 = ui2"( ) n # k #1( )

= SSR df

! df=# of observations - # of estimated parameters ! df=n-(k+1) 47

Error Variance Estimate (cont)

se ! j( ) = "

(xij # x j )2

i=1

n

$%

&'(

)*(1# Rj

2 )+

,-

.

/0

1/2

= " SSTj 1# Rj2( )+, ./

1 2

! Thus, an estimate of the standard error of the jth coefficient is:

48

The Gauss-Markov Theorem

! Allows us to say something about the efficiency of OLS -why we should use OLS instead of some other estimator

! Given our 5 Gauss-Markov Assumptions it can be shown that the OLS estimates are “BLUE”

Best: -Smallest variance among the class of linear unbiased estimators

where denotes any other unbiased estimator of β1.

!

Var( ˆ " 1) < Var( ˙ " 1)

!

˙ " 149

The Gauss-Markov Theorem Linear:

-Can be expressed as a linear function of y where w is some function of the sample x’s.

Unbiased:

Estimator: -provides an estimate of the underlying parameters *When assumptions hold, use OLS *When assumptions fail, OLS is not BLUE: a better

estimator may exist

!

ˆ " j = wij yii=1

n

#

!

E( ˆ " j ) = " j , for all j

50

multiple regression analysis - university of...

Documents