econometrics the simple regression modeldocentes.fe.unl.pt/~azevedoj/web...

SLR Model Gauss-Markov OLS Estimation More OLS Terminology Unbiasedness Variance Example

EconometricsThe Simple Regression Model

Joao Valle e Azevedo

Faculdade de EconomiaUniversidade Nova de Lisboa

Spring Semester

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, February 2011 1 / 35


Objectives

Objectives

Given the model

y = β0 + β1x + u

Where y is earnings and x is years of education

Or y is sales and x is spending in advertising

Or y is the market value of an apartment and x is its area

Our objectives for the moment will be:

Estimate the unknown parameters β0 and β1

Assess how much x explains y



Assumptions

The Gauss-Markov Assumptions

Assumption SLR.1 (Linearity in Parameters)

y = β0 + β1x + u

I y is the dependent variable (or explained variable, or response variable,or the regressand)

I x is the independent variable (or explanatory variable, or controlvariable, or the regressor)

I u is the error term or disturbance (y is allowed to vary for the samelevel of x)

This is a population equation. It is linear in the (unknown)parameters.



Assumptions


Assumption SLR.2 (Random Sampling) Random sample of size n,{(xi ,yi ): i=1,2,...,n} such that:

yi = β0 + β1xi + ui , i = 1, 2, ..., n

Assumption SLR.3 (Sample Variation in the ExplanatoryVariable)The sample outcomes on x, namely {xi : i=1,2,...,n}, are not all thesame value



Assumptions


Assumption SLR.4 (Zero Conditional Mean) The error u has anexpected value of zero given any value of the explanatory variable

E (u|x) = 0

Crucial Assumption!Remember the previous examples:

Earnings = β0 + β1Education + u

CrimeRate = β0 + β1Policeman + u

In these cases SLR.4 most likely fail



Assumptions


Assumption SLR.4’ (Zero Mean) The error u has an expected valueof zero given any value of the explanatory variable

E (u) = 0

Not really necessary given that SLR.4 implies SLR.4’

I This is not a restrictive assumption since we can always use β0 so thatSLR.4’ holds



Assumptions

Zero Conditional Mean

E (u|x) = 0⇒ E (y |x) = β0 + β1x

..

x1 x2

E(y|x) = b0 + b1x

y

f(y)

Figure: For any x, the distribution of y is centered about E(y |x)Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, February 2011 7 / 35


Assumptions

Graph of yi=β0+β1xi+ui

.

..

.

y4

y1

y2

y3

x1 x2 x3 x4

}

}

{

{

u1

u2

u3

u4

x

y xxyE 10)|(

Figure: Population regression line, data points and the associated error terms



Estimators

OLS Estimators

Want to estimate the population parameters from a sample

Let {(xi ,yi ): i=1,2,...,n} denote a random sample of size n from thepopulation

For each observation in this sample:

yi = β0 + β1xi + ui , i = 1, 2, ..., n

To derive the OLS estimates we need to realize that our mainassumption of E(u|x)=E(u)=0 also implies:

Cov(x , u) = E (xu) = 0

Remember from basic probability that Cov(x,y) = E(xy)-E(x)E(y)



Estimators

OLS EstimatorsTwo Restrictions:

E (u|x) = E (u) = 0

Cov(x,u) = E (xu) = 0

Since u = y − β0 − β1x :I Two Moment Conditions:

E (y − β0 − β1x) = 0

E [x(y − β0 − β1x)] = 0I Sample versions:

n−1n∑

i=1

(yi − β0 − β1xi ) = 0

n−1n∑

i=1

xi (yi − β0 − β1xi ) = 0



Estimators

OLS Estimators

We have two equations and two unknowns β0, β1

Pick β0 and β1 so that the sample moments match the populationmoments

From the first sample moment condition and given the definitionof a sample mean and properties of summation:

n−1n∑

i=1

(yi − β0 − β1xi ) = 0

⇔ y = β0 + β1x

⇔ β0 = y − β1x



Estimators

OLS EstimatorsFrom the second sample moment condition:

n−1n∑

i=1

xi (yi − β0 − β1xi ) = 0

Plug-in β0 ⇒n∑

i=1

xi (yi − (y − β1x)− β1xi ) = 0

This simplifies to⇒n∑

i=1

xi (yi − y) = β1

n∑i=1

xi (xi − x)

Rearranging⇒n∑

i=1

(xi − x)(yi − y) = β1

n∑i=1

(xi − x)2

Thus, β1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2



Estimators

OLS Estimators

β1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2

Need x to vary in our sample so that the slope parameter iswell-defined!

n∑i=1

(xi − x)2 > 0

Sample covariance between x and y divided by the sample variance ofx

If x and y are positively correlated, the slope will be positive

If x and y are negatively correlated, the slope will be negative

I It is an estimator if we treat (xi , yi ) as random variablesI It is an estimate once we substitute (xi , yi ) by the actual observations



Estimators

Example

ˆwagei = 245.073 + 52.513educi

n = 11064, Data from the Employment Survey (INE), 2003

wage is net monthly wage and educ measures the number of years ofschooling completed

So, we estimate a positive relation between these two variables:

I An additional year of schooling leads to an estimated expected increaseof 52.513 euros in the wage on average, ceteris paribus

I Caution about the interpretation of these estimates!



Definitions

Some Definitions

Intuitively, OLS is fitting a line through the sample points such that theSSR is as small as possible, hence the term least squares

The residual, u, is an estimate of the error term, u, and is the differencebetween the fitted line (sample regression function) and the sample point

Fitted Valueyi = β0 + β1xi

Residualui = yi − yi = yi − β0 − β1xi

Sum of Squared Residuals (SSR)

n∑i=1

ui2 =

n∑i=1

(yi − β0 − β1xi )2



Definitions

Graph of yi=β0+β1xi

.

..

.

y4

y1

y2

y3

x1 x2 x3 x4

}

}

{

{

û1

û2

û3

û4

x

y

xy 10ˆˆˆ

Figure: Sample regression line, sample data points and the associated estimatederror terms - the residuals



Definitions

Alternative approach to derive OLS

We want to choose our parameters such that we minimize the SSR:

SSR =n∑

i=1

ui2 =

n∑i=1

(yi − β0 − β1xi )2

δRSS

δβ0

= −2n∑

i=1

(yi − β0 − β1xi ) = 0

δRSS

δβ1

= −2n∑

i=1

xi (yi − β0 − β1xi ) = 0

These equations are equivalent to the same equations we had before:thus, same solution!



More Terminology

More TerminologyThink of each observation as being made up of an explained part and anunexplained part:

yi = yi + uiTotal Sum of Squares (SST)

n∑i=1

(yi − y)2

Explained Sum of Squares (SSE)

n∑i=1

(yi − y)2

Residual Sum of Squares (SSR)

n∑i=1

ui2



More Terminology

SST = SSE + SSR

SST =n∑

i=1

(yi − y)2 =n∑

i=1

[(yi − yi ) + (yi − y)]2

=n∑

i=1

[ui + (yi − y)]2

=n∑

i=1

ui2 + 2

n∑i=1

ui (yi − y) +n∑

i=1

(yi − y)2

=n∑

i=1

ui2 +

n∑i=1

(yi − y)2

= SSR + SSE



More Terminology

Goodness of Fit

How well does our sample regression line fit our sample data?

I If SSR is very small relative to SST (or SSE very large relative to SST),then a lot of the variation in the dependent variable y is explained bythe model

Can compute the fraction of the total sum of squares (SST) that isexplained by the model: the R-squared of the regression

R2 =SSE

SST= 1− SSR

SST

0 ≤ R2 ≤ 1

I The lower the SSR, the higher will be the R2

I OLS maximises the R2 (minimises SSR)



More Terminology

Goodness of Fit - Example (Cont.)

ˆwagei = 245.073 + 52.513educi


R2 = 0.262

A lot of variation in the dependent variable (wage) is left unexplainedby the model

This is not related to the fact that educ is most probably correlatedwith u

Can have low R2 even if all the assumptions hold true



Properties of OLS

Properties of OLS EstimatorsThought experiment:

I Get a sample of size n from the population many times (infinite times)I Compute the OLS estimatesI Is the average of these estimates equal to the true parameter values?I Under SLR.1 to SLR.4 it turns out that the answer is positive

β1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2

β0 = y − β1x

I These are estimators if we treat (xi , yi ) as random variables

Theorem

Under these assumptions:

I E(β0)=β0 and E(β1)=β1



Properties of OLS

Proof of Unbiasedness of OLS

β1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2

=

∑ni=1(xi − x)(yi − y)

SSTx

where:

SST =n∑

i=1

(xi − x)2

Note that:n∑

i=1

(xi − x) = 0 and

n∑i=1

(xi − x)xi =n∑

i=1

(xi − x)2



Properties of OLS

Proof of Unbiasedness of OLS (Cont.)Focusing on the numerator:

n∑i=1

(xi − x)(yi − y) =n∑

i=1

(xi − x)yi

=n∑

i=1

(xi − x)(β0 + β1xi + ui )

= β1SSTx +n∑

i=1

(xi − x)ui

Putting together:

β1 =β1SSTx +

∑ni=1(xi − x)ui

SSTx

= β1 +

∑ni=1(xi − x)ui

SSTx



Properties of OLS

Proof of Unbiasedness of OLS (Cont.)

β1 = β1 +

∑ni=1(xi − x)ui

SSTx

Now take expectations conditional on the values of the x’s, that is,treat the x ’s as constants. To avoid heavy notation that isn’t doneexplicitly

E (β1) = β1 +

∑ni=1(xi − x)E (ui )

SSTx

= β1 +

∑ni=1(xi − x)0

SSTx

= β1

And if the conditional expectation is a constant, the unconditinalexpectation is also that constant...



Properties of OLS

Proof of Unbiasedness of OLS (Cont.)

β0 = y − β1x

Now, it’s really easy to prove unbiasedness of β0. Try it yourself!

Unbiasedness Summary

I The OLS estimators of β0 and β1 are unbiased

I However, in a given sample we may be close or far from the trueparameter, unbiasedness is a minimal requirement

I Unbiasedness depends on the 4 assumptions: if any assumption fails,then OLS is not necessarily unbiased



Properties of OLS

Variance of the OLS Estimators

We know already that the sampling distribution of our estimators iscentered around the true parameter

But, is the distribution concentrated around the true parameter?

To answer this question, we add an additional assumption:

I Assumption SLR.5 (Homoskedasticity) The error u has the samevariance given any value of the explanatory variable.

Var(u|x) = σ2



Properties of OLS

Variance of the OLS Estimators

Remember that:

σ2 = Var(u|x) = E (u2|x)− [E (u|x)]2

= E (u2|x),given that E (u|x)=0

= E (u2) = Var(u)

I So, σ2 is also the unconditional variance, called the variance of the errorI σ, the square root of the error variance, is called the standard deviation

of the error

Can write:

E (y |x) = β0 + β1x and Var(y |x) = σ2



Properties of OLS

Homoskedastic Case

..

x1 x2

E(y|x) = b0 + b1x

y

f(y)

Figure: How spread out is the distribution of the estimator



Properties of OLS

Heteroskedastic Case

.

xx1 x2

f(y|x)

x3

..

E(y|x) = b0 + b1x

Figure: How spread out is the distribution of the estimator



Properties of OLS

Variance of the OLS Estimators (Cont.)

β1 = β1 +

∑ni=1(xi − x)ui

SSTx,where SSTx =

n∑i=1

(xi − x)2

Var(β1) = Var

(β1 +

1

SSTx

n∑i=1

(xi − x)ui

)

=

(1

SSTx

)2

Var

( n∑i=1

(xi − x)ui

)

=

(1

SSTx

)2 n∑i=1

(xi − x)2Var(ui )

=

(1

SSTx

)2 n∑i=1

(xi − x)2σ2 = σ2

(1

SSTx

)2

SSTx

This is a conditional variance!



Properties of OLS

Variance of the OLS Estimators (Cont.)

Var(β1) =σ2

SSTx,where SSTx =

n∑i=1

(xi − x)2

The larger the error variance, σ2, the larger the variance of the slopeestimator

The larger the variability in the xi , the smaller the variance of theslope estimate

SSTx tends to increase with the sample size n, so variance tends todecrease with the sample size

Can also show:

Var(β0) =n−1σ2

∑ni=1 x

2i∑n

i=1(xi − x)2



Properties of OLS

Variance of the OLS

Theorem

Under Assumptions SLR.1 through SLR.5

Var(β1) =σ2∑n

i=1(xi − x)2

Var(β0) =n−1σ2

∑ni=1 x

2i∑n

i=1(xi − x)2

conditional on the sample values {x1, x2, ...}



Properties of OLS

Estimating the Error Variance

In practice, the error variance σ2 is unknown since we don’t observethe errors, we must estimate it...

σ2 =

∑ni=1 ui

2

n − 2=

SSR

n − 2

Can be show that this is an unbiased estimator of the error variance

The so-called standard error of the regression (SER) is given by:

σ =√σ2

Recall that:

se(β1) =σ

[∑n

i=1(xi − x)2]1/2,and similarly for β0



Example

Example (Cont.)

ˆwagei = 245.073 + 52.513educi


R2 = 0.262

σ2 = 155366 and σ = 394.164

se(β0) = 7.575 and se(β1) = 0.837


econometrics the simple regression modeldocentes.fe.unl.pt/~azevedoj/web...

Documents