simple linear regression estimation and properties

Simple Linear RegressionEstimation and Properties

Outline• Review of the Reading• Estimate parameters using OLS• Other features of OLS– Numerical Properties of OLS– Assumptions of OLS– Goodness of Fit

Checking Understanding• What is the best estimate of E(Y)?• How would we find E(Y|Xi)?

• Y = B1 + B2X + u– What is B1?– What is B2?– What is u?

Checking Understanding• What is a z-score?

• What is the mean of z(x)?• What is the standard deviation of z(x)?

z(x) =x� x

�x

Checking Understanding• What is a z-score?

• Correlation:

r =

Pzxzy

n� 1

z(x) =x� x

�x

Checking Understanding• Correlation:

• The regression line in z-scores:

r =

Pzxzy

n� 1

zy = mzx


• The regression line in z-scores:• Can also be written as:

• Can also be written as:

r =

Pzxzy

n� 1

zy = mzxzy = mzx

zy = rzx


• The regression line in z-scores:• Can also be written as: • Can also be written as:

• Remember:

r =

Pzxzy

n� 1

zy = mzxzy = mzxzy = rzx

m =cov(X,Y )

var(x)

And What is Covariance?

• Cov(X,Y) = E[(X-E[X])(Y-E[Y])]• Cov(X,Y) = E[XY]-E[X]E[Y]• Covariance is positive if x and y are both

below their mean or both above their mean. It is negative if x is above its mean while y is below its mean or vice versa.

�xy = cov(X,Y ) = E[(X � µx)(Y � µy)]

�xy = cov(X,Y ) = E[(X � x)(Y � y)]

And What is Covariance?

• Cov(X,Y) = E[ ( X - E[X] ) ( Y - E[Y] ) ]• Cov(X,Y) = E[XY] - E[X] E[Y]• Covariance is positive if x and y are both

below their mean or both above their mean. It is negative if x is above its mean while y is below its mean or vice versa.

• But it has units. It is easy to interpret the sign, but hard to interpret the number

�xy = cov(X,Y ) = E[(X � µx)(Y � µy)]�xy = cov(X,Y ) = E[(X � x)(Y � y)]

Total Population of Money Spent and the Number of Votes

Effect of Money on Votes

Num

ber o

f Vot

es

0

12500

25000

37500

50000

Amount Spent- in millions0 3 5 8 10

What we can see from the graph

• We can see the average value of Y for each value of X– These are the conditional expected values E(Y|X)

• If we join the conditional values of Y given each value of X we get the – Population Regression Line

Population Regression Function and the Linear Model

• E(Y|Xi)=f(Xi)– The expected value of the distribution of Y,

given Xi is functionally related to Xi

• E(Y|Xi)=B1+B2Xi

Two interpretations of linearity• Linear in Variables

– Which of the following is linear in variables and why?:• E(Y|Xi)=B1+B2Xi

2

• E(Y|Xi)=B1+B2Xi

• Linear in Parameters– Which of the following is linear in parameters and why?

• E(Y|Xi)=B1+B2Xi2

• E(Y|Xi)=B1+B22Xi

• Why Should We Care?– Linear Regression Requires linearity in parameters only

Straight Line

Y=B1+B2Xi

Quadratic

Y=B1+B2X+B3X2

Adding in the Stochastic Term

• Yi=E(Y|Xi) + ui

• Systematic Component: E(Y|Xi)• Stochastic Disturbance: U

The Sample Regression Function (SRF)

• Because of sampling fluctuation, any sample will only approximate our true Population Regression Function

• Stochastic form of the SRF:

Primary Goal in Regression Analysis

• We want to estimate the PRF– Yi=B1+B2Xi+ui

• On the basis of the SRF

One method• Choose the Sample Regression Function

such that the sum of the residuals is as small as possible

Illustration and Problem

X

Y

u1=10

u2=-2

u3=2

u4=-10

Alternative Method• Ordinary Least Squares (OLS) is a method of

finding the linear model which minimizes the sum of the squared errors.

– Example: (10)2 + (-2)2 + (2)2 + (-10)2 = 208

• This method is the best, linear unbiased estimator

Good Spot for a break

Minimizing the Sum of Squares• Our goal is to minimize the sum of the

squared errors.

• Since we have two unknowns, B1 and B2, we need to take the partial derivatives for the following equation:

Partial Derivatives for B’s

• We start with our original equation:

• Now we take the partial derivatives– First equation is the partial derivative with respect to

B1,

– Second equation is with respect to B2

Set Equal to Zero• Last set of equations:

• Next:

The Normal Equations• Last:

• Divide both equations by –2• Multiply through• Separate summation terms and rearrange:

Rewriting the Equation• Last Equation:

• We can rewrite

Solving Equation• We have two equations with two unknowns, for which we

can use algebra

• Multiply first equation by sum of Xi and second by n• End up with…

Subtract first equation from second and rearranging

Last step• Last equation

• Multiply numerator and denominator by 1/n…recall that

• End up with

We can now solve for B1

• If we go back to the first normal equation:

What Does B2 Mean?

• Equation for B2 may not seem to make intuitive sense at first

• But if we break it down into pieces we can begin to see the logic

In sum…

• If the changes in X are EQUAL to the changes in y, then B2 = 1

• If the changes in Y are LARGER than the changes in X, then B2 > 1

• If the changes in Y are SMALLER than the changes in X, then B2 < 1

Let’s Do An Example!

Calculating a and b• Mean of X is 4• Mean of Y is 12.71429

Calculating B1 and B2

Which Looks Like…This!Regression of Y on X

0

8

15

23

30

0 2 4 6 8

Practice Problem• We have a sample of the amount of

money a each candidate spent in a state (in millions) and the percentage of the vote they received.

• Calculate the regression line and interpret.

Data

State % vote Money spentCA 40 10FL 35 12GA 15 4MO 20 6OH 40 11VT 25 8

Numerical Properties of OLS• Those properties that result from the method of

OLS– Expressed from observable quantities of X and Y– Point Estimator for B’s– Sample regression line passes through sample

means of Y and X– Sum of residuals is zero– Residuals are uncorrelated with the predicted Yi

– Residuals uncorrelated with Xi

Assumptions of Classical Linear Regression

• A1: Linear Regression Model-Linear in parameters

• A2: X values are fixed in repeated sampling.

• A3: Zero mean value of the disturbance term ui

• A4: Homoskedasticity or Equal Variance of ui.

More Assumptions• A5: No autocorrelation between disturbances

• A6: Zero covariance between ui and Xi

• A7: Number of observations n is greater than the number of parameters to be estimated

• A8: Variability in X values

More Assumptions• A9: Regression model is correctly

specified.– The correct variables are included– We have the correct functional form– Correct assumptions about the probability

distributions of Yi, Xi and ui.• A10: With multiple regression, we add the

assumption of no perfect multicollinearity

How “good” does it fit?

• To measure “reduction in errors” we need a benchmark for comparison.

• The mean of the dependent variable is a relevant and tractable benchmark for comparing predictions.

• The mean of Y represents our “best guess” at the value of Yi absent other information.

Sums of Squares

• This gives us the following 'sum-of-squares' measures:

• Total Variation = Explained Variation + Unexplained Variation

How well does our model perform?

• R squared statistic– = TSS-USS/TSS– =ESS/TSS• Bounded between 0 and 1• Higher values indicate a better fit• Lower values more unexplained than explained

variance

simple linear regression estimation and properties

Documents