ordinary least square estimation
TRANSCRIPT
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 1/30
1/30
EC114 Introduction to Quantitative Economics12. Ordinary Least Squares Estimation
Marcus Chambers
Department of EconomicsUniversity of Essex
24/26 January 2012
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 2/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 3/30
Ordinary Least Squares (OLS) Estimation 3/30
Recall that the population regression line is given by
E (Y ) = α + β X
and the sample regression line is given by
Y = a + bX
where a and b can be regarded as estimates of α and β .
Another way to think of these relationships is in terms of Y itself:
Y = α + β X + ,Y = a + bX + e,
where is the disturbance and e is the residual.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 4/30
Ordinary Least Squares (OLS) Estimation 4/30
It is clear that, if we vary the sample regression line in
some way, we will obtain a different set of residuals.
In other words, if we vary the estimation method for the
sample regression line, we will obtain a different set ofresiduals.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 5/30
Ordinary Least Squares (OLS) Estimation 5/30
The best known method of fitting a straight line to a scatterdiagram is Ordinary Least Squares (OLS).
The sample regression line is determined by the intercepta and the slope b.
A good criterion in the choice of a and b is to make theresiduals ‘small’ somehow.
Small residuals imply that the differences between theactual Y and the fitted Y are small.
The OLS method of estimation chooses a and b in order tominimize the sum of the squares of the residuals:
n
i=1
e2i = e21 + e22 + . . . + e2n.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
O di L S (OLS) E i i /
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 6/30
Ordinary Least Squares (OLS) Estimation 6/30
We know that ei = Y i − a− bX i so that the sum of squaredresiduals can be written
S =
i
e2i =
i
(Y i − a− bX i)2,
which is a function of a and b alone because Y i and X i arethe given data points.
The objective is to minimise S with respect to a and b.
To do this we need to partially differentiate S with respect toa and b and set these derivatives equal to zero:
∂ S ∂ a
= −2
(Y i − a− bX i) = 0,
∂ S
∂ b= −2
X i(Y i − a− bX i) = 0.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
O di L t S (OLS) E ti ti 7/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 7/30
Ordinary Least Squares (OLS) Estimation 7/30
As these derivatives are set equal to zero we can divideboth sides by −2 and re-arrange the terms to give:
Y i = na + b
X i,
X iY i = a
X i + b
X 2i ,
noting that
a = na.
These are known as the normal equations (but are not
related to the normal distribution).
Note that, because Y i − a− bX i = ei, we can also write the
first-order conditions in the form:
∂ S
∂ a= −2
ei = 0,
∂ S
∂ b= −2
X iei = 0.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Ordinary Least Squares (OLS) Estimation 8/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 8/30
Ordinary Least Squares (OLS) Estimation 8/30
We therefore have two equations in two unknowns which
we can solve for a and b.The extension question on Problem Set 12 deals with this
solution.
A compact representation of the solution is:
b =
xi yi x2i
, a = Y − b¯ X ,
where xi = X i − ¯ X , yi = Y i − Y , and ¯ X and Y are the sample
means of X and Y respectively.
The above expressions for a and b are the OLS estimators
of α and β .
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Ordinary Least Squares (OLS) Estimation 9/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 9/30
Ordinary Least Squares (OLS) Estimation 9/30
We can compute a and b from various sample sums,making use of the following:
x2i =
( X i − ¯ X )2 =
X 2i −
(
X i)2
n
=
X 2i − n¯ X 2,
xi yi =
( X i −
¯ X )(Y i −
¯Y ) =
X iY i −
X i
Y i
n
=
X iY i − n¯ X Y .
In view of this another common expression for b is:
b =
X iY i −
X i
Y i
n
X 2i −(
X i)2
n
.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Ordinary Least Squares (OLS) Estimation 10/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 10/30
Ordinary Least Squares (OLS) Estimation 10/30
The data on money stock (Y ) and GDP ( X ) in Table 9.1 ofThomas yield:
X i = 132.004,
X 2i = 1247.66,
X iY i = 220.956,
Y i = 23.718,
Y 2i = 45.154, n = 30.
Based on these quantities we obtain:
x2i =
X 2i −
(
X i)2
n
= 1247.66−132.0042
30 = 666.86,
xi yi =
X iY i −
X i
Y i
n
= 220.956−132.004× 23.718
30 = 116.60.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Ordinary Least Squares (OLS) Estimation 11/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 11/30
Ordinary Least Squares (OLS) Estimation 11/30
The slope coefficient is therefore
b =
xi yi x2i
= 116
.60
666.86 = 0.1748.
We also find that
¯ X =
132.004
30 = 4.40
, Y =
23.718
30 = 0.7906
and so the intercept is
a = Y − b¯ X = 0.7906− (0.1748× 4.40) = 0.0212.
The sample regression line is therefore
Y = 0.021 + 0.175 X .
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Ordinary Least Squares (OLS) Estimation 12/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 12/30
Ordinary Least Squares (OLS) Estimation 12/30
Note that the residuals (the vertical distance between thedata point and the line) are larger for the countries with
larger GDP ( X ).
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Ordinary Least Squares (OLS) Estimation 13/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 13/30
y q ( )
Note, too, that the sample regression line passes through
the point ¯ X , Y .
The reason can be seen by re-arranging the equation for a,
which givesY = a + b¯ X .
This is known as the point of sample means .
In our example this point is (4.40,0.791).
Note that the intercept a > 0, although its value is small.
We had expected a relationship of the form Y = bX ,suggesting a = 0.
Although a is small we will want to know whether it issignificantly different from zero – we shall consider testingthe hypothesis that a = 0 at a later point.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Ordinary Least Squares (OLS) Estimation 14/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 14/30
y q ( )
The value b = 0.175 means that the demand for money per
head will increase by $175 whenever GDP per head
increases by $1000.But a more interesting quantity is the income (GDP)
elasticity of the demand for money.
We can use the previous results to compute an estimate of
it – the required elasticity is given by the formula
η = dY
dX
X
Y .
However the elasticity varies along our sample regression
line because the values of X and Y vary along the line.
It is, however, common to evaluate η at the sample means
of X and Y , while dY /dX can be estimated by b.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Ordinary Least Squares (OLS) Estimation 15/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 15/30
In our case the elasticity evaluated at the sample means is
η = 0.175 4.40
0.791 = 0.973.
Thus we obtain a GDP elasticity close to unity.A 1% rise in GDP per head leads to a 0.97% rise in
demand for money per head.
It would be of interest to test the hypothesis that η = 1 and
we will examine how to do this later on in the term.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 16/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 16/30
So far we have fitted the sample regression line
ˆY =
0
.021
+ 0
.175
X
to our scatter of points in the money-income example.
The values of the intercept (0.021) and slope (0.175) were
obtained by the method of ordinary least squares (OLS)
which chooses these values so as to minimise the sum ofsquared residuals:
i
e2i =
i
(Y i − a− bX i)2.
But we might want to ask the question: how well does our
sample regression line fit the data?
Let’s begin by taking a look at the graph:
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 17/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 17/30
We can observe that the sample regression line passes‘fairly close’ to each point in the scatter, although with
greater dispersion for larger values of X (GDP per head).
We need, however, to be more precise about this; in other
words we need some sort of numerical measure ofgoodness of fit .
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 18/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 18/30
We will use the coefficient of determination , R2, which is
equal to the square of the correlation coefficient R, where
R =
( X − ¯ X )(Y − Y )
( X − ¯ X )2
(Y − Y )2
We know that−1 ≤
R≤ 1
, and so it follows that 0 ≤
R
2 ≤ 1
.In our example of the demand for money we found that R = 0.8787.
Hence the coefficient of determination must therefore be R2 = (0.8787)2 = 0.772.
In regression analysis it is possible to give a preciseinterpretation to the value 0.772 obtained for R2.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 19/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 19/30
Suppose we ask the question:
What proportion of the variation in the demand for
money in our 30 countries can be attributed to the
variation in GDP?
If our sample regression line is able to explain a high
proportion of the variation in the demand for money then itmust provide a good fit to the data.
Consider the next Figure, which refers to a single samplepoint, namely France, which is observation i = 8.
We have Y 8 = 2
.3912
; ˆY 8 =
1
.6776
; and the overall samplemean is Y = 0.7906.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 20/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 20/30
The diagram shows the fitted line, the sample mean line,the residual e8 = Y 8 − Y 8, as well as the deviations Y 8 − Y
and Y 8 − Y .
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 21/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 21/30
The variations in demand for money are measured relative
to the mean.The following relationship holds:
total = variation due + residual
variation to X variation
Y 8 − Y = Y 8 − Y + e8
1.6006 = 0.8870 + 0.7136
Such a relationship holds for all points in the sample, sothat we can write
(Y i − Y ) = (Y i − Y ) + ei, i = 1, . . . , n.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 22/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 22/30
Note that these variations can be positive or negative, and
that they only apply to a single point in the sample.
However, we require an overall measure for the entire sample, and when we talk about variation we usually havein mind a positive measure.
A measure of variation of Y taken over the entire sample isthe total sum of squares (SST):
n
i=1
(Y i − Y )2.
This is the total variation in Y that we attempt to explain byour regression line, and is always non-negative.
We have seen this sort of quantity before – dividing byn− 1 gives the sample variance.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 23/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 23/30
A sample-wide measure of the variation in Y due to X isgiven by the explained sum of squares (SSE):
n
i=1
(Y i − Y )2.
This quantity is also non-negative.
Finally, a measure of the total residual variation is theresidual sum of squares (SSR):
n
i=1
e2i ,
which is also non-negative.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 24/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 24/30
The following relationship holds:
total sum of =
explained sum +
residual sum
squares of squares of squares
ni=1
(Y i − Y )2 = n
i=1(Y i − Y )2 +
ni=1
e2i
SST = SSE + SSR
The extension question on Problem Set 12 deals with thisidentity.
These quantities are used to define the coefficient ofdetermination, R2, as follows:
R2 = variation in Y due to X
total variation in Y =
SSE
SST.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 25/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 25/30
Alternative (but equivalent) expressions for R2 include
R2 = 1− SSRSST
obtained by making the substitution SSE=SST−SSR.
Another expression is
R2 = b2
x2i y2i
where xi = X i − ¯ X and yi = Y i − Y .
The derivation of this last expression requires showing thatSSE= b2
x2i and noting that SST=
y2i (see the
extension question on Problem Set 12).
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 26/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 26/30
In our demand for money example we have already shown
that R2 = 0.772 by squaring the correlation coefficient
R = 0.8787
.However, we know that
b = 0.17485,
x2i = 666.86,
y2i = 26.403,
and hence an alternative derivation is
R2 = b2
x2i y2i
= 0.174852 × 666.86
26.403 = 0.772.
This implies that just over 77% of the variation in thedemand for money can be attributed to the variation inGDP.
Remember: 0 ≤ R2 ≤ 1.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 27/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 27/30
In (a) and (c) R2 = 1 because all points lie on a singlesample line – in (a) R = +1 and in (c) R = −1.
In (b) R = 0 due to the lack of association between the twovariables and hence R2 = 0.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Goodness-of-fit 28/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 28/30
The correlation coefficient, R, is a measure of the strengthof association between two variables, and says nothing
about the direction of causation (if any exists).
The coefficient of determination, R2, however, is based on
the regression model Y = α + β X + in which the
causation is assumed to go from X to Y .
However, we should be careful to refer to R2 as thepercentage of the variation in Y attributed to X rather thanexplained by X , because any such relationship could be
spurious.
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Computing OLS Estimates 29/30
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 29/30
In practice we use computer software for OLS calculations.
As an example, the Stata output for the money demandexample is of the form:
. regress m g
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 1, 28) = 94.88
Model | 20.3862321 1 20.3862321 Prob > F = 0.0000
Residual | 6.01600434 28 .214857298 R-squared = 0.7721
-------------+------------------------------ Adj R-squared = 0.7640
Total | 26.4022364 29 .910421946 Root MSE = .46353
------------------------------------------------------------------------------
m | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g | .1748489 .0179502 9.74 0.000 .1380795 .2116182
_cons | .0212579 .1157594 0.18 0.856 -.2158645 .2583803
------------------------------------------------------------------------------
Quite a lot of information is provided by default, but note
that the estimates a and b are given at the start of the finaltwo rows (under the heading ‘Coef.’).
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
Summary 30/30
Summary
8/11/2019 Ordinary Least Square Estimation
http://slidepdf.com/reader/full/ordinary-least-square-estimation 30/30
Summary
Ordinary Least Squares (OLS) Estimation
Goodness-of-fit
Next week:
Non-Linear Models
EC114 Introduction to Quantitative Economics 12. Ordinary Least Squares Estimation
1 Research Method Lecture 11-1 (Ch15) Instrumental Variables Estimation and Two Stage Least Square ©
1 Research Method Lecture 11-3 (Ch15) Instrumental Variables Estimation and Two Stage Least Square ©