christopher dougherty ec220 - introduction to econometrics (chapter 1) slideshow: deriving linear...
TRANSCRIPT
Christopher Dougherty
EC220 - Introduction to econometrics (chapter 1)Slideshow: deriving linear regression coefficients
Original citation:
Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 1). [Teaching Resource]
© 2012 The Author
This version available at: http://learningresources.lse.ac.uk/127/
Available in LSE Learning Resources Online: May 2012
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/
http://learningresources.lse.ac.uk/
0
1
2
3
4
5
6
0 1 2 3
1Y
2Y3Y
DERIVING LINEAR REGRESSION COEFFICIENTS
Y
X
This sequence shows how the regression coefficients for a simple regression model are derived, using the least squares criterion (OLS, for ordinary least squares)
1
uXY 21 True model:
0
1
2
3
4
5
6
0 1 2 3
1Y
2Y3Y
DERIVING LINEAR REGRESSION COEFFICIENTS
Y
X
We will start with a numerical example with just three observations: (1,3), (2,5), and (3,6).
2
uXY 21 True model:
0
1
2
3
4
5
6
0 1 2 3
1Y
2Y3Y
211̂ bbY 212 2ˆ bbY
213 3ˆ bbY
DERIVING LINEAR REGRESSION COEFFICIENTS
Y
b2b1
X
Writing the fitted regression as Y = b1 + b2X, we will determine the values of b1 and b2 that minimize RSS, the sum of the squares of the residuals.
3
^
uXY 21 True model:
Fitted line: XbbY 21ˆ
0
1
2
3
4
5
6
0 1 2 3
1Y
2Y3Y
211̂ bbY 212 2ˆ bbY
213 3ˆ bbY
Given our choice of b1 and b2, the residuals are as shown.
DERIVING LINEAR REGRESSION COEFFICIENTS
Y
b2b1
21333
21222
21111
36ˆ
25ˆ
3ˆ
bbYYe
bbYYe
bbYYe
4
X
uXY 21 True model:
Fitted line: XbbY 21ˆ
SIMPLE REGRESSION ANALYSIS
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
The sum of the squares of the residuals is thus as shown above.
5
21333
21222
21111
36ˆ
25ˆ
3ˆ
bbYYe
bbYYe
bbYYe
SIMPLE REGRESSION ANALYSIS
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
The quadratics have been expanded.
6
SIMPLE REGRESSION ANALYSIS
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
Like terms have been added together.
7
SIMPLE REGRESSION ANALYSIS
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
0281260 211
bb
bRSS
06228120 212
bb
bRSS
For a minimum, the partial derivatives of RSS with respect to b1 and b2 should be zero. (We should also check a second-order condition.)
8
SIMPLE REGRESSION ANALYSIS
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
The first-order conditions give us two equations in two unknowns.
9
0281260 211
bb
bRSS
06228120 212
bb
bRSS
SIMPLE REGRESSION ANALYSIS
0281260 211
bb
bRSS
06228120 212
bb
bRSS
50.1,67.1 21 bb
Solving them, we find that RSS is minimized when b1 and b2 are equal to 1.67 and 1.50, respectively.
10
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
0
1
2
3
4
5
6
0 1 2 3
1Y
2Y3Y
211̂ bbY 212 2ˆ bbY
213 3ˆ bbY
DERIVING LINEAR REGRESSION COEFFICIENTS
Y
b2b1
X
Here is the scatter diagram again.
11
uXY 21 True model:
Fitted line: XbbY 21ˆ
0
1
2
3
4
5
6
0 1 2 3
1Y
2Y3Y
17.31̂ Y67.4ˆ
2 Y
17.6ˆ3 Y
DERIVING LINEAR REGRESSION COEFFICIENTS
Y
X
The fitted line and the fitted values of Y are as shown.
12
1.501.67
uXY 21 True model:
Fitted line: XY 50.167.1ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
XXnX1
Y
1Y
nY
Now we will do the same thing for the general case with n observations.
13
uXY 21 True model:
DERIVING LINEAR REGRESSION COEFFICIENTS
XXnX1
Y
b1
1211̂ XbbY
1Y
b2
nY
nn XbbY 21ˆ
Given our choice of b1 and b2, we will obtain a fitted line as shown.
14
uXY 21 True model:
Fitted line: XbbY 21ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
XXnX1
Y
b1 nnnnn XbbYYYe
XbbYYYe
21
1211111
ˆ
.....
ˆ
1211̂ XbbY
1Y
b2
nY
1e
nn XbbY 21ˆ
The residual for the first observation is defined.
15
uXY 21 True model:
Fitted line: XbbY 21ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
Similarly we define the residuals for the remaining observations. That for the last one is marked.
XXnX1
Y
b1 nnnnn XbbYYYe
XbbYYYe
21
1211111
ˆ
.....
ˆ
1211̂ XbbY
1Y
b2
nY
1e
nenn XbbY 21
ˆ
16
uXY 21 True model:
Fitted line: XbbY 21ˆ
iiiiii
nnnnnn
nnn
XbbYXbYbXbnbY
XbbYXbYbXbbY
XbbYXbYbXbbY
XbbYXbbYeeRSS
212122
221
2
212122
221
2
1211121121
22
21
21
221
21211
221
222
222
...
222
)(...)(...
DERIVING LINEAR REGRESSION COEFFICIENTS
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
RSS, the sum of the squares of the residuals, is defined for the general case. The data for the numerical example are shown for comparison..
17
iiiiii
nnnnnn
nnn
XbbYXbYbXbnbY
XbbYXbYbXbbY
XbbYXbYbXbbY
XbbYXbbYeeRSS
212122
221
2
212122
221
2
1211121121
22
21
21
221
21211
221
222
222
...
222
)(...)(...
DERIVING LINEAR REGRESSION COEFFICIENTS
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
The quadratics are expanded.
18
iiiiii
nnnnnn
nnn
XbbYXbYbXbnbY
XbbYXbYbXbbY
XbbYXbYbXbbY
XbbYXbbYeeRSS
212122
221
2
212122
221
2
1211121121
22
21
21
221
21211
221
222
222
...
222
)(...)(...
Like terms are added together.
DERIVING LINEAR REGRESSION COEFFICIENTS
19
212122
21
212122
21
212122
21
212122
21
221
221
221
23
22
21
12622814370
63612936
42010425
2669
)36()25()3(
bbbbbb
bbbbbb
bbbbbb
bbbbbb
bbbbbbeeeRSS
DERIVING LINEAR REGRESSION COEFFICIENTS
iiiiii XbbYXbYbXbnbYRSS 212122
221
2 222
212122
21 12622814370 bbbbbbRSS
0281260 211
bb
bRSS
06228120 212
bb
bRSS
50.1,67.1 21 bb
Note that in this equation the observations on X and Y are just data that determine the coefficients in the expression for RSS.
20
DERIVING LINEAR REGRESSION COEFFICIENTS
iiiiii XbbYXbYbXbnbYRSS 212122
221
2 222
212122
21 12622814370 bbbbbbRSS
0281260 211
bb
bRSS
06228120 212
bb
bRSS
50.1,67.1 21 bb
The choice variables in the expression are b1 and b2. This may seem a bit strange because in elementary calculus courses b1 and b2 are usually constants and X and Y are variables.
21
DERIVING LINEAR REGRESSION COEFFICIENTS
iiiiii XbbYXbYbXbnbYRSS 212122
221
2 222
212122
21 12622814370 bbbbbbRSS
0281260 211
bb
bRSS
06228120 212
bb
bRSS
50.1,67.1 21 bb
However, if you have any doubts, compare what we are doing in the general case with what we did in the numerical example.
22
DERIVING LINEAR REGRESSION COEFFICIENTS
iiiiii XbbYXbYbXbnbYRSS 212122
221
2 222
212122
21 12622814370 bbbbbbRSS
0281260 211
bb
bRSS
06228120 212
bb
bRSS
50.1,67.1 21 bb
The first derivative with respect to b1.
23
02220 211
ii XbYnbbRSS
DERIVING LINEAR REGRESSION COEFFICIENTS
iiiiii XbbYXbYbXbnbYRSS 212122
221
2 222
212122
21 12622814370 bbbbbbRSS
0281260 211
bb
bRSS
06228120 212
bb
bRSS
50.1,67.1 21 bb
With some simple manipulation we obtain a tidy expression for b1 .
24
02220 211
ii XbYnbbRSS
ii XbYnb 21 XbYb 21
DERIVING LINEAR REGRESSION COEFFICIENTS
The first derivative with respect to b2.
25
iiiiii XbbYXbYbXbnbYRSS 212122
221
2 222
212122
21 12622814370 bbbbbbRSS
0281260 211
bb
bRSS
06228120 212
bb
bRSS
50.1,67.1 21 bb
02220 211
ii XbYnbbRSS
ii XbYnb 21 XbYb 21
02220 12
22
iiii XbYXXbbRSS
SIMPLE REGRESSION ANALYSIS
02220 12
22
iiii XbYXXbbRSS
012
2 iiii XbYXXb
iiiiii XbbYXbYbXbnbYRSS 212122
221
2 222
Divide through by 2.
26
02220 211
ii XbYnbbRSS
ii XbYnb 21 XbYb 21
02220 12
22
iiii XbYXXbbRSS
SIMPLE REGRESSION ANALYSIS
012
2 iiii XbYXXb
0)( 22
2 iiii XXbYYXXb
We now substitute for b1 using the expression obtained for it and we thus obtain an equation that contains b2 only.
27
iiiiii XbbYXbYbXbnbYRSS 212122
221
2 222
02220 211
ii XbYnbbRSS
ii XbYnb 21 XbYb 21
02220 12
22
iiii XbYXXbbRSS
02220 12
22
iiii XbYXXbbRSS
SIMPLE REGRESSION ANALYSIS
012
2 iiii XbYXXb
0)( 22
2 iiii XXbYYXXb
0)( 22
2 XnXbYYXXb iii
The definition of the sample mean has been used.
28
n
XX i
XnX i
02220 12
22
iiii XbYXXbbRSS
SIMPLE REGRESSION ANALYSIS
012
2 iiii XbYXXb
0)( 22
2 iiii XXbYYXXb
0)( 22
2 XnXbYYXXb iii
022
22 XnbYXnYXXb iii
The last two terms have been disentangled.
29
02220 12
22
iiii XbYXXbbRSS
SIMPLE REGRESSION ANALYSIS
012
2 iiii XbYXXb
0)( 22
2 iiii XXbYYXXb
0)( 22
2 XnXbYYXXb iii
022
22 XnbYXnYXXb iii
Terms not involving b2 have been transferred to the right side.
30
02220 12
22
iiii XbYXXbbRSS
YXnYXXnXb iii 222
SIMPLE REGRESSION ANALYSIS
To create space, the equation is shifted to the top of the slide.
31
YXnYXXnXb iii 222
YXnYXXnXb iii 222
SIMPLE REGRESSION ANALYSIS
Hence we obtain an expression for b2.
32
YXnYXXnXb iii 222
222 XnX
YXnYXb
i
ii
SIMPLE REGRESSION ANALYSIS
In practice, we shall use an alternative expression. We will demonstrate that it is equivalent.
33
YXnYXXnXb iii 222
222 XnX
YXnYXb
i
ii
22
XX
YYXXb
i
ii
SIMPLE REGRESSION ANALYSIS
Expanding the numerator, we obtain the terms shown.
34
YXnYXXnXb iii 222
222 XnX
YXnYXb
i
ii
22
XX
YYXXb
i
ii
YXnYX
YXnYnXXnYYX
YXnYXXYYX
YXYXYXYXYYXX
ii
ii
iiii
iiiiii
SIMPLE REGRESSION ANALYSIS
In the second term the mean value of Y is a common factor. In the third, the mean value of X is a common factor. The last term is the same for all i.
35
YXnYXXnXb iii 222
222 XnX
YXnYXb
i
ii
22
XX
YYXXb
i
ii
YXnYX
YXnYnXXnYYX
YXnYXXYYX
YXYXYXYXYYXX
ii
ii
iiii
iiiiii
SIMPLE REGRESSION ANALYSIS
We use the definitions of the sample means to simplify the expression.
36
YXnYXXnXb iii 222
222 XnX
YXnYXb
i
ii
22
XX
YYXXb
i
ii
YXnYX
YXnYnXXnYYX
YXnYXXYYX
YXYXYXYXYYXX
ii
ii
iiii
iiiiii
n
XX i
XnX i
SIMPLE REGRESSION ANALYSIS
Hence we have shown that the numerators of the two expressions are the same.
37
YXnYXXnXb iii 222
222 XnX
YXnYXb
i
ii
22
XX
YYXXb
i
ii
YXnYX
YXnYnXXnYYX
YXnYXXYYX
YXYXYXYXYYXX
ii
ii
iiii
iiiiii
SIMPLE REGRESSION ANALYSIS
The denominator is mathematically a special case of the numerator, replacing Y by X. Hence the expressions are quivalent.
38
YXnYXXnXb iii 222
222 XnX
YXnYXb
i
ii
22
XX
YYXXb
i
ii
YXnYXYYXX iiii 222 XnXXX ii
DERIVING LINEAR REGRESSION COEFFICIENTS
XXnX1
Y
b1
1211̂ XbbY
1Y
b2
nY
nn XbbY 21ˆ
The scatter diagram is shown again. We will summarize what we have done. We hypothesized that the true model is as shown, we obtained some data, and we fitted a line.
39
uXY 21 True model:
Fitted line: XbbY 21ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
XXnX1
Y
b1
1211̂ XbbY
1Y
b2
nY
nn XbbY 21ˆ
XbYb 21
We chose the parameters of the fitted line so as to minimize the sum of the squares of the residuals. As a result, we derived the expressions for b1 and b2.
40
22
XX
YYXXb
i
ii
uXY 21 True model:
Fitted line: XbbY 21ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
41
Typically, an intercept should be included in the regression specification. Occasionally, however, one may have reason to fit the regression without an intercept. In the case of a simple regression model, the true and fitted models become as shown.
uXY 2 True model:
Fitted line: XbY 2ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
42
iiiii XbYYYe 2ˆ
We will derive the expression for b2 from first principles using the least squares criterion. The residual in observation i is ei = Yi – b2Xi.
uXY 2 True model:
Fitted line: XbY 2ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
43
iiiii XbYYYe 2ˆ
n
ii
n
iii
n
ii
n
iii XbYXbYXbYRSS
1
222
12
1
2
1
22 2
With this, we obtain the expression for the sum of the squares of the residuals.
uXY 2 True model:
Fitted line: XbY 2ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
Differentiating with respect to b2, we obtain the first-order condition for a minimum.
44
iiiii XbYYYe 2ˆ
n
ii
n
iii
n
ii
n
iii XbYXbYXbYRSS
1
222
12
1
2
1
22 2
022d
d
11
22
2
n
iii
n
ii YXXb
bRSS
uXY 2 True model:
Fitted line: XbY 2ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
45
iiiii XbYYYe 2ˆ
n
ii
n
iii
n
ii
n
iii XbYXbYXbYRSS
1
222
12
1
2
1
22 2
022d
d
11
22
2
n
iii
n
ii YXXb
bRSS
n
ii
n
iii
X
YXb
1
2
12
Hence, we obtain the OLS estimator of b2 for this model.
uXY 2 True model:
Fitted line: XbY 2ˆ
DERIVING LINEAR REGRESSION COEFFICIENTS
46
iiiii XbYYYe 2ˆ
n
ii
n
iii
n
ii
n
iii XbYXbYXbYRSS
1
222
12
1
2
1
22 2
022d
d
11
22
2
n
iii
n
ii YXXb
bRSS
n
ii
n
iii
X
YXb
1
2
12 02
dd
1
222
2
n
iiXb
RSS
The second derivative is positive, confirming that we have found a minimum.
uXY 2 True model:
Fitted line: XbY 2ˆ
Copyright Christopher Dougherty 2011.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 1.3 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own and who feel that they might
benefit from participation in a formal course should consider the London School
of Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
20 Elements of Econometrics
www.londoninternational.ac.uk/lse.
11.07.25