2 simple regression model 29x09x2011
TRANSCRIPT
-
8/3/2019 2 Simple Regression Model 29x09x2011
1/35
1
2 The simple regression model
Version: 29-09-2011
Ezequiel Uriel
2.1 Definition of the simple regression model
Population model
In the simple regression model, the reference model or population model is the
following:
0 1y x u (1)
We are going to look at the different elements of the model (1) and the
terminology used to designate them. We are going to consider that the three variables of
the model (y, x and u) are random variables. In this model x is the unique factor toexplainy. All the other factors that affect toy are captured by u.
We typically refer to y as the endogenous (from the Greek: generated inside)
variable or dependent variable. Other denominations are also used to designate y: left-
hand side variable, explained variable, or regressand. In this model all these
denominations are equivalent, but in other models, as we will see later, there can be
some differences.
In the simple linear regression ofy onx, we typically refer tox as the exogenous
(from the Greek: generated outside) variable or independent variable. Other
denominations are also used to designate x: right-hand side variable, explanatory
variable, regressor, covariate, or control variable. All these denominations areequivalent, but in other models, as we will see later, there can be some differences.
The variable u represents factors other than x that affect y. It is denominated
error or random disturbance. The error term can also capture measurement error in the
dependent variable. The error is an unobservable variable.
The parameters 0 and 1 are fixed and unknown.
Population regression function
In order to define thepopulation regression function, we are previously going to
introduce two statistical assumptions.The first assumption to introduce is that the mean value of the error in the
population is 0. That is to say, the expectation ofu is 0:
( ) 0E u (2)
This is not a restrictive assumption, since we can always use0
to normalize
E(u) to 0. Let us suppose, for example, that ( ) 4E u , then we could redefine the modelin the following way:
0 1( 4)y x v
where 4v u . Therefore, the expectation of the new error, v, is 0 and the expectationofu has been absorbed by the intercept.
-
8/3/2019 2 Simple Regression Model 29x09x2011
2/35
2
The second assumption is about how x and u are related. Concretely, this
assumption states that
( ) 0E ux (3)
This is a crucial assumption in regression analysis. If this assumption were not
acceptable, it would be difficult to separate the effects ofx and u ony.
Taking into account the two assumptions together, we have
( | ) ( ) 0E u x E u , (4)
which implies
0 1( | )E y x x (5)
This equation is known as the population regression function (PRF) or
population line. Therefore,E(y|x), as it can be seen in figure 2.1, is a linear function ofx
with intercept 0 and slope 1 .
x
Figure 2.1. The population regression function (PRF)
The linearity means that a one-unit increase inx changes the expected value ofyby the amount 1 .
Now, let us suppose we have a random sample of size n {(yi ,xi): i = 1,,n}
from the population. In figure 2.2 the scatter diagram, corresponding to these data, have
been displayed.
-
8/3/2019 2 Simple Regression Model 29x09x2011
3/35
3
x
Figure 2.2. The scatter diagram
We can express the population model for each observation of the sample:
0 1 1,2, ,i i iy x u i n (6)
In figure 2.3 the population regression function and the scatter diagram are put
together, but it is important to keep in mind that 0 and 1 are fixed, but unknown.
According to the model it is possible, from a theoretical point of view, to make thefollowing decomposition:
( | ) 1,2, ,i i i iy E y x u i n (7)
that has been represented in the figure 2.3 for the observation i. However, from an
empirical point of view, it is not possible to do it, because 0 and 1 are unknown
parameters and iu is not observable.
-
8/3/2019 2 Simple Regression Model 29x09x2011
4/35
4
x
E(y/x)
i
xi
E(y i/xi)
u i
Figure 2.3. The population regression function and the scatter diagram
Sample regression function
The basic idea of regression is to estimate the population parameters, 0 and 1 ,
from a given sample.
The sample regression function (SRF) is the sample counterpart of the
population regression function (PRF). Since the SRFis obtained for a given sample, anew sample will generate different estimates.
The SRF, which is an estimation of the PRF, given by
0 1
i iy x (8)
allows to calculate the fitted value ( iy ) for y when ix x .In the SRF 0 and 1 are
estimators of the parameters 0 and 1 .
For each xi we have an observed value ( iy ) and a fitted value ( iy )
We call residual ( iu ) to the difference between iy and iy . That is
0 1
i i i i iu y y y x (9)
In other words, the residual iu is the difference between the fitted line and the
sample point, as it can be seen in figure 2.4. In this case, it is possible to calculate the
decomposition:
i i iy y u
for a given sample.
-
8/3/2019 2 Simple Regression Model 29x09x2011
5/35
5
i
i
x
i
xi
Figure 2.4. The sample regression function and the scatter diagram
To sum up, 0 , 1
, iy and iu are the sample counterpart of 0 , 1 , E(y|x) and
iu respectively. It is possible to calculate 0 and 1
for a given sample, but for each
sample the estimates will change. On the contrary, 0 and 1 are fixed, but unknown.
2.2 Deriving the Ordinary Least Squares Estimates
Different criteria of estimation
Before deriving the ordinary least squares estimates, we are going to examine
three alternative methods to illustrate the problem which we face. These three methods
have in common that they try to minimize, in some way, the value of the residuals as a
whole.
Criterion 1
A first criterion would consist of taking as estimators those values of0
and1
that make the sum of all the residuals as near to zero as possible. With this criterion the
expression to minimize would be the following:
Min.1
n
i
i
u
(10)
The main problem of this method of estimation is that residuals of different sign
can be compensated. Such situation can be observed graphically in the figure 2.5, in
which three aligned observations are graphed, ( 1 1,x y ), ( 2 2,x y ) and ( 3 3,x y ). In this case
the following happens:
3 12 1
2 1 3 1
y yy y
x x x x
-
8/3/2019 2 Simple Regression Model 29x09x2011
6/35
6
xx 1 x 2 x 3
Figure 2.5. The problems of criterion 1.
If a straight line is fitted so that it crosses through the three points, each one of
the residuals will take value zero, so that
1
0n
i
i
u
This fit could be considered optimal. But it is also possible to obtain3
1 0ii u , by rotating the straight line -from the point 2 2,x y - in any direction, as
figure 2 shows, because 3 1 u u . In other words, by rotating in this way the result3
1 0ii u is always obtained. This simple example shows us that this criterion is not
appropriate for the estimation of the parameters because, for any set of observations,
infinite straight lines exist, that satisfy this criterion,
Criterion 2
A form to avoid the compensation of positive residuals with negatives consists
of taking the absolute values from the residuals. In this case the following expression
would be minimized:
Min.1
n
i
i
u
(11)
Unfortunately, although the estimators thus obtained have some interesting
properties, their calculation is complicated, requiring the resolution of a problem of
linear programming or the application of a procedure of iterative calculation.
Criterion 3
A third method consists of minimizing the sum of the square residuals, that is to
say, minimizing
-
8/3/2019 2 Simple Regression Model 29x09x2011
7/35
7
2
1
. .n
i
i
Min S Min u
(12)
The estimators obtained are denominated least square estimators, and they enjoy
certain desirable statistical properties, that we will study later. On the other hand, as
opposed to first of the examined criteria, when we square the residuals the compensationof these is avoided, whereas, unlike the second of the criteria, the least square estimators
are simple to obtain. It is important to indicate that, from the moment at which we are
squaring the residuals, we are proportionally penalizing to the great residuals with
respect to the small ones (if a residual is the double of another one, its square will be
four times greater), which also characterizes the least square estimation with respect to
other possible methods.
Application of least square criterion
Now, we are going to look at the process to obtain least square estimators. The
objective is to minimize the sum of the square residuals (S). To do this, in the first placewe are going to express S as a function of the estimators, using (9):
Therefore, we must
0 1 0 1 0 1
2 2
0 1 , , ,1 1
min min min ( )T n
t i i
t i
S u y x
(13)
To minimize S, we differentiate partially with respect to 0 and 1
:
0 1
10
2 ( )
n
i i
i
Sy x
0 1
11
2 ( )
n
i i i
i
Sy x x
The least square estimators are obtained equalling to 0 the previous derivatives:
0
1
( ) 0n
i i i
i
y x
(14)
0
1
( ) 0n
i i i i
i
y x x
(15)
The equations (14)are denominated normal equations or LS first order
conditions.
In operations with summations, the following rules must be taken into account:
1
n
i
a na
1 1
n n
i i
i i
ax a x
-
8/3/2019 2 Simple Regression Model 29x09x2011
8/35
8
1 1 1
( )n n n
i i i i
i i i
x y x y
Operating with the normal equations, we have
0 1
1 1
n ni i
i i
y n x
(16)
2
0 1
1 1 1
n n n
i i i i
i i i
y x x x
(17)
Dividing both sides of (16) by n, we have
0 1 y x (18)
Therefore
0 1 y x (19)
Substituting this value of 0 in the second normal equation (17), we have
2
1 1
1 1 1
( )n n n
i i i i
i i i
y x y x x x
2
1 1
1 1 1 1
n n n n
i i i i i
i i i i
y x y x x x x
Solving for 1
1 11
2
1 1
n n
i i i
i i
n n
i i
i i
y x y x
x x x
(20)
Taking into account that
1 1 1 1 1
1 1
1 1 1 1 1 1
1 1
( )( ) ( )n n n n n
i i i i i i i i i i
i i i i i
n n
i in n n n n ni i
i i i i i i i i
i i i i i i
n n
i i i
i i
y y x x y x xy yx yx y x x y y x nyx
x x
y x ny y x ny y x y x y x y xn n
y x y x
2 2 2 2
1 1 1 1
2 2
1 1 1 1 1
( ) ( 2 ) 2
2
n n n n
i i i i i
i i i i
n n n n n
i i i i i
i i i i i
x x x xx xx x x x nxx
x x x x x x x x
-
8/3/2019 2 Simple Regression Model 29x09x2011
9/35
9
then (20) can be expressed in the following way:
1 1 11
2 2
1 1 1
( )( )
( )
n n n
i i i i i
i i i
n n n
i i ii i i
y x y x y y x x
x x x x x
(21)
Once 1 is calculated, then we can obtain 0
by using (19).
These are the Least Squares (LS) estimators. Since other more complicated
methods exist, also called least square methods, the method that we have applied is
denominated ordinary least square (OLS), due to its simplicity.
In the precedent epigraphs 0 and 1
have been used to designate generic
estimators. From now on with this notation we will only designate OLS estimators.
An alternative approach of estimationWe are going to see an alternative method to obtain estimators: the method of
moments. This method is a general method to estimate econometric models.
The main idea behind this method is quite intuitive and very simple. The
assumptions in an econometric model imply some restrictions on the means, variances
or covariances (i.e., moments) of the population. In the method of moments the
estimates are the values that satisfy these restrictions when we use sample means,
variances and covariances, instead of the population ones.
In the presentation of the population model in simple regression, we introduced
assumptions (2) and (3) that are restrictions to the model:
Taking into account that
0 1u y x
we can express our two restrictions just in terms ofx,y, 0 and 1 :
0 1( ) 0E y x (22)
0 1( ) 0E y x x (23)
The sample restrictions, that correspond with (22) and (23), are the following:
0
1
1( ) 0
n
i i i
i
y xn
(24)
0
1
1( ) 0
n
i i i i
i
y x xn
(25)
From the two previous equations the estimators0
and 1 by the method of the
moments are obtained.
But these two equations are precisely equal to the two normal equations - (14)and (15) -obtained by minimizing the sum of the square residuals. In any case, it must
-
8/3/2019 2 Simple Regression Model 29x09x2011
10/35
10
be kept in mind that the estimators obtained by the method of moments are conditional
to the restrictions imposed to the model. Consequently, the equality between both
approaches used is due to the two assumptions considered.
2.3 Properties of OLS on any sample of data
Algebraic properties of OLS estimators
The algebraic properties of OLS are properties derived exclusively from the
application of the method of OLS:
1. The sum of the OLS residuals is equal to 0:
1
0n
i
i
u
(26)
From the definition of residual
0 1 1,2, ,i i i i iu y y y x i n (27)
If we add for the n observations, we get
0 1
1 1
( ) 0n n
i i i
i i
u y x
(28)
which is precisely the first equation (14) of the system of normal equations.
Note that, if (26) holds, it implies that
1 1
n n
i i
i i
y y
(29)
and, dividing (26) and (29) by n, we obtain
0u y y (30)
2. The OLS line always goes through the mean of the sample ( ,y x ).
Effectively, dividing the equation (16) by n,we have:
0 1 y x (31)
3. The sample covariance between the regressor and the OLS residuals is zero.
The sample covariance between the regressor and the OLS residuals is equal to
1 1
1 1 1
( )( ) ( )
n n
i i i i
i ixu
n n n
i i i i i
i i i
x x u u x x u
Sn n
x u x u x u
n n
Therefore, if
-
8/3/2019 2 Simple Regression Model 29x09x2011
11/35
11
1
1
0n
i i
i
x u
= (32)
then the sample covariance between the regressor and the OLS residuals is equal to 0.
It can be seen that (32) is equal to the second normal equation,
0 1
1 1
( ) 0n n
i i i i i
i i
x u x y x
given in (15).
Therefore, the sample covariance between the regressor and the OLS residuals is
zero.
4. The sample covariance between the fitted values ( iy ) and the OLS residuals
iu is zero.
Analogous to property 3, the sample covariance between the fitted values ( iy )and the OLS residuals ( iu ) is equal to 0, if the following happens
1
0n
i i
i
y u
(33)
Proof
Taking into account the algebraic properties 1 - (26)- and 3 - (32)-, we have
0 1 0 1
1 1 1 1 0 1
( ) 0 00
n n n n
i i i i i i i
i i i iyu
y u x u u x u
S n n n n
Decomposition of the variance ofy
By definition
i i iy y u (34)
Subtracting y in both sides of the previous expression (remember that y is
equal toy ), we have
i i iy y y y u
Squaring both sides:
22 2 2 ( ) ( ) 2 ( )i i i i i i iy y y y u y y u u y y
Summing for all i:
2 2 2 ( ) 2 ( )i i i i iy y y y u u y y
Taking into account the algebraic properties 1 and 4, the third term of the right
hand side is equal to 0. Analytically,
-
8/3/2019 2 Simple Regression Model 29x09x2011
12/35
12
( ) 0i i i i iu y y u y y u (35)Therefore, we have
2 2 2 ( )i i iy y y y u (36)
It must be stressed that it is necessary to use the relation (26) to assure that (35)
is equal to 0. We must remember that (26) is associated to the first normal equation that
is to say, to the equation corresponding to the intercept. If in the fitted model there is no
intercept, then in general the decomposition obtained will not be fulfilled (36).
In words,
Sum of square total (SST) =
Sum of square explained (SSE)+ Sum of square residuals (SSR)
or,
SST=SSE+SSR
This decomposition can be done in terms of variances, dividing both sides of
(36) by n:
2 2 2 ( )i i iy y y y u
n n n
(37)
In words,
Total variance= explained variance + residual variance
Goodness of fit: Coefficient of determination (R2)
A priori we have obtained the estimators minimizing the sum of square
residuals.
Now, once the estimation has been done, we can look at how well our sample
regression line fits our data.
The statistics that indicate how well sample regression line fits to data are
denominated goodness of fit measures. We are going to look at the most known
measure, which is called coefficient of determination or the R-square (2
R ). This
measure is defined in the following way:
2
2 1
2
1
( )
( )
n
i
i
n
i
i
y y
R
y y
(38)
Therefore,2
R is the proportion of sum of square total that is explained by the
regression, that is to say, that is explained by the model. We can also say that 100 2R is
the percentage of the sample variation iny explained byx.
Alternatively, taking into account (36), we have:
-
8/3/2019 2 Simple Regression Model 29x09x2011
13/35
13
2 2 2 ( ) ( )i i iy y y y u Substituting in (38), we have
22 2 2
2 1
2 2 2
1 1 1
( ) ( ) 1
( ) ( ) ( )
n
ii i ii
n n n
i i i
i i i
y yy y u uR
y y y y y y
(39)
Therefore,2
R is equal to 1 minus the proportion of sum of square total that is
non-explained by the regression.
According to the definition of2
R , the following must be accomplished
20 1R
Extreme cases:
a) If we have a perfect fit, then 0u i would happen. This implies that
2 2 2 ( ) ( ) 1 i iy y i y y y y R
b) If y c i , it implies that
2 2 0 ( ) 0 0i iy c y y c c i y y R
If2
R is close to 0, it implies that we have a poor fit. In other words, very little
variation iny is explained byx.
In many cases, a high
2
R is obtained when the model is fitted using time seriesdata, due to the effect of a common trend. On the contrary, when we use cross sectioned
data a low value is obtained in many cases, but it does not mean that the fitted model is
bad.
Coefficient of determination and coefficient of correlation
In descriptive statistics, the coefficient of correlation is studied. It is calculated
according to the following formula:
1
2 2
1 1
1
2 2
1 1
( )( )
cov( , )var( ) var( )
( ) ( )
( )( )
( ) ( )
n
i i
i
xyn n
i i
i i
n
i i
i
n n
i i
i i
y y x x
x y nrx y
x x y y
n n
y y x x
x x y y
(40)
-
8/3/2019 2 Simple Regression Model 29x09x2011
14/35
14
What is the relationship between the coefficient of determination and the
coefficient of correlation?
Answer: the coefficient of determination is equal to the squared coefficient of
correlation. Analytically,
2 2xyr R (41)
(This equality is valid in simple regression model, but not in multiple regression
model)
Proof
In first place we are going to see an equivalence that is going to be used in the
proof. By definition,
0 1
i iy x
From the first normal equation, we have
0 1 y x
Subtracting the second equation of the first one:
1 ( )i iy y x x
Squaring both sides
2 2 2
1( ) ( )i iy y x x
and summing for all i, we have
2 2 2
1( ) ( )i iy y x x
Taking into account the previous equivalence, we have
2 2 2
12 1 1
2 2
1 1
2
2
1 1
22
2
11
2
1 2
2 2
1 1
( ) ( )
( ) ( )
( )( ) ( )
( )( )
( )( )1
( ) ( )
n n
i i
i i
n n
i i
i i
n n
i i ii i
nn
ii
ii
n
i i
i
xyn n
i i
i i
y y x x
R
y y y y
y y x x x x
y yx x
y y x x
r
x x y y
-
8/3/2019 2 Simple Regression Model 29x09x2011
15/35
15
2.4 Units of measurement and functional form
Units of Measurement
Changing the units of measurement (changing of scale)
If x is multiplied/divided by a constant, c 0, then the OLS slope isdivided/multiplied by the same constant, c. Thus
10
( )i iy x c
c
(42)
Example
Let us suppose the following estimated function of consumption, in
which both variables are measured in thousands of euros:
0.2 0.85i icons inc= + (43)
If we now express income in euros (multiplication by 1000) and call it
ince, the fitted model with the new units of measurement of income would
be the following:
0.2 0.00085i icons ince
As can be seen, changing the units of measurement of the explanatory variable
does not affect the intercept.
Ify is multiplied/divided by a constant, c 0, then the OLS slope and intercept
are both multiplied/divided by the same constant, c. Thus,
0 1 ( ) ( ) ( )i iy c c c x (44)
Example
If we express, in model (43), consumption in euros (multiplication by
1000) and call it conse, the fitted model with the new units of measurement
of consumption would be the following:
200 850i iconse inc
Changing the originIf one adds/subtracts a constant d to x and/or y, then the OLS slope is not
affected.
However, changing the origin of either x and/or y affects the intercept of the
regression.
If one subtracts a constant dtox, the intercept will change in the following way:
0 1 1 ( ) ( )i iy d x d (45)
Example
-
8/3/2019 2 Simple Regression Model 29x09x2011
16/35
16
Let us suppose that the average income is 20 thousand euros. If we define
the variablei iincd inc inc and both variables are measured in thousands
of euros, the fitted model with this change in the origin will be the following:
(0.2 0.85 20) 0.85 ( 20) 17.2 0.85i i icons inc incd
If one subtracts a constant dtoy, the intercept will change in the following way:
0 1 ( )
i iy d d x (46)
Example
Let us suppose that the average consumption is 15 thousands of euros. If
we define the variable i iconsd cons cons and both variables are
measured in euros, the fitted model with this change in the origin will be the
following:
15 0.2 15 0.85i icons inc
that is to say,
14.8 0.85i iconsd inc= - +
Eventually note that 2R is invariant to changes in the units ofx and/or y, and
also is invariant to the origin of the variables.
Functional Form
Linear relationships are not adequate in many cases for economic applications.
However, we can incorporate many nonlinearities (in variables) into simple regressionanalysis by appropriately redefining the dependent and independent variables.
Previously, before examining different functional form, we are going to look at
some definitions which will be useful in the exposition.
Some definitions
We are going to look at some definitions of variation measures that will be
useful in the interpretation of the coefficients corresponding to different functional
forms. Specifically, we will look at the following: absolute change, proportional change,
percentage change and changes in logarithms.
The absolute change between 1x and 0x is given by
1 1 0x x x (47)
This measure of variation is not very used by economists.
The proportional change (or relative rate of variation) between 1x and 0x is
given by:
1 01
0 0
x xx
x x
(48)
-
8/3/2019 2 Simple Regression Model 29x09x2011
17/35
17
Multiplying a proportional change by 100 apercentage change is obtained. That
is to say:
1
0
100 %x
x
(49)
In economic reports this is the most used measure.
The change in logarithms between 1x and 0x is given by
1 0log( ) log( ) log( )x x x (50)
This is also a variation rate, which is used in economics research. The
relationship between proportional change and change in logarithms can be seen if we
expand (50) by Taylor series:
11
00
1
0
11 0
0
2
1 1
10 0 1
0 1 0 1
3
1
30
1
0 1
1 1
0 0
log( ) log( ) log
1 1 1log(1) 1 1
2
1 2
13 2
11 1
2
xx
xx
x
x
xx x x
x x
xx x x
x x
x
x x
x
x x
x x
2 3
1
0
2 3
1 1 1
0 0 0
11
3
1 1
2 3
x
x
x x x
x x x
(51)
Therefore, if we take the linear approximation in this expansion, we have
1 11 0
0 0
log( ) log( ) log( ) logx x
x x xx x
(52)
This approximation is good when the proportional change is small, but the
differences can be important when the proportional change is big, as can see in table 1.
-
8/3/2019 2 Simple Regression Model 29x09x2011
18/35
18
Table 1. Examples of proportional change and change in logarithms
x1 202 210 220 240 300x0 200 200 200 200 200
Proportional change 0.010 0.050 0.100 0.200 0.500
Change in logarithms 0.010 0.049 0.095 0.182 0.405
Proportional change % 1% 5.0% 10.0% 20.0% 50.0%Change in logarithms % 1% 4.9% 9.5% 18.2% 40.5%
Elasticity is the ratio of the relative changes of two variables.
If we use proportional changes, the elasticity of the variabley with respect to the
variablex is given by
0/
0
y x
y
y
x
x
(53)
If we use changes in logarithms and consider infinitesimal changes, the elasticity
of the variabley with respect to a variablex is given by
/
log( )
log( )y x
dy
d yy
dx d x
x
(54)
In general, in the econometric models elasticity is defined by using (54).
Alternative functional forms
The OLS method can be applied without any problem to models in which the
transformations of the endogenous variable and/or of the exogenous variable have been
done. In the presentation of the model (1) we said that exogenous variable and regressor
were equivalent terms. But from now on, regressor is the specific form in which an
exogenous variable appears in the equation. For example, in the model
0 1 log( )y x u
the exogenous variable isx, but the regressor is log(x).
In the presentation of the model (1) we also said that endogenous variable andregressand were equivalent. But from now on, regressand is the specific form in which
an endogenous variable appears in the equation. For example, in the model
0 1log( )y x u
the endogenous variable isy, but the regressand is log(y).
The two previous models are linear in parameters, although they are not linear in
the variablex (the first one) or in the variabley (the second one). In any case, if a model
is linear in parameters, it can be estimated applying the OLS method. On the contrary, if
a model is not linear in parameters,iterative methods must be used in the estimation.However, there are certain nonlinear models that, by means of suitable
transformations, can become linear. These models are denominated linearizables.
-
8/3/2019 2 Simple Regression Model 29x09x2011
19/35
19
Thus, in some occasions potential models are postulated in economic theory, like
in the well-known production function of Cobb-Douglas. A potential model with a
unique explanatory variable is given by
1y Ax
If we introduce the error term in the following way
1 uy Ax e (55)
then, taking natural logarithms in both sides of (55), we obtain a linear model in
parameters:
0 1log( ) log( )y x u (56)
where we have called 0 to log(A).
On the contrary, if we introduce the error term in the following way
1y Ax u
then there is not a transformation that allows turning this model into a linear model.
This is a non linearizable model.
Now, we are going to consider some models with alternative functional forms,
all of them linear are in parameters. We will look at the interpretation of the coefficient
1 in each case.
a) Linear model
In this model, defined in (1), if the other factors included in u are held fixed, so
that the change in u is zero, 0u , thenx has a linear effect ony:
1 if 0y x u
Therefore,1
is the change in y (in the units in which y is measured) by a unit
change ofx (in the units in whichx is measured).
For example, in the fitted function (43) if income increases by 1 unit,
consumption will increase by 0,85 units.
The linearity of this model implies that a one-unit change in x has always the
same effect ony, regardless of the value ofx considered.
b) Linear log model
A linear log model is given by
0 1log( )y x u (57)
Taking first differences in (57) and doing 0u , we have
1 log( ) if 0y x u
Therefore, ifx increases by 1%, then y will increase by 1( /100) units, being
0u .
-
8/3/2019 2 Simple Regression Model 29x09x2011
20/35
20
c) Log linear model
A log linear model is given by
0 1log( )y x u (58)
This model can be obtained from the following one:
0 1exp( )y x u
by taking natural logs in both sides. For this reason this model is also called
exponential.
Taking first differences in (58) and doing 0u , we have
1log( ) if 0y x u
Therefore, ifx increases by 1 unit, then y will increase by 100 1 %, being
0u .d)Log log model
The model given in (56) is a log log model or, before the transformation, a
potential model (55). This model is also called constant elasticity model.
Taking first differences in (56) and doing 0u , we have
1log( ) log( ) if 0y x u
Therefore, ifx increases by 1%, theny will increase by 1 %, being 0u . It is
important to remark that, in this model, 1 is the elasticity ofy with respect tox, for any
value ofx andy. Consequently, in this model the elasticity is constant.
In table 2 and for the fitted model, the interpretation of 1 in these four models
is shown.
Table 2. Interpretation of 1 in different models
Model Ifx increases in theny will increases in
linear 1 unit1
units
linear log 1%1
( /100) units
log linear 1 unit1
(100 )%
log log 1%1
%
2.5 Statistical Properties of OLS
We are going now to study the statistical properties of OLS estimators, 0 and
1 , considered as estimators of the population parameters,0 and1.
The first property to study is the unbiasedness of the OLS.
Unbiasedness of the OLS
To proof this property we need formulate the following assumptions:Assumptions
-
8/3/2019 2 Simple Regression Model 29x09x2011
21/35
21
SLR.1: Linear in parameters
The population model is linear in parameters and given by
0 1y x u (59)
SLR.2: Random samplingWe have a random sample from of size n, {(y
i,x
i): i = 1,2,3,,n}, from the
population model.
Thus we can write the population model in terms of the sample,
0 1 1,2, ,i i iy x u i n (60)
SLR.3: Sample variation in the explanatory variable
The sample outcomes onx, namely , 1,2, ,ix i n are not all the same value.
This assumption implies that
2
2 1 0
n
i
iX
x x
Sn
(61)
SLR.4: Zero conditional mean
E(u|x) = 0
For a random sample, this assumption implies that
E(ui|x
i) = 0
, i = 1,2,3,,n
According to the SLR.4, all derivations using this assumption will be conditional
on the sample values,xs.
We are going to proof only the unbiasedness of the estimator 1 , which is the
most important. In order to do this proof, we need to rewrite our estimator in terms of
the population parameter. Start with a simple rewrite of the OLS formula (21) as
1 11
2 2
1 1
n n
i i i i
i i
n n
i ii i
x x y y x x y
x x x x
(62)
because 1 1
0 0n n
i i
i i
x x y y x x y
Substituting (60) into (62), we have
-
8/3/2019 2 Simple Regression Model 29x09x2011
22/35
22
0 1 1
1 1 11
2 2 2
1 1 1
21
1 1 11
2 2 2
1 1 1
( ) ( )
n n n
i i i i i i i i
i i i
n n n
i i i
i i i
n n n
i i i i i
i i i
n n n
i i i
i i i
x x y x x x u x x x u
x x x x x x
x x x x u x x u
x x x x x x
(63)
The estimator equals the population slope, 1, plus a term that is a linear
combination in the errors, {u1,,un}.
In the derivation of (63), we have taken into account that
0 0 01 1
0 0n n
i i
i i
x x x x
2
1 1 1 1 1
n n n n n
i i i i i i i i
i i i i i
x x x x x x x x x x x x x x x
THEOREM 2.1 Unbiasedness of OLS
Under assumptions SLR.1 to SLR.4 the expectations of 0 and 1
, conditional
tox, are unbiased.
Proof 1 is unbiased
1 11 1 1
2 2
1 1
11 1
2
1
|( | ) |
( )
n n
i i i i i
i i
n n
i i
i i
n
i i
i
n
i
i
x x u E x x u x
E X E X
x x x x
x x E u
x x
(64)
(From now on in order to simplify the notation, we will use 1
( )E , instead of1( | )E X , but understanding that is a conditional expectation.)
Therefore, the OLS estimate of1
is unbiased. Similarly, one can show that the
OLS estimator of0
is also unbiased. In any case, the proof of unbiasedness depends on
assumptions SLR.1 to SLR.4. If any assumption fails, then OLS estimators are not
necessarily unbiased. Remember unbiasedness is a general property of the estimator, but
in a given sample we may be near or far from the true parameter, but its distribution
will be centered at the population parameter.
Variance of the OLS estimators
-
8/3/2019 2 Simple Regression Model 29x09x2011
23/35
23
Now we know that the sampling distribution of our estimator is centered around
the true parameter. How spread out is this distribution? This will be a measure of
uncertainty.
In order to calculate the variance of 1 (and 0 ) an additional assumption is
needed.SLR.5: Homoskedasticity
Homoskedasticity implies that all errors have the same variance. That is to say,
2( | )var u x (65)
We add SLR.5 because it simplifies the variance calculations and because it
implies that OLS has certain efficiency properties.
The homoskedasticity assumption is quite distinct from the zero conditional
mean assumption, E(u|x)=0. SLR.4 involves the expected value of u, while SLR.5
concerns the variance ofu. Homoskedasticity plays no role in showing that 0 and 1
are unbiased.
Taking into account that ( ) 0E u x and 2 2( | ) ( )E u x E u -because E(xu)=0-,
we obtain
22 2
2 2 2
( | ( ( | ) 2 ( ( (
( | ) ( ( | ) ( ) ( )
var u x E u x E u x E u x E u x E u x E u x
E u x E u x E u x E u var u
2
2
| ) | ) | ) | )+ | )
| )
Thus, is the conditional variance, but also the unconditional variance. It is
called the error variance. The square root of the error variance ( is called thestandard deviation of the error.
We can say
0 1( )E y x x | and2
( | )var u x .
So, the conditional expectation ofy givenx is linear inx, but the variance ofy
given x is constant. When var(u|x) depends on x, the error term is said to exhibit
heteroskedasticity.
On the other hand,
2 2
0 1
2 2
0 1
( | ( | (
( ) (
var y x E y x E y x E y x x x
E y x x E u x var u x
| ) | ) + | )
- | ) | ) | )(66)
Since var(u|x)=var (y|x), heteroskedasticity is present whenever var(y|x) is a
function ofx.
THEOREM 2.2 Sampling variances of OLS estimators
Under assumptions SLR.1 to SLR.5 the variances of 1 and 0 , conditional on
{x1,,xn}, are the following:
-
8/3/2019 2 Simple Regression Model 29x09x2011
24/35
24
2
12
1
( )n
i
i
Var
x x
(67)
and
2 1 2
10
2
1
( )
n
i
i
n
i
i
n x
Var
x x
(68)
Proof of(67):
Taking into account (63)
1
22 1 1 1|
2 2
2
1
2 2 1
1 1
2
2 2
212
1
( | ) ( ) |
1| |
1|
1
X
n
i i ni
i in ni
i i
i i
n n
i i i j i jn
i i j
i
i
i
var X E E X
x x u
E X E x x u X
x x x x
E x x u x x x x u u X
x x
x x
2 2
212
1
2 2
212
1
2 22 2
2 2212
11
| |
1( ) ( )
1( )
n n
i i i j i jn
i i j
i
n n
i i i j i jn
i i j
i
i
n
i i nn
Xii
i
ii
E x x u X E x x x x u u X
x x E u x x x x E u u
x x
x x E unS
x xx x
(69)
where
2
2 1
n
i
iX
x x
Sn
is the sample variance ofX.
In the above proof we have taken into account that ( )i jE u u is equal to 0,
because the sampling is random and errors are therefore independent.
(From now on in order to simplify the notation, we will not use explicitely
conditional expectations)
We can draw the following conclusions of the last expression:
-
8/3/2019 2 Simple Regression Model 29x09x2011
25/35
25
1) The larger the error variance, , the larger the variance of theslope estimate. This is not at all surprising: more noise in the
equation, a larger , makes it more difficult to estimate the
effect ofx on y, and this is reflected in higher variances for the
OLS slope estimators. Since is a feature of the population,
it has nothing to do with the sample size.
2) The larger the variability in thexis, the smaller the variance of
the slope estimate. Everything else being equal, for estimating
1 we prefer to have as much sample variation inx as possible.
In any case it is not allowed by Assumption SLR.3 that 2XS =0.
3) The larger the sample size (n), the smaller the variance of theslope estimate.
We have a problem to calculate1
2 because the error variance,
, is unknown.
Estimating the Error Variance
We dont know what the error variance, , is, and we cannot estimate it from
the errors, ui, because we dont observe the errors.
Given that = E(u), an unbiased estimator would be
2
2 1
n
i
i
u
n
(70)
Unfortunately, this is not a true estimator, because we dont observe the errorsu
i. But, we do have estimates of the u
i, namely the OLS residuals
i.
The relation between errors and residuals is given by
0 1 0 1
0 0 1 1
i i i i i i
i i
u y y x u x
u x
(71)
Hence i
is not the same as ui, although the difference between them does have
an expected value of zero. If we replace the errors with the OLS residuals, we have
2
2 1
n
i
i
u
n
(72)
This is a true estimator, because it gives a computable rule for any sample of the
data,x andy. However, this estimator is biased, essentially because it does not account
for two restrictions that must be satisfied by the OLS residuals,
-
8/3/2019 2 Simple Regression Model 29x09x2011
26/35
26
1
1
0
0
n
i
i
n
i i
i
u
x u
(73)
One way to view these restrictions is the following one: if we know n2 of the
residuals, we can get the other two residuals by using the restrictions implied by the
moment conditions.
Thus, there are only n2 degrees of freedom in the OLS residuals, as opposed to
n degrees of freedom in the errors. The unbiased estimator of2
that we will use makes
an adjustment taking into account the degrees of freedom:
2
2 1
2
n
i
i
u
n
(74)
THEOREM 2.3 Unbiased estimator of2
Under assumptions SLR.1 to SLR.5
2 2( )E (75)
If 2 is plugged into the variance formulas we then have unbiased estimators of
var( 0 ) and var( 1
)
The natural estimator of is2 and is called the standard error of the
regression.
The square root of the variance of 1 , defined en (69), is called the standard
deviation of 1 , that is to say,
1
2
1
( )n
i
i
sd
x x
(76)
Therefore, its natural estimator is
1
2
1
( )n
i
i
se
x x
(77)
Note that 1( )se , the standard error of 1 , is viewed as a random variable
when we think of running OLS over different samples; this is because 1 varies with
different samples. The standard error of any estimate gives us an idea of how precise the
estimator is.
-
8/3/2019 2 Simple Regression Model 29x09x2011
27/35
27
2.6 Regression through the origin
If we force the regression line to pass through the point (0,0) we are constraining
the intercept to be zero, as it can be seen in figure 2.5. This is called a regression
through the origin.
x
Figure 2.5. A regression through the origin.
This is not very often done because, among other problems, when0
0 thenthe slope estimate will be biased.
We are going to how to estimate a regression line through the origin.
The fitted model is the following:
1
i iy x (78)
Therefore, we must minimize
0 0
2
1
1
min min ( )n
i i
i
S y x
(79)
To minimize S, we differentiate with respect to 1 and equal to 0:
1
11
2 ( ) 0n
i i i
i
Sy x x
(80)
Solving for 1
-
8/3/2019 2 Simple Regression Model 29x09x2011
28/35
28
11
2
1
n
i i
i
n
i
i
y x
x
(81)
Other problem with fitting a regression line through the origin is that, in general,happens:
2 2 2 ( )
i i iy y y y u If it is not possible the decomposition of the variance ofy in two components
(explained and residual), then the R2
is meaningless. This coefficient can take values
negative or greater than 1 in this case.
To sum up, it must be included an intercept in the regressions, unless there are
strong theoretical reasons from the economic theory- to exclude it.
-
8/3/2019 2 Simple Regression Model 29x09x2011
29/35
29
Case study. Engel's Curve. Demand of dairy products
The expression Engel's Curve makes reference to the line which shows the
relationship between various quantities of a good a consumer is willing to purchase at
varying income levels.
In a survey to 40 household, data about expenditure on dairy products andincome have been obtained. These data appear in table 3. In order to avoid distortions
due to the different size of households, both the consumption and the income have been
expressed in terms per capita. The data come expressed in thousands of euros monthly.
Table 3. Expenditure in dairy products (dairy), disposable income (inc) in terms per capita.
Unit: euros per month.
household dairy inc household dairy inc
1 8.87 1,250 21 16.20 2,100
2 6.59 985 22 10.39 1,470
3 11.46 2,175 23 13.50 1,225
4 15.07 1,025 24 8.50 1,380
5 15.60 1,690 25 19.77 2,450
6 6.71 670 26 9.69 910
7 10.02 1,600 27 7.90 690
8 7.41 940 28 10.15 1,450
9 11.52 1,730 29 13.82 2,275
10 7.47 640 30 13.74 1,620
11 6.73 860 31 4.91 740
12 8.05 960 32 20.99 1,125
13 11.03 1,575 33 20.06 1,335
14 10.11 1,230 34 18.93 2,875
15 18.65 2,190 35 13.19 1,680
16 10.30 1,580 36 5.86 870
17 15.30 2,300 37 7.43 1,620
18 13.75 1,720 38 7.15 960
19 11.49 850 39 9.10 1,125
20 6.69 780 40 15.31 1,875
There are multiple types of models that are used in the demand studies. In
particular in this case study we are going to consider the following models: linear,
inverse, semi-logarithmic, potential, exponential and exponential inverse. In the first
three models, the regressand of the equation is directly the endogenous variable,
whereas in the last three the regressand is the natural logarithm of the endogenous
variable.
In all the models we will calculate the marginal propensity to expenditure, aswell as the elasticity expenditure/income.
Linear model
The linear model for demand of dairy products will be the following:
0 1dairy inc u (82)
As it is known, the marginal propensity indicates us as expenditure changes
when income varies and it is obtained differentiating the expenditure with respect to
income in the demand equation. In the linear model the marginal propensity of the
expenditure is given by
-
8/3/2019 2 Simple Regression Model 29x09x2011
30/35
30
1
d dairy
d inc (83)
In other words, in the linear model the marginal propensity is constant and,
therefore, it is independent of the value that takes the income. The fact that it is constant
is an advantage, but at the same time it has the disadvantage of not being adapted todescribe the behaviour of the consumers, especially when there are important
differences in the income of the households. Thus, it is unreasonable that the marginal
propensity of expenditure on dairy products is the same in a low income family and
another family with great income. However, if the variation of the income is not very
high in the sample, a linear model can be used to describe the demand of certain goods.
In this model the elasticity expenditure/income is the following:
/ 1
lineardairy inc
d dairy inc inc
d inc dairy dairy (84)
Estimating the model (82) with the data of table 3, we obtain
24.012 0.005288 0.4584dairy inc R (85)
Inverse model
In an inverse model there is a linear relationship between expenditure and the
inverse income. Therefore, this model is directly linear in the parameters and it is
expressed in the following way:
0 1
1dairy u
inc
(86)
The sign of the coefficient 1 will be negative if the income is correlated
positively with the expenditure. It is easy to see that, when the income tends towards
infinite, the expenditure tends towards a limit which is equal to . In other words,
represents the maximum consumption of this good.
In figure 2.6, we can see a double representation of the population function
corresponding to this model. In the first one, the relationship between the dependent
variable and explanatory variable has been represented. In the second one, the
relationship between the regressand and regressor has been represented. As can be seen,
the second function is linear.
Figure 2.6. The inverse model
In the inverse model, the marginal propensity to expenditure is given by
-
8/3/2019 2 Simple Regression Model 29x09x2011
31/35
31
1 2
1
( )
d dairy
d inc inc (87)
According to (87), the propensity is inversely proportional to the square of the
income level.
On the other hand, the elasticity is inversely proportional to the product ofexpenditure and income, as can be seen in the following expression:
/ 1
1
inv
dairy inc
d dairy inc
d inc dairy inc dairy
(88)
Estimating the model (86) with the data of table 3, we obtain
2118.652 8702 0.4281dairy Rinc
(89)
In this case the coefficient 1 does not have an economic meaning.
Linear-log model
This model is denominated linear-log model, because the expenditure is a linear
function of the logarithm of income, that is to say,
0 1 log( )dairy inc u (90)
In this model the marginal propensity to expenditure is given by
11 1
log( )d dairy d dairy inc d dairyd inc d inc inc d inc inc inc
(91)
and the elasticity expenditure/income by
/ 1
1 1
log( )
lin-log
dairy inc
d dairy inc d dairy
d inc dairy d inc dairy dairy (92)
As can be seen, in the linear-log model the marginal propensity is inversely
proportional to the level of income, while the elasticity is inversely proportional to the
level of expenditure on dairy products.
In figure 2.7, we can see a double representation of the population function
corresponding to this model.
Figure 2.7. The linear log model
-
8/3/2019 2 Simple Regression Model 29x09x2011
32/35
32
Estimating the model (90) with the data of table 3, we obtain
241.623 7.399 log( ) 0.4567dairy inc R (93)
The interpretation of 1 is the following: if the income increases by 1%, the
demand of dairy products will increase by 0.07399 euros.
Log-log model or potential model
This exponential model is defined in the following way:
1 udairy Ainc e (94)
This model is not linear in the parameters, but linearizable, by taking natural
logarithms, obtaining the model:
0 1log( ) log( )dairy inc u (95)
where0=log(A)
This model is also called log-log model, because it is the structure of the
corresponding linearized model.
In this model the marginal propensity to expenditure is given by
1
d dairy dairy
d inc inc (96)
In the log-log model, the elasticity is constant. Therefore, if the income increasesby 1%, the expenditure will increase by 1%, since
/ 1
log( )
log( )
log-log
dairy inc
d dairy inc d dairy
d inc dairy d inc (97)
In figure 2.8, we can see a double representation of the population function
corresponding to this model.
Figure 2.8. The log log model
Estimating the model (95) with the data of table 3, we obtain
2log( ) 2.556 0.6866 log( ) 0.5190dairy inc R (98)
-
8/3/2019 2 Simple Regression Model 29x09x2011
33/35
33
In this case 1 is the elasticity expenditure/income. Its interpretation is the
following: if the income increases by 1%, the demand of dairy products will increase by
0.68%.
Log linear or exponential model
This exponential model is defined in the following way:
0 1exp( )dairy inc u (99)
By taking natural logarithms in both sides of (99), we obtain the following
model that is linear in the parameters:
0 1log( )dairy inc u (100)
In this model the marginal propensity to expenditure is given by
1d dairy dairyd inc
(101)
In the exponential model, unlike other models seen previously, the marginal
propensity increases when the level of expenditure does. For this reason, this model is
adequate to describe the demand of luxury products. On the other hand, the elasticity is
proportional to the level of income:
/ 1
log( )
exp
dairy inc
d dairy inc d dairyinc inc
d inc dairy d inc (102)
In figure 2.9, we can see a double representation of the population functioncorresponding to this model.
Figure 2.9. The log linear model
Estimating the model (100) with the data of table 3, we obtain
2log( ) 1.694 0.00048 0.4978dairy inc R (103)
The interpretation of 1 is the following: if the income increases by 1 euro the
demand of dairy products will increase by 0.048%.
Inverse exponential model
The inverse exponential model, that is a mixture of the exponential model and
the inverse model, is given by
-
8/3/2019 2 Simple Regression Model 29x09x2011
34/35
34
0 1
1exp( )dairy u
inc (104)
By taking natural logarithms in both sides of (104), we obtain the following
model that is linear in the parameters:
0 1
1log( )dairy u
inc (105)
In this model the marginal propensity to expenditure is given by
1 2
( )
d dairy dairy
d inc inc (106)
and the elasticity by
/ 1
log( ) 1
invexp
dairy inc
d dairy inc d dairyincd inc dairy d inc inc (107)
Estimating the model (105) with the data of table 3, we obtain
21log( ) 3.049 822.02 0.5040dairy Rinc
(108)
In this case, as in the inverse model, the coefficient 1 does not have an
economic meaning.
In table 4, the results of marginal propensity, elasticity expenditure/income andR
2in the six fitted models are shown
Table 4. Marginal propensity, elasticity expenditure/income andR2 in the fitted models.
Model Marginal propensity Elasticity R2
Linear 1 =0.0053 1
inc
dairy =0.6505 0.4440
Inverse1 2
1
inc
=0.00441
1dairy inc
=0.53610.4279
Linear log 11
inc =0.0052 1
1dairy
=0.64410.4566
Log log1
dairy
inc =0.0056 1
=0.6864 0.5188
Log linear 1 dairy =0.0055
1 inc =0.6783 0.4976
Inverse log 1 2 dairy
inc
=0.00471
1
inc
=0.5815 0.5038
-
8/3/2019 2 Simple Regression Model 29x09x2011
35/35
The R2 obtained in the three first models are not comparable with the R
2
obtained in the three last one because the functional form of the regressand is different:
y in the three first models and log(y) in the three last ones.
Comparing the three first models the best one is the linear log model, if we use
the R2
as goodness of fit measure. Comparing the three last models the best one is the
log log model. If we had used the measure Akaike Information Criterion (AIC), which
allows comparing models with different functional forms for the regressand, then the
log log model would have been the best among the six models fitted.