2 simple regression model 29x09x2011

8/3/2019 2 Simple Regression Model 29x09x2011

1/35

1

2 The simple regression model

Version: 29-09-2011

Ezequiel Uriel

2.1 Definition of the simple regression model

Population model

In the simple regression model, the reference model or population model is the

following:

0 1y x u (1)

We are going to look at the different elements of the model (1) and the

terminology used to designate them. We are going to consider that the three variables of

the model (y, x and u) are random variables. In this model x is the unique factor toexplainy. All the other factors that affect toy are captured by u.

We typically refer to y as the endogenous (from the Greek: generated inside)

variable or dependent variable. Other denominations are also used to designate y: left-

hand side variable, explained variable, or regressand. In this model all these

denominations are equivalent, but in other models, as we will see later, there can be

some differences.

In the simple linear regression ofy onx, we typically refer tox as the exogenous

(from the Greek: generated outside) variable or independent variable. Other

denominations are also used to designate x: right-hand side variable, explanatory

variable, regressor, covariate, or control variable. All these denominations areequivalent, but in other models, as we will see later, there can be some differences.

The variable u represents factors other than x that affect y. It is denominated

error or random disturbance. The error term can also capture measurement error in the

dependent variable. The error is an unobservable variable.

The parameters 0 and 1 are fixed and unknown.

Population regression function

In order to define thepopulation regression function, we are previously going to

introduce two statistical assumptions.The first assumption to introduce is that the mean value of the error in the

population is 0. That is to say, the expectation ofu is 0:

( ) 0E u (2)

This is not a restrictive assumption, since we can always use0

to normalize

E(u) to 0. Let us suppose, for example, that ( ) 4E u , then we could redefine the modelin the following way:

0 1( 4)y x v

where 4v u . Therefore, the expectation of the new error, v, is 0 and the expectationofu has been absorbed by the intercept.


2/35

2

The second assumption is about how x and u are related. Concretely, this

assumption states that

( ) 0E ux (3)

This is a crucial assumption in regression analysis. If this assumption were not

acceptable, it would be difficult to separate the effects ofx and u ony.

Taking into account the two assumptions together, we have

( | ) ( ) 0E u x E u , (4)

which implies

0 1( | )E y x x (5)

This equation is known as the population regression function (PRF) or

population line. Therefore,E(y|x), as it can be seen in figure 2.1, is a linear function ofx

with intercept 0 and slope 1 .

x

Figure 2.1. The population regression function (PRF)

The linearity means that a one-unit increase inx changes the expected value ofyby the amount 1 .

Now, let us suppose we have a random sample of size n {(yi ,xi): i = 1,,n}

from the population. In figure 2.2 the scatter diagram, corresponding to these data, have

been displayed.


3/35

3

x

Figure 2.2. The scatter diagram

We can express the population model for each observation of the sample:

0 1 1,2, ,i i iy x u i n (6)

In figure 2.3 the population regression function and the scatter diagram are put

together, but it is important to keep in mind that 0 and 1 are fixed, but unknown.

According to the model it is possible, from a theoretical point of view, to make thefollowing decomposition:

( | ) 1,2, ,i i i iy E y x u i n (7)

that has been represented in the figure 2.3 for the observation i. However, from an

empirical point of view, it is not possible to do it, because 0 and 1 are unknown

parameters and iu is not observable.


4/35

4

x

E(y/x)

i

xi

E(y i/xi)

u i

Figure 2.3. The population regression function and the scatter diagram

Sample regression function

The basic idea of regression is to estimate the population parameters, 0 and 1 ,

from a given sample.

The sample regression function (SRF) is the sample counterpart of the

population regression function (PRF). Since the SRFis obtained for a given sample, anew sample will generate different estimates.

The SRF, which is an estimation of the PRF, given by

0 1

i iy x (8)

allows to calculate the fitted value ( iy ) for y when ix x .In the SRF 0 and 1 are

estimators of the parameters 0 and 1 .

For each xi we have an observed value ( iy ) and a fitted value ( iy )

We call residual ( iu ) to the difference between iy and iy . That is

0 1

i i i i iu y y y x (9)

In other words, the residual iu is the difference between the fitted line and the

sample point, as it can be seen in figure 2.4. In this case, it is possible to calculate the

decomposition:

i i iy y u

for a given sample.


5/35

5

i

i

x

i

xi

Figure 2.4. The sample regression function and the scatter diagram

To sum up, 0 , 1

, iy and iu are the sample counterpart of 0 , 1 , E(y|x) and

iu respectively. It is possible to calculate 0 and 1

for a given sample, but for each

sample the estimates will change. On the contrary, 0 and 1 are fixed, but unknown.

2.2 Deriving the Ordinary Least Squares Estimates

Different criteria of estimation

Before deriving the ordinary least squares estimates, we are going to examine

three alternative methods to illustrate the problem which we face. These three methods

have in common that they try to minimize, in some way, the value of the residuals as a

whole.

Criterion 1

A first criterion would consist of taking as estimators those values of0

and1

that make the sum of all the residuals as near to zero as possible. With this criterion the

expression to minimize would be the following:

Min.1

n

i

i

u

(10)

The main problem of this method of estimation is that residuals of different sign

can be compensated. Such situation can be observed graphically in the figure 2.5, in

which three aligned observations are graphed, ( 1 1,x y ), ( 2 2,x y ) and ( 3 3,x y ). In this case

the following happens:

3 12 1

2 1 3 1

y yy y

x x x x


6/35

6

xx 1 x 2 x 3

Figure 2.5. The problems of criterion 1.

If a straight line is fitted so that it crosses through the three points, each one of

the residuals will take value zero, so that

1

0n

i

i

u

This fit could be considered optimal. But it is also possible to obtain3

1 0ii u , by rotating the straight line -from the point 2 2,x y - in any direction, as

figure 2 shows, because 3 1 u u . In other words, by rotating in this way the result3

1 0ii u is always obtained. This simple example shows us that this criterion is not

appropriate for the estimation of the parameters because, for any set of observations,

infinite straight lines exist, that satisfy this criterion,

Criterion 2

A form to avoid the compensation of positive residuals with negatives consists

of taking the absolute values from the residuals. In this case the following expression

would be minimized:

Min.1

n

i

i

u

(11)

Unfortunately, although the estimators thus obtained have some interesting

properties, their calculation is complicated, requiring the resolution of a problem of

linear programming or the application of a procedure of iterative calculation.

Criterion 3

A third method consists of minimizing the sum of the square residuals, that is to

say, minimizing


7/35

7

2

1

. .n

i

i

Min S Min u

(12)

The estimators obtained are denominated least square estimators, and they enjoy

certain desirable statistical properties, that we will study later. On the other hand, as

opposed to first of the examined criteria, when we square the residuals the compensationof these is avoided, whereas, unlike the second of the criteria, the least square estimators

are simple to obtain. It is important to indicate that, from the moment at which we are

squaring the residuals, we are proportionally penalizing to the great residuals with

respect to the small ones (if a residual is the double of another one, its square will be

four times greater), which also characterizes the least square estimation with respect to

other possible methods.

Application of least square criterion

Now, we are going to look at the process to obtain least square estimators. The

objective is to minimize the sum of the square residuals (S). To do this, in the first placewe are going to express S as a function of the estimators, using (9):

Therefore, we must

0 1 0 1 0 1

2 2

0 1 , , ,1 1

min min min ( )T n

t i i

t i

S u y x

(13)

To minimize S, we differentiate partially with respect to 0 and 1

:

0 1

10

2 ( )

n

i i

i

Sy x

0 1

11

2 ( )

n

i i i

i

Sy x x

The least square estimators are obtained equalling to 0 the previous derivatives:

0

1

( ) 0n

i i i

i

y x

(14)

0

1

( ) 0n

i i i i

i

y x x

(15)

The equations (14)are denominated normal equations or LS first order

conditions.

In operations with summations, the following rules must be taken into account:

1

n

i

a na

1 1

n n

i i

i i

ax a x


8/35

8

1 1 1

( )n n n

i i i i

i i i

x y x y

Operating with the normal equations, we have

0 1

1 1

n ni i

i i

y n x

(16)

2

0 1

1 1 1

n n n

i i i i

i i i

y x x x

(17)

Dividing both sides of (16) by n, we have

0 1 y x (18)

Therefore

0 1 y x (19)

Substituting this value of 0 in the second normal equation (17), we have

2

1 1

1 1 1

( )n n n

i i i i

i i i

y x y x x x

2

1 1

1 1 1 1

n n n n

i i i i i

i i i i

y x y x x x x

Solving for 1

1 11

2

1 1

n n

i i i

i i

n n

i i

i i

y x y x

x x x

(20)

Taking into account that

1 1 1 1 1

1 1

1 1 1 1 1 1

1 1

( )( ) ( )n n n n n

i i i i i i i i i i

i i i i i

n n

i in n n n n ni i

i i i i i i i i

i i i i i i

n n

i i i

i i

y y x x y x xy yx yx y x x y y x nyx

x x

y x ny y x ny y x y x y x y xn n

y x y x

2 2 2 2

1 1 1 1

2 2

1 1 1 1 1

( ) ( 2 ) 2

2

n n n n

i i i i i

i i i i

n n n n n

i i i i i

i i i i i

x x x xx xx x x x nxx

x x x x x x x x


9/35

9

then (20) can be expressed in the following way:

1 1 11

2 2

1 1 1

( )( )

( )

n n n

i i i i i

i i i

n n n

i i ii i i

y x y x y y x x

x x x x x

(21)

Once 1 is calculated, then we can obtain 0

by using (19).

These are the Least Squares (LS) estimators. Since other more complicated

methods exist, also called least square methods, the method that we have applied is

denominated ordinary least square (OLS), due to its simplicity.

In the precedent epigraphs 0 and 1

have been used to designate generic

estimators. From now on with this notation we will only designate OLS estimators.

An alternative approach of estimationWe are going to see an alternative method to obtain estimators: the method of

moments. This method is a general method to estimate econometric models.

The main idea behind this method is quite intuitive and very simple. The

assumptions in an econometric model imply some restrictions on the means, variances

or covariances (i.e., moments) of the population. In the method of moments the

estimates are the values that satisfy these restrictions when we use sample means,

variances and covariances, instead of the population ones.

In the presentation of the population model in simple regression, we introduced

assumptions (2) and (3) that are restrictions to the model:

Taking into account that

0 1u y x

we can express our two restrictions just in terms ofx,y, 0 and 1 :

0 1( ) 0E y x (22)

0 1( ) 0E y x x (23)

The sample restrictions, that correspond with (22) and (23), are the following:

0

1

1( ) 0

n

i i i

i

y xn

(24)

0

1

1( ) 0

n

i i i i

i

y x xn

(25)

From the two previous equations the estimators0

and 1 by the method of the

moments are obtained.

But these two equations are precisely equal to the two normal equations - (14)and (15) -obtained by minimizing the sum of the square residuals. In any case, it must


10/35

10

be kept in mind that the estimators obtained by the method of moments are conditional

to the restrictions imposed to the model. Consequently, the equality between both

approaches used is due to the two assumptions considered.

2.3 Properties of OLS on any sample of data

Algebraic properties of OLS estimators

The algebraic properties of OLS are properties derived exclusively from the

application of the method of OLS:

1. The sum of the OLS residuals is equal to 0:

1

0n

i

i

u

(26)

From the definition of residual

0 1 1,2, ,i i i i iu y y y x i n (27)

If we add for the n observations, we get

0 1

1 1

( ) 0n n

i i i

i i

u y x

(28)

which is precisely the first equation (14) of the system of normal equations.

Note that, if (26) holds, it implies that

1 1

n n

i i

i i

y y

(29)

and, dividing (26) and (29) by n, we obtain

0u y y (30)

2. The OLS line always goes through the mean of the sample ( ,y x ).

Effectively, dividing the equation (16) by n,we have:

0 1 y x (31)

3. The sample covariance between the regressor and the OLS residuals is zero.

The sample covariance between the regressor and the OLS residuals is equal to

1 1

1 1 1

( )( ) ( )

n n

i i i i

i ixu

n n n

i i i i i

i i i

x x u u x x u

Sn n

x u x u x u

n n

Therefore, if


11/35

11

1

1

0n

i i

i

x u

= (32)

then the sample covariance between the regressor and the OLS residuals is equal to 0.

It can be seen that (32) is equal to the second normal equation,

0 1

1 1

( ) 0n n

i i i i i

i i

x u x y x

given in (15).

Therefore, the sample covariance between the regressor and the OLS residuals is

zero.

4. The sample covariance between the fitted values ( iy ) and the OLS residuals

iu is zero.

Analogous to property 3, the sample covariance between the fitted values ( iy )and the OLS residuals ( iu ) is equal to 0, if the following happens

1

0n

i i

i

y u

(33)

Proof

Taking into account the algebraic properties 1 - (26)- and 3 - (32)-, we have

0 1 0 1

1 1 1 1 0 1

( ) 0 00

n n n n

i i i i i i i

i i i iyu

y u x u u x u

S n n n n

Decomposition of the variance ofy

By definition

i i iy y u (34)

Subtracting y in both sides of the previous expression (remember that y is

equal toy ), we have

i i iy y y y u

Squaring both sides:

22 2 2 ( ) ( ) 2 ( )i i i i i i iy y y y u y y u u y y

Summing for all i:

2 2 2 ( ) 2 ( )i i i i iy y y y u u y y

Taking into account the algebraic properties 1 and 4, the third term of the right

hand side is equal to 0. Analytically,


12/35

12

( ) 0i i i i iu y y u y y u (35)Therefore, we have

2 2 2 ( )i i iy y y y u (36)

It must be stressed that it is necessary to use the relation (26) to assure that (35)

is equal to 0. We must remember that (26) is associated to the first normal equation that

is to say, to the equation corresponding to the intercept. If in the fitted model there is no

intercept, then in general the decomposition obtained will not be fulfilled (36).

In words,

Sum of square total (SST) =

Sum of square explained (SSE)+ Sum of square residuals (SSR)

or,

SST=SSE+SSR

This decomposition can be done in terms of variances, dividing both sides of

(36) by n:

2 2 2 ( )i i iy y y y u

n n n

(37)

In words,

Total variance= explained variance + residual variance

Goodness of fit: Coefficient of determination (R2)

A priori we have obtained the estimators minimizing the sum of square

residuals.

Now, once the estimation has been done, we can look at how well our sample

regression line fits our data.

The statistics that indicate how well sample regression line fits to data are

denominated goodness of fit measures. We are going to look at the most known

measure, which is called coefficient of determination or the R-square (2

R ). This

measure is defined in the following way:

2

2 1

2

1

( )

( )

n

i

i

n

i

i

y y

R

y y

(38)

Therefore,2

R is the proportion of sum of square total that is explained by the

regression, that is to say, that is explained by the model. We can also say that 100 2R is

the percentage of the sample variation iny explained byx.

Alternatively, taking into account (36), we have:


13/35

13

2 2 2 ( ) ( )i i iy y y y u Substituting in (38), we have

22 2 2

2 1

2 2 2

1 1 1

( ) ( ) 1

( ) ( ) ( )

n

ii i ii

n n n

i i i

i i i

y yy y u uR

y y y y y y

(39)

Therefore,2

R is equal to 1 minus the proportion of sum of square total that is

non-explained by the regression.

According to the definition of2

R , the following must be accomplished

20 1R

Extreme cases:

a) If we have a perfect fit, then 0u i would happen. This implies that

2 2 2 ( ) ( ) 1 i iy y i y y y y R

b) If y c i , it implies that

2 2 0 ( ) 0 0i iy c y y c c i y y R

If2

R is close to 0, it implies that we have a poor fit. In other words, very little

variation iny is explained byx.

In many cases, a high

2

R is obtained when the model is fitted using time seriesdata, due to the effect of a common trend. On the contrary, when we use cross sectioned

data a low value is obtained in many cases, but it does not mean that the fitted model is

bad.

Coefficient of determination and coefficient of correlation

In descriptive statistics, the coefficient of correlation is studied. It is calculated

according to the following formula:

1

2 2

1 1

1

2 2

1 1

( )( )

cov( , )var( ) var( )

( ) ( )

( )( )

( ) ( )

n

i i

i

xyn n

i i

i i

n

i i

i

n n

i i

i i

y y x x

x y nrx y

x x y y

n n

y y x x

x x y y

(40)


14/35

14

What is the relationship between the coefficient of determination and the

coefficient of correlation?

Answer: the coefficient of determination is equal to the squared coefficient of

correlation. Analytically,

2 2xyr R (41)

(This equality is valid in simple regression model, but not in multiple regression

model)

Proof

In first place we are going to see an equivalence that is going to be used in the

proof. By definition,

0 1

i iy x

From the first normal equation, we have

0 1 y x

Subtracting the second equation of the first one:

1 ( )i iy y x x

Squaring both sides

2 2 2

1( ) ( )i iy y x x

and summing for all i, we have

2 2 2

1( ) ( )i iy y x x

Taking into account the previous equivalence, we have

2 2 2

12 1 1

2 2

1 1

2

2

1 1

22

2

11

2

1 2

2 2

1 1

( ) ( )

( ) ( )

( )( ) ( )

( )( )

( )( )1

( ) ( )

n n

i i

i i

n n

i i

i i

n n

i i ii i

nn

ii

ii

n

i i

i

xyn n

i i

i i

y y x x

R

y y y y

y y x x x x

y yx x

y y x x

r

x x y y


15/35

15

2.4 Units of measurement and functional form

Units of Measurement

Changing the units of measurement (changing of scale)

If x is multiplied/divided by a constant, c 0, then the OLS slope isdivided/multiplied by the same constant, c. Thus

10

( )i iy x c

c

(42)

Example

Let us suppose the following estimated function of consumption, in

which both variables are measured in thousands of euros:

0.2 0.85i icons inc= + (43)

If we now express income in euros (multiplication by 1000) and call it

ince, the fitted model with the new units of measurement of income would

be the following:

0.2 0.00085i icons ince

As can be seen, changing the units of measurement of the explanatory variable

does not affect the intercept.

Ify is multiplied/divided by a constant, c 0, then the OLS slope and intercept

are both multiplied/divided by the same constant, c. Thus,

0 1 ( ) ( ) ( )i iy c c c x (44)

Example

If we express, in model (43), consumption in euros (multiplication by

1000) and call it conse, the fitted model with the new units of measurement

of consumption would be the following:

200 850i iconse inc

Changing the originIf one adds/subtracts a constant d to x and/or y, then the OLS slope is not

affected.

However, changing the origin of either x and/or y affects the intercept of the

regression.

If one subtracts a constant dtox, the intercept will change in the following way:

0 1 1 ( ) ( )i iy d x d (45)

Example


16/35

16

Let us suppose that the average income is 20 thousand euros. If we define

the variablei iincd inc inc and both variables are measured in thousands

of euros, the fitted model with this change in the origin will be the following:

(0.2 0.85 20) 0.85 ( 20) 17.2 0.85i i icons inc incd

If one subtracts a constant dtoy, the intercept will change in the following way:

0 1 ( )

i iy d d x (46)

Example

Let us suppose that the average consumption is 15 thousands of euros. If

we define the variable i iconsd cons cons and both variables are

measured in euros, the fitted model with this change in the origin will be the

following:

15 0.2 15 0.85i icons inc

that is to say,

14.8 0.85i iconsd inc= - +

Eventually note that 2R is invariant to changes in the units ofx and/or y, and

also is invariant to the origin of the variables.

Functional Form

Linear relationships are not adequate in many cases for economic applications.

However, we can incorporate many nonlinearities (in variables) into simple regressionanalysis by appropriately redefining the dependent and independent variables.

Previously, before examining different functional form, we are going to look at

some definitions which will be useful in the exposition.

Some definitions

We are going to look at some definitions of variation measures that will be

useful in the interpretation of the coefficients corresponding to different functional

forms. Specifically, we will look at the following: absolute change, proportional change,

percentage change and changes in logarithms.

The absolute change between 1x and 0x is given by

1 1 0x x x (47)

This measure of variation is not very used by economists.

The proportional change (or relative rate of variation) between 1x and 0x is

given by:

1 01

0 0

x xx

x x

(48)


17/35

17

Multiplying a proportional change by 100 apercentage change is obtained. That

is to say:

1

0

100 %x

x

(49)

In economic reports this is the most used measure.

The change in logarithms between 1x and 0x is given by

1 0log( ) log( ) log( )x x x (50)

This is also a variation rate, which is used in economics research. The

relationship between proportional change and change in logarithms can be seen if we

expand (50) by Taylor series:

11

00

1

0

11 0

0

2

1 1

10 0 1

0 1 0 1

3

1

30

1

0 1

1 1

0 0

log( ) log( ) log

1 1 1log(1) 1 1

2

1 2

13 2

11 1

2

xx

xx

x

x

xx x x

x x

xx x x

x x

x

x x

x

x x

x x

2 3

1

0

2 3

1 1 1

0 0 0

11

3

1 1

2 3

x

x

x x x

x x x

(51)

Therefore, if we take the linear approximation in this expansion, we have

1 11 0

0 0

log( ) log( ) log( ) logx x

x x xx x

(52)

This approximation is good when the proportional change is small, but the

differences can be important when the proportional change is big, as can see in table 1.


18/35

18

Table 1. Examples of proportional change and change in logarithms

x1 202 210 220 240 300x0 200 200 200 200 200

Proportional change 0.010 0.050 0.100 0.200 0.500

Change in logarithms 0.010 0.049 0.095 0.182 0.405

Proportional change % 1% 5.0% 10.0% 20.0% 50.0%Change in logarithms % 1% 4.9% 9.5% 18.2% 40.5%

Elasticity is the ratio of the relative changes of two variables.

If we use proportional changes, the elasticity of the variabley with respect to the

variablex is given by

0/

0

y x

y

y

x

x

(53)

If we use changes in logarithms and consider infinitesimal changes, the elasticity

of the variabley with respect to a variablex is given by

/

log( )

log( )y x

dy

d yy

dx d x

x

(54)

In general, in the econometric models elasticity is defined by using (54).

Alternative functional forms

The OLS method can be applied without any problem to models in which the

transformations of the endogenous variable and/or of the exogenous variable have been

done. In the presentation of the model (1) we said that exogenous variable and regressor

were equivalent terms. But from now on, regressor is the specific form in which an

exogenous variable appears in the equation. For example, in the model

0 1 log( )y x u

the exogenous variable isx, but the regressor is log(x).

In the presentation of the model (1) we also said that endogenous variable andregressand were equivalent. But from now on, regressand is the specific form in which

an endogenous variable appears in the equation. For example, in the model

0 1log( )y x u

the endogenous variable isy, but the regressand is log(y).

The two previous models are linear in parameters, although they are not linear in

the variablex (the first one) or in the variabley (the second one). In any case, if a model

is linear in parameters, it can be estimated applying the OLS method. On the contrary, if

a model is not linear in parameters,iterative methods must be used in the estimation.However, there are certain nonlinear models that, by means of suitable

transformations, can become linear. These models are denominated linearizables.


19/35

19

Thus, in some occasions potential models are postulated in economic theory, like

in the well-known production function of Cobb-Douglas. A potential model with a

unique explanatory variable is given by

1y Ax

If we introduce the error term in the following way

1 uy Ax e (55)

then, taking natural logarithms in both sides of (55), we obtain a linear model in

parameters:

0 1log( ) log( )y x u (56)

where we have called 0 to log(A).

On the contrary, if we introduce the error term in the following way

1y Ax u

then there is not a transformation that allows turning this model into a linear model.

This is a non linearizable model.

Now, we are going to consider some models with alternative functional forms,

all of them linear are in parameters. We will look at the interpretation of the coefficient

1 in each case.

a) Linear model

In this model, defined in (1), if the other factors included in u are held fixed, so

that the change in u is zero, 0u , thenx has a linear effect ony:

1 if 0y x u

Therefore,1

is the change in y (in the units in which y is measured) by a unit

change ofx (in the units in whichx is measured).

For example, in the fitted function (43) if income increases by 1 unit,

consumption will increase by 0,85 units.

The linearity of this model implies that a one-unit change in x has always the

same effect ony, regardless of the value ofx considered.

b) Linear log model

A linear log model is given by

0 1log( )y x u (57)

Taking first differences in (57) and doing 0u , we have

1 log( ) if 0y x u

Therefore, ifx increases by 1%, then y will increase by 1( /100) units, being

0u .


20/35

20

c) Log linear model

A log linear model is given by

0 1log( )y x u (58)

This model can be obtained from the following one:

0 1exp( )y x u

by taking natural logs in both sides. For this reason this model is also called

exponential.


1log( ) if 0y x u

Therefore, ifx increases by 1 unit, then y will increase by 100 1 %, being

0u .d)Log log model

The model given in (56) is a log log model or, before the transformation, a

potential model (55). This model is also called constant elasticity model.


1log( ) log( ) if 0y x u

Therefore, ifx increases by 1%, theny will increase by 1 %, being 0u . It is

important to remark that, in this model, 1 is the elasticity ofy with respect tox, for any

value ofx andy. Consequently, in this model the elasticity is constant.

In table 2 and for the fitted model, the interpretation of 1 in these four models

is shown.

Table 2. Interpretation of 1 in different models

Model Ifx increases in theny will increases in

linear 1 unit1

units

linear log 1%1

( /100) units

log linear 1 unit1

(100 )%

log log 1%1

%

2.5 Statistical Properties of OLS

We are going now to study the statistical properties of OLS estimators, 0 and

1 , considered as estimators of the population parameters,0 and1.

The first property to study is the unbiasedness of the OLS.

Unbiasedness of the OLS

To proof this property we need formulate the following assumptions:Assumptions


21/35

21

SLR.1: Linear in parameters

The population model is linear in parameters and given by

0 1y x u (59)

SLR.2: Random samplingWe have a random sample from of size n, {(y

i,x

i): i = 1,2,3,,n}, from the

population model.

Thus we can write the population model in terms of the sample,

0 1 1,2, ,i i iy x u i n (60)

SLR.3: Sample variation in the explanatory variable

The sample outcomes onx, namely , 1,2, ,ix i n are not all the same value.

This assumption implies that

2

2 1 0

n

i

iX

x x

Sn

(61)

SLR.4: Zero conditional mean

E(u|x) = 0

For a random sample, this assumption implies that

E(ui|x

i) = 0

, i = 1,2,3,,n

According to the SLR.4, all derivations using this assumption will be conditional

on the sample values,xs.

We are going to proof only the unbiasedness of the estimator 1 , which is the

most important. In order to do this proof, we need to rewrite our estimator in terms of

the population parameter. Start with a simple rewrite of the OLS formula (21) as

1 11

2 2

1 1

n n

i i i i

i i

n n

i ii i

x x y y x x y

x x x x

(62)

because 1 1

0 0n n

i i

i i

x x y y x x y

Substituting (60) into (62), we have


22/35

22

0 1 1

1 1 11

2 2 2

1 1 1

21

1 1 11

2 2 2

1 1 1

( ) ( )

n n n

i i i i i i i i

i i i

n n n

i i i

i i i

n n n

i i i i i

i i i

n n n

i i i

i i i

x x y x x x u x x x u

x x x x x x

x x x x u x x u

x x x x x x

(63)

The estimator equals the population slope, 1, plus a term that is a linear

combination in the errors, {u1,,un}.

In the derivation of (63), we have taken into account that

0 0 01 1

0 0n n

i i

i i

x x x x

2

1 1 1 1 1

n n n n n

i i i i i i i i

i i i i i

x x x x x x x x x x x x x x x

THEOREM 2.1 Unbiasedness of OLS

Under assumptions SLR.1 to SLR.4 the expectations of 0 and 1

, conditional

tox, are unbiased.

Proof 1 is unbiased

1 11 1 1

2 2

1 1

11 1

2

1

|( | ) |

( )

n n

i i i i i

i i

n n

i i

i i

n

i i

i

n

i

i

x x u E x x u x

E X E X

x x x x

x x E u

x x

(64)

(From now on in order to simplify the notation, we will use 1

( )E , instead of1( | )E X , but understanding that is a conditional expectation.)

Therefore, the OLS estimate of1

is unbiased. Similarly, one can show that the

OLS estimator of0

is also unbiased. In any case, the proof of unbiasedness depends on

assumptions SLR.1 to SLR.4. If any assumption fails, then OLS estimators are not

necessarily unbiased. Remember unbiasedness is a general property of the estimator, but

in a given sample we may be near or far from the true parameter, but its distribution

will be centered at the population parameter.

Variance of the OLS estimators


23/35

23

Now we know that the sampling distribution of our estimator is centered around

the true parameter. How spread out is this distribution? This will be a measure of

uncertainty.

In order to calculate the variance of 1 (and 0 ) an additional assumption is

needed.SLR.5: Homoskedasticity

Homoskedasticity implies that all errors have the same variance. That is to say,

2( | )var u x (65)

We add SLR.5 because it simplifies the variance calculations and because it

implies that OLS has certain efficiency properties.

The homoskedasticity assumption is quite distinct from the zero conditional

mean assumption, E(u|x)=0. SLR.4 involves the expected value of u, while SLR.5

concerns the variance ofu. Homoskedasticity plays no role in showing that 0 and 1

are unbiased.

Taking into account that ( ) 0E u x and 2 2( | ) ( )E u x E u -because E(xu)=0-,

we obtain

22 2

2 2 2

( | ( ( | ) 2 ( ( (

( | ) ( ( | ) ( ) ( )

var u x E u x E u x E u x E u x E u x E u x

E u x E u x E u x E u var u

2

2

| ) | ) | ) | )+ | )

| )

Thus, is the conditional variance, but also the unconditional variance. It is

called the error variance. The square root of the error variance ( is called thestandard deviation of the error.

We can say

0 1( )E y x x | and2

( | )var u x .

So, the conditional expectation ofy givenx is linear inx, but the variance ofy

given x is constant. When var(u|x) depends on x, the error term is said to exhibit

heteroskedasticity.

On the other hand,

2 2

0 1

2 2

0 1

( | ( | (

( ) (

var y x E y x E y x E y x x x

E y x x E u x var u x

| ) | ) + | )

- | ) | ) | )(66)

Since var(u|x)=var (y|x), heteroskedasticity is present whenever var(y|x) is a

function ofx.

THEOREM 2.2 Sampling variances of OLS estimators

Under assumptions SLR.1 to SLR.5 the variances of 1 and 0 , conditional on

{x1,,xn}, are the following:


24/35

24

2

12

1

( )n

i

i

Var

x x

(67)

and

2 1 2

10

2

1

( )

n

i

i

n

i

i

n x

Var

x x

(68)

Proof of(67):

Taking into account (63)

1

22 1 1 1|

2 2

2

1

2 2 1

1 1

2

2 2

212

1

( | ) ( ) |

1| |

1|

1

X

n

i i ni

i in ni

i i

i i

n n

i i i j i jn

i i j

i

i

i

var X E E X

x x u

E X E x x u X

x x x x

E x x u x x x x u u X

x x

x x

2 2

212

1

2 2

212

1

2 22 2

2 2212

11

| |

1( ) ( )

1( )

n n

i i i j i jn

i i j

i

n n

i i i j i jn

i i j

i

i

n

i i nn

Xii

i

ii

E x x u X E x x x x u u X

x x E u x x x x E u u

x x

x x E unS

x xx x

(69)

where

2

2 1

n

i

iX

x x

Sn

is the sample variance ofX.

In the above proof we have taken into account that ( )i jE u u is equal to 0,

because the sampling is random and errors are therefore independent.

(From now on in order to simplify the notation, we will not use explicitely

conditional expectations)

We can draw the following conclusions of the last expression:


25/35

25

1) The larger the error variance, , the larger the variance of theslope estimate. This is not at all surprising: more noise in the

equation, a larger , makes it more difficult to estimate the

effect ofx on y, and this is reflected in higher variances for the

OLS slope estimators. Since is a feature of the population,

it has nothing to do with the sample size.

2) The larger the variability in thexis, the smaller the variance of

the slope estimate. Everything else being equal, for estimating

1 we prefer to have as much sample variation inx as possible.

In any case it is not allowed by Assumption SLR.3 that 2XS =0.

3) The larger the sample size (n), the smaller the variance of theslope estimate.

We have a problem to calculate1

2 because the error variance,

, is unknown.

Estimating the Error Variance

We dont know what the error variance, , is, and we cannot estimate it from

the errors, ui, because we dont observe the errors.

Given that = E(u), an unbiased estimator would be

2

2 1

n

i

i

u

n

(70)

Unfortunately, this is not a true estimator, because we dont observe the errorsu

i. But, we do have estimates of the u

i, namely the OLS residuals

i.

The relation between errors and residuals is given by

0 1 0 1

0 0 1 1

i i i i i i

i i

u y y x u x

u x

(71)

Hence i

is not the same as ui, although the difference between them does have

an expected value of zero. If we replace the errors with the OLS residuals, we have

2

2 1

n

i

i

u

n

(72)

This is a true estimator, because it gives a computable rule for any sample of the

data,x andy. However, this estimator is biased, essentially because it does not account

for two restrictions that must be satisfied by the OLS residuals,


26/35

26

1

1

0

0

n

i

i

n

i i

i

u

x u

(73)

One way to view these restrictions is the following one: if we know n2 of the

residuals, we can get the other two residuals by using the restrictions implied by the

moment conditions.

Thus, there are only n2 degrees of freedom in the OLS residuals, as opposed to

n degrees of freedom in the errors. The unbiased estimator of2

that we will use makes

an adjustment taking into account the degrees of freedom:

2

2 1

2

n

i

i

u

n

(74)

THEOREM 2.3 Unbiased estimator of2

Under assumptions SLR.1 to SLR.5

2 2( )E (75)

If 2 is plugged into the variance formulas we then have unbiased estimators of

var( 0 ) and var( 1

)

The natural estimator of is2 and is called the standard error of the

regression.

The square root of the variance of 1 , defined en (69), is called the standard

deviation of 1 , that is to say,

1

2

1

( )n

i

i

sd

x x

(76)

Therefore, its natural estimator is

1

2

1

( )n

i

i

se

x x

(77)

Note that 1( )se , the standard error of 1 , is viewed as a random variable

when we think of running OLS over different samples; this is because 1 varies with

different samples. The standard error of any estimate gives us an idea of how precise the

estimator is.


27/35

27

2.6 Regression through the origin

If we force the regression line to pass through the point (0,0) we are constraining

the intercept to be zero, as it can be seen in figure 2.5. This is called a regression

through the origin.

x

Figure 2.5. A regression through the origin.

This is not very often done because, among other problems, when0

0 thenthe slope estimate will be biased.

We are going to how to estimate a regression line through the origin.

The fitted model is the following:

1

i iy x (78)

Therefore, we must minimize

0 0

2

1

1

min min ( )n

i i

i

S y x

(79)

To minimize S, we differentiate with respect to 1 and equal to 0:

1

11

2 ( ) 0n

i i i

i

Sy x x

(80)

Solving for 1


28/35

28

11

2

1

n

i i

i

n

i

i

y x

x

(81)

Other problem with fitting a regression line through the origin is that, in general,happens:

2 2 2 ( )

i i iy y y y u If it is not possible the decomposition of the variance ofy in two components

(explained and residual), then the R2

is meaningless. This coefficient can take values

negative or greater than 1 in this case.

To sum up, it must be included an intercept in the regressions, unless there are

strong theoretical reasons from the economic theory- to exclude it.


29/35

29

Case study. Engel's Curve. Demand of dairy products

The expression Engel's Curve makes reference to the line which shows the

relationship between various quantities of a good a consumer is willing to purchase at

varying income levels.

In a survey to 40 household, data about expenditure on dairy products andincome have been obtained. These data appear in table 3. In order to avoid distortions

due to the different size of households, both the consumption and the income have been

expressed in terms per capita. The data come expressed in thousands of euros monthly.

Table 3. Expenditure in dairy products (dairy), disposable income (inc) in terms per capita.

Unit: euros per month.

household dairy inc household dairy inc

1 8.87 1,250 21 16.20 2,100

2 6.59 985 22 10.39 1,470

3 11.46 2,175 23 13.50 1,225

4 15.07 1,025 24 8.50 1,380

5 15.60 1,690 25 19.77 2,450

6 6.71 670 26 9.69 910

7 10.02 1,600 27 7.90 690

8 7.41 940 28 10.15 1,450

9 11.52 1,730 29 13.82 2,275

10 7.47 640 30 13.74 1,620

11 6.73 860 31 4.91 740

12 8.05 960 32 20.99 1,125

13 11.03 1,575 33 20.06 1,335

14 10.11 1,230 34 18.93 2,875

15 18.65 2,190 35 13.19 1,680

16 10.30 1,580 36 5.86 870

17 15.30 2,300 37 7.43 1,620

18 13.75 1,720 38 7.15 960

19 11.49 850 39 9.10 1,125

20 6.69 780 40 15.31 1,875

There are multiple types of models that are used in the demand studies. In

particular in this case study we are going to consider the following models: linear,

inverse, semi-logarithmic, potential, exponential and exponential inverse. In the first

three models, the regressand of the equation is directly the endogenous variable,

whereas in the last three the regressand is the natural logarithm of the endogenous

variable.

In all the models we will calculate the marginal propensity to expenditure, aswell as the elasticity expenditure/income.

Linear model

The linear model for demand of dairy products will be the following:

0 1dairy inc u (82)

As it is known, the marginal propensity indicates us as expenditure changes

when income varies and it is obtained differentiating the expenditure with respect to

income in the demand equation. In the linear model the marginal propensity of the

expenditure is given by


30/35

30

1

d dairy

d inc (83)

In other words, in the linear model the marginal propensity is constant and,

therefore, it is independent of the value that takes the income. The fact that it is constant

is an advantage, but at the same time it has the disadvantage of not being adapted todescribe the behaviour of the consumers, especially when there are important

differences in the income of the households. Thus, it is unreasonable that the marginal

propensity of expenditure on dairy products is the same in a low income family and

another family with great income. However, if the variation of the income is not very

high in the sample, a linear model can be used to describe the demand of certain goods.

In this model the elasticity expenditure/income is the following:

/ 1

lineardairy inc

d dairy inc inc

d inc dairy dairy (84)

Estimating the model (82) with the data of table 3, we obtain

24.012 0.005288 0.4584dairy inc R (85)

Inverse model

In an inverse model there is a linear relationship between expenditure and the

inverse income. Therefore, this model is directly linear in the parameters and it is

expressed in the following way:

0 1

1dairy u

inc

(86)

The sign of the coefficient 1 will be negative if the income is correlated

positively with the expenditure. It is easy to see that, when the income tends towards

infinite, the expenditure tends towards a limit which is equal to . In other words,

represents the maximum consumption of this good.

In figure 2.6, we can see a double representation of the population function

corresponding to this model. In the first one, the relationship between the dependent

variable and explanatory variable has been represented. In the second one, the

relationship between the regressand and regressor has been represented. As can be seen,

the second function is linear.

Figure 2.6. The inverse model

In the inverse model, the marginal propensity to expenditure is given by


31/35

31

1 2

1

( )

d dairy

d inc inc (87)

According to (87), the propensity is inversely proportional to the square of the

income level.

On the other hand, the elasticity is inversely proportional to the product ofexpenditure and income, as can be seen in the following expression:

/ 1

1

inv

dairy inc

d dairy inc

d inc dairy inc dairy

(88)


2118.652 8702 0.4281dairy Rinc

(89)

In this case the coefficient 1 does not have an economic meaning.

Linear-log model

This model is denominated linear-log model, because the expenditure is a linear

function of the logarithm of income, that is to say,

0 1 log( )dairy inc u (90)

In this model the marginal propensity to expenditure is given by

11 1

log( )d dairy d dairy inc d dairyd inc d inc inc d inc inc inc

(91)

and the elasticity expenditure/income by

/ 1

1 1

log( )

lin-log

dairy inc

d dairy inc d dairy

d inc dairy d inc dairy dairy (92)

As can be seen, in the linear-log model the marginal propensity is inversely

proportional to the level of income, while the elasticity is inversely proportional to the

level of expenditure on dairy products.


corresponding to this model.

Figure 2.7. The linear log model


32/35

32


241.623 7.399 log( ) 0.4567dairy inc R (93)

The interpretation of 1 is the following: if the income increases by 1%, the

demand of dairy products will increase by 0.07399 euros.

Log-log model or potential model

This exponential model is defined in the following way:

1 udairy Ainc e (94)

This model is not linear in the parameters, but linearizable, by taking natural

logarithms, obtaining the model:

0 1log( ) log( )dairy inc u (95)

where0=log(A)

This model is also called log-log model, because it is the structure of the

corresponding linearized model.


1

d dairy dairy

d inc inc (96)

In the log-log model, the elasticity is constant. Therefore, if the income increasesby 1%, the expenditure will increase by 1%, since

/ 1

log( )

log( )

log-log

dairy inc

d dairy inc d dairy

d inc dairy d inc (97)


corresponding to this model.

Figure 2.8. The log log model


2log( ) 2.556 0.6866 log( ) 0.5190dairy inc R (98)


33/35

33

In this case 1 is the elasticity expenditure/income. Its interpretation is the

following: if the income increases by 1%, the demand of dairy products will increase by

0.68%.

Log linear or exponential model

This exponential model is defined in the following way:

0 1exp( )dairy inc u (99)

By taking natural logarithms in both sides of (99), we obtain the following

model that is linear in the parameters:

0 1log( )dairy inc u (100)


1d dairy dairyd inc

(101)

In the exponential model, unlike other models seen previously, the marginal

propensity increases when the level of expenditure does. For this reason, this model is

adequate to describe the demand of luxury products. On the other hand, the elasticity is

proportional to the level of income:

/ 1

log( )

exp

dairy inc

d dairy inc d dairyinc inc

d inc dairy d inc (102)

In figure 2.9, we can see a double representation of the population functioncorresponding to this model.

Figure 2.9. The log linear model


2log( ) 1.694 0.00048 0.4978dairy inc R (103)

The interpretation of 1 is the following: if the income increases by 1 euro the

demand of dairy products will increase by 0.048%.

Inverse exponential model

The inverse exponential model, that is a mixture of the exponential model and

the inverse model, is given by


34/35

34

0 1

1exp( )dairy u

inc (104)

By taking natural logarithms in both sides of (104), we obtain the following

model that is linear in the parameters:

0 1

1log( )dairy u

inc (105)


1 2

( )

d dairy dairy

d inc inc (106)

and the elasticity by

/ 1

log( ) 1

invexp

dairy inc

d dairy inc d dairyincd inc dairy d inc inc (107)


21log( ) 3.049 822.02 0.5040dairy Rinc

(108)

In this case, as in the inverse model, the coefficient 1 does not have an

economic meaning.

In table 4, the results of marginal propensity, elasticity expenditure/income andR

2in the six fitted models are shown

Table 4. Marginal propensity, elasticity expenditure/income andR2 in the fitted models.

Model Marginal propensity Elasticity R2

Linear 1 =0.0053 1

inc

dairy =0.6505 0.4440

Inverse1 2

1

inc

=0.00441

1dairy inc

=0.53610.4279

Linear log 11

inc =0.0052 1

1dairy

=0.64410.4566

Log log1

dairy

inc =0.0056 1

=0.6864 0.5188

Log linear 1 dairy =0.0055

1 inc =0.6783 0.4976

Inverse log 1 2 dairy

inc

=0.00471

1

inc

=0.5815 0.5038


35/35

The R2 obtained in the three first models are not comparable with the R

2

obtained in the three last one because the functional form of the regressand is different:

y in the three first models and log(y) in the three last ones.

Comparing the three first models the best one is the linear log model, if we use

the R2

as goodness of fit measure. Comparing the three last models the best one is the

log log model. If we had used the measure Akaike Information Criterion (AIC), which

allows comparing models with different functional forms for the regressand, then the

log log model would have been the best among the six models fitted.

2 simple regression model 29x09x2011

Documents