ie slide04

Introductory Econometrics

ECON2206/ECON3209

Slides04

Lecturer: Minxian Yang

ie_Slides04 my, School of Economics, UNSW 1

Assignment -1 is due in Week 5. Please submit it

to your tutor at the beginning of your tutorial.

See Course Outline for more information.Staple your pages together. Do not submit loose pages.

Do not use plastic sheets or binders.

4. Multiple Regression Model: Inference (Ch4)

4. Multiple Regression Model: Inference

Lecture plan

Classical linear model assumptions

Sampling distribution of OLS estimators under CLM

Testing hypothesis about one population parameter

p-values

Confidence intervals

Test hypothesis with CI



Motivation: y = 0 + 1x1 +...+ kxk + u

Goal is to gain knowledge about the population parameters (s) in the model.

OLS provides the point estimates of the parameters.

OLS will get it right on average (being unbiased).

Knowing the mean and variance of is not enough.

how to decide if a hypothesis is supported or not?

what we can say about the true values?

We need the sampling distribution of the OLS

estimators to answer these questions.

To simplify, we use a strong assumption here (and will relax it for large-sample cases).


j


Normality assumption

6. (MLR6, normality) The disturbance u is independent

of all explanatory variables and normally distributed

with mean zero and variance 2:

u ~ Normal(0, 2).

This is a very strong assumption. It implies both MLR4 (ZCM) and MLR5 (homoskedasticity).

MLR1-6 together are known as the classical linear model (CLM) assumptions.

Under CLM, the OLS produces the minimum variance unbiased estimators.

They are the best of unbiased estimators (not just the best of linear unbiased estimators).



Normality assumption

CLM implies

y | x ~ Normal( 0 + 1x1 +...+ kxk , 2).

CLM also implies is normally distributed.

Whether or not MLR6 is a reasonable assumption depends on data

Is it reasonable for wage model, given that no wage can be negative?

Empirically, it is reasonable for log(wage) model.

MLR6 is restrictive. But the results here will be useful for large-sample cases (Ch5) without MLR6.


j

It is completely characterised

by the mean and variance.


Sampling distribution of OLS

Theorem 4.1 (normal sampling distribution)

Under CLM, conditional on independent variables,

where the variance is given in Ch3 (ie_Slides03):

It implies:


,,Normal~ 2jjj

.,...,1 ,)1(

)(2

22 kj

RSSTVar

jj

jj

.)()( ),,(Normal~)(

jj

j

jjVarsd

sd

10


Sampling distribution of OLS

In practice, we have to estimate 2 and

Theorem 4.2 (t-distribution)

Under CLM, conditional on independent variables,

where tn-k-1 is the t-distribution with n-k-1 df.

This is a basis for statistical inference.


.,..., ,)(

)( kj

RSSTarV

jj

j 11 2

2

,)()( ,~)(

jjkn

j

jjarVset

se

1

Known as

standard error

)( jVar


Testing simple null hypothesis

Some questions of interest may be formulated as a simple null hypothesis and an alternative

hypothesis about a population parameter,

H0: j = aj , H1: j aj (or > aj or < aj ),

where aj is a know value (often zero).

eg. In the log wage model

log(wage) = 0 + 1educ + 2 exper + 3 tenure + u,

H0: 1 = 0, is economically interesting. It says that, holding exper and tenure fixed, a persons education level has no effect on wage.




To test a simple null hypothesis, the test statistic is usually the t-statistic

We will call the t-statistic the t-ratio when aj = 0. The STATA output includes the t-ratios.

By Theorem 4.2, the t-statistic has the t-distribution with n-k-1 df, under the null H0.

When df or n is large, the t-distribution approaches to the standard normal distribution. See Table G.2.

The decision rule depends on the alternative H1.


.)()( ,)(

jj

j

jjarVse

se

at

j



When H0 is true, we can choose a critical value csuch that the probability of the t-statistic exceeding c

is a small number, known as significant level.

c depends on df and the significant level. (Use normal critical values when df >120.)

If we reject H0 whenever the t-statistic exceeds c, the probability that we make a (Type I) error is small.

Hence, reject H0 whenever the t-statistic exceeds c is a reasonable decision rule.

What do we mean by the t-statistic exceeds c?

It depends on the alternative hypothesis H1.


.05.0)|(| eg. ctPj



Decision rule :

eg. 5% significant level, df = 19.

For one tail H1, c = 1.729.

For two tail H1, c = 2.093.

Table G.2 of Wooldridge

-c 0 c


H0: j = ajLower

tail H1

Upper

tail H1

Two

tail H1

Alternative H1: j < aj j > aj j aj

Reject H0 when < -c > c | | > cjt jt jt

f(t)

t



Example 4.1

log wage model (standard errors are in brackets):

log(wage) = .284 + .092educ + .0041expr + .022tenure

(.104) (.007) (.0017) (.003)

n = 526, R2 = .316

Q. Is return to education statistically significant at the

1% level, after controlling for experience and tenure?

Hypotheses: H0: educ = 0 vs H1: educ 0

Test statistic and decision rule: reject H0 if

Critical value (large df, normal): c = 2.576

Conclusion: reject H0 at the 1% level because


cteduc

||

.../. || cteduc

14913007092



Example 4.5. Housing Prices and Air Pollution:

log(price) = 11.08 .954log(nox) .134log(dist )

(.32) (.117) (.043)

+ .255rooms .052stratio

(.019) (.006)

n = 506, R2 = .581

Hypotheses: H0: nox = 1 vs H1: nox > 1

Test statistic:

Decision rule: reject H0 if

5% Critical value: c = 1.645

Conclusion: do not reject H0 at the 5% level because


)(/)( noxnox setnox

1

.../).( ctnox

3931171954

ctnox

Is the price elasticity

w.r.t. nox equal to 1? All coefficients are significantly different

from 0 at the 5% level.

After controlling for

dist, rooms and

stratio, there is little

evidence against H0.


Terminology

In the conclusion, we need to be explicit about the hypotheses and the significant level, eg,

H0 is (is not) rejected in favour of H1 at the 5% level of significance.

In the case of H0: j = 0 and H1: j 0, we say

xj is (is not) statistically significant at the 5% level of significance. or

xj is (is not) statistically different from 0 at the 5% level of significance.



p-values

The choice of significant level is somewhat arbitrary. Different researchers may have different choices

(We tend to use high levels with small samples).

It is more informative to measure the data evidence for H0. The p-value serves this role.

The p-value is the probability of the null distribution beyond the observed test statistic:

reported routinely by software.

Smaller the p-value, stronger

the evidence against H0.

observed tie_Slides04 my, School of Economics, UNSW 15

f(t)

t

|)| |(|value j

tTPp


Economic/statistical significance

An explanatory variable is statistically significant when the size of the t-ratio is sufficiently large

(beyond the critical value c).

An explanatory variable is economically (or practically) significant when the size of the estimate

is sufficiently large (in comparison to the size of y).

An important x should be both statistically and economically significant.

Over-emphasising statistical significance may lead to false conclusion about the importance of an x.

See the guidelines on p137, and Example 4.6-4.7.


j

t

j



The confidence interval (CI) for j is based on Theorem 4.2, namely,

which directly leads to the CI

Here, c is the tn-k-1 critical value, dependent on l.o.c..

Example 4.5. The 95%CI for nox:

n-k-1 = 501, c = 1.96 (large sample, use normal cv),


].. ,.[)(... 7251831117961954

]. ,[)]( ),([)( ULsecsecsec jjjjjj

,~)(

1

kn

j

jjt

se

The probability that [L, U]

covers the parameter is

the level of confidence.



To construct CI, we need and c. For c, we need df and the confidence level.

When df is large (> 120), the tn-k-1 distribution is very close to the normal distribution and we use N(0,1)

critical values.

eg. For large df, the 95% CI is about

The interpretation of 95% CI

If many random samples are drawn and [L, U] is

computed for each sample, then 95% of these [L, U]

will cover the true population parameter j.


] ,[)]( ),([)( ULsecsecsec jjjjjj

, j )( jse

).( jj se 2



The width of CI depends on the standard error and the critical value c.

high confidence level large c wide CI,

large standard error wide CI.

CI and two-tailed test

test H0: j = aj against H1: j aj .

reject H0 at the 5% significant level if (and only if) the 95% CI does not contain aj.

Example 4.5

95% CI = [-1.183, -0.725].

Do not reject H0: nox = -1 in favour of H1: nox -1 at the 5% significant level.


)( jse


Summary so far CLM assumptions

Sampling distribution of OLS estimators

Standard error, t-statistic, t-distribution and df

Testing hypothesis about a single

p-Values

Confidence interval for a single

CI and two-tailed test

To do: the F test

hypotheses about a linear combination of parameters

multiple linear restrictions on parameters

STATA



Hypotheses about a linear combination of parameters

In the log wage model

log(wage) = 0 + 1educ + 2exper + u,

we wish to see whether or not educ has the same

causal effect on log(wage) as exper, ie, to test

H0: 1 2 = 0 vs H1: 1 2 0,

which involve a combination of 2 parameters.

If we had , we could use

But is not usually reported by software.


)( 21 se

.)(

21

21

21

se

t

)( 21 se

21

2121

21

2 /)],cov()var()[var(

)(

se



We re-parameterise the log wage model

log(wage) = 0 + educ + 2(exper+educ) + u,

where = 1 2 and 1 is replaced by + 2.

The hypotheses become

H0: = 0 vs H1: 0,

which can easily be tested by regressing log(wage)

on educ and (exper+educ).

The main idea here is to isolate the parameter of interest = 1 2 by re-parameterisation. The OLS output provides both and .


)(se



eg. log wage model (standard errors are in brackets):

log(wage) = .284 + .0920educ + .0041expr + .022tenure

(.104) (.0073) (.0017) (.003)

n = 526, R2 = .316

Hypotheses H0: educexpr= 0 vs H1: educexpr 0.

Re-parameterised model with exed=exper+educ

log(wage) = .284 + .0879educ + .0041exed + .022tenure

(.104) (.0070) (.0017) (.003)

n = 526, R2 = .316

Hypotheses H0: = 0 vs H1: 0.

Test statistic t = .0879/.0070 = 12.59.





How do you test

H0: educ2expr= 0 by re-parameterisation?

t-ratios

p-values


Testing multiple linear restrictions: the F test

Exclusion restrictions

It is of interest to check whether or not a group of xvariables has a joint effect on y (with the rest of x

variables as controls).

This question is formulated as the null (H0) that all coefficients of the group is zero, called exclusion

restrictions. The model under H0 is known as the

restricted model.

The alternative (H1) is simply that the null is false. The model under H1 is known as the unrestricted model.

In general, multiple linear restrictions involve more than one linear restrictions on parameters.




Example

Child birth weight and parents education

bwght = 0 + 1cigs + 2parity + 3faminc

+ 4motheduc + 5fatheduc + u

bwght : birth weight

cigs : average cigarettes per day by mother

parity : birth order

faminc : family income

motheduc : years of education for mother

fatheduc : years of education for father

H0: 4 = 0 and 5 = 0. vs H1: H0 is false.

We need the F statistic to test the joint hypotheses.




Example

Unrestricted model (ur)

bwght = 0 + 1cigs + 2parity + 3faminc

+ 4motheduc + 5fatheduc + u

SSRur.

Restricted model (r)

bwght = 0 + 1cigs + 2parity + 3faminc + u(r)

SSRr.

If SSRr is much too greater than SSRur, we reject the

restrictions (ie. H0).

But how greater is much too greater?




Example

The F statistic is the relative difference between

SSRr and SSRur :

Under H0, F has the F-distribution

with (2, n-6) degrees of freedom.

Decision rule: reject if F > c,

where c is the F2,n-6 critical value.


.)/()(

/)(

)/(

/)(

61

2

6

22

22

nR

RR

nSSR

SSRSSRF

ur

rur

ur

urr

626

2

n

ur

urr FnSSR

SSRSSRF ,~

)/(

/)(# of restrictions

# of coefficients

in unrestricted

model



Example

Use the data in BWGHT.RAW : n = 1191,

SSRr = 465166.792 and SSRur = 464041.135.

The 5% F2,n-6 critical value is c = 3.00.

(see Table G.3b)

According to the decision rule, H0 is not rejected at

the 5% level because F < c.


.43.1)6/(

2/)(

nSSR

SSRSSRF

ur

urr



Example


SSR

You can use STATA

test to do the test.

overall significance

test (see p34)



General case with CLM assumptions

The unrestricted model:

y = 0 + 1x1 +...+ kxk + u.

There are q restrictions:

H0: k-q+1 = 0, ..., k = 0

which lead to the restricted model:

y = 0 + 1x1 +...+ k-qxk-q + u(r).

Test statistic

Decision rule: reject H0 if F > c (Fq,n-k-1 critical value).


.H under ~)/(

/)(0, 1

1

knq

ur

urr FknSSR

qSSRSSRF



General case with CLM assumptions

If H0 is rejected, we say that xk-q+1, ..., xk are jointly statistically significant.

If H0 is not rejected, we say that xk-q+1, ..., xk are jointly statistically insignificant, which justifies dropping them

from the model.

It is possible that a group of variables are jointly significant but individually insignificant. This is a

symptom of a group of highly correlated variables.

The p-value for F test is the probability of F-distribution beyond observed F-stat


).(value , FFPp knq 1

When

highly

correlated,

it is hard

to

precisely

estimate

their

individual

partial

effects.



The F distribution

F and t statistics

When q = 1, H0

can be tested

with either t-stat

or F-stat.

It turns out

(t-stat)2 = F-stat

in the case q = 1.




The overall significance of a regression

When q = k, the null H0: 1 = 0, ..., k = 0 is routinely tested by most regression packages, known as the F

test for overall significance.

The null is that none of the explanatory variables has an effect on y. The restricted model is simply

y = 0 + u.

The F-stat under the null has an Fk,n-k-1 distribution. As the R-squared is zero under the null, this F-stat is

where R2 is from the unrestricted model.


,)/()(

/

11 2

2

knR

kRF



Testing general linear restriction

Based on restricted and unrestricted regression SSRs, the F test is applicable to testing any linear

restrictions on parameters. For example,

H0: 1 = 1, 2 = 0, 3 = 24 .

We only need the SSRr from the restricted model (H0), which may involve reparameterisation, and SSRur from

the unrestricted model.

eg. House price (4.47)

log(price) = 0 + 1log(assess) + 2log(lotsize)

+ 3log(sqrft) + 4bdrms + u

If the assessment is a rational valuation,

H0: 1 = 1, 2 = 0, 3 = 0, 4 = 0 should hold.



Reporting regression results Good practice (minimum)

Report estimated coefficients AND standard errors

Report the mean of dependent variable

Report sample size

Report R-squared and SSR

Report in equation-form if the number of equations is small

Report in table-form and indicate dependent variable

eg. log wage model


Dependent variable: log(WAGE), mean = 1.623

Model 1 Model 2

Variable Coeff Stderr Coeff Stderr

CONSTANT 0.2844 0.1042 0.5014 0.1019

EDUC 0.0920 0.0073 0.0875 0.0069

EXPER 0.0041 0.0017 0.0046 0.0016

TENURE 0.0221 0.0031 0.0174 0.0030

FEMALE -0.3012 0.0373

Sample size 526 526

R-squared 0.316 0.392

SSR 101.460 90.144


Summary The methods covered allow us to infer knowledge about

population parameters from a random sample.

CLM assumptions OLS estimators follow normal distribution t-stat and F-stat follow t and F-distributions.

To test hypotheses: choose a (small) level of significance, find the critical value, compute the test statistic, use the

decision rule (which depends on the alternative) to draw

conclusion.

To construct CI: choose a (large) level of confidence, find the critical value and standard error, use

In practice, both statistical and economic significance of your results need to be commented on.


).( jj sec


What we are able to do now We covered statistical inference methods, base on the

sampling distribution of the OLS estimators under the CLM

assumptions.

We are able to test hypotheses about a parameter.

We are able to construct confidence intervals for parameters.

We are able to test single or multiple restrictions on parameters.

Under the CLM assumptions, the above inference methods are exact, in the sense that we know the exact

distribution of t-stat or F-stat, regardless of the sample

size.

But MLR6 in the CLM assumptions is too strong....


ie slide04

Documents