q201_lec1_2

11
QUAN201 Introductory Econometrics Dean Hyslop Lecture 1 (Ref: Wooldridge, Chapter 1) 1. What is Econometrics? It is the Statistical or empirical analysis of economic relationships. This may involve: Using economic theory to guide statistical modelling Using statistical methods to: 1. Estimate economic relationships E.g. the effect of education on wages 2. Test economic theories E.g. does raising minimum wage reduce employment? 3. Evaluate / implement govt / business policy E.g. how effective are job training programmes on low-skilled employment & wages 4. Forecast / predict economic variables E.g. GDP growth, inflation, interest rates, etc.

Upload: universedrill

Post on 20-Jul-2016

213 views

Category:

Documents


0 download

DESCRIPTION

asd

TRANSCRIPT

Page 1: Q201_Lec1_2

QUAN201 Introductory Econometrics Dean Hyslop Lecture 1 (Ref: Wooldridge, Chapter 1)

1. What is Econometrics?

It is the Statistical or empirical analysis of economic relationships.

This may involve:

Using economic theory to guide statistical modelling

Using statistical methods to:

1. Estimate economic relationships

E.g. the effect of education on wages

2. Test economic theories

E.g. does raising minimum wage reduce employment?

3. Evaluate / implement govt / business policy

E.g. how effective are job training programmes on low-skilled employment & wages

4. Forecast / predict economic variables

E.g. GDP growth, inflation, interest rates, etc.

Page 2: Q201_Lec1_2

2

2. How does Econometrics differ from

(Classical) Statistics?

Classical Statistics generally deals in the context of experimental data collected in laboratory environments in the natural sciences:

So the effects of control/treatment variables on outcome/response variables can be examined directly by changing the level of the control variable(s), holding all-else equal, and measuring the effect on the outcome variable(s).

In contrast, most economic (and other social science) data is generally non-experimental in nature:

So there is often no reason to believe that, if two observations of a control variable differ, that all-else is the same between these observations.

E.g. can we assume 2 randomly selected people with different education levels the same in other respects?

Therefore, many non-experimental issues need to be addressed in econ metrics in order to confidently infer causality between a control variable and an outcome variable.

This makes econometrics both difficult and interesting!

Page 3: Q201_Lec1_2

3

3. Stages of Econometric Analysis

1. Careful formulation of the question of interest

E.g. what is the effect of an additional year of education on wages?

2. Either use economic theory or an informal/intuitive approach to develop an economic framework for estimation & testing

E.g. Mincer‟s (1962) Human Capital Theory

3. Translate the economic model into an econometric model to be estimated statistically. This requires:

– functional form of the relationship(s) to be estimated between the observed variables of interest; – assumptions on the effects of unobserved factors

E.g. linear relationship between log(wages) and years of education, and assume other factors that affect wages are uncorrelated with education

4. Formulate hypotheses of interest in terms of the (unknown) model parameters.

E.g. H0: returns to education = 0 vs H1: not so

Empirical analysis requires data!

5. Given data, proceed to estimation, hypothesis testing and general model-specification evaluation Note: Generally “econometrics” begins at stage-3, the econometric specification of the model …

Page 4: Q201_Lec1_2

QUAN201 Introductory Econometrics Dean Hyslop Lecture 2 (Ref: Wooldridge, Chapter 2)

The starting point for any statistical analysis should be a description of the data to:

i. understand what the data are supposed to measure

ii. check whether there are values that look like they were mis-measured – e.g. outliers

This can be done using:

i. simple graphical methods – e.g. histograms or scatterplots, etc; and/or

ii. common descriptive statistics – e.g. mean, median, standard deviation, etc.

After some such descriptive analysis, regression is the most important building block in econometrics …

1. What is Regression?

… possible answers:

i. Fitting a line through data – e.g. scatterplot of two variables X and Y.

ii. Estimating the relationship between two or more variables – e.g. how are variables X and Y related?

iii. Ultimately, used as a basis for causal interpretation in econometrics – e.g. to answer the question: what is the effect of a change in variable-X on outcome-Y?

Page 5: Q201_Lec1_2

2

2. Regression as line fitting

Consider the following scatterplot of data for two variables X and Y :

Our interest is to fit a (simple) linear regression, represented by

iii ebXaY ,

to the (Xi,Yi) scatterplot of data, where:

Yi is called the outcome or dependent variable,

Xi is the explanatory or independent variable,

ei is the residual (or error),

(a,b) are regression parameters or coefficients: a is called the intercept, and b is called slope, and

the subscript i denotes observation-i.

Page 6: Q201_Lec1_2

3

Three classic Examples:

1. Galton (1886, “Regression towards Mediocrity …”):

Xi = father-i‟s height (in metres),

Yi = son-i‟s height (in metres)

What is the relationship between ingenerational family structure, as measured by father‟s and son‟s heights?

2. Returns to education (e.g. Mincer, 1962):

Xi = person-i‟s schooling level (years of education),

Yi = i„s wages ($/hour worked)

What is the average effect of an additional year of education on a worker‟s wage?

3. Phillips-curve (Phillips, 1958, Economica):

Xt = year-t unemployment rate (percent unemployed),

Yt = year-t wage inflation rate (percent)

What is the relationship between unemployment and wage inflation?

Page 7: Q201_Lec1_2

4

Consider three possible lines fit to these data, labelled A, B, and C as follows.

First, consider line-A

Let ii bXaY ˆ be the fitted (or predicted) value of Y,

given the value-Xi – i.e. the point on the fitted line corresponding to Xi.

Then the residual is the difference between the actual and predicted Y-values: i.e.

)(ˆiiiii bXaYYYe .

Intuitively, line-A doesn‟t fit the data – i.e. it doesn‟t go through the scatterplot!

More formally, the residuals (ei) are all negative … so the

average residual, 01

1

N

i

ieN

e . This suggests:

Property 1: one desirable property of a regression line fit

is that the average residual is 0: i.e. 01

1

N

i

ieN

e .

Page 8: Q201_Lec1_2

5

Next, consider line-B

Line-B satisfies the zero average residual condition … but still doesn‟t look like a good fit, because there are mostly negative residuals for low-X‟s and positive residuals for high-X‟s.

More formally, the problem is the residuals are correlated with the Xi’s – i.e. their covariance should be zero:

0))((1),cov(1

N

i

iiii XXeeN

Xe .

This suggests:

Property 2: a second desirable property of a regression line fit is that the residuals are uncorrelated with the X‟s.

Note: if 0e , then this implies:

01),cov(1

N

i

iiii XeN

Xe .

Page 9: Q201_Lec1_2

6

Finally, consider line-C

Line-C looks like a good fitting line, and satisfies both properties 1 and 2.

This very intuitive sense of a good fitting regression line is based on a „method of moments‟ approach …

It turns out that this approach leads to the (essentially) most commonly used estimators in regression analysis.

These are generally referred to as the Ordinary Least Squares (OLS) estimators … the OLS name comes from an alternative approach than intuited here.

But, let‟s see what this „method of moments‟ approach implies about the regression coefficient estimates …

Page 10: Q201_Lec1_2

7

3. Summary / Implications

This very intuitive discussion of fitting a good line to a

scatterplot relied on three aspects:

1. The assumed functional form of the relationship

between Y and X – i.e. is linear

2. The resulting residuals should have zero average; and

3. The residuals should be uncorrelated with the Xi‟s

To estimate the coefficients (a & b) of the “good-fitting” line,

we use these three points.

Property 1: Zero average residual

0)ˆ(1

1

N

i

ii YYN

e

YY ˆ – i.e. avg actual-Y = avg predicted-Y.

And, using the linear functional form assumption,

0))((1

1

N

i

ii bXaYN

which implies

XbaY ,

and solving for a , gives:

XbYa

i.e. the intercept = avg-Y – b*avg-X.

Page 11: Q201_Lec1_2

8

Property 2: zero correlation between residuals and X‟s

0)1(

1),cov(1

N

i

iiii XeN

Xe

0)))((1

N

i

iii XbXXbYY

0)()(11

N

i

ii

N

i

ii XXXbXYY

Solving for b, gives

N

i

ii

N

i

ii

XXX

XYY

b

1

1

)(

)(

.

Since

N

i

i

N

i

i XXXXYY11

)(0)( , we can rewrite

this to solve for b,

N

i

ii

N

i

ii

XXXXN

XXYYN

b

1

1

))(()1(

1

))(()1(

1

,

which is simply

)(

),(

i

ii

XVar

YXCovb

i.e. the slope parameter is the covariance between Xi and Yi

divided (i.e. normalised) by the variance of Xi.