introduction to econometrics€¦ · the dataset contains data on test performance, school...

1

Introduction to Econometrics The statistical analysis of economic (and related) data

STATS301

Info Session (2015/2016)

Titulaire: Christopher Bruffaerts

Assistant: Lorenzo Ricci

2

Contacts Titulaire : Christopher Bruffaerts

E-mail : [email protected]

Assistant : Lorenzo Ricci

Office: H.4.152

Office Hour: Monday 14h-16h (or by appointment)

E-mail : [email protected]

Etudiants-Assistants (Help for the exercices sessions):

Lina Frank

Hélène Seynaeve please use the email wisely

3

Organisation of the Course

STAT-S-301: Introduction to Econometrics

5 ECTS

3 parts:

• Theory (Christopher Bruffaerts)

• Theoretical exercises (Lorenzo Ricci)

• Practical work on computer (Lorenzo Ricci)

Website : http://metricssbsem.wordpress.com/ Other material in : http://beta.respublicae.be (managed by students!)

4

Organisation of the Course - Schedule 1 - Theory

Friday 8-10 See Gehol

Saturday (2 times) 9-12 See Gehol

2 - Theoretical exercises (All Group) Thursday 10-12 See Gehol

The goal of “theoretical exercises” is to help to solve the exercises at the end of each chapter. You will find the solution of all exercises on Respublicae (student Forum). For any question regarding exercises and computer session, please contact Lorenzo Ricci.

5

Date Subject (theory & exercises) September 18 Friday Introduction and Review of Statistics

25 Friday Review of Statistics

26 Saturday Linear Regression October 1 Thursday Review of Statistics October 2 Friday Multiple regression

3 Saturday Multiple regression/ Model evaluation

8 Thursday Linear Regression

9 Friday Nonlinear Regression

15 Thursday Multiple Regression

16 Friday Panel Data

22 Thursday Nonlinear Regression /Model evaluation

23 Friday Instrumental Variable

29 Thursday Panel Data November 5 Thursday Instrumental Variables

13 Friday Instrumental Variable

19 Thursday Instrumental Variable

26 Thursday Instrumental Variables

27 Friday Time series December 3 Thursday Time Series Regression

4 Friday Quasi-experiment/Summary

10 Thursday Quasi-experiment/Summary

6

Computer Sessions (by group):

September 28,29 STATA Introduction and Review of Statistics

October 5,6 Linear Regression/Multiple Regression

October 19,20 Nonlinear Regression and Panel Data

November 30 Instrumental Variable and Time Series

December 2 Regression

• You have to register for the Computer sessions next week

(~ next Tuesday)

• There is an announcement on monulb

• Registration will be held in the office H 3.159. (Nicole Vanderroost’s Office).

7

Organisation of the Course – Final Exam

Structure of the exam:

1. multiple choice questions

2. open question on theoretical exercise

3. open questions on practical exercise

Option 1: Final exam

Option 2: Problem sets + Mid-term exam + Final exam November 6

• We will not give you copies of the old exams • however, there is a 2010 final exam and all the mid-terms on Respublicae

8

Organisation of the Course – Final grade

1) Final Exam (100% of the Grade)

2) The weighted average between i. Problem set (25%),

ii. Mid-term exam (25%), iii. Final Exam (50%).

Important: The final grade will be the best between the two options. Suggestion: Take the mid-term exam and work regularly on the personal work. Statistical evidence based on last years : On average the grade of those that take option 2 is 2 points higher

9

Organisation of the Course – Problem sets

• Written Reports

• STATA is the statistical software used for this Course.

• Reports should be Team work

NO FREE RIDER WITHIN THE TEAM IS ALLOWED

- You can work in groups (maximum 4 people) - but each student must participate actively on the report - More details on the Problem set will be given during the first

computer session

10

Reports and Waivers

• The exam will be at the beginning of January.

• The course is considered successful if the total score is greater than or equal to 10/20.

• Students who are repeating the year are exempted from the course if and only if they have obtained at least 10/20 before.

11

References

J. H. Stock and M. W. Watson, Introduction to Econometrics (third edition), Pearson Education Limited, 2012

Chapter Subject 2-3 Review of Statistics 4-5 Linear Regression 6-7 Multiple Regression 8 Nonlinear Regression 9 Model evaluation

10 Panel Data 12 Instrumental Variable 13 Quasi-experiment 14 Time Series Regression

• I strongly recommend you to buy the book

• Study the book, do not limit your study to reading the slides!!!

12

Brief Overview of the Course Economics suggests important relationships, often with policy implications, but virtually never suggests quantitative magnitudes of causal effects. Meaning of econometrics Econometrics is the application of mathematics, statistical methods, and computer science, to economic data and is described as the branch of economics that aims to give empirical content to economic relations (source : Wikipedia). Econometrics = « Economy » + « metrics »

13

Some questions we might ask ourselves • What is the quantitative effect of reducing class size on

student achievement? • What is the price elasticity of cigarettes?

• What is the effect on output growth of a 1 percentage

point increase in interest rates by the ECB? • How does another year of education change earnings?

14

This course is about using data to measure causal effects • Ideally, we would like an experiment • But almost always we only have observational data

o returns to education o cigarette prices o monetary policy

• Most of the course deals with difficulties arising from using observational to estimate causal effects :

o confounding effects (omitted factors) o simultaneous causality

Remember: “correlation does not imply causation”

15

In this course you will: • Learn methods for estimating causal effects using

observational data

• Learn some tools that can be used for other purposes, for example forecasting using time series data

• Learn to evaluate the regression analysis of others – this

means you will be able to read/understand empirical economics papers in other econ courses

• Get some hands-on experience with regression analysis in your problem sets.

16

A first example of an empirical problem Class size and educational output

Policy question:

What is the effect on test scores (or some other outcome measure) of reducing class size by one student per class? by 8 students/class?

Answer : We must use data to find out (is there any way to answer this without data?)

17

The California Test Score Data Set Data: California school districts (n = 420) The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California.

Variables:

1. 5th grade test scores (Stanford-9 achievement test, combined math and reading), district average

2. Student-teacher ratio (STR) = number of students in the district divided by number of full-time equivalent teachers

18

Initial look at the data: univariate

This table does not tell us anything about the relationship between test scores (TS) and the Student Teacher Ratio (STR).

19

The California Test Score Data Set

20

Initial look at the data: bivariate Scatterplot of test score versus student-teacher ratio

Do districts with smaller classes have higher test scores?

21

The empirical strategy

We need to get some numerical evidence on whether districts with low STRs have higher test scores – but how?

1. Compare average test scores in districts with low STRs to those with high STRs (“estimation”)

2. Test the hypothesis that the mean test scores in the two types of districts are the same, against the hypothesis that they differ (“hypothesis testing”)

3. Estimate an interval for the difference in the mean test scores, high versus low STR districts (“confidence interval”)

22

Initial data analysis: Compare districts with “small” (STR < 20) and “large” (STR ≥ 20) class sizes:

Class Size

Average score (Y )

Standard deviation (sY)

n

Small 657.4 19.4 238 Large 650.0 17.9 182

Δ = µsmall – µlarge = E(TS|STR < 20) – E(TS|STR ≥ 20) 1. Estimation of Δ = difference between group means 2. Test the hypothesis that Δ = 0 3. Construct a confidence interval for Δ

23

1. Estimation

small largeY Y− = small

1small

1 n

iiY

n =∑ –

large

1large

1 n

iiY

n =∑

= 657.4 – 650.0 = 7.4

Is this a large difference in a real-world sense? • Standard deviation across districts = 19.1 • Difference between 60th and 75th percentiles of test score

distribution is 667.6 – 659.4 = 8.2 • This is a big enough difference to be important for school

reform discussions, for parents, or for a school committee?

24

2. Hypothesis testing Formulating hypothesis

H0: µsmall – µlarge = 0 H1: µsmall – µlarge ≠ 0

Comparison of means for 2 populations (test at level α) Statistic: compute the t-statistic,

2 2 ( )s l

s l

s l s l

s s s ln n

Y Y Y YtSE Y Y

− −= =

−+

where SE( sY – lY ) is the “standard error” of sY – lY , the subscripts s and l refer to “small” and “large” STR districts,

and 2 2

1

1 ( )1

sn

s i sis

s Y Yn =

= −− ∑

25

2. Hypothesis testing Compute the difference-of-means t-statistic:

Size Y sY n small 657.4 19.4 238 large 650.0 17.9 182

2 2 2 219.4 17.9238 182

657.4 650.0 7.41.83s l

s l

s l

s sn n

Y Yt − −= = =

+ + = 4.05

Distribution under H0 : t ≈ N(0,1) (for n1,n2 > 30) Decision rule : |t| > 1.96, so reject (at the 5% significance level) the null hypothesis that the two means are the same.

26

3.Confidence interval A 95% confidence interval for the difference between the means is,

CI 95%(µsmall – µlarge) = [ ( sY – lY ) ± 1.96 × SE( sY – lY ) ] = [7.4 ± 1.96 × 1.83] = [3.8, 11.0]

Two equivalent statements: 1. The 95% confidence interval for Δ does not include 0 2. The hypothesis that Δ = µsmall – µlarge = 0 is rejected at the

5% level

27

What comes next… • The mechanics of estimation, hypothesis testing, and

confidence intervals should be familiar • These concepts extend directly to regression and its

variants • Before turning to regression, however, we will review

some of the underlying theory of estimation, hypothesis testing, and confidence intervals: o why do these procedures work, and why use these

rather than others? o we will review the intellectual foundations of statistics

and econometrics

introduction to econometrics€¦ · the dataset contains data on test performance, school...

Documents