introduction to econometrics€¦ · the dataset contains data on test performance, school...
TRANSCRIPT
1
Introduction to Econometrics The statistical analysis of economic (and related) data
STATS301
Info Session (2015/2016)
Titulaire: Christopher Bruffaerts
Assistant: Lorenzo Ricci
2
Contacts Titulaire : Christopher Bruffaerts
E-mail : [email protected]
Assistant : Lorenzo Ricci
Office: H.4.152
Office Hour: Monday 14h-16h (or by appointment)
E-mail : [email protected]
Etudiants-Assistants (Help for the exercices sessions):
Lina Frank
Hélène Seynaeve please use the email wisely
3
Organisation of the Course
STAT-S-301: Introduction to Econometrics
5 ECTS
3 parts:
• Theory (Christopher Bruffaerts)
• Theoretical exercises (Lorenzo Ricci)
• Practical work on computer (Lorenzo Ricci)
Website : http://metricssbsem.wordpress.com/ Other material in : http://beta.respublicae.be (managed by students!)
4
Organisation of the Course - Schedule 1 - Theory
Friday 8-10 See Gehol
Saturday (2 times) 9-12 See Gehol
2 - Theoretical exercises (All Group) Thursday 10-12 See Gehol
The goal of “theoretical exercises” is to help to solve the exercises at the end of each chapter. You will find the solution of all exercises on Respublicae (student Forum). For any question regarding exercises and computer session, please contact Lorenzo Ricci.
5
Date Subject (theory & exercises) September 18 Friday Introduction and Review of Statistics
25 Friday Review of Statistics
26 Saturday Linear Regression October 1 Thursday Review of Statistics October 2 Friday Multiple regression
3 Saturday Multiple regression/ Model evaluation
8 Thursday Linear Regression
9 Friday Nonlinear Regression
15 Thursday Multiple Regression
16 Friday Panel Data
22 Thursday Nonlinear Regression /Model evaluation
23 Friday Instrumental Variable
29 Thursday Panel Data November 5 Thursday Instrumental Variables
13 Friday Instrumental Variable
19 Thursday Instrumental Variable
26 Thursday Instrumental Variables
27 Friday Time series December 3 Thursday Time Series Regression
4 Friday Quasi-experiment/Summary
10 Thursday Quasi-experiment/Summary
6
Computer Sessions (by group):
September 28,29 STATA Introduction and Review of Statistics
October 5,6 Linear Regression/Multiple Regression
October 19,20 Nonlinear Regression and Panel Data
November 30 Instrumental Variable and Time Series
December 2 Regression
• You have to register for the Computer sessions next week
(~ next Tuesday)
• There is an announcement on monulb
• Registration will be held in the office H 3.159. (Nicole Vanderroost’s Office).
7
Organisation of the Course – Final Exam
Structure of the exam:
1. multiple choice questions
2. open question on theoretical exercise
3. open questions on practical exercise
Option 1: Final exam
Option 2: Problem sets + Mid-term exam + Final exam November 6
• We will not give you copies of the old exams • however, there is a 2010 final exam and all the mid-terms on Respublicae
8
Organisation of the Course – Final grade
1) Final Exam (100% of the Grade)
2) The weighted average between i. Problem set (25%),
ii. Mid-term exam (25%), iii. Final Exam (50%).
Important: The final grade will be the best between the two options. Suggestion: Take the mid-term exam and work regularly on the personal work. Statistical evidence based on last years : On average the grade of those that take option 2 is 2 points higher
9
Organisation of the Course – Problem sets
• Written Reports
• STATA is the statistical software used for this Course.
• Reports should be Team work
NO FREE RIDER WITHIN THE TEAM IS ALLOWED
- You can work in groups (maximum 4 people) - but each student must participate actively on the report - More details on the Problem set will be given during the first
computer session
10
Reports and Waivers
• The exam will be at the beginning of January.
• The course is considered successful if the total score is greater than or equal to 10/20.
• Students who are repeating the year are exempted from the course if and only if they have obtained at least 10/20 before.
11
References
J. H. Stock and M. W. Watson, Introduction to Econometrics (third edition), Pearson Education Limited, 2012
Chapter Subject 2-3 Review of Statistics 4-5 Linear Regression 6-7 Multiple Regression 8 Nonlinear Regression 9 Model evaluation
10 Panel Data 12 Instrumental Variable 13 Quasi-experiment 14 Time Series Regression
• I strongly recommend you to buy the book
• Study the book, do not limit your study to reading the slides!!!
12
Brief Overview of the Course Economics suggests important relationships, often with policy implications, but virtually never suggests quantitative magnitudes of causal effects. Meaning of econometrics Econometrics is the application of mathematics, statistical methods, and computer science, to economic data and is described as the branch of economics that aims to give empirical content to economic relations (source : Wikipedia). Econometrics = « Economy » + « metrics »
13
Some questions we might ask ourselves • What is the quantitative effect of reducing class size on
student achievement? • What is the price elasticity of cigarettes?
• What is the effect on output growth of a 1 percentage
point increase in interest rates by the ECB? • How does another year of education change earnings?
14
This course is about using data to measure causal effects • Ideally, we would like an experiment • But almost always we only have observational data
o returns to education o cigarette prices o monetary policy
• Most of the course deals with difficulties arising from using observational to estimate causal effects :
o confounding effects (omitted factors) o simultaneous causality
Remember: “correlation does not imply causation”
15
In this course you will: • Learn methods for estimating causal effects using
observational data
• Learn some tools that can be used for other purposes, for example forecasting using time series data
• Learn to evaluate the regression analysis of others – this
means you will be able to read/understand empirical economics papers in other econ courses
• Get some hands-on experience with regression analysis in your problem sets.
16
A first example of an empirical problem Class size and educational output
Policy question:
What is the effect on test scores (or some other outcome measure) of reducing class size by one student per class? by 8 students/class?
Answer : We must use data to find out (is there any way to answer this without data?)
17
The California Test Score Data Set Data: California school districts (n = 420) The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California.
Variables:
1. 5th grade test scores (Stanford-9 achievement test, combined math and reading), district average
2. Student-teacher ratio (STR) = number of students in the district divided by number of full-time equivalent teachers
18
Initial look at the data: univariate
This table does not tell us anything about the relationship between test scores (TS) and the Student Teacher Ratio (STR).
19
The California Test Score Data Set
20
Initial look at the data: bivariate Scatterplot of test score versus student-teacher ratio
Do districts with smaller classes have higher test scores?
21
The empirical strategy
We need to get some numerical evidence on whether districts with low STRs have higher test scores – but how?
1. Compare average test scores in districts with low STRs to those with high STRs (“estimation”)
2. Test the hypothesis that the mean test scores in the two types of districts are the same, against the hypothesis that they differ (“hypothesis testing”)
3. Estimate an interval for the difference in the mean test scores, high versus low STR districts (“confidence interval”)
22
Initial data analysis: Compare districts with “small” (STR < 20) and “large” (STR ≥ 20) class sizes:
Class Size
Average score (Y )
Standard deviation (sY)
n
Small 657.4 19.4 238 Large 650.0 17.9 182
Δ = µsmall – µlarge = E(TS|STR < 20) – E(TS|STR ≥ 20) 1. Estimation of Δ = difference between group means 2. Test the hypothesis that Δ = 0 3. Construct a confidence interval for Δ
23
1. Estimation
small largeY Y− = small
1small
1 n
iiY
n =∑ –
large
1large
1 n
iiY
n =∑
= 657.4 – 650.0 = 7.4
Is this a large difference in a real-world sense? • Standard deviation across districts = 19.1 • Difference between 60th and 75th percentiles of test score
distribution is 667.6 – 659.4 = 8.2 • This is a big enough difference to be important for school
reform discussions, for parents, or for a school committee?
24
2. Hypothesis testing Formulating hypothesis
H0: µsmall – µlarge = 0 H1: µsmall – µlarge ≠ 0
Comparison of means for 2 populations (test at level α) Statistic: compute the t-statistic,
2 2 ( )s l
s l
s l s l
s s s ln n
Y Y Y YtSE Y Y
− −= =
−+
where SE( sY – lY ) is the “standard error” of sY – lY , the subscripts s and l refer to “small” and “large” STR districts,
and 2 2
1
1 ( )1
sn
s i sis
s Y Yn =
= −− ∑
25
2. Hypothesis testing Compute the difference-of-means t-statistic:
Size Y sY n small 657.4 19.4 238 large 650.0 17.9 182
2 2 2 219.4 17.9238 182
657.4 650.0 7.41.83s l
s l
s l
s sn n
Y Yt − −= = =
+ + = 4.05
Distribution under H0 : t ≈ N(0,1) (for n1,n2 > 30) Decision rule : |t| > 1.96, so reject (at the 5% significance level) the null hypothesis that the two means are the same.
26
3.Confidence interval A 95% confidence interval for the difference between the means is,
CI 95%(µsmall – µlarge) = [ ( sY – lY ) ± 1.96 × SE( sY – lY ) ] = [7.4 ± 1.96 × 1.83] = [3.8, 11.0]
Two equivalent statements: 1. The 95% confidence interval for Δ does not include 0 2. The hypothesis that Δ = µsmall – µlarge = 0 is rejected at the
5% level
27
What comes next… • The mechanics of estimation, hypothesis testing, and
confidence intervals should be familiar • These concepts extend directly to regression and its
variants • Before turning to regression, however, we will review
some of the underlying theory of estimation, hypothesis testing, and confidence intervals: o why do these procedures work, and why use these
rather than others? o we will review the intellectual foundations of statistics
and econometrics